mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-01-06 15:24:56 +08:00
Add new README file for pages/checksums
This commit is contained in:
parent
96ef3b8ff1
commit
9df56f6d91
52
src/backend/storage/page/README
Normal file
52
src/backend/storage/page/README
Normal file
@ -0,0 +1,52 @@
|
|||||||
|
src/backend/storage/page/README
|
||||||
|
|
||||||
|
Checksums
|
||||||
|
---------
|
||||||
|
|
||||||
|
Checksums on data pages are designed to detect corruption by the I/O system.
|
||||||
|
We do not protect buffers against uncorrectable memory errors, since these
|
||||||
|
have a very low measured incidence according to research on large server farms,
|
||||||
|
http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf, discussed
|
||||||
|
2010/12/22 on -hackers list.
|
||||||
|
|
||||||
|
Current implementation requires this be enabled system-wide at initdb time.
|
||||||
|
|
||||||
|
The checksum is not valid at all times on a data page!!
|
||||||
|
The checksum is valid when the page leaves the shared pool and is checked
|
||||||
|
when it later re-enters the shared pool as a result of I/O.
|
||||||
|
We set the checksum on a buffer in the shared pool immediately before we
|
||||||
|
flush the buffer. As a result we implicitly invalidate the page's checksum
|
||||||
|
when we modify the page for a data change or even a hint. This means that
|
||||||
|
many or even most pages in shared buffers have invalid page checksums,
|
||||||
|
so be careful how you interpret the pd_checksum field.
|
||||||
|
|
||||||
|
That means that WAL-logged changes to a page do NOT update the page checksum,
|
||||||
|
so full page images may not have a valid checksum. But those page images have
|
||||||
|
the WAL CRC covering them and so are verified separately from this
|
||||||
|
mechanism. WAL replay should not test the checksum of a full-page image.
|
||||||
|
|
||||||
|
The best way to understand this is that WAL CRCs protect records entering the
|
||||||
|
WAL stream, and data page verification protects blocks entering the shared
|
||||||
|
buffer pool. They are similar in purpose, yet completely separate. Together
|
||||||
|
they ensure we are able to detect errors in data re-entering
|
||||||
|
PostgreSQL-controlled memory. Note also that the WAL checksum is a 32-bit CRC,
|
||||||
|
whereas the page checksum is only 16-bits.
|
||||||
|
|
||||||
|
Any write of a data block can cause a torn page if the write is unsuccessful.
|
||||||
|
Full page writes protect us from that, which are stored in WAL. Setting hint
|
||||||
|
bits when a page is already dirty is OK because a full page write must already
|
||||||
|
have been written for it since the last checkpoint. Setting hint bits on an
|
||||||
|
otherwise clean page can allow torn pages; this doesn't normally matter since
|
||||||
|
they are just hints, but when the page has checksums, then losing a few bits
|
||||||
|
would cause the checksum to be invalid. So if we have full_page_writes = on
|
||||||
|
and checksums enabled then we must write a WAL record specifically so that we
|
||||||
|
record a full page image in WAL. Hint bits updates should be protected using
|
||||||
|
MarkBufferDirtyHint(), which is responsible for writing the full-page image
|
||||||
|
when necessary.
|
||||||
|
|
||||||
|
New WAL records cannot be written during recovery, so hint bits set during
|
||||||
|
recovery must not dirty the page if the buffer is not already dirty, when
|
||||||
|
checksums are enabled. Systems in Hot-Standby mode may benefit from hint bits
|
||||||
|
being set, but with checksums enabled, a page cannot be dirtied after setting a
|
||||||
|
hint bit (due to the torn page risk). So, it must wait for full-page images
|
||||||
|
containing the hint bit updates to arrive from the master.
|
Loading…
Reference in New Issue
Block a user