nbtree README: move VACUUM linear scan section.

Discuss VACUUM's linear scan after discussion of tuple deletion by
VACUUM, but before discussion of page deletion by VACUUM.  This
progression is a lot more natural.

Also tweak the wording a little.  It seems unnecessary to talk about how
it worked prior to PostgreSQL 8.2.
This commit is contained in:
Peter Geoghegan 2021-02-17 21:13:15 -08:00
parent 927f453a94
commit 128dd901a5

View File

@ -214,6 +214,34 @@ page). Since we hold a lock on the lower page (per L&Y) until we have
re-found the parent item that links to it, we can be assured that the
parent item does still exist and can't have been deleted.
VACUUM's linear scan, concurrent page splits
--------------------------------------------
VACUUM accesses the index by doing a linear scan to search for deletable
TIDs, while considering the possibility of deleting empty pages in
passing. This is in physical/block order, not logical/keyspace order.
The tricky part of this is avoiding missing any deletable tuples in the
presence of concurrent page splits: a page split could easily move some
tuples from a page not yet passed over by the sequential scan to a
lower-numbered page already passed over.
To implement this, we provide a "vacuum cycle ID" mechanism that makes it
possible to determine whether a page has been split since the current
btbulkdelete cycle started. If btbulkdelete finds a page that has been
split since it started, and has a right-link pointing to a lower page
number, then it temporarily suspends its sequential scan and visits that
page instead. It must continue to follow right-links and vacuum dead
tuples until reaching a page that either hasn't been split since
btbulkdelete started, or is above the location of the outer sequential
scan. Then it can resume the sequential scan. This ensures that all
tuples are visited. It may be that some tuples are visited twice, but
that has no worse effect than an inaccurate index tuple count (and we
can't guarantee an accurate count anyway in the face of concurrent
activity). Note that this still works if the has-been-recently-split test
has a small probability of false positives, so long as it never gives a
false negative. This makes it possible to implement the test with a small
counter value stored on each index page.
Deleting entire pages during VACUUM
-----------------------------------
@ -371,33 +399,6 @@ as part of the atomic update for the delete (either way, the metapage has
to be the last page locked in the update to avoid deadlock risks). This
avoids race conditions if two such operations are executing concurrently.
VACUUM needs to do a linear scan of an index to search for deleted pages
that can be reclaimed because they are older than all open transactions.
For efficiency's sake, we'd like to use the same linear scan to search for
deletable tuples. Before Postgres 8.2, btbulkdelete scanned the leaf pages
in index order, but it is possible to visit them in physical order instead.
The tricky part of this is to avoid missing any deletable tuples in the
presence of concurrent page splits: a page split could easily move some
tuples from a page not yet passed over by the sequential scan to a
lower-numbered page already passed over. (This wasn't a concern for the
index-order scan, because splits always split right.) To implement this,
we provide a "vacuum cycle ID" mechanism that makes it possible to
determine whether a page has been split since the current btbulkdelete
cycle started. If btbulkdelete finds a page that has been split since
it started, and has a right-link pointing to a lower page number, then
it temporarily suspends its sequential scan and visits that page instead.
It must continue to follow right-links and vacuum dead tuples until
reaching a page that either hasn't been split since btbulkdelete started,
or is above the location of the outer sequential scan. Then it can resume
the sequential scan. This ensures that all tuples are visited. It may be
that some tuples are visited twice, but that has no worse effect than an
inaccurate index tuple count (and we can't guarantee an accurate count
anyway in the face of concurrent activity). Note that this still works
if the has-been-recently-split test has a small probability of false
positives, so long as it never gives a false negative. This makes it
possible to implement the test with a small counter value stored on each
index page.
Fastpath For Index Insertion
----------------------------