From 128dd901a5c87e11c6a8cbe227a806cdc3afd10d Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Wed, 17 Feb 2021 21:13:15 -0800
Subject: [PATCH] nbtree README: move VACUUM linear scan section.

Discuss VACUUM's linear scan after discussion of tuple deletion by
VACUUM, but before discussion of page deletion by VACUUM.  This
progression is a lot more natural.

Also tweak the wording a little.  It seems unnecessary to talk about how
it worked prior to PostgreSQL 8.2.
---
 src/backend/access/nbtree/README | 55 ++++++++++++++++----------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index 8503fd4e72..62da36e80c 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -214,6 +214,34 @@ page).  Since we hold a lock on the lower page (per L&Y) until we have
 re-found the parent item that links to it, we can be assured that the
 parent item does still exist and can't have been deleted.
 
+VACUUM's linear scan, concurrent page splits
+--------------------------------------------
+
+VACUUM accesses the index by doing a linear scan to search for deletable
+TIDs, while considering the possibility of deleting empty pages in
+passing.  This is in physical/block order, not logical/keyspace order.
+The tricky part of this is avoiding missing any deletable tuples in the
+presence of concurrent page splits: a page split could easily move some
+tuples from a page not yet passed over by the sequential scan to a
+lower-numbered page already passed over.
+
+To implement this, we provide a "vacuum cycle ID" mechanism that makes it
+possible to determine whether a page has been split since the current
+btbulkdelete cycle started.  If btbulkdelete finds a page that has been
+split since it started, and has a right-link pointing to a lower page
+number, then it temporarily suspends its sequential scan and visits that
+page instead.  It must continue to follow right-links and vacuum dead
+tuples until reaching a page that either hasn't been split since
+btbulkdelete started, or is above the location of the outer sequential
+scan.  Then it can resume the sequential scan.  This ensures that all
+tuples are visited.  It may be that some tuples are visited twice, but
+that has no worse effect than an inaccurate index tuple count (and we
+can't guarantee an accurate count anyway in the face of concurrent
+activity).  Note that this still works if the has-been-recently-split test
+has a small probability of false positives, so long as it never gives a
+false negative.  This makes it possible to implement the test with a small
+counter value stored on each index page.
+
 Deleting entire pages during VACUUM
 -----------------------------------
 
@@ -371,33 +399,6 @@ as part of the atomic update for the delete (either way, the metapage has
 to be the last page locked in the update to avoid deadlock risks).  This
 avoids race conditions if two such operations are executing concurrently.
 
-VACUUM needs to do a linear scan of an index to search for deleted pages
-that can be reclaimed because they are older than all open transactions.
-For efficiency's sake, we'd like to use the same linear scan to search for
-deletable tuples.  Before Postgres 8.2, btbulkdelete scanned the leaf pages
-in index order, but it is possible to visit them in physical order instead.
-The tricky part of this is to avoid missing any deletable tuples in the
-presence of concurrent page splits: a page split could easily move some
-tuples from a page not yet passed over by the sequential scan to a
-lower-numbered page already passed over.  (This wasn't a concern for the
-index-order scan, because splits always split right.)  To implement this,
-we provide a "vacuum cycle ID" mechanism that makes it possible to
-determine whether a page has been split since the current btbulkdelete
-cycle started.  If btbulkdelete finds a page that has been split since
-it started, and has a right-link pointing to a lower page number, then
-it temporarily suspends its sequential scan and visits that page instead.
-It must continue to follow right-links and vacuum dead tuples until
-reaching a page that either hasn't been split since btbulkdelete started,
-or is above the location of the outer sequential scan.  Then it can resume
-the sequential scan.  This ensures that all tuples are visited.  It may be
-that some tuples are visited twice, but that has no worse effect than an
-inaccurate index tuple count (and we can't guarantee an accurate count
-anyway in the face of concurrent activity).  Note that this still works
-if the has-been-recently-split test has a small probability of false
-positives, so long as it never gives a false negative.  This makes it
-possible to implement the test with a small counter value stored on each
-index page.
-
 Fastpath For Index Insertion
 ----------------------------