Avoid VACUUM reltuples distortion.

Add a heuristic that avoids distortion in the pg_class.reltuples
estimates used by VACUUM.  Without the heuristic, successive manually
run VACUUM commands (run against a table that is never modified after
initial bulk loading) will scan the same page in each VACUUM operation.
Eventually pg_class.reltuples may reach the point where one single heap
page is accidentally considered highly representative of the entire
table.  This is likely to be completely wrong, since the last heap page
typically has fewer tuples than average for the table.

It's not obvious that this was a problem prior to commit 44fa8488, which
made vacuumlazy.c consistently scan the last heap page (even when it is
all-visible in the visibility map).  It seems possible that there were
more subtle variants of the same problem that went unnoticed for quite
some time, though.  Commit 44fa8488 simplified certain aspects of when
and how relation truncation was considered, but it did not introduce the
"scan the last page" behavior.  Essentially the same behavior was
introduced much earlier, in commit e8429082.  It was conditioned on
whether or not truncation looked promising towards the end of the
initial heap pass by VACUUM until recently, which was at least somewhat
protective.  That doesn't seem like something that we should be relying
on, though.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com
This commit is contained in:
Peter Geoghegan 2022-02-16 17:15:50 -08:00
parent d61a361d1a
commit 74388a1ac3

View File

@ -1238,6 +1238,25 @@ vac_estimate_reltuples(Relation relation,
if (scanned_pages == 0)
return old_rel_tuples;
/*
* When successive VACUUM commands scan the same few pages again and
* again, without anything from the table really changing, there is a risk
* that our beliefs about tuple density will gradually become distorted.
* It's particularly important to avoid becoming confused in this way due
* to vacuumlazy.c implementation details. For example, the tendency for
* our caller to always scan the last heap page should not ever cause us
* to believe that every page in the table must be just like the last
* page.
*
* We apply a heuristic to avoid these problems: if the relation is
* exactly the same size as it was at the end of the last VACUUM, and only
* a few of its pages (less than a quasi-arbitrary threshold of 2%) were
* scanned by this VACUUM, assume that reltuples has not changed at all.
*/
if (old_rel_pages == total_pages &&
scanned_pages < (double) total_pages * 0.02)
return old_rel_tuples;
/*
* If old density is unknown, we can't do much except scale up
* scanned_tuples to match total_pages.