mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-02-05 19:09:58 +08:00
Avoid VACUUM reltuples distortion.
Add a heuristic that avoids distortion in the pg_class.reltuples estimates used by VACUUM. Without the heuristic, successive manually run VACUUM commands (run against a table that is never modified after initial bulk loading) will scan the same page in each VACUUM operation. Eventually pg_class.reltuples may reach the point where one single heap page is accidentally considered highly representative of the entire table. This is likely to be completely wrong, since the last heap page typically has fewer tuples than average for the table. It's not obvious that this was a problem prior to commit44fa8488
, which made vacuumlazy.c consistently scan the last heap page (even when it is all-visible in the visibility map). It seems possible that there were more subtle variants of the same problem that went unnoticed for quite some time, though. Commit44fa8488
simplified certain aspects of when and how relation truncation was considered, but it did not introduce the "scan the last page" behavior. Essentially the same behavior was introduced much earlier, in commite8429082
. It was conditioned on whether or not truncation looked promising towards the end of the initial heap pass by VACUUM until recently, which was at least somewhat protective. That doesn't seem like something that we should be relying on, though. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com
This commit is contained in:
parent
d61a361d1a
commit
74388a1ac3
@ -1238,6 +1238,25 @@ vac_estimate_reltuples(Relation relation,
|
||||
if (scanned_pages == 0)
|
||||
return old_rel_tuples;
|
||||
|
||||
/*
|
||||
* When successive VACUUM commands scan the same few pages again and
|
||||
* again, without anything from the table really changing, there is a risk
|
||||
* that our beliefs about tuple density will gradually become distorted.
|
||||
* It's particularly important to avoid becoming confused in this way due
|
||||
* to vacuumlazy.c implementation details. For example, the tendency for
|
||||
* our caller to always scan the last heap page should not ever cause us
|
||||
* to believe that every page in the table must be just like the last
|
||||
* page.
|
||||
*
|
||||
* We apply a heuristic to avoid these problems: if the relation is
|
||||
* exactly the same size as it was at the end of the last VACUUM, and only
|
||||
* a few of its pages (less than a quasi-arbitrary threshold of 2%) were
|
||||
* scanned by this VACUUM, assume that reltuples has not changed at all.
|
||||
*/
|
||||
if (old_rel_pages == total_pages &&
|
||||
scanned_pages < (double) total_pages * 0.02)
|
||||
return old_rel_tuples;
|
||||
|
||||
/*
|
||||
* If old density is unknown, we can't do much except scale up
|
||||
* scanned_tuples to match total_pages.
|
||||
|
Loading…
Reference in New Issue
Block a user