2003-03-21 02:14:46 +08:00
|
|
|
pg_autovacuum README
|
2003-06-12 09:36:44 +08:00
|
|
|
--------------------
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
pg_autovacuum is a libpq client program that monitors all the
|
2003-09-14 00:26:18 +08:00
|
|
|
databases associated with a PostgreSQL server. It uses the statistics
|
2003-06-12 09:36:44 +08:00
|
|
|
collector to monitor insert, update and delete activity.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
When a table exceeds a insert or delete threshold (for more detail on
|
|
|
|
thresholds, see "Vacuum and Analyze" below) then that table will be
|
|
|
|
vacuumed and/or analyzed.
|
2003-06-12 09:36:44 +08:00
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
This allows PostgreSQL to keep the FSM (Free Space Map) and table
|
|
|
|
statistics up to date, and eliminates the need to schedule periodic
|
|
|
|
vacuums.
|
2003-06-12 09:36:44 +08:00
|
|
|
|
|
|
|
The primary benefit of pg_autovacuum is that the FSM and table
|
2003-09-14 00:26:18 +08:00
|
|
|
statistic information are updated more nearly as frequently as needed.
|
|
|
|
When a table is actively changing, pg_autovacuum will perform the
|
|
|
|
VACUUMs and ANALYZEs that such a table needs, whereas if a table
|
|
|
|
remains static, no cycles will be wasted performing this
|
|
|
|
unnecessarily.
|
2003-06-12 09:36:44 +08:00
|
|
|
|
|
|
|
A secondary benefit of pg_autovacuum is that it ensures that a
|
2003-09-14 00:26:18 +08:00
|
|
|
database wide vacuum is performed prior to XID wraparound. This is an
|
2003-06-12 09:36:44 +08:00
|
|
|
important, if rare, problem, as failing to do so can result in major
|
2003-09-14 00:26:18 +08:00
|
|
|
data loss. (See the section in the _Administrator's Guide_ entitled
|
|
|
|
"Preventing transaction ID wraparound failures" for more details.)
|
2003-06-12 09:36:44 +08:00
|
|
|
|
|
|
|
KNOWN ISSUES:
|
|
|
|
-------------
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
pg_autovacuum has been tested under Redhat Linux (by me) and Debian
|
2003-10-16 11:47:28 +08:00
|
|
|
GNU/Linux, Solaris, and AIX (by Christopher B. Browne) and all known
|
|
|
|
bugs have been resolved. Please report any problems to the hackers
|
|
|
|
list.
|
2003-09-14 00:26:18 +08:00
|
|
|
|
|
|
|
pg_autovacuum requires that the statistics system be enabled and
|
|
|
|
reporting row level stats. The overhead of the stats system has been
|
2003-10-16 11:47:28 +08:00
|
|
|
shown to have a significant cost under certain workloads. For
|
|
|
|
instance, a tight loop of queries performing "select 1" was found to
|
|
|
|
run nearly 30% slower when stats were enabled. However, in practice,
|
|
|
|
with more realistic workloads, the stats system overhead is usually
|
|
|
|
nominal.
|
2003-09-14 00:26:18 +08:00
|
|
|
|
|
|
|
pg_autovacuum does not get started automatically by either the
|
|
|
|
postmaster or by pg_ctl. Similarly, when the postmaster exits, no one
|
|
|
|
tells pg_autovacuum. The result of that is that at the start of the
|
|
|
|
next loop, pg_autovacuum will fail to connect to the server and
|
|
|
|
exit(). Any time it fails to connect pg_autovacuum exit()s.
|
|
|
|
|
|
|
|
While pg_autovacuum can manage vacuums for as many databases as you
|
|
|
|
may have tied to a particular PostgreSQL postmaster, it can only
|
|
|
|
connect to a single PostgreSQL postmaster. Thus, if you have multiple
|
|
|
|
postmasters on a particular host, you will need multiple pg_autovacuum
|
|
|
|
instances, and they have no way, at present, to coordinate between one
|
|
|
|
another to ensure that they do not concurrently vacuum big tables.
|
|
|
|
|
|
|
|
TODO:
|
|
|
|
-----
|
|
|
|
|
|
|
|
At present, there are no sample scripts to automatically start up
|
|
|
|
pg_autovacuum along with the database. It would be desirable to have
|
|
|
|
a SysV script to start up pg_autovacuum after PostgreSQL has been
|
|
|
|
started.
|
|
|
|
|
|
|
|
Some users have expressed interest in making pg_autovacuum more
|
|
|
|
configurable so that certain tables known to be inactive could be
|
|
|
|
excluded from being vacuumed. It would probably make sense to
|
|
|
|
introduce this sort of functionality by providing arguments to specify
|
|
|
|
the database and schema in which to find a configuration table.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-10-16 11:47:28 +08:00
|
|
|
It would also be desirable for the daemon to monitor how busy the
|
|
|
|
system is, with a view to deferring vacuums until there is less other
|
|
|
|
activity.
|
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
INSTALL:
|
2003-06-12 09:36:44 +08:00
|
|
|
--------
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
As of postgresql v7.4 pg_autovacuum is included in the main source
|
|
|
|
tree under contrib. Therefore you merely need to "make && make
|
|
|
|
install" (similar to most other contrib modules) and it will be
|
|
|
|
installed for you.
|
2003-06-12 09:36:44 +08:00
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
If you are using an earlier version of PostgreSQL, uncompress the
|
|
|
|
tar.gz file into the contrib directory and modify the contrib/Makefile
|
|
|
|
to include the pg_autovacuum directory. pg_autovacuum will then be
|
2003-10-16 11:47:28 +08:00
|
|
|
built as part of the standard postgresql install. It is known to work
|
|
|
|
with v7.3 releases; it is not presently compatible with v7.2.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
make sure that the following are set in postgresql.conf:
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
stats_start_collector = true
|
|
|
|
stats_row_level = true
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
Start up the postmaster, then execute the pg_autovacuum executable.
|
|
|
|
|
|
|
|
If you have a script that automatically starts up the PostgreSQL
|
|
|
|
instance, you might add in, after that, something similar to the
|
|
|
|
following:
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
sleep 10 # To give the database some time to start up
|
|
|
|
$PGBINS/pg_autovacuum -D -s $SBASE -S $SSCALE ... [other arguments]
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
Command line arguments:
|
2003-06-12 09:36:44 +08:00
|
|
|
-----------------------
|
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
pg_autovacuum has the following optional arguments:
|
2003-06-12 09:36:44 +08:00
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
-d debug: 0 silent, 1 basic info, 2 more debug info, etc...
|
2003-09-14 00:26:18 +08:00
|
|
|
-D daemonize: Detach from tty and run in background.
|
2003-03-21 02:14:46 +08:00
|
|
|
-s sleep base value: see "Sleeping" below.
|
|
|
|
-S sleep scaling factor: see "Sleeping" below.
|
2004-04-20 05:30:18 +08:00
|
|
|
-v vacuum base threshold: see "Vacuum and Analyze" below.
|
|
|
|
-V vacuum scaling factor: see "Vacuum and Analyze" below.
|
|
|
|
-a analyze base threshold: see "Vacuum and Analyze" below.
|
|
|
|
-A analyze scaling factor: see "Vacuum and Analyze" below.
|
2003-06-12 09:36:44 +08:00
|
|
|
-L log file: Name of file to which output is submitted, otherwise STDERR
|
|
|
|
-U username: Username pg_autovacuum will use to connect with, if not
|
|
|
|
specified the current username is used.
|
2003-03-21 02:14:46 +08:00
|
|
|
-P password: Password pg_autovacuum will use to connect with.
|
2003-09-14 00:26:18 +08:00
|
|
|
-H host: host name or IP to connect to.
|
2003-03-21 02:14:46 +08:00
|
|
|
-p port: port used for connection.
|
|
|
|
-h help: list of command line options.
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
Numerous arguments have default values defined in pg_autovacuum.h. At
|
|
|
|
the time of writing they are:
|
2003-06-12 09:36:44 +08:00
|
|
|
|
|
|
|
-d 1
|
|
|
|
-v 1000
|
|
|
|
-V 2
|
2003-09-14 00:26:18 +08:00
|
|
|
-a 500 (half of -v if not specified)
|
2004-04-20 05:30:18 +08:00
|
|
|
-A 1 (half of -V if not specified)
|
2003-06-12 09:36:44 +08:00
|
|
|
-s 300 (5 minutes)
|
|
|
|
-S 2
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
|
|
|
|
Vacuum and Analyze:
|
2003-06-12 09:36:44 +08:00
|
|
|
-------------------
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
pg_autovacuum performs either a VACUUM ANALYZE or just ANALYZE
|
|
|
|
depending on the mixture of table activity (insert, update, or
|
|
|
|
delete):
|
2003-06-12 09:36:44 +08:00
|
|
|
|
|
|
|
- If the number of (inserts + updates + deletes) > AnalyzeThreshold, then
|
|
|
|
only an analyze is performed.
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
- If the number of (deletes + updates) > VacuumThreshold, then a
|
2003-06-12 09:36:44 +08:00
|
|
|
vacuum analyze is performed.
|
|
|
|
|
2003-11-09 11:15:46 +08:00
|
|
|
VacuumThreshold is equal to:
|
2003-06-12 09:36:44 +08:00
|
|
|
vacuum_base_value + (vacuum_scaling_factor * "number of tuples in the table")
|
|
|
|
|
2003-11-09 11:15:46 +08:00
|
|
|
AnalyzeThreshold is equal to:
|
2003-06-12 09:36:44 +08:00
|
|
|
analyze_base_value + (analyze_scaling_factor * "number of tuples in the table")
|
|
|
|
|
|
|
|
The AnalyzeThreshold defaults to half of the VacuumThreshold since it
|
2003-09-14 00:26:18 +08:00
|
|
|
represents a much less expensive operation (approx 5%-10% of vacuum),
|
|
|
|
and running ANALYZE more often should not substantially degrade system
|
|
|
|
performance.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
Sleeping:
|
2003-06-12 09:36:44 +08:00
|
|
|
---------
|
|
|
|
|
|
|
|
pg_autovacuum sleeps for a while after it is done checking all the
|
|
|
|
databases. It does this in order to limit the amount of system
|
2003-09-14 00:26:18 +08:00
|
|
|
resources it consumes. This allows the system administrator to
|
2003-06-12 09:36:44 +08:00
|
|
|
configure pg_autovacuum to be more or less aggressive.
|
|
|
|
|
|
|
|
Reducing the sleep time will cause pg_autovacuum to respond more
|
|
|
|
quickly to changes, whether they be database addition/removal, table
|
|
|
|
addition/removal, or just normal table activity.
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
On the other hand, setting pg_autovacuum to sleep values too
|
|
|
|
aggressively (to too short periods of time) can have a negative effect
|
|
|
|
on server performance. For instance, if a table gets vacuumed 5 times
|
|
|
|
during the course of a large set of updates, this is likely to take a
|
|
|
|
lot more work than if the table was vacuumed just once, at the end.
|
2003-06-12 09:36:44 +08:00
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
The total time it sleeps is equal to:
|
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
base_sleep_value + sleep_scaling_factor * "duration of the previous
|
|
|
|
loop"
|
|
|
|
|
|
|
|
Note that timing measurements are made in seconds; specifying
|
2003-09-14 00:26:18 +08:00
|
|
|
"pg_vacuum -s 1" means pg_autovacuum could poll the database up to 60
|
|
|
|
times minute. In a system with large tables where vacuums may run for
|
|
|
|
several minutes, rather longer times between vacuums are likely to be
|
|
|
|
appropriate.
|
2003-06-12 09:36:44 +08:00
|
|
|
|
|
|
|
What pg_autovacuum monitors:
|
|
|
|
----------------------------
|
|
|
|
|
2003-09-14 00:26:18 +08:00
|
|
|
pg_autovacuum dynamically generates a list of all databases and tables
|
|
|
|
that exist on the server. It will dynamically add and remove
|
|
|
|
databases and tables that are removed from the database server while
|
|
|
|
pg_autovacuum is running. Overhead is fairly small per object. For
|
|
|
|
example: 10 databases with 10 tables each appears to less than 10k of
|
|
|
|
memory on my Linux box.
|