2003-03-21 02:14:46 +08:00
|
|
|
pg_autovacuum README
|
2003-06-12 09:36:44 +08:00
|
|
|
--------------------
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
pg_autovacuum is a libpq client program that monitors all the
|
|
|
|
databases associated with a postgresql server. It uses the stats
|
|
|
|
collector to monitor insert, update and delete activity.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
When a table exceeds its insert or delete threshold (more detail
|
|
|
|
on thresholds below) then that table will be vacuumed or analyzed.
|
|
|
|
|
|
|
|
This allows postgresql to keep the fsm and table statistics up to
|
|
|
|
date, and eliminates the need to schedule periodic vacuums.
|
|
|
|
|
|
|
|
The primary benefit of pg_autovacuum is that the FSM and table
|
|
|
|
statistic information are updated as needed. When a table is actively
|
|
|
|
changing, pg_autovacuum will perform the necessary vacuums and
|
|
|
|
analyzes, whereas if a table remains static, no cycles will be wasted
|
|
|
|
performing unnecessary vacuums/analyzes.
|
|
|
|
|
|
|
|
A secondary benefit of pg_autovacuum is that it ensures that a
|
|
|
|
database wide vacuum is performed prior to xid wraparound. This is an
|
|
|
|
important, if rare, problem, as failing to do so can result in major
|
|
|
|
data loss.
|
|
|
|
|
|
|
|
|
|
|
|
KNOWN ISSUES:
|
|
|
|
-------------
|
|
|
|
pg_autovacuum has been tested under Redhat Linux (by me) and Solaris (by
|
|
|
|
Christopher B. Browne) and all known bugs have been resolved. Please report
|
|
|
|
any problems to the hackers list.
|
|
|
|
|
|
|
|
pg_autovacuum does not get started automatically by either the postmaster or
|
|
|
|
by pg_ctl. Along the sames lines, when the postmaster exits no one tells
|
|
|
|
pg_autovacuum. The result is that at the start of the next loop,
|
|
|
|
pg_autovacuum fails to connect to the server and exits. Any time it fails
|
|
|
|
to connect pg_autovacuum exits.
|
|
|
|
|
|
|
|
pg_autovacuum requires that the stats system be enabled and reporting row
|
|
|
|
level stats. The overhead of the stats system has been shown to be
|
|
|
|
significant under certain workloads. For instance a tight loop of queries
|
|
|
|
performing "select 1" was nearly 30% slower with stats enabled. However,
|
|
|
|
in practice with more realistic workloads, the stats system overhead is
|
|
|
|
usually nominal.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
|
|
|
|
INSTALL:
|
2003-06-12 09:36:44 +08:00
|
|
|
--------
|
|
|
|
|
|
|
|
As of postgresql v7.4 pg_autovacuum is included in the main source tree
|
|
|
|
under contrib. Therefore you just make && make install (similar to most other
|
|
|
|
contrib modules) and it will be installed for you.
|
|
|
|
|
|
|
|
If you are using an earlier version of postgresql just uncompress the tar.gz
|
|
|
|
into the contrib directory and modify the contrib/Makefile to include the pg_autovacuum
|
|
|
|
directory. pg_autovacuum will then be made as part of the standard
|
|
|
|
postgresql install.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
make sure that the folowing are set in postgresql.conf
|
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
stats_start_collector = true
|
|
|
|
stats_row_level = true
|
|
|
|
|
|
|
|
start up the postmaster, then execute the pg_autovacuum executable.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
|
|
|
|
Command line arguments:
|
2003-06-12 09:36:44 +08:00
|
|
|
-----------------------
|
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
pg_autovacuum has the following optional arguments:
|
2003-06-12 09:36:44 +08:00
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
-d debug: 0 silent, 1 basic info, 2 more debug info, etc...
|
2003-06-12 09:36:44 +08:00
|
|
|
-D dameonize: Detach from tty and run in background.
|
2003-03-21 02:14:46 +08:00
|
|
|
-s sleep base value: see "Sleeping" below.
|
|
|
|
-S sleep scaling factor: see "Sleeping" below.
|
2003-06-12 09:36:44 +08:00
|
|
|
-v vacuum base threshold: see Vacuum and Analyze.
|
|
|
|
-V vacuum scaling factor: see Vacuum and Analyze.
|
|
|
|
-a analyze base threshold: see Vacuum and Analyze.
|
|
|
|
-A analyze scaling factor: see Vacuum and Analyze.
|
|
|
|
-L log file: Name of file to which output is submitted, otherwise STDERR
|
|
|
|
-U username: Username pg_autovacuum will use to connect with, if not
|
|
|
|
specified the current username is used.
|
2003-03-21 02:14:46 +08:00
|
|
|
-P password: Password pg_autovacuum will use to connect with.
|
|
|
|
-H host: host name or IP to connect too.
|
|
|
|
-p port: port used for connection.
|
|
|
|
-h help: list of command line options.
|
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
All arguments have default values defined in pg_autovacuum.h. At the
|
|
|
|
time of writing they are:
|
|
|
|
|
|
|
|
-d 1
|
|
|
|
-v 1000
|
|
|
|
-V 2
|
|
|
|
-a 500 (half of -v is not specified)
|
|
|
|
-A 1 (half of -v is not specified)
|
|
|
|
-s 300 (5 minutes)
|
|
|
|
-S 2
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
|
|
|
|
Vacuum and Analyze:
|
2003-06-12 09:36:44 +08:00
|
|
|
-------------------
|
|
|
|
|
|
|
|
pg_autovacuum performs either a vacuum analyze or just analyze depending
|
|
|
|
on the quantity and type of table activity (insert, update, or delete):
|
|
|
|
|
|
|
|
- If the number of (inserts + updates + deletes) > AnalyzeThreshold, then
|
|
|
|
only an analyze is performed.
|
|
|
|
|
|
|
|
- If the number of (deletes + updates ) > VacuumThreshold, then a
|
|
|
|
vacuum analyze is performed.
|
|
|
|
|
|
|
|
deleteThreshold is equal to:
|
|
|
|
vacuum_base_value + (vacuum_scaling_factor * "number of tuples in the table")
|
|
|
|
|
|
|
|
insertThreshold is equal to:
|
|
|
|
analyze_base_value + (analyze_scaling_factor * "number of tuples in the table")
|
|
|
|
|
|
|
|
The AnalyzeThreshold defaults to half of the VacuumThreshold since it
|
|
|
|
represents a much less expensive operation (approx 5%-10% of vacuum), and
|
|
|
|
running it more often should not substantially degrade system performance.
|
2003-03-21 02:14:46 +08:00
|
|
|
|
|
|
|
Sleeping:
|
2003-06-12 09:36:44 +08:00
|
|
|
---------
|
|
|
|
|
|
|
|
pg_autovacuum sleeps for a while after it is done checking all the
|
|
|
|
databases. It does this in order to limit the amount of system
|
|
|
|
resources it consumes. This also allows the system administrator to
|
|
|
|
configure pg_autovacuum to be more or less aggressive.
|
|
|
|
|
|
|
|
Reducing the sleep time will cause pg_autovacuum to respond more
|
|
|
|
quickly to changes, whether they be database addition/removal, table
|
|
|
|
addition/removal, or just normal table activity.
|
|
|
|
|
|
|
|
On the other hand, setting pg_autovaccum to sleep values to agressivly
|
|
|
|
(for too short a period of time) can have a negative effect on server
|
|
|
|
performance. If a table gets vacuumed 5 times during the course of a
|
|
|
|
large update, this is likely to take much longer than if the table was
|
|
|
|
vacuumed only once, at the end.
|
|
|
|
|
2003-03-21 02:14:46 +08:00
|
|
|
The total time it sleeps is equal to:
|
|
|
|
|
2003-06-12 09:36:44 +08:00
|
|
|
base_sleep_value + sleep_scaling_factor * "duration of the previous
|
|
|
|
loop"
|
|
|
|
|
|
|
|
Note that timing measurements are made in seconds; specifying
|
|
|
|
"pg_vacuum -s 1" means pg_autovacuum could poll the database upto 60 times
|
|
|
|
minute. In a system with large tables where vacuums may run for several
|
|
|
|
minutes, longer times between vacuums are likely to be appropriate.
|
|
|
|
|
|
|
|
What pg_autovacuum monitors:
|
|
|
|
----------------------------
|
|
|
|
|
|
|
|
pg_autovacuum dynamically generates a list of all databases and tables that
|
|
|
|
exist on the server. It will dynamically add and remove databases and
|
|
|
|
tables that are removed from the database server while pg_autovacuum is
|
|
|
|
running. Overhead is fairly small per object. For example: 10 databases
|
|
|
|
with 10 tables each appears to less than 10k of memory on my Linux box.
|