On reflection this method seems to be exposing an unreasonable amount of
implementation detail. It wouldn't matter when talking to a remote server
of the identical Postgres version, but it seems likely to make things worse
not better if the remote is a different version with different casting
infrastructure. Instead adopt ruleutils.c's policy of regurgitating the
cast as it was originally specified; including not showing it at all, if
it was implicit to start with. (We must do that because for some datatypes
explicit and implicit casts have different semantics.)
The only place we depended on that was in sending numeric type OIDs in
PQexecParams; but we can replace that usage with explicitly casting
each Param symbol in the query string, so that the types are specified
to the remote by name not OID. This makes no immediate difference but
will be essential if we ever hope to support use of non-builtin types.
Set the remote session's search path to exactly "pg_catalog" at session
start, then schema-qualify only names that aren't in that schema. This
greatly reduces clutter in the generated SQL commands, as seen in the
regression test changes. Per discussion.
Also, rethink use of FirstNormalObjectId as the "built-in object" cutoff
--- FirstBootstrapObjectId is safer, since the former will accept
objects in information_schema for instance.
There's still a lot of room for improvement, but it basically works,
and we need this to be present before we can do anything much with the
writable-foreign-tables patch. So let's commit it and get on with testing.
Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane
If users create tablespaces inside the old cluster directory, it is
impossible for the delete script to delete _only_ the old cluster files,
so don't create a script in that case, and issue a message to the user.
Cases such as similarity('', '') produced a NaN result due to computing
0/0. Per discussion, make it return zero instead.
This appears to be the basic cause of bug #7867 from Michele Baravalle,
although it remains unclear why her installation doesn't think Cyrillic
letters are letters.
Back-patch to all active branches.
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines. We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.
The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.
At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly. To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc(). This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.
This lets us clean up some places that were already with
localized hacks.
Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that. libpgcommon infrastructure was authored by Álvaro.
The previous coding supposed that the first differing bytes in two varlena
datums must have the same sign difference as their overall comparison
result. This is obviously bogus for text strings in non-C locales, and
probably wrong for numeric, and even for bytea I think it was wrong on
machines where char is signed. When the assumption failed, the function
could deliver a zero or negative penalty in situations where such a result
is quite ridiculous, leading the core GiST code to make very bad page-split
decisions.
To fix, take the absolute values of the byte-level differences. Also,
switch the code to using unsigned char not just char, so that the behavior
will be consistent whether char is signed or not.
Per investigation of a trouble report from Tomas Vondra. Back-patch to all
supported branches.
gbt_var_bin_union() failed to do the right thing when the existing range
needed to be widened at both ends rather than just one end. This could
result in an invalid index in which keys that are present would not be
found by searches, because the searches would not think they need to
descend to the relevant leaf pages. This error affected all the varlena
datatypes supported by btree_gist (text, bytea, bit, numeric).
Per investigation of a trouble report from Tomas Vondra. (There is also
an issue in gbt_var_penalty(), but that should only result in inefficiency
not wrong answers. I'm committing this separately so that we have a git
state in which it can be tested that bad penalty results don't produce
invalid indexes.) Back-patch to all supported branches.
The new option specifies length of aggregation interval (in
seconds). May be used only together with -l. With this option, the log
contains per-interval summary (number of transactions, min/max latency
and two additional fields useful for variance estimation).
Patch contributed by Tomas Vondra, reviewed by Pavel Stehule. Slight
change by Tatsuo Ishii, suggested by Robert Hass to emit an error
message indicating that the option is not currently supported on
Windows.
Beyond 21474, the number of accounts exceed the range for int4. Change the
initialization code to use bigint for account id columns when scale is large
enough, and switch to using int64s for the variables in pgbench code. The
threshold where we switch to bigints is set at 20000, because that's easier
to remember and document than 21474, and ensures that there is some headroom
when int4s are used.
Greg Smith, with various changes by Euler Taveira de Oliveira, Gurjeet
Singh and Satoshi Nagayasu.
This makes 9.3 -> 9.3 upgrades work when they cross the commit that
added persistent multixacts; early 9.3 pg_controldata did not have the
required oldestMultiXact line, and so would fail to upgrade.
per Bruce Momjian
When pg_upgrade can't find required pg_controldata information, report
_which_ cluster is failing, with this message:
The %s cluster lacks some required control information:
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com1290721684-sup-3951@alvh.no-ip.org1294953201-sup-2099@alvh.no-ip.org1320343602-sup-2290@alvh.no-ip.org1339690386-sup-8927@alvh.no-ip.org4FE5FF020200002500048A3D@gw.wicourts.gov4FEAB90A0200002500048B7D@gw.wicourts.gov
With AtEOXact applied, --single-transaction makes pg_restore slower, and
has the potential to require lock table configuration, so remove the
argument.
Per suggestion from Tom.
In commit 71450d7fd6, we added code to inform
suitably-intelligent compilers that ereport() doesn't return if the elevel
is ERROR or higher. This patch extends that to elog(), and also fixes a
double-evaluation hazard that the previous commit created in ereport(),
as well as reducing the emitted code size.
The elog() improvement requires the compiler to support __VA_ARGS__, which
should be available in just about anything nowadays since it's required by
C99. But our minimum language baseline is still C89, so add a configure
test for that.
The previous commit assumed that ereport's elevel could be evaluated twice,
which isn't terribly safe --- there are already counterexamples in xlog.c.
On compilers that have __builtin_constant_p, we can use that to protect the
second test, since there's no possible optimization gain if the compiler
doesn't know the value of elevel. Otherwise, use a local variable inside
the macros to prevent double evaluation. The local-variable solution is
inferior because (a) it leads to useless code being emitted when elevel
isn't constant, and (b) it increases the optimization level needed for the
compiler to recognize that subsequent code is unreachable. But it seems
better than not teaching non-gcc compilers about unreachability at all.
Lastly, if the compiler has __builtin_unreachable(), we can use that
instead of abort(), resulting in a noticeable code savings since no
function call is actually emitted. However, it seems wise to do this only
in non-assert builds. In an assert build, continue to use abort(), so that
the behavior will be predictable and debuggable if the "impossible"
happens.
These changes involve making the ereport and elog macros emit do-while
statement blocks not just expressions, which forces small changes in
a few call sites.
Andres Freund, Tom Lane, Heikki Linnakangas
This is now used by ecpg tests, and not clobbered by pg_upgrade
tests. This change won't affect anything that doesn't set this
environment variable, but will enable the buildfarm to control
exactly what port regression test installs will be running on,
and thus to detect possible rogue postmasters more easily.
Backpatch to release 9.2 where EXTRA_REGRESS_OPTS was first used.
(-i), producing only one progress message per 5 seconds along with
elapsed time and estimated remaining time. Also add elapsed time and
estimated remaining time to the default logging(prints one message
each 100000 rows).
Patch contributed by Tomas Vondra, reviewed by Jeevan Chalke and
Tatsuo Ishii.
On non-Windows machines, we use the Unix socket for connections to test
postmasters, so there is no need to create a TCP socket. Furthermore,
doing so causes failures due to port conflicts if two builds are carried
out concurrently on one machine. (If the builds are done in different
chroots, which is standard practice at least in Red Hat distros, there
is no risk of conflict on the Unix socket.) Suppressing the TCP socket
by setting listen_addresses to empty has long been standard practice
for pg_regress, and pg_upgrade knows about this too ... but pg_upgrade's
test.sh didn't get the memo.
Back-patch to 9.2, and also sync the 9.2 version of the script with HEAD
as much as practical.
Because the client encoding might not match the server encoding,
pg_upgrade can't allocate NAMEDATALEN bytes for storage of database,
relation, and namespace identifiers. Instead pg_strdup() the memory and
free it.
Also add C comment in initdb.c about safe NAMEDATALEN usage.
All versions of pg_upgrade upgraded invalid indexes caused by CREATE
INDEX CONCURRENTLY failures and marked them as valid. The patch adds a
check to all pg_upgrade versions and throws an error during upgrade or
--check.
Backpatch to 9.2, 9.1, 9.0. Patch slightly adjusted.
Normally each module is tested in a database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This new mode, enabled by adding USE_MODULE_DB=1 to the make command
line, runs most modules in a database with the module name embedded in
it.
This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.
Second attempt at this, this time accomodating make versions older
than 3.82.
Still to be done: adapt to the MSVC build system.
Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.
Pg_upgrade displays file names during copy and database names during
dump/restore. Andrew Dunstan identified three bugs:
* long file names were being truncated to 60 _leading_ characters, which
often do not change for long file names
* file names were truncated to 60 characters in log files
* carriage returns were being output to log files
This commit fixes these --- it prints 60 _trailing_ characters to the
status display, and full path names without carriage returns to log
files. It also suppresses status output to the log file unless verbose
mode is used.
Background workers are postmaster subprocesses that run arbitrary
user-specified code. They can request shared memory access as well as
backend database connections; or they can just use plain libpq frontend
database connections.
Modules listed in shared_preload_libraries can register background
workers in their _PG_init() function; this is early enough that it's not
necessary to provide an extra GUC option, because the necessary extra
resources can be allocated early on. Modules can install more than one
bgworker, if necessary.
Care is taken that these extra processes do not interfere with other
postmaster tasks: only one such process is started on each ServerLoop
iteration. This means a large number of them could be waiting to be
started up and postmaster is still able to quickly service external
connection requests. Also, shutdown sequence should not be impacted by
a worker process that's reasonably well behaved (i.e. promptly responds
to termination signals.)
The current implementation lets worker processes specify their start
time, i.e. at what point in the server startup process they are to be
started: right after postmaster start (in which case they mustn't ask
for shared memory access), when consistent state has been reached
(useful during recovery in a HOT standby server), or when recovery has
terminated (i.e. when normal backends are allowed).
In case of a bgworker crash, actions to take depend on registration
data: if shared memory was requested, then all other connections are
taken down (as well as other bgworkers), just like it were a regular
backend crashing. The bgworker itself is restarted, too, within a
configurable timeframe (which can be configured to be never).
More features to add to this framework can be imagined without much
effort, and have been discussed, but this seems good enough as a useful
unit already.
An elementary sample module is supplied.
Author: Álvaro Herrera
This patch is loosely based on prior patches submitted by KaiGai Kohei,
and unsubmitted code by Simon Riggs.
Reviewed by: KaiGai Kohei, Markus Wanner, Andres Freund,
Heikki Linnakangas, Simon Riggs, Amit Kapila
storage.
Have pg_upgrade use it, and enable server options fsync=off and
full_page_writes=off.
Document that users turning fsync from off to on should run initdb
--sync-only.
[ Previous commit was incorrectly applied as a git merge. ]
Normally each module is tested in aq database named contrib_regression,
which is dropped and recreated at the beginhning of each pg_regress run.
This mode, enabled by adding USE_MODULE_DB=1 to the make command line,
runs most modules in a database with the module name embedded in it.
This will make testing pg_upgrade on clusters with the contrib modules
a lot easier.
Still to be done: adapt to the MSVC build system.
Backpatch to 9.0, which is the earliest version it is reasonably possible
to test upgrading from.