Arrange for postmaster child processes to respond to two environment
variables, PG_OOM_ADJUST_FILE and PG_OOM_ADJUST_VALUE, to determine whether
they reset their OOM score adjustments and if so to what. This is superior
to the previous design involving #ifdef's in several ways. The behavior is
now available in a default build, and both ends of the adjustment --- the
original adjustment of the postmaster's level and the subsequent
readjustment by child processes --- can now be controlled in one place,
namely the postmaster launch script. So it's no longer necessary for the
launch script to act on faith that the server was compiled with the
appropriate options. In addition, if someone wants to use an OOM score
other than zero for the child processes, that doesn't take a recompile
anymore; and we no longer have to cater separately to the two different
historical kernel APIs for this adjustment.
Gurjeet Singh, somewhat revised by me
This SQL-standard feature allows a sub-SELECT yielding multiple columns
(but only one row) to be used to compute the new values of several columns
to be updated. While the same results can be had with an independent
sub-SELECT per column, such a workaround can require a great deal of
duplicated computation.
The standard actually says that the source for a multi-column assignment
could be any row-valued expression. The implementation used here is
tightly tied to our existing sub-SELECT support and can't handle other
cases; the Bison grammar would have some issues with them too. However,
I don't feel too bad about this since other cases can be converted into
sub-SELECTs. For instance, "SET (a,b,c) = row_valued_function(x)" could
be written "SET (a,b,c) = (SELECT * FROM row_valued_function(x))".
Since most of the system thinks AND and OR are N-argument expressions
anyway, let's have the grammar generate a representation of that form when
dealing with input like "x AND y AND z AND ...", rather than generating
a deeply-nested binary tree that just has to be flattened later by the
planner. This avoids stack overflow in parse analysis when dealing with
queries having more than a few thousand such clauses; and in any case it
removes some rather unsightly inconsistencies, since some parts of parse
analysis were generating N-argument ANDs/ORs already.
It's still possible to get a stack overflow with weirdly parenthesized
input, such as "x AND (y AND (z AND ( ... )))", but such cases are not
mainstream usage. The maximum depth of parenthesization is already
limited by Bison's stack in such cases, anyway, so that the limit is
probably fairly platform-independent.
Patch originally by Gurjeet Singh, heavily revised by me
Any OS user able to access the socket can connect as the bootstrap
superuser and proceed to execute arbitrary code as the OS user running
the test. Protect against that by placing the socket in a temporary,
mode-0700 subdirectory of /tmp. The pg_regress-based test suites and
the pg_upgrade test suite were vulnerable; the $(prove_check)-based test
suites were already secure. Back-patch to 8.4 (all supported versions).
The hazard remains wherever the temporary cluster accepts TCP
connections, notably on Windows.
As a convenient side effect, this lets testing proceed smoothly in
builds that override DEFAULT_PGSOCKET_DIR. Popular non-default values
like /var/run/postgresql are often unwritable to the build user.
Security: CVE-2014-0067
187492b6c2 changed pgstat.c so that
the stats files were saved into $PGDATA/pg_stat directory when the server
was shutdowned. But it accidentally forgot to change the location of
pg_stat_statements permanent stats file. This commit fixes pg_stat_statements
so that its stats file is also saved into $PGDATA/pg_stat at shutdown.
Since this fix changes the file layout, we don't back-patch it to 9.3
where this oversight was introduced.
The original coding in contrib/uuid-ossp created and destroyed a uuid_t
object (or, in some cases, even two of them) each time it was called.
This is not the intended usage: you're supposed to keep the uuid_t object
around so that the library can cache its state across uses. (Other UUID
libraries seem to keep equivalent state behind-the-scenes in static
variables, but OSSP chose differently.) Aside from being quite inefficient,
creating a new uuid_t loses knowledge of the previously generated UUID,
which in theory could result in duplicate V1-style UUIDs being created
on sufficiently fast machines.
On at least some platforms, creating a new uuid_t also draws some entropy
from /dev/urandom, leaving less for the rest of the system. This seems
sufficiently unpleasant to justify back-patching this change.
The previous version of these tests expected uuid_generate_v1() to always
emit MAC addresses with the local-admin and multicast address bits zero.
However, several of the buildfarm critters are reporting values with the
local-admin bit set. (Perhaps they're running inside VMs or jails.)
And a couple are reporting values with the multicast bit set, probably
meaning that the UUID library couldn't read the system MAC address.
Also, it emerges that if OSSP UUID can't read the system MAC address, it
falls back to V1MC behavior wherein the whole node field gets randomized
each time, breaking the test that expected the node field to remain stable
in V1 output. (It looks like e2fs doesn't behave that way, though.)
It's not entirely clear why we can't get a system MAC address, since the
buildfarm scripts would not work without internet access. Nonetheless,
the regression tests had better cope with the case, so adjust the tests
to expect these behaviors.
This reverts commit 45b7abe59e.
It turns out that the %name-prefix syntax without "=" does not work
at all in pre-2.4 Bison. We are not prepared to make such a large
jump in minimum required Bison version just to suppress a warning
message in a version hardly any developers are using yet.
When 3.0 gets more popular, we'll figure out a way to deal with this.
In the meantime, BISONFLAGS=-Wno-deprecated is recommendable for
anyone using 3.0 who doesn't want to see the warning.
%name-prefix doesn't use an "=" sign according to the Bison docs, but it
silently accepted one anyway, until Bison 3.0. This was originally a
typo of mine in commit 012abebab1, and we
seem to have slavishly copied the error into all the other grammar files.
Per report from Vik Fearing; analysis by Peter Eisentraut.
Back-patch to all active branches, since somebody might try to build
a back branch with up-to-date tools.
On reflection, the timestamp-advances test might fail if we're unlucky
enough for the time_mid field to change between two calls, since uuid_cmp
is just bytewise comparison and the field ordering has more significant
fields later. Build some field extraction functions so we can do a more
honest test of that. Also check that the version and reserved fields
contain what they should.
The V5 (SHA1 hashing) code wrote 20 bytes into a 16-byte local variable.
This had accidentally failed to fail in my testing and Matteo's, but
buildfarm results exposed the problem.
Allow the contrib/uuid-ossp extension to be built atop any one of these
three popular UUID libraries. (The extension's name is now arguably a
misnomer, but we'll keep it the same so as not to cause unnecessary
compatibility issues for users.)
We would not normally consider a change like this post-beta1, but the issue
has been forced by our upgrade to autoconf 2.69, whose more rigorous header
checks are causing OSSP's header files to be rejected on some platforms.
It's been foreseen for some time that we'd have to move away from depending
on OSSP UUID due to lack of upstream maintenance, so this is a down payment
on that problem.
While at it, add some simple regression tests, in hopes of catching any
major incompatibilities between the three implementations.
Matteo Beccati, with some further hacking by me
Commit 090d0f2050 added new code showing
how it can be useful to set bgw_notify_pid to a non-zero value, but it
failed to make sure that the existing call to RegisterBackgroundWorker
initialized the new field at all.
Report and patch by Shigeru Hanada.
On Mingw, it seems that scanf() doesn't necessarily accept the same format
codes that printf() does, and in particular it may fail to recognize %llu
even though printf() does. Since configure only probes printf() behavior
while setting up the INT64_FORMAT macros, this means it's unsafe to use
those macros with scanf(). We had only one instance of such a coding
pattern, in contrib/pg_stat_statements, so change that code to avoid
the problem.
Per buildfarm warnings. Back-patch to 9.0 where the troublesome code
was introduced.
Michael Paquier
Change the total-transactions counters from int32 to int64 to accommodate
cases where we do more than 2^31 transactions during a run. This patch
does not change the INT_MAX limit on explicit "-t" parameters, but it
does allow the product of the -t and -c parameters to exceed INT_MAX, or
allow a -T limit that is large enough that more than 2^31 transactions
can be completed. While pgbench did not actually fail in such cases,
it did print an incorrect total-transactions count, and some of the
derived numbers such as TPS would have been wrong as well.
Tomas Vondra
C89 says that compound initializers may only contain constant expressions;
a restriction violated by commit 89d00cbe. While we've had no actual field
complaints about this, C89 is still the project standard, and it's not
saving all that much code to break compatibility here. So let's adhere to
the old restriction.
In passing, replace a bunch of hardwired constants "256" with
sizeof(target-variable), just because the latter is more readable and
less breakable. And const-ify where possible.
Back-patch to 9.3 where the nonportable code was added.
Andres Freund and Tom Lane
gbt_macad_union also allocated 12-byte structs where we really need 16.
Per report from Andres Freund. No back-patch since there's no current
risk of a real problem.
The macaddr opclass stores two macaddr structs (each of size 6) in an
index column that's declared as being of type gbtreekey16, ie 16 bytes.
In the original coding this led to passing a palloc'd value of size 12
to the index insertion code, so that data would be fetched past the
end of the allocated value during index tuple construction. This makes
valgrind unhappy. In principle it could result in a SIGSEGV, though
with the current implementation of palloc there's no risk since
the 12-byte request size would be rounded up to 16 bytes anyway.
To fix, add a field to struct gbtree_ninfo showing the declared size of
the index datums, and use that in the palloc requests; and use palloc0
to be sure that any wasted bytes are cleanly initialized.
Per report from Andres Freund. No back-patch since there's no current
risk of a real problem.
pg_stat_replication shows connected replication clients. The ddl test case
never has any replication clients connected, so querying pg_stat_replication
is pointless. To check that a slot has been dropped correctly, query
pg_replication_slots instead.
Andres Freund
The code expands a varbit gist leaf key to a node key by copying the bit
data twice in a varlen datum, as both the lower and upper key. The lower key
was expanded to INTALIGN size, but the padding bytes were not initialized.
That's a problem because when the lower/upper keys are compared, the padding
bytes are used compared too, when the values are otherwise equal. That could
lead to incorrect query results.
REINDEX is advised for any btree_gist indexes on bit or bit varying data
type, to fix any garbage padding bytes on disk.
Per Valgrind, reported by Andres Freund. Backpatch to all supported
versions.
It's easy to forget using SYSTEMQUOTEs when constructing command strings
for system() or popen(). Even if we fix all the places missing it now, it is
bound to be forgotten again in the future. Introduce wrapper functions that
do the the extra quoting for you, and get rid of SYSTEMQUOTEs in all the
callers.
We previosly used SYSTEMQUOTEs in all the hard-coded command strings, and
this doesn't change the behavior of those. But user-supplied commands, like
archive_command, restore_command, COPY TO/FROM PROGRAM calls, as well as
pgbench's \shell, will now gain an extra pair of quotes. That is desirable,
but if you have existing scripts or config files that include an extra
pair of quotes, those might need to be adjusted.
Reviewed by Amit Kapila and Tom Lane
Commit a730183926 created rather a mess by
putting dependencies on backend-only include files into include/common.
We really shouldn't do that. To clean it up:
* Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in
catalog/catalog.h. We won't consider this symbol part of the FE/BE API.
* Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider
relpath.h as the source of truth for fork numbers, since relpath.c was
already partially serving that function, and anyway relfilenode.h was
kind of a random place for that enum.
* So, relfilenode.h now includes relpath.h rather than vice-versa. This
direction of dependency is fine. (That allows most, but not quite all,
of the existing explicit #includes of relpath.h to go away again.)
* Push forkname_to_number from catalog.c to relpath.c, just to centralize
fork number stuff a bit better.
* Push GetDatabasePath from catalog.c to relpath.c; it was rather odd
that the previous commit didn't keep this together with relpath().
* To avoid needing relfilenode.h in common/, redefine the underlying
function (now called GetRelationPath) as taking separate OID arguments,
and make the APIs using RelFileNode or RelFileNodeBackend into macro
wrappers. (The macros have a potential multiple-eval risk, but none of
the existing call sites have an issue with that; one of them had such a
risk already anyway.)
* Fix failure to follow the directions when "init" fork type was added;
specifically, the errhint in forkname_to_number wasn't updated, and neither
was the SGML documentation for pg_relation_size().
* Fix tablespace-path-too-long check in CreateTableSpace() to account for
fork-name component of maximum-length pathnames. This requires putting
FORKNAMECHARS into a header file, but it was rather useless (and
actually unreferenced) where it was.
The last couple of items are potentially back-patchable bug fixes,
if anyone is sufficiently excited about them; but personally I'm not.
Per a gripe from Christoph Berg about how include/common wasn't
self-contained.
Some popen() calls were missing SYSTEMQUOTEs, which caused initdb and
pg_upgrade to fail on Windows, if the installation path contained both
spaces and @ signs.
Patch by Nikhil Deshpande. Backpatch to all supported versions.
pgss_post_parse_analyze() neglected to pass the call on to any earlier
occupant of the post_parse_analyze_hook. There are no other users of that
hook in contrib/, and most likely none in the wild either, so this is
probably just a latent bug. But it's a bug nonetheless, so back-patch
to 9.2 where this code was introduced.
Because of gcc -Wmissing-prototypes, all functions in dynamically
loadable modules must have a separate prototype declaration. This is
meant to detect global functions that are not declared in header files,
but in cases where the function is called via dfmgr, this is redundant.
Besides filling up space with boilerplate, this is a frequent source of
compiler warnings in extension modules.
We can fix that by creating the function prototype as part of the
PG_FUNCTION_INFO_V1 macro, which such modules have to use anyway. That
makes the code of modules cleaner, because there is one less place where
the entry points have to be listed, and creates an additional check that
functions have the right prototype.
Remove now redundant prototypes from contrib and other modules.
Specifically, on-stack memset() might be removed, so:
* Replace memset() with px_memset()
* Add px_memset to copy_crlf()
* Add px_memset to pgp-s2k.c
Patch by Marko Kreen
Report by PVS-Studio
Backpatch through 8.4.
Non-existent tablespace directory references can occur if user
tablespaces are created inside data directories and the data directory
is renamed in preparation for running pg_upgrade, and the symbolic links
are not updated.
Backpatch to 9.3.
We were emitting "(SELECT null::typename)", which is usually interpreted
as a scalar subselect, but not so much in the context "x = ANY(...)".
This led to remote-side parsing failures when remote_estimate is enabled.
A quick and ugly fix is to stick in an extra cast step,
"((SELECT null::typename)::typename)". The cast will be thrown away as
redundant by parse analysis, but not before it's done its job of making
sure the grammar sees the ANY argument as an a_expr rather than a
select_with_parens. Per an example from Hannu Krosing.
Add vacuumdb option --analyze-in-stages which runs ANALYZE three times
with different configuration settings, adopting the logic from the
analyze_new_cluster.sh script that pg_upgrade generates. That way,
users of pg_dump/pg_restore can also use that functionality.
Change pg_upgrade to create the script so that it calls vacuumdb instead
of implementing the logic itself.
When extracting trigrams from a regular expression for search of a GIN or
GIST trigram index, it's useful to penalize (preferentially discard)
trigrams that contain whitespace, since those are typically far more common
in the index than trigrams not containing whitespace. Of course, this
should only be a preference not a hard rule, since we might otherwise end
up with no trigrams to search for. The previous coding tended to produce
fairly inefficient trigram search sets for anchored regexp patterns, as
reported by Erik Rijkers. This patch penalizes whitespace-containing
trigrams, and also reduces the target number of extracted trigrams, since
experience suggests that the original coding tended to select too many
trigrams to search for.
Alexander Korotkov, reviewed by Tom Lane
For variadic functions (other than VARIADIC ANY), the syntaxes foo(x,y,...)
and foo(VARIADIC ARRAY[x,y,...]) should be considered equivalent, since the
former is converted to the latter at parse time. They have indeed been
equivalent, in all releases before 9.3. However, commit 75b39e790 made an
ill-considered decision to record which syntax had been used in FuncExpr
nodes, and then to make equal() test that in checking node equality ---
which caused the syntaxes to not be seen as equivalent by the planner.
This is the underlying cause of bug #9817 from Dmitry Ryabov.
It might seem that a quick fix would be to make equal() disregard
FuncExpr.funcvariadic, but the same commit made that untenable, because
the field actually *is* semantically significant for some VARIADIC ANY
functions. This patch instead adopts the approach of redefining
funcvariadic (and aggvariadic, in HEAD) as meaning that the last argument
is a variadic array, whether it got that way by parser intervention or was
supplied explicitly by the user. Therefore the value will always be true
for non-ANY variadic functions, restoring the principle of equivalence.
(However, the planner will continue to consider use of VARIADIC as a
meaningful difference for VARIADIC ANY functions, even though some such
functions might disregard it.)
In HEAD, this change lets us simplify the decompilation logic in
ruleutils.c, since the funcvariadic/aggvariadic flag tells directly whether
to print VARIADIC. However, in 9.3 we have to continue to cope with
existing stored rules/views that might contain the previous definition.
Fortunately, this just means no change in ruleutils.c, since its existing
behavior effectively ignores funcvariadic for all cases other than VARIADIC
ANY functions.
In HEAD, bump catversion to reflect the fact that FuncExpr.funcvariadic
changed meanings; this is sort of pro forma, since I don't believe any
built-in views are affected.
Unfortunately, this patch doesn't magically fix everything for affected
9.3 users. After installing 9.3.5, they might need to recreate their
rules/views/indexes containing variadic function calls in order to get
everything consistent with the new definition. As in the cited bug,
the symptom of a problem would be failure to use a nominally matching
index that has a variadic function call in its definition. We'll need
to mention this in the 9.3.5 release notes.
contrib/test_decoding's "make check" runs two sets of tests. Unless we
specify separate output directories for each set the isolation tests
will overwrite the output from the normal regression set. Doing this
will help the buildfarm collect complete logs.
Any OS user able to access the socket can connect as the bootstrap
superuser and in turn execute arbitrary code as the OS user running the
test. Protect against that by placing the socket in the temporary data
directory, which has mode 0700 thanks to initdb. Back-patch to 8.4 (all
supported versions). The hazard remains wherever the temporary cluster
accepts TCP connections, notably on Windows.
Attempts to run "make check" from a directory with a long name will now
fail. An alternative not sharing that problem was to place the socket
in a subdirectory of /tmp, but that is only secure if /tmp is sticky.
The PG_REGRESS_SOCK_DIR environment variable is available as a
workaround when testing from long directory paths.
As a convenient side effect, this lets testing proceed smoothly in
builds that override DEFAULT_PGSOCKET_DIR. Popular non-default values
like /var/run/postgresql are often unwritable to the build user.
Security: CVE-2014-0067
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
Authors: Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
This covers all the SQL-standard trigger types supported for regular
tables; it does not cover constraint triggers. The approach for
acquiring the old row mirrors that for view INSTEAD OF triggers. For
AFTER ROW triggers, we spool the foreign tuples to a tuplestore.
This changes the FDW API contract; when deciding which columns to
populate in the slot returned from data modification callbacks, writable
FDWs will need to check for AFTER ROW triggers in addition to checking
for a RETURNING clause.
In support of the feature addition, refactor the TriggerFlags bits and
the assembly of old tuples in ModifyTable.
Ronan Dunklau, reviewed by KaiGai Kohei; some additional hacking by me.
Clear errno before calling readdir() and handle old MinGW errno bug
while adding full test coverage for readdir/closedir failures.
Backpatch through 8.4.
I discovered the hard way that on some old shells, the locution
FOO="" unset FOO
does not behave the same as
FOO=""; unset FOO
and in fact leaves FOO set to an empty string. test.sh was inconsistently
spelling it different ways on adjacent lines.
This got broken relatively recently, in commit c737a2e56, so the lack of
field reports to date doesn't represent a lot of evidence that the problem
is rare.
In b89e151054 I had assumed it was ok to use anonymous unions as
struct members, but while a longstanding extension in many compilers,
it's only been standardized in C11.
To fix, remove one of the anonymous unions which tried to hide some
implementation specific enum values and give the other a name. The
latter unfortunately requires changes in output plugins, but since the
feature has only been added a few days ago...
Andres Freund
The previous coding supposed that it could consider just a single join
condition in any one parameterized path for the foreign table. But in
reality, the parameterized-path machinery forces all join clauses that are
"movable to" the foreign table to be evaluated at that node; including
clauses that we might not consider safe to send across. Such cases would
result in an Assert failure in an assert-enabled build, and otherwise in
sending an unsafe clause to the foreign server, which might result in
errors or silently-wrong answers. A lesser problem was that the
cost/rowcount estimates generated for the parameterized path failed to
account for any additional join quals that get assigned to the scan.
To fix, rewrite postgresGetForeignPaths so that it correctly collects all
the movable quals for any one outer relation when generating parameterized
paths; we'll now generate just one path per outer relation not one per join
qual. Also fix bogus assumptions in postgresGetForeignPlan and
estimate_path_cost_size that only safe-to-send join quals will be
presented.
Based on complaint from Etsuro Fujita that the path costs were being
miscalculated, though this is significantly different from his proposed
patch.
Commit 6f37c08057 removed whitespace
from the SQL file but not the expected-output file, and commit
7e8db2dc42 changed the error message
without updating the expected outputs.
This forces an input field containing the quoted null string to be
returned as a NULL. Without this option, only unquoted null strings
behave this way. This helps where some CSV producers insist on quoting
every field, whether or not it is needed. The option takes a list of
fields, and only applies to those columns. There is an equivalent
column-level option added to file_fdw.
Ian Barwick, with some tweaking by Andrew Dunstan, reviewed by Payal
Singh.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
Testing convert_to(..., 'ISO-8859-1') fails if there isn't a conversion
function available from the database encoding to ISO-8859-1. This has
been broken since day one, but the breakage was hidden by
pg_do_encoding_conversion's failure to complain, up till commit
49c817eab7.
Since the data being converted in this test is plain ASCII, no actual
conversion need happen (and if it did, it would prove little about citext
anyway). So that we still have some code coverage of the convert() family
of functions, let's switch to using convert_from, with SQL_ASCII as the
specified source encoding. Per buildfarm.
A large majority of the callers of pg_do_encoding_conversion were
specifying the database encoding as either source or target of the
conversion, meaning that we can use the less general functions
pg_any_to_server/pg_server_to_any instead.
The main advantage of using the latter functions is that they can make use
of a cached conversion-function lookup in the common case that the other
encoding is the current client_encoding. It's notationally cleaner too in
most cases, not least because of the historical artifact that the latter
functions use "char *" rather than "unsigned char *" in their APIs.
Note that pg_any_to_server will apply an encoding verification step in
some cases where pg_do_encoding_conversion would have just done nothing.
This seems to me to be a good idea at most of these call sites, though
it partially negates the performance benefit.
Per discussion of bug #9210.
The length of the output buffer was calculated based on the size of the
argument hstore. On a sizeof(int) == 4 platform and a huge argument, it
could overflow, causing a too small buffer to be allocated.
Refactor the function to use a StringInfo instead of pre-allocating the
buffer. Makes it shorter and more readable, too.
Coverity identified a number of places in which it couldn't prove that a
string being copied into a fixed-size buffer would fit. We believe that
most, perhaps all of these are in fact safe, or are copying data that is
coming from a trusted source so that any overrun is not really a security
issue. Nonetheless it seems prudent to forestall any risk by using
strlcpy() and similar functions.
Fixes by Peter Eisentraut and Jozef Mlich based on Coverity reports.
In addition, fix a potential null-pointer-dereference crash in
contrib/chkpass. The crypt(3) function is defined to return NULL on
failure, but chkpass.c didn't check for that before using the result.
The main practical case in which this could be an issue is if libc is
configured to refuse to execute unapproved hashing algorithms (e.g.,
"FIPS mode"). This ideally should've been a separate commit, but
since it touches code adjacent to one of the buffer overrun changes,
I included it in this commit to avoid last-minute merge issues.
This issue was reported by Honza Horak.
Security: CVE-2014-0065 for buffer overruns, CVE-2014-0066 for crypt()
Several functions, mostly type input functions, calculated an allocation
size such that the calculation wrapped to a small positive value when
arguments implied a sufficiently-large requirement. Writes past the end
of the inadvertent small allocation followed shortly thereafter.
Coverity identified the path_in() vulnerability; code inspection led to
the rest. In passing, add check_stack_depth() to prevent stack overflow
in related functions.
Back-patch to 8.4 (all supported versions). The non-comment hstore
changes touch code that did not exist in 8.4, so that part stops at 9.0.
Noah Misch and Heikki Linnakangas, reviewed by Tom Lane.
Security: CVE-2014-0064
We used to have externs for getopt() and its API variables scattered
all over the place. Now that we find we're going to need to tweak the
variable declarations for Cygwin, it seems like a good idea to have
just one place to tweak.
In this commit, the variables are declared "#ifndef HAVE_GETOPT_H".
That may or may not work everywhere, but we'll soon find out.
Andres Freund
postgres_fdw tended to say "unknown error" if it tried to execute a command
on an already-dead connection, because some paths in libpq just return a
null PGresult for such cases. Out-of-memory might result in that, too.
To fix, pass the PGconn to pgfdw_report_error, and look at its
PQerrorMessage() string if we can't get anything out of the PGresult.
Also, fix the transaction-exit logic to reliably drop a dead connection.
It was attempting to do that already, but it assumed that only connection
cache entries with xact_depth > 0 needed to be examined. The folly in that
is that if we fail while issuing START TRANSACTION, we'll not have bumped
xact_depth. (At least for the case I was testing, this fix masks the
other problem; but it still seems like a good idea to have the PGconn
fallback logic.)
Per investigation of bug #9087 from Craig Lucas. Backpatch to 9.3 where
this code was introduced.
The temporary statistics files don't need to be included in the backup
because they are always reset at the beginning of the archive recovery.
This patch changes pg_basebackup so that it skips all files located in
$PGDATA/pg_stat_tmp or the directory specified by stats_temp_directory
parameter.
WalSndKill was doing things exactly backwards: it should first clear
MyWalSnd (to stop signal handlers from touching MyWalSnd->latch),
then disown the latch, and only then mark the WalSnd struct unused by
clearing its pid field.
Also, WalRcvSigUsr1Handler and worker_spi_sighup failed to preserve
errno, which is surely a requirement for any signal handler.
Per discussion of recent buildfarm failures. Back-patch as far
as the relevant code exists.
The buildfarm says commit 58274728fb doesn't
work so well on Windows. This is because the encoding part of Windows
locale names can be just a code page number, eg "1252", which we don't
consider to be a valid encoding name. Add a check to accept encoding
parts that are case-insensitively string equal; this at least ensures
that the new code doesn't reject any cases that the old code allowed.
Even though the server tries to canonicalize stored locale names, the
platform often doesn't cooperate, so it's entirely possible that one DB
thinks its locale is, say, "en_US.UTF-8" while the other has "en_US.utf8".
Rather than failing, we should try to allow this where it's clearly OK.
There is already pretty robust encoding lookup in encnames.c, so make
use of that to compare the encoding parts of the names. The locale
identifier parts are just compared case-insensitively, which we were
already doing. The major problem known to exist in the field is variant
encoding-name spellings, so hopefully this will be Good Enough. If not,
we can try being even laxer.
Pavel Raiskup, reviewed by Rushabh Lathia
Thinko in error report (and a typo in the message text, too). We're
failing anyway, but it would be good to print something useful first.
Noted while reviewing a patch to make pg_upgrade's locale code laxer.
This change allows us to eliminate the previous limit on stored query
length, and it makes the shared-memory hash table very much smaller,
allowing more statements to be tracked. (The default value of
pg_stat_statements.max is therefore increased from 1000 to 5000.)
In typical scenarios, the hash table can be large enough to hold all the
statements commonly issued by an application, so that there is little
"churn" in the set of tracked statements, and thus little need to do I/O
to the file.
To further reduce the need for I/O to the query-texts file, add a way
to retrieve all the columns of the pg_stat_statements view except for
the query text column. This is probably not of much interest for human
use but it could be exploited by programs, which will prefer using the
queryid anyway.
Ordinarily, we'd need to bump the extension version number for the latter
change. But since we already advanced pg_stat_statements' version number
from 1.1 to 1.2 in the 9.4 development cycle, it seems all right to just
redefine what 1.2 means.
Peter Geoghegan, reviewed by Pavel Stehule
This makes it possible to store lwlocks as part of some other data
structure in the main shared memory segment, or in a dynamic shared
memory segment. There is still a main LWLock array and this patch does
not move anything out of it, but it provides necessary infrastructure
for doing that in the future.
This change is likely to increase the size of LWLockPadded on some
platforms, especially 32-bit platforms where it was previously only
16 bytes.
Patch by me. Review by Andres Freund and KaiGai Kohei.
GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.
To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.
This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.
Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
This function provides a way of generating version 4 (pseudorandom) UUIDs
based on pgcrypto's PRNG. The main reason for doing this is that the
OSSP UUID library depended on by contrib/uuid-ossp is becoming more and
more of a porting headache, so we need an alternative for people who can't
install that. A nice side benefit though is that this implementation is
noticeably faster than uuid-ossp's uuid_generate_v4() function.
Oskari Saarenmaa, reviewed by Emre Hasegeli
If --progress=2148 or higher was given, the calculation of the next time
to report overflowed, and pgbench would print a progress report very
frequently.
Kingter Wang
Kevin Gritter reports that his compiler complains about inq and outq
being possibly-uninitialized at the point where they are passed to
shm_mq_attach(). They are initialized by the call to
setup_dynamic_shared_memory, but apparently his compiler is inlining
that function and then having doubts about whether the for loop will
always execute at least once. Fix by initializing them to NULL.
This code is intended as a demonstration of how the dynamic shared
memory and dynamic background worker facilities can be used to establish
a group of coooperating processes which can coordinate their activities
using the shared memory message queue facility. By itself, the code
does nothing particularly interesting: it simply allows messages to
be passed through a loop of workers and back to the original process.
But it's a useful unit test, in addition to its demonstration value.
Allow for the possibility that folding a string to lower case makes it
longer (due to replacing a character with a longer multibyte character).
This doesn't change the number of trigrams that will be extracted, but
it does affect the required size of an intermediate buffer in
generate_trgm(). Per bug #8821 from Ufuk Kayserilioglu.
Also install some checks that the input string length is not so large
as to cause overflow in the calculations of palloc request sizes.
Back-patch to all supported versions.
pgp.h used to require including mbuf.h and px.h first. Include those in
pgp.h, so that it can be used without prerequisites. Remove mbuf.h
inclusions in .c files where mbuf.h features are not used
directly. (px.h was always used.)
Although these files get cleaned up if the test runs to completion,
a failure partway through leaves trash all over the floor. The Makefile
ought to be bright enough to get rid of it when you say "make clean".
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).
Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change. To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly. HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.
Robert Haas and Andres Freund
Previously, lookups of non-existent user names could return "Success";
it will now return "User does not exist" by resetting errno. This also
centralizes the user name lookup code in libpgport.
Report and analysis by Nicolas Marchildon; patch by me
Integer overflow showed minus percent and minus remaining time something like this.
239300000 of 3800000000 tuples (-48%) done (elapsed 226.86 s, remaining -696.10 s).
Previous commit e5de601267 modified dblink
to ensure client encoding matched the server. However the added
PQsetClientEncoding() call added significant overhead. Restore original
performance in the common case where client encoding already matches
server encoding by doing nothing in that case. Applies to all active
branches.
Issue reported and work sponsored by Zonar Systems.
The query ID is the internal hash identifier of the statement,
and was not available in pg_stat_statements view so far.
Daniel Farina, Sameer Thakur and Peter Geoghegan, reviewed by me.
Previously missing or invalid service files returned NULL. Also fix
pg_upgrade to report "out of memory" for a null return from
PQconndefaults().
Patch by Steve Singer, rewritten by me
This function formerly crashed if called as a statement-level trigger,
or if a column-name argument wasn't given.
In passing, add the trigger name to all error messages from the function.
(None of them are expected cases, so this shouldn't pose any compatibility
risk.)
Marc Cousin, reviewed by Sawada Masahiko
The command we're telling people to type needs to include double-quoting
around the unfortunately-chosen extension name. Twiddle the textual
quoting so that it looks somewhat sane. Per gripe from roadrunner6.
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me
This only affects upgrades from 8.3 currently, and is harmless as the
child just generates an error in the script, but we should get it right
in case we ever need this for more complex uses.
Per report from Peter Eisentraut
Previously, pg_upgrade would abort copy_file() on a short write without
setting errno, which the caller would report as an error with the
message "Success". We assume ENOSPC in that case, as we do elsewhere in
the code. Also set errno in some other error cases in copy_file() to
avoid bogus "Success" error messages.
This was broken in 6b711cf37c, so 9.2 and
before are OK.
pgbench formerly failed on lines longer than BUFSIZ, unexpectedly
splitting them into multiple commands. Allow it to work with any
length of input line.
Sawada Masahiko
The NetBSD shell apparently returns non-zero from an unset command if
the variable is already unset. This matters when, as in pg_upgrade's
test.sh, we are working under 'set -e'. To protect against this, we
first set the PG variables to an empty string before unsetting them
completely.
Error found on buildfarm member coypu, solution from Rémi Zara.
Previous commit modified the test case, but I didn't update cube.out
expected output file in previous commit because it was not needed by the
platforms I have easy access to. Buildfarm animal 'dugong', running
"Debian 4.0 icc 10.1.011 ia64", has now gone red because of that, so update
it now.
Also adjust cube_3.out. According to git history, it was added to support
64-bit MinGW. There is no such animal in the buildfarm, so I'm doing this
blindly, but it was added quite recently so maybe someone still cares.
If the lower left and upper right corners of a cube are the same, set a
flag in the cube header, and only store one copy of the coordinates. That
cuts the on-disk size into half for the common case that the cube datatype
is used to represent points rather than boxes.
The new format is backwards-compatible with the old one, so pg_upgrade
still works. However, to get the space savings, the data needs to be
rewritten. A simple VACUUM FULL or REINDEX is not enough, as the old
Datums will just be moved to the new heap/index as is. A pg_dump and
reload, or something similar like casting to text and back, will do the
trick.
This patch deliberately doesn't update all the alternative expected output
files, as I don't have access to machines that produce those outputs. I'm
not sure if they are still relevant, but if they are, the buildfarm will
tell us and produce the diff required to fix it. If none of the buildfarm
animals need them, they should be removed altogether.
Patch by Stas Kelvich.
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition. Replacement implementations taken from
NetBSD.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
REFRESH MATERIALIZED VIEW CONCURRENTLY was broken for any matview
containing a column of a type without a default btree operator
class. It also did not produce results consistent with a non-
concurrent REFRESH or a normal view if any column was of a type
which allowed user-visible differences between values which
compared as equal according to the type's default btree opclass.
Concurrent matview refresh was modified to use the new operators
to solve these problems.
Documentation was added for record comparison, both for the
default btree operator class for record, and the newly added
operators. Regression tests now check for proper behavior both
for a matview with a box column and a matview containing a citext
column.
Reviewed by Steve Singer, who suggested some of the doc language.
Isolate transaction latency (elapsed time between submitting first
command and receiving response to last command) from client-side delays
pertaining to the --rate schedule. Under --rate, report schedule lag as
defined in the documentation. Report latency standard deviation
whenever we collect the measurements to do so. All of these changes
affect --progress messages and the final report.
Fabien COELHO, reviewed by Pavel Stehule.
With the PGXS boilerplate in place, pg_xlogdump currently fails with an
ominous error message that certain targets cannot be built because
certain files do not exist. Remove that and instead throw a quick error
message alerting the user of the actual problem, which should be easier
to diagnose that the statu quo.
Andres Freund
This should have been done when the json functionality was added to
hstore in 9.3.0. To handle this correctly, the upgrade script therefore
uses conditional logic by using plpgsql in a DO statement to add the two
new functions and the new cast. If hstore_to_json_loose is detected as
already present and dependent on the hstore extension nothing is done.
This will require that the database be loaded with plpgsql.
People who have installed the earlier and spurious 1.1 version of hstore
will need to do:
ALTER EXTENSION hstore UPDATE;
to pick up the new functions properly.
Previously a one-dimensional empty array was returned, but its text
representation matched a zero-dimensional array, and there is no way to
dump/reload a one-dimensional empty array.
BACKWARD INCOMPATIBILITY
Per report from elein
Using the infrastructure provided by this patch, it's possible either
to wait for the startup of a dynamically-registered background worker,
or to poll the status of such a worker without waiting. In either
case, the current PID of the worker process can also be obtained.
As usual, worker_spi is updated to demonstrate the new functionality.
Patch by me. Review by Andres Freund.
The planner largely failed to consider the possibility that a
PlaceHolderVar's expression might contain a lateral reference to a Var
coming from somewhere outside the PHV's syntactic scope. We had a previous
report of a problem in this area, which I tried to fix in a quick-hack way
in commit 4da6439bd8, but Antonin Houska
pointed out that there were still some problems, and investigation turned
up other issues. This patch largely reverts that commit in favor of a more
thoroughly thought-through solution. The new theory is that a PHV's
ph_eval_at level cannot be higher than its original syntactic level. If it
contains lateral references, those don't change the ph_eval_at level, but
rather they create a lateral-reference requirement for the ph_eval_at join
relation. The code in joinpath.c needs to handle that.
Another issue is that createplan.c wasn't handling nested PlaceHolderVars
properly.
In passing, push knowledge of lateral-reference checks for join clauses
into join_clause_is_movable_to. This is mainly so that FDWs don't need
to deal with it.
This patch doesn't fix the original join-qual-placement problem reported by
Jeremy Evans (and indeed, one of the new regression test cases shows the
wrong answer because of that). But the PlaceHolderVar problems need to be
fixed before that issue can be addressed, so committing this separately
seems reasonable.
These modules used the YYPARSE_PARAM macro, which has been deprecated
by the bison folk since 1.875, and which they finally removed in 3.0.
Adjust the code to use the replacement facility, %parse-param, which
is a much better solution anyway since it allows specification of the
type of the extra parser parameter. We can thus get rid of a lot of
unsightly casting.
Back-patch to all active branches, since somebody might try to build
a back branch with up-to-date tools.
Pg_Upgrade cannot write the command string to the log file and then call
system() to write to the same file without causing occasional file-share
errors on Windows. So instead, write the command string to the log file
after system(), in those cases.
Backpatch to 9.3.
Tuples belonging to uncommitted transactions should not be
counted as dead.
This is arguably a bug fix that should be back-patched, but
as no one ever noticed until it came time to try to get rid
of SnapshotNow, I'm only doing this in master for now.
Since pg_upgrade -j on Windows uses threads, calling umask()
before/after opening a file via fopen_priv() is no longer possible, so
set umask() as we enter the thread-creating loop, and reset it on exit.
Also adjust internal fopen_priv() calls to just use fopen().
Backpatch to 9.3beta.
This fixes the problem of passing the wrong function pointer when doing
parallel copy/link operations on Windows.
Backpatched to 9.3beta.
Found and patch supplied by Andrew Dunstan
This controls the target transaction rate to certain tps, rather than
maximum. Patch contributed by Fabien COELHO, reviewed by Greg Smith,
and slight editing by me.
Per discussion, it's desirable to eliminate all remaining uses of
SnapshotNow, because it has unpleasant semantics: race conditions
can result in seeing multiple versions of a concurrently updated
row, or none at all. By using GetActiveSnapshot() here, callers
will see exactly those rows that would have been visible if the
invoking query had scanned the table using, for example, a SELECT
statement.
This is slightly different from the old behavior, because commits
that happen concurrently with the scan will not affect the results.
In REPEATABLE READ or SERIALIZABLE modes, where transaction
snapshots are used, commits that have happened since the start of
the transaction will also not affect the results. It is hoped
that this minor incompatibility will be thought an improvement,
or at least no worse than what we did before.
Per discussion on pgsql-hackers, these aren't really needed. Interim
versions of the background worker patch had the worker starting with
signals already unblocked, which would have made this necessary.
But the final version does not, so we don't really need it; and it
doesn't work well with the new facility for starting dynamic background
workers, so just rip it out.
Also per discussion on pgsql-hackers, back-patch this change to 9.3.
It's best to get the API break out of the way before we do an
official release of this facility, to avoid more pain for extension
authors later.
Previously, these functions took a HeapTupleHeader, but upcoming
patches for logical replication will introduce new a new snapshot
type under which the tuple's TID will be used to lookup (CMIN, CMAX)
for visibility determination purposes. This makes that information
available. Code churn is minimal since HeapTupleSatisfiesVisibility
took the HeapTuple anyway, and deferenced it before calling the
satisfies function.
Independently of logical replication, this allows t_tableOid and
t_self to be cross-checked via assertions in tqual.c. This seems
like a useful way to make sure that all callers are setting these
values properly, which has been previously put forward as
desirable.
Andres Freund, reviewed by Álvaro Herrera
This allows us to specify the target relation with several expressions,
'relname', 'schemaname.relname' and OID in all pgstattuple functions.
pgstatindex() and pg_relpages() could not accept OID as the argument
so far.
Per discussion on -hackers, we decided to keep two types of interfaces,
with regclass-type and TEXT-type argument, for each pgstattuple
function because of the backward-compatibility issue. The functions
which have TEXT-type argument will be deprecated in the future release.
Patch by Satoshi Nagayasu, reviewed by Rushabh Lathia and Fujii Masao.
This is SQL-standard with a few extensions, namely support for
subqueries and outer references in clause expressions.
catversion bump due to change in Aggref and WindowFunc.
David Fetter, reviewed by Dean Rasheed.
There is a new API, RegisterDynamicBackgroundWorker, which allows
an ordinary user backend to register a new background writer during
normal running. This means that it's no longer necessary for all
background workers to be registered during processing of
shared_preload_libraries, although the option of registering workers
at that time remains available.
When a background worker exits and will not be restarted, the
slot previously used by that background worker is automatically
released and becomes available for reuse. Slots used by background
workers that are configured for automatic restart can't (yet) be
released without shutting down the system.
This commit adds a new source file, bgworker.c, and moves some
of the existing control logic for background workers there.
Previously, there was little enough logic that it made sense to
keep everything in postmaster.c, but not any more.
This commit also makes the worker_spi contrib module into an
extension and adds a new function, worker_spi_launch, which can
be used to demonstrate the new facility.
This prevents the client from gobbling up too much memory when the
number of large objects to be removed is very large.
Andrew Dunstan, reviewed by Josh Kupershmidt
Treat TOAST index just the same as normal one and get the OID
of TOAST index from pg_index but not pg_class.reltoastidxid.
This change allows us to handle multiple TOAST indexes, and
which is required infrastructure for upcoming
REINDEX CONCURRENTLY feature.
Patch by Michael Paquier, reviewed by Andres Freund and me.
On the command line, GUC option strings are handled by the guc parser,
not by the shell parser, so '' is the proper way to represent a
zero-length string. This reverts commit
3132a9b7ab.
SnapshotNow scans have the undesirable property that, in the face of
concurrent updates, the scan can fail to see either the old or the new
versions of the row. In many cases, we work around this by requiring
DDL operations to hold AccessExclusiveLock on the object being
modified; in some cases, the existing locking is inadequate and random
failures occur as a result. This commit doesn't change anything
related to locking, but will hopefully pave the way to allowing lock
strength reductions in the future.
The major issue has held us back from making this change in the past
is that taking an MVCC snapshot is significantly more expensive than
using a static special snapshot such as SnapshotNow. However, testing
of various worst-case scenarios reveals that this problem is not
severe except under fairly extreme workloads. To mitigate those
problems, we avoid retaking the MVCC snapshot for each new scan;
instead, we take a new snapshot only when invalidation messages have
been processed. The catcache machinery already requires that
invalidation messages be sent before releasing the related heavyweight
lock; else other backends might rely on locally-cached data rather
than scanning the catalog at all. Thus, making snapshot reuse
dependent on the same guarantees shouldn't break anything that wasn't
already subtly broken.
Patch by me. Review by Michael Paquier and Andres Freund.
Previous code had old/new prefixes on option values, e.g.
--old-datadir=OLDDATADIR. Remove them, for simplicity; now:
--old-datadir=DATADIR. Also update docs to do the same.
Change -u (user) option to -U, for consistency with other tools like
pg_dump and psql. Also expand --user to --username, again for
consistency.
BACKWARD INCOMPATIBILITY
Extend the FDW API (which we already changed for 9.3) so that an FDW can
report whether specific foreign tables are insertable/updatable/deletable.
The default assumption continues to be that they're updatable if the
relevant executor callback function is supplied by the FDW, but finer
granularity is now possible. As a test case, add an "updatable" option to
contrib/postgres_fdw.
This patch also fixes the information_schema views, which previously did
not think that foreign tables were ever updatable, and fixes
view_is_auto_updatable() so that a view on a foreign table can be
auto-updatable.
initdb forced due to changes in information_schema views and the functions
they rely on. This is a bit unfortunate to do post-beta1, but if we don't
change this now then we'll have another API break for FDWs when we do
change it.
Dean Rasheed, somewhat editorialized on by Tom Lane
Autovacuum occurring while the test runs could allow some of the inserts to
go into recycled space, thus changing the output ordering of later queries.
While we could complicate those queries to force sorting of their output
rows, it doesn't seem like that would make the test better in any
meaningful way, and conceivably it could hide unexpected diffs. Instead,
tweak the affected queries so that the inserted rows aren't updated by the
following UPDATE. Per buildfarm.
Make slightly better decisions about indentation than what pgindent
is capable of. Mostly breaking out long function calls into one
line per argument, with a few other minor adjustments.
No functional changes- all whitespace.
pgindent ran cleanly (didn't change anything) after.
Passes all regressions.
The behavior is that the required sequence is created locally, which is
appropriate because the default expression will be evaluated locally.
Per gripe from Brad Nicholson that this case was refused with a confusing
error message. We could have improved the error message but it seems
better to just allow the case.
Also, remove ALTER TABLE's arbitrary prohibition against being applied to
foreign tables, which was pretty inconsistent considering we allow it for
views, sequences, and other relation types that aren't even called tables.
This is needed to avoid breaking pg_dump, which sometimes emits column
defaults using separate ALTER TABLE commands. (I think this can happen
even when the default is not associated with a sequence, so that was a
pre-existing bug once we allowed column defaults for foreign tables.)
Previously, the port number used in this test script was hard-wired at
pg_upgrade's default of 50432; which is not so great because parallel build
runs might conflict. Commit 3d53173e20
removed this setting for the postmasters started by the script proper
(not by pg_upgrade), which didn't do anything to fix that problem and also
guaranteed a failure if there was a live postmaster at the build's default
port number. Instead, select a non-conflicting temporary port number in
the same way that pg_regress.c does. (Its method isn't entirely
bulletproof, but given the lack of complaints I'm not going to worry
about that today.)
In passing, unset MAKEFLAGS and MAKELEVEL to avoid problems with the
script's internal invocations of make, for the same reason pg_regress.c
does: it could cause problems in a parallel make.
This helps guard against changes in the set of reserved keywords from
one version to another. In theory it should only be an issue if we
de-reserve a keyword in a newer release, since that can create the type
of problem shown in bug #8128.
Back-patch to 9.1 where the --quote-all-identifiers option was added.
This code was left over from when pg_upgrade paid attention to PGPORT.
Now it would only affects the regression test run before the test run of
pg_upgrade. You can still set PGPORT for that, but there is no reason
to have the test driver default it to 50432.
Choose a saner ordering of parameters (adding a new input param after
the output params seemed a bit random), update the function's header
comment to match reality (cmon folks, is this really that hard?),
get rid of useless and sloppily-defined distinction between
PROCESS_UTILITY_SUBCOMMAND and PROCESS_UTILITY_GENERATED.
The initial coding just descended the index if any of the target trigrams
were possibly present at the next level down. But actually we can apply
trigramsMatchGraph() so as to take advantage of AND requirements when there
are some. The input data might contain false positive matches, but that
can only result in a false positive result, not false negative, so it's
safe to do it this way.
Alexander Korotkov
This changes the behavior of the start and stop actions to exit
successfully if the server was already started or stopped.
This changes the default behavior of the start action: Before, if the
server was already running, it would print a message and succeed. Now,
that situation will result in an error. When running in idempotent
mode, no message is printed and pg_ctl exits successfully.
It was considered to just make the idempotent behavior the default and
only option, but pg_upgrade needs the old behavior.
This wasn't addressed in the original patch, but it doesn't take very
much additional code to cover the case, so let's get it done.
Since pg_trgm 1.1 hasn't been released yet, I just changed the definition
of what's in it, rather than inventing a 1.2.
Make use of some GUC variables, and add SIGHUP handling to reload
the config file. Patch submitted by Guillaume Lelarge.
Also, report to pg_stat_activity. Per report from Marc Cousin, add
setting of statement start time.
This works by extracting trigrams from the given regular expression,
in generally the same spirit as the previously-existing support for
LIKE searches, though of course the details are far more complicated.
Currently, only GIN indexes are supported. We might be able to make
it work with GiST indexes later.
The implementation includes adding API functions to backend/regex/
to provide a view of the search NFA created from a regular expression.
These functions are meant to be generic enough to be supportable in
a standalone version of the regex library, should that ever happen.
Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
contrib/pg_trgm's make_trigrams() was coded to ignore multibyte character
boundaries and just make trigrams from bytes if USE_WIDE_UPPER_LOWER wasn't
defined. This is a bit odd, since there's no obvious reason why trigram
compaction rules should depend on the presence of towlower() and friends.
What's more, there was an Assert() that would fail if that code path was
fed any multibyte characters.
We need to do something about this since the pending regex-indexing patch
has an assumption that you get just one "trgm" from any three characters.
The best solution seems to be to remove the USE_WIDE_UPPER_LOWER
dependency, which shouldn't really have been there in the first place.
The second loop in make_trigrams() is now just a fast path and not a
potentially incompatible algorithm.
If there is anybody still using Postgres on machines without wcstombs() or
towlower(), and they have non-ASCII data indexed by pg_trgm, they'll need
to REINDEX those indexes after pg_upgrade to 9.3, else searches may fail
incorrectly. It seems likely that there are no such installations, though.
In passing, rename cnt_trigram to compact_trigram, which seems to better
describe its functionality, and improve make_trigrams' test for whether it
has to use the slow path or not (per a suggestion from Alexander Korotkov).
Now that pg_dump no longer dumps invalid indexes, per commit
683abc73df, have pg_upgrade also skip
them. Previously pg_upgrade threw an error if invalid indexes existed.
Backpatch to 9.2, 9.1, and 9.0 (where pg_upgrade was added to git)
Windows sometimes gets upset if we rename a large directory and then try
to use the old name quickly, as seen in occasional buildfarm failures.
So we avoid that by building the old version in the intended
destination in the first place instead of renaming it, similar to the
change made for the same reason in commit b7f8465c.
The main change here is to call security_compute_create_name_raw()
rather than security_compute_create_raw(). This ups the minimum
requirement for libselinux from 2.0.99 to 2.1.10, but it looks
like most distributions will have picked that up before 9.3 is out.
KaiGai Kohei
One of the use-cases for postgres_fdw is extracting data from older PG
servers, so cross-version compatibility is important. Document what we
can do here, and further annotate some of the coding choices that create
compatibility constraints. In passing, remove one unnecessary
incompatibility with old servers, namely assuming that we didn't need to
quote the timezone name 'UTC'.
If the remote database's settings of these GUCs are different from ours,
ambiguous datetime values may be read incorrectly. To fix, temporarily
adopt the remote server's settings while we ingest a query result.
This is not a complete fix, since it doesn't do anything about ambiguous
values in commands sent to the remote server; but there seems little we
can do about that end of it given dblink's entirely textual API for
transmitted commands.
Back-patch to 9.2. The hazard exists in all versions, but this patch
would need more work to apply before 9.2. Given the lack of field
complaints about this issue, it doesn't seem worth the effort at present.
Daniel Farina and Tom Lane
Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.
WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.
Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.
Checksum used is WAL CRC-32 truncated to 16-bits.
Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.
This should provide some marginal overall savings, since it surely takes
many more cycles for the remote server to deal with the NULL columns than
it takes for postgres_fdw not to emit them. But really the reason is to
keep the emitted queries from looking quite so silly ...
I wasn't going to ship this without having at least some example of how
to do that. This version isn't terribly bright; in particular it won't
consider any combinations of multiple join clauses. Given the cost of
executing a remote EXPLAIN, I'm not sure we want to be very aggressive
about doing that, anyway.
In support of this, refactor generate_implied_equalities_for_indexcol
so that it can be used to extract equivalence clauses that aren't
necessarily tied to an index.
Remove use of PageSetTLI() from all page manipulation functions
and adjust README to indicate change in the way we make changes
to pages. Repurpose those bytes into the pd_checksum field and
explain how that works in comments about page header.
Refactoring ahead of actual feature patch which would make use
of the checksum field, arriving later.
Jeff Davis, with comments and doc changes by Simon Riggs
Direction suggested by Robert Haas; many others providing
review comments.
The semantics of signal(2) are more variable than one could wish; in
particular, on strict-POSIX platforms the signal handler will be reset
to SIG_DFL when the signal is delivered. This demonstrably breaks
pg_test_fsync's use of SIGALRM. The other changes I made are not
absolutely necessary today, because the called handlers all exit the
program anyway. But it seems like a good general practice to use
pqsignal() exclusively in Postgres code, now that we have it available
everywhere.
We had two copies of this function in the backend and libpq, which was
already pretty bogus, but it turns out that we need it in some other
programs that don't use libpq (such as pg_test_fsync). So put it where
it probably should have been all along. The signal-mask-initialization
support in src/backend/libpq/pqsignal.c stays where it is, though, since
we only need that in the backend.
Clarify the docs explaining what commit_delay does, and add a
recommendation about a useful value for it, namely half of the single-page
fsync time reported by pg_test_fsync. This is informed by testing of
the new-in-9.3 implementation of commit_delay; in prior versions it
was far harder to arrive at a useful setting.
In passing, do some wordsmithing and markup-fixing in the same general
area.
Also, change pg_test_fsync's default time-per-test from 2 seconds to 5.
The old value was about the minimum at which the results could be taken
seriously at all, and so seems a tad optimistic as a default.
Peter Geoghegan, reviewed by Noah Misch; some additional editing by me
Treat expressions as being remotely executable only if all collations used
in them are determined by Vars of the foreign table. This means that, if
the foreign server gets different answers than we do, it's the user's fault
for not having marked the foreign table columns with collations equivalent
to the remote table's. This rule allows most simple expressions such as
"var < 'constant'" to be sent to the remote side, because the constant
isn't determining the collation (the Var's collation would win). There's
still room for improvement, but it's hard to see how to do it without a
lot more knowledge and/or assumptions about what the remote side will do.
Adopt the position that only locally-defined defaults matter. Any defaults
defined in the remote database do not affect insertions performed through
a foreign table (unless they are for columns not known to the foreign
table). While it'd arguably be more useful to permit remote defaults to be
used, making that work in a consistent fashion requires far more work than
seems possible for 9.3.
A test intended to provoke an error on the remote side was coded in such
a way that multiple rows should be updated, so the output would vary
depending on which one was processed first. Per buildfarm.
For datatypes whose output formatting depends on one or more GUC settings,
we have to worry about whether the other server will interpret the value
the same way it was meant. pg_dump has been aware of this hazard for a
long time, but postgres_fdw needs to deal with it too. To fix data
retrieval from the remote server, set the necessary remote GUC settings at
connection startup. (We were already assuming that settings made then
would persist throughout the remote session.) To fix data transmission to
the remote server, temporarily force the relevant GUCs to the right values
when we're about to convert any data values to text for transmission.
This is all pretty grotty, and not very cheap either. It's tempting to
think of defining one uber-GUC that would override any settings that might
render printed data values unportable. But of course, older remote servers
wouldn't know any such thing and would still need this logic.
While at it, revert commit f7951eef89, since
this provides a real fix. (The timestamptz given in the error message
returned from the "remote" server will now reliably be shown in UTC.)
This adds the following:
json_agg(anyrecord) -> json
to_json(any) -> json
hstore_to_json(hstore) -> json (also used as a cast)
hstore_to_json_loose(hstore) -> json
The last provides heuristic treatment of numbers and booleans.
Also, in json generation, if any non-builtin type has a cast to json,
that function is used instead of the type's output function.
Andrew Dunstan, reviewed by Steve Singer.
Catalog version bumped.
We probably need to tell the remote server to use specific timezone and
datestyle settings, and maybe other things. But for now let's just hack
the postgres_fdw regression test to not provoke failures when run in
non-EST5EDT environments. Per buildfarm.
This patch adds the core-system infrastructure needed to support updates
on foreign tables, and extends contrib/postgres_fdw to allow updates
against remote Postgres servers. There's still a great deal of room for
improvement in optimization of remote updates, but at least there's basic
functionality there now.
KaiGai Kohei, reviewed by Alexander Korotkov and Laurenz Albe, and rather
heavily revised by Tom Lane.
A materialized view has a rule just like a view and a heap and
other physical properties like a table. The rule is only used to
populate the table, references in queries refer to the
materialized data.
This is a minimal implementation, but should still be useful in
many cases. Currently data is only populated "on demand" by the
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
It is expected that future releases will add incremental updates
with various timings, and that a more refined concept of defining
what is "fresh" data will be developed. At some point it may even
be possible to have queries use a materialized in place of
references to underlying tables, but that requires the other
above-mentioned features to be working first.
Much of the documentation work by Robert Haas.
Review by Noah Misch, Thom Brown, Robert Haas, Marko Tiikkaja
Security review by KaiGai Kohei, with a decision on how best to
implement sepgsql still pending.
This includes backend "COPY TO/FROM PROGRAM '...'" syntax, and corresponding
psql \copy syntax. Like with reading/writing files, the backend version is
superuser-only, and in the psql version, the program is run in the client.
In the passing, the psql \copy STDIN/STDOUT syntax is subtly changed: if you
the stdin/stdout is quoted, it's now interpreted as a filename. For example,
"\copy foo from 'stdin'" now reads from a file called 'stdin', not from
standard input. Before this, there was no way to specify a filename called
stdin, stdout, pstdin or pstdout.
This creates a new function in pgport, wait_result_to_str(), which can
be used to convert the exit status of a process, as returned by wait(3),
to a human-readable string.
Etsuro Fujita, reviewed by Amit Kapila.
This program relies on rm_desc backend routines and the xlogreader
infrastructure to emit human-readable rendering of WAL records.
Author: Andres Freund, with many reworks by Álvaro
Reviewed (in a much earlier version) by Peter Eisentraut
Include eval costs of local conditions in remote-estimate mode, and don't
assume the remote eval cost is zero in local-estimate mode. (The best
we can do with that at the moment is to assume a seqscan, which may well
be wildly pessimistic ... but zero won't do at all.)
To get a reasonable local estimate, we need to know the relpages count
for the remote rel, so improve the ANALYZE code to fetch that rather
than just setting the foreign table's relpages field to zero.
On reflection this method seems to be exposing an unreasonable amount of
implementation detail. It wouldn't matter when talking to a remote server
of the identical Postgres version, but it seems likely to make things worse
not better if the remote is a different version with different casting
infrastructure. Instead adopt ruleutils.c's policy of regurgitating the
cast as it was originally specified; including not showing it at all, if
it was implicit to start with. (We must do that because for some datatypes
explicit and implicit casts have different semantics.)
The only place we depended on that was in sending numeric type OIDs in
PQexecParams; but we can replace that usage with explicitly casting
each Param symbol in the query string, so that the types are specified
to the remote by name not OID. This makes no immediate difference but
will be essential if we ever hope to support use of non-builtin types.
Set the remote session's search path to exactly "pg_catalog" at session
start, then schema-qualify only names that aren't in that schema. This
greatly reduces clutter in the generated SQL commands, as seen in the
regression test changes. Per discussion.
Also, rethink use of FirstNormalObjectId as the "built-in object" cutoff
--- FirstBootstrapObjectId is safer, since the former will accept
objects in information_schema for instance.
There's still a lot of room for improvement, but it basically works,
and we need this to be present before we can do anything much with the
writable-foreign-tables patch. So let's commit it and get on with testing.
Shigeru Hanada, reviewed by KaiGai Kohei and Tom Lane
If users create tablespaces inside the old cluster directory, it is
impossible for the delete script to delete _only_ the old cluster files,
so don't create a script in that case, and issue a message to the user.
Cases such as similarity('', '') produced a NaN result due to computing
0/0. Per discussion, make it return zero instead.
This appears to be the basic cause of bug #7867 from Michele Baravalle,
although it remains unclear why her installation doesn't think Cyrillic
letters are letters.
Back-patch to all active branches.
libpgcommon is a new static library to allow sharing code among the
various frontend programs and backend; this lets us eliminate duplicate
implementations of common routines. We avoid libpgport, because that's
intended as a place for porting issues; per discussion, it seems better
to keep them separate.
The first use case, and the only implemented by this patch, is pg_malloc
and friends, which many frontend programs were already using.
At the same time, we can use this to provide palloc emulation functions
for the frontend; this way, some palloc-using files in the backend can
also be used by the frontend cleanly. To do this, we change palloc() in
the backend to be a function instead of a macro on top of
MemoryContextAlloc(). This was previously believed to cause loss of
performance, but this implementation has been tweaked by Tom and Andres
so that on modern compilers it provides a slight improvement over the
previous one.
This lets us clean up some places that were already with
localized hacks.
Most of the pg_malloc/palloc changes in this patch were authored by
Andres Freund. Zoltán Böszörményi also independently provided a form of
that. libpgcommon infrastructure was authored by Álvaro.
The previous coding supposed that the first differing bytes in two varlena
datums must have the same sign difference as their overall comparison
result. This is obviously bogus for text strings in non-C locales, and
probably wrong for numeric, and even for bytea I think it was wrong on
machines where char is signed. When the assumption failed, the function
could deliver a zero or negative penalty in situations where such a result
is quite ridiculous, leading the core GiST code to make very bad page-split
decisions.
To fix, take the absolute values of the byte-level differences. Also,
switch the code to using unsigned char not just char, so that the behavior
will be consistent whether char is signed or not.
Per investigation of a trouble report from Tomas Vondra. Back-patch to all
supported branches.
gbt_var_bin_union() failed to do the right thing when the existing range
needed to be widened at both ends rather than just one end. This could
result in an invalid index in which keys that are present would not be
found by searches, because the searches would not think they need to
descend to the relevant leaf pages. This error affected all the varlena
datatypes supported by btree_gist (text, bytea, bit, numeric).
Per investigation of a trouble report from Tomas Vondra. (There is also
an issue in gbt_var_penalty(), but that should only result in inefficiency
not wrong answers. I'm committing this separately so that we have a git
state in which it can be tested that bad penalty results don't produce
invalid indexes.) Back-patch to all supported branches.