Instead of repeatedly fishing the data out of the relcache entry,
let's use the version that we cached in the PartitionDispatch. We
could alternatively rip out the PartitionDispatch fields altogether,
but it doesn't make much sense to have them and not use them; before
this patch, partdesc was set but altogether unused. Amit Langote and
I both thought using them was a litle better than removing them, so
this patch takes that approach.
Discussion: http://postgr.es/m/CA+TgmobFnxcaW-Co-XO8=yhJ5pJXoNkCj6Z7jm9Mwj9FGv-D7w@mail.gmail.com
During parallel index scans, if the current page to be read is deleted, we
skip it and try to get the next page for a scan without releasing the buffer
lock on the current page. To get the next page, sometimes it needs to wait
for another process to complete its scan and advance it to the next page.
Now, it is quite possible that the master backend has errored out before
advancing the scan and issued a termination signal for all workers. The
workers failed to notice the termination request during wait because the
interrupts are held due to buffer lock on the previous page. This lead to
all workers being stuck.
The fix is to release the buffer lock on current page before trying to get
the next page. We are already doing same in backward scans, but missed
it for forward scans.
Reported-by: Victor Yegorov
Bug: 15290
Diagnosed-by: Thomas Munro and Amit Kapila
Author: Amit Kapila
Reviewed-by: Thomas Munro
Tested-By: Thomas Munro and Victor Yegorov
Backpatch-through: 10 where parallel index scans were introduced
Discussion: https://postgr.es/m/153228422922.1395.1746424054206154747@wrigleys.postgresql.org
Depending on the platform used, this can cause a crash in the worst
case, or an unhelpful error message, so fail gracefully.
Author: Fabien Coelho
Discussion: https://postgr.es/m/alpine.DEB.2.21.1807262302550.29874@lancre
Backpatch: 11-, where hash() has been added in pgbench.
We suppressed one of these test cases in commit feb1cc559 because
it was failing to produce the expected results on CLOBBER_CACHE_ALWAYS
buildfarm members. But now we need another test with similar behavior,
so let's set up a test file that is expected to vary between regular and
CLOBBER_CACHE_ALWAYS cases, and provide variant expected files.
Someday we should fix plpgsql's failure for change-of-field-type, and
then the discrepancy will go away and we can fold these tests back
into plpgsql_record.sql. But today is not that day.
Discussion: https://postgr.es/m/87wotkfju1.fsf@news-spur.riddles.org.uk
Since commit 6719b238e it's been possible for the values of plpgsql
record field variables to be exposed to the planner as Params.
(Before that, plpgsql never supplied values for such variables during
planning, so that the problematic code wasn't reached.) Other places
that touch potentially-type-mutable Params either cope gracefully or
do runtime-test-and-ereport checks that the type is what they expect.
But eval_const_expressions() just had an Assert, meaning that it either
failed the assertion or risked crashes due to using an incompatible
value.
In this case, rather than throwing an ereport immediately, we can just
not perform a const-substitution in case of a mismatch. This seems
important for the same reason that the Param fetch was speculative:
we might not actually reach this part of the expression at runtime.
Test case will follow in a separate commit.
Patch by me, pursuant to bug report from Andrew Gierth.
Back-patch to v11 where the previous commit appeared.
Discussion: https://postgr.es/m/87wotkfju1.fsf@news-spur.riddles.org.uk
Due to inlining it previously was possible that an ExprContext's
shutdown callback pointed to a JITed function. As the JIT context
previously was shut down before the shutdown callbacks were called,
that could lead to segfaults. Fix the ordering.
Reported-By: Dmitry Dolgov
Author: Andres Freund
Discussion: https://postgr.es/m/CA+q6zcWO7CeAJtHBxgcHn_hj+PenM=tvG0RJ93X1uEJ86+76Ug@mail.gmail.com
Backpatch: 11-, where JIT compilation was added
Previously the attribute was only checked for external functions
inlined, not "static" functions that had to be inlined as
dependencies.
This isn't really a bug, but makes debugging a bit harder. The new
behaviour also makes more sense. Therefore backpatch.
Author: Andres Freund
Backpatch: 11-, where JIT compilation was added
Until now shadowed_variables was the only plpgsql check supported by
plpgsql.extra_warnings and plpgsql.extra_errors. This patch introduces
two new checks - strict_multi_assignment and too_many_rows. Unlike
shadowed_variables, these new checks are enforced at run-time.
strict_multi_assignment checks that commands allowing multi-assignment
(for example SELECT INTO) have the same number of sources and targets.
too_many_rows checks that queries with an INTO clause return one row
exactly.
These checks are aimed at cases that are technically valid and allowed,
but are often a sign of a bug. Therefore those checks are expected to
be enabled primarily in development and testing environments.
Author: Pavel Stehule
Reviewed-by: Stephen Frost, Tomas Vondra
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRA2kKRDKpUNwLY0GeG1OqOp+tLS2yQA1V41gzuSz-hCng@mail.gmail.com
In a USE_UNNAMED_SEMAPHORES build, the default on Linux and FreeBSD
since commit ecb0d20a, we have an array of sem_t objects. This
turned out to reduce performance compared to the previous default
USE_SYSV_SEMAPHORES on an 8 socket system. Testing showed that the
lost performance could be regained by padding the array elements so
that they have their own cache lines. This matches what we do for
similar hot arrays (see LWLockPadded, WALInsertLockPadded).
Back-patch to 10, where unnamed semaphores were adopted as the default
semaphore interface on those operating systems.
Author: Thomas Munro
Reviewed-by: Andres Freund
Reported-by: Mithun Cy
Tested-by: Mithun Cy, Tom Lane, Thomas Munro
Discussion: https://postgr.es/m/CAD__OugYDM3O%2BdyZnnZSbJprSfsGFJcQ1R%3De59T3hcLmDug4_w%40mail.gmail.com
This extends cluster_rel() in such a way that more options can be added
in the future, which will reduce the amount of chunk code for an
upcoming SKIP_LOCKED aimed for VACUUM. As VACUUM FULL is a different
flavor of CLUSTER, we want to make that extensible to ease integration.
This only reworks the API and its callers, without providing anything
user-facing. Two options are present now: verbose mode and relation
recheck when doing the cluster command work across multiple
transactions. This could be used as well as a base to extend the
grammar of CLUSTER later on.
Author: Michael Paquier
Reviewed-by: Nathan Bossart
Discussion: https://postgr.es/m/20180723031058.GE2854@paquier.xyz
Commit 4b0d28de06 has removed the prior checkpoint and related
facilities but has left WAL recycling based on the LSN of the prior
checkpoint, which causes incorrect calculations for WAL removal and
recycling for max_wal_size and min_wal_size. This commit changes things
so as the base calculation point is the last checkpoint generated.
Reported-by: Kyotaro Horiguchi
Author: Kyotaro Horiguchi
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/20180723.135748.42558387.horiguchi.kyotaro@lab.ntt.co.jp
Backpatch: 11-, where the prior checkpoint has been removed.
During the work of upstreaming my previous patches for gdb and perf
support the API changed. Adapt. Normally this wouldn't necessarily be
something to backpatch, but the previous API wasn't upstream, and at
least the gdb support is quite useful for debugging.
Author: Andres Freund
Backpatch: 11, where LLVM based JIT support was added.
Those would use the default ERRCODE_INTERNAL_ERROR, but for foreseeable
failures an errcode ought to be set, ERRCODE_DATA_CORRUPTED making the
most sense here.
While on the way, fix one errcode_for_file_access missing in origin.c
since the code has been created, and remove one assignment of errno to 0
before calling read(), as this was around to fit with what was present
before 811b6e36 where errno would not be set when not enough bytes are
read. I have noticed the first one, and Tom has pinged me about the
second one.
Author: Michael Paquier
Reported-by: Tom Lane
Discussion: https://postgr.es/m/27265.1531925836@sss.pgh.pa.us
Some error messages which report something about a file operation use
as well context which is already provided within the path being worked
on, making things rather duplicated. This creates more work for
translators, and does not actually bring clarity.
More could be done, however in a lot of cases the context used is
actually useful, still that patch gets down things with a good cut.
Author: Michael Paquier
Reviewed-by: Kyotaro Horiguchi, Tom Lane
Discussion: https://postgr.es/m/20180718044711.GA8565@paquier.xyz
The JIT compiled implementation missed maintaining
AggState->{current_set,curaggcontext}. That could lead to trouble
because the transition value could be allocated in the wrong context.
Reported-By: Rushabh Lathia
Diagnosed-By: Dmitry Dolgov
Author: Dmitry Dolgov, with minor changes by me
Discussion: https://postgr.es/m/CAGPqQf165-=+Drw3Voim7M5EjHT1zwPF9BQRjLFQzCzYnNZEiQ@mail.gmail.com
Backpatch: 11-, where JIT compilation support was added
As benchmarks show, using libc's string-to-integer conversion is
pretty slow. At least part of the reason for that is that strtol[l]
have to be more generic than what largely is required inside pg.
This patch considerably speeds up int2/int4 input (int8 already was
already using hand-rolled code).
Most of the existing pg_atoi callers have been converted. But as one
requires pg_atoi's custom delimiter functionality, and as it seems
likely that there's external pg_atoi users, it seems sensible to just
keep pg_atoi around.
Author: Andres Freund
Reviewed-By: Robert Haas
Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hmjpk@alap3.anarazel.de
Previously a lot of the error messages referenced the type in the
error message itself. That requires that the message is translated
separately for each type.
Note that currently a few smallint cases continue to reference the
integer, rather than smallint, type. A later patch will create a
separate routine for 16bit input.
Author: Andres Freund
Discussion: https://postgr.es/m/20180707200158.wpqkd7rjr4jxq5g7@alap3.anarazel.de
I blew the dust off a Bourne shell (file date 1996, yea verily) and
tried to run test.sh with it. It mostly worked, but I found that the
temp-directory creation code introduced by commit be76a6d39 was not
compatible, for a couple of reasons: this shell thinks "set -e" should
force an exit if a command within backticks fails, and it also thinks code
within braces should be executed by a sub-shell, meaning that variable
settings don't propagate back up to the parent shell. In view of Victor
Wagner's report that Solaris is still using pre-POSIX shells, seems like
we oughta make this case work. It's not like the code is any less
idiomatic this way; the prior coding technique appeared nowhere else.
(There is a remaining bash-ism here, which is that $RANDOM doesn't do
what the code hopes in non-bash shells. But the use of $$ elsewhere in
that path should be enough to ensure uniqueness and some amount of
randomness, so I think it's okay as-is.)
Back-patch to all supported branches, as the previous commit was.
Discussion: https://postgr.es/m/20180720153820.69e9ae6c@fafnir.local.vm
Double-quote $PGDATA in "find" commands introduced by commit da9b580d8,
in case that path contains spaces or other special characters.
Adjust a few other places so that quoting is done more consistently.
None of the others are actual bugs AFAICS, but it's confusing to readers
if the same thing is done differently in different places.
Noted by Tels.
Discussion: https://postgr.es/m/c96303c04c360bbedaa04f90f515745b.squirrel@sm.webmail.pair.com
Most of test.sh uses traditional backtick syntax for command substitution,
but commit da9b580d8 introduced two uses of $(...) syntax, which is not
recognized by very old shells. Bring those into line with the rest.
Victor Wagner
Discussion: https://postgr.es/m/20180720153820.69e9ae6c@fafnir.local.vm
When built using OpenSSL, pg_strong_random() uses RAND_bytes() to
generate the random number. On very rare occasions that can fail, if
its PRNG has not been seeded with enough data. Additionally, once it
does fail, all subsequent calls will also fail until more seed data is
added. Since this is required during backend startup, this can result
in all new backends failing to start until a postmaster restart.
Guard against that by checking the state of OpenSSL's PRNG using
RAND_status(), and if necessary (very rarely), seeding it using
RAND_poll().
Back-patch to v10, where pg_strong_random() was introduced.
Dean Rasheed and Michael Paquier.
Discussion: https://postgr.es/m/CAEZATCXMtxbzSAvyKKk5uCRf9pNt4UV%2BF_5v%3DgLfJUuPxU4Ytg%40mail.gmail.com
It has been project policy to create toast tables only for those catalogs
that might reasonably need one. Since this judgment call can change over
time, just create one for every catalog, as this can be useful when
creating rather-long entries in catalogs, with recent examples being in
the shape of policy expressions or customly-formatted SCRAM verifiers.
To prevent circular dependencies and to avoid adding complexity to VACUUM
FULL logic, exclude pg_class, pg_attribute, and pg_index. Also, to
prevent pg_upgrade from seeing a non-empty new cluster, exclude
pg_largeobject and pg_largeobject_metadata from the set as large object
data is handled as user data. Those relations have no reason to use a
toast table anyway.
Author: Joe Conway, John Naylor
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/84ddff04-f122-784b-b6c5-3536804495f8@joeconway.com
transformPartitionSpec rejected duplicate simple partition columns
(e.g., "PARTITION BY RANGE (x,x)") but paid no attention to expression
columns, resulting in inconsistent behavior. Worse, cases like
"PARTITION BY RANGE (x,(x))") were accepted but would then result in
dump/reload failures, since the expression (x) would get simplified
to a plain column later.
There seems no better reason for this restriction than there was for
the one against duplicate included index columns (cf commit 701fd0bbc),
so let's just remove it.
Back-patch to v10 where this code was added.
Report and patch by Yugo Nagata.
Discussion: https://postgr.es/m/20180712165939.36b12aff.nagata@sraoss.co.jp
This is essential information when looking at an index that has
"included" columns. Per discussion, follow the style used in \dC
and some other places: column header is "Key?" and values are "yes"
or "no" (all translatable).
While at it, revise describeOneTableDetails to be a bit more maintainable:
avoid hard-wired column numbers and multiple repetitions of what needs
to be identical test logic. This also results in the emitted catalog
query corresponding more closely to what we print, which should be a
benefit to users of ECHO_HIDDEN mode, and perhaps a bit faster too
(the old logic sometimes asked for values it would not print, even
ones that are fairly expensive to get).
Discussion: https://postgr.es/m/21724.1531943735@sss.pgh.pa.us
The multi-argument form of pg_get_indexdef() failed to print anything when
asked to print a single index column that is an included column rather than
a key column. This seems an unintentional result of someone having tried
to take a short-cut and use the attrsOnly flag for two different purposes.
To fix, split said flag into two flags, attrsOnly which suppresses
non-attribute info, and keysOnly which suppresses included columns.
Add a test case using psql's \d command, which relies on that function.
(It's mighty tempting at this point to replace pg_get_indexdef_worker's
mess of boolean flag arguments with a single bitmask-of-flags argument,
which would allow making the call sites much more self-documenting.
But I refrained for the moment.)
Discussion: https://postgr.es/m/21724.1531943735@sss.pgh.pa.us
The code added by 9c7d06d606 was a bit obscure; clarify that by
rewriting the comments. Lack of clarity has already caused bugs, so
it's a worthy goal.
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Michaël Paquier <michael@paquier.xyz>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Petr Jelínek <petr.jelinek@2ndquadrant.com>
Discussion: https://postgr.es/m/87y3fgoyrn.fsf@ars-thinkpad
PostgreSQL 9.4 introduces posting list compression in GIN. This feature
supports online upgrade, so that after pg_upgrade uncompressed posting
lists are compressed on-the-fly. Underlying code appears to always
expect at least one item on uncompressed posting list page. But there
could be completely empty pages, because VACUUM never deletes leftmost
and rightmost pages from posting trees. This commit fixes that.
Reported-by: Sivasubramanian Ramasubramanian
Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com
Author: Sivasubramanian Ramasubramanian, Alexander Korotkov
Backpatch-through: 9.4
This comment was copy-pasted from nodeAppend.c to nodeMergeAppend.c, but
while committing 5220bb7533, I modified wrong copy of it.
Spotted by David Rowley
GatherMergePath (introduced in 10) and CustomPath (introduced in 9.5)
have gone missing. The order of the Path nodes was inconsistent with
what is listed in nodes.h, so make the order consistent at the same time
to ease future checks and additions.
Author: Sawada Masahiko
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAD21AoBQMLoc=ohH-oocuAPsELrmk8_EsRJjOyR8FQLZkbE0wA@mail.gmail.com
Instead of MergeAppendPath, MergeAppend nodes were considered. This
code is not covered by any tests now, which should be addressed at some
point.
This is an oversight from f49842d, which introduced partition-wise joins
in v11, so back-patch down to that.
Author: Michael Paquier
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/20180718062202.GC8565@paquier.xyz
This script supposed that if it turned hot_standby_feedback on and then
shut down the standby server, at least one feedback message would be
guaranteed to be sent before the standby stops. But there is no such
guarantee, if the standby's walreceiver process is slow enough --- and
we've seen multiple failures in the buildfarm showing that that does
happen in practice. While we could rearrange the walreceiver logic to
make it less likely, it seems probably impossible to create a really
bulletproof guarantee of that sort; and if we tried, we might create
situations where the walreceiver wouldn't react in a timely manner to
shutdown commands. It seems better instead to remove the script's
assumption that feedback will occur before shutdown.
But once we do that, these last few tests seem quite redundant with
the earlier tests in the script. So let's just drop them altogether
and save some buildfarm cycles.
Backpatch to v10 where these tests were added.
Discussion: https://postgr.es/m/1922.1531592205@sss.pgh.pa.us
The initial version of the included-index-column feature stated that
included columns couldn't be the same as any key column of the index.
While it'd be pretty silly to do that, since the included column would be
entirely redundant, we've never prohibited redundant index columns before
so it's not very consistent to do so here. Moreover, the prohibition
was itself badly implemented, so that it failed to reject columns that
were effectively identical but not spelled quite alike, as reported by
Aditya Toshniwal.
(Moreover, it's not hard to imagine that for some non-btree index types,
such cases would be non-silly anyhow: the index might use a lossy
representation for key columns but be able to support retrieval of the
original form of included columns.)
Hence, let's just drop the prohibition.
In passing, do some copy-editing on the documentation for the
included-column feature.
Yugo Nagata; documentation and test corrections by me
Discussion: https://postgr.es/m/CAM9w-_mhBCys4fQNfaiQKTRrVWtoFrZ-wXmDuE9Nj5y-Y7aDKQ@mail.gmail.com
Historically, we've allowed auxiliary processes to take buffer pins without
tracking them in a ResourceOwner. However, that creates problems for error
recovery. In particular, we've seen multiple reports of assertion crashes
in the startup process when it gets an error while holding a buffer pin,
as for example if it gets ENOSPC during a write. In a non-assert build,
the process would simply exit without releasing the pin at all. We've
gotten away with that so far just because a failure exit of the startup
process translates to a database crash anyhow; but any similar behavior
in other aux processes could result in stuck pins and subsequent problems
in vacuum.
To improve this, institute a policy that we must *always* have a resowner
backing any attempt to pin a buffer, which we can enforce just by removing
the previous special-case code in resowner.c. Add infrastructure to make
it easy to create a process-lifespan AuxProcessResourceOwner and clear
out its contents at appropriate times. Replace existing ad-hoc resowner
management in bgwriter.c and other aux processes with that. (Thus, while
the startup process gains a resowner where it had none at all before, some
other aux process types are replacing an ad-hoc resowner with this code.)
Also use the AuxProcessResourceOwner to manage buffer pins taken during
StartupXLOG and ShutdownXLOG, even when those are being run in a bootstrap
process or a standalone backend rather than a true auxiliary process.
In passing, remove some other ad-hoc resource owner creations that had
gotten cargo-culted into various other places. As far as I can tell
that was all unnecessary, and if it had been necessary it was incomplete,
due to lacking any provision for clearing those resowners later.
(Also worth noting in this connection is that a process that hasn't called
InitBufferPoolBackend has no business accessing buffers; so there's more
to do than just add the resowner if we want to touch buffers in processes
not covered by this patch.)
Although this fixes a very old bug, no back-patch, because there's no
evidence of any significant problem in non-assert builds.
Patch by me, pursuant to a report from Justin Pryzby. Thanks to
Robert Haas and Kyotaro Horiguchi for reviews.
Discussion: https://postgr.es/m/20180627233939.GA10276@telsasoft.com
A collection of typos I happened to spot while reading code, as well as
grepping for common mistakes.
Backpatch to all supported versions, as applicable, to avoid conflicts
when backporting other commits in the future.
This should tame the beast, as there are no other places where off_t is
used in the new error messages.
Reported again by longfin, which complained about walsender.c while I
spotted the other two ones while double-checking.
Some error messages related to file handling are using the code path
context to define their state. For example, 2PC-related errors are
referring to "two-phase status files", or "relation mapping file" is
used for catalog-to-filenode mapping, however those prove to be
difficult to translate, and are not more helpful than just referring to
the path of the file being worked on. So simplify all those error
messages by just referring to files with their path used. In some
cases, like the manipulation of WAL segments, the context is actually
helpful so those are kept.
Calls to the system function read() have also been rather inconsistent
with their error handling sometimes not reporting the number of bytes
read, and some other code paths trying to use an errno which has not
been set. The in-core functions are using a more consistent pattern
with this patch, which checks for both errno if set or if an
inconsistent read is happening.
So as to care about pluralization when reading an unexpected number of
byte(s), "could not read: read %d of %zu" is used as error message, with
%d field being the output result of read() and %zu the expected size.
This simplifies the work of translators with less variations of the same
message.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20180520000522.GB1603@paquier.xyz
It's more logical this way, since the new ordering matches the way the
tables are created; but in any case, the previous location of PARTITION OF
did not appear carefully chosen anyway (since it didn't match the
location in which it appears in the synopsys either, which is what we
normally do.)
In the PARTITION BY stanza, add a link to the partitioning section in
the DDL chapter, too.
Suggested-by: David G. Johnston
Discussion: https://postgr.es/m/CAKFQuwY4Ld7ecxL_KAmaxwt0FUu5VcPPN2L4dh+3BeYbrdBa5g@mail.gmail.com