XLogPageRead() can retry internally after a pread() system call has
succeeded, in the case of short reads, and page validation failures
while in standby mode (see commit 0668719801). Due to an oversight in
commit 3f1ce973, these cases could leave stale data in the internal
cache of xlogreader.c without marking it invalid. The main defense
against stale cached data on failure to read a page was in the error
handling path of the calling function ReadPageInternal(), but that
wasn't quite enough for errors handled internally by XLogPageRead()'s
retry loop if we then exited with XLREAD_WOULDBLOCK.
1. ReadPageInternal() now marks the cache invalid before calling the
page_read callback, by setting state->readLen to 0. It'll be set to
a non-zero value only after a successful read. It'll stay valid as
long as the caller requests data in the cached range.
2. XLogPageRead() no long performs internal retries while reading
ahead. While such retries should work, the general philosophy is
that we should give up prefetching if anything unusual happens so we
can handle it when recovery catches up, to reduce the complexity of
the system. Let's do that here too.
3. While here, a new function XLogReaderResetError() improves the
separation between xlogrecovery.c and xlogreader.c, where the former
previously clobbered the latter's internal error buffer directly.
The new function makes this more explicit, and also clears a related
flag, without which a standby would needlessly retry in the outer
function.
Thanks to Noah Misch for tracking down the conditions required for a
rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and
providing a reproducer.
Back-patch to 15.
Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com
The planner has to special-case indexes on boolean columns, because
what we need for an indexscan on such a column is a qual of the shape
of "boolvar = pseudoconstant". For plain bool constants, previous
simplification will have reduced this to "boolvar" or "NOT boolvar",
and we have to reverse that if we want to make an indexqual. There is
existing code to do so, but it only fires when the index's opfamily
is BOOL_BTREE_FAM_OID or BOOL_HASH_FAM_OID. Thus extension AMs, or
extension opclasses such as contrib/btree_gin, are out in the cold.
The reason for hard-wiring the set of relevant opfamilies was mostly
to avoid a catalog lookup in a hot code path. We can improve matters
while not taking much of a performance hit by relying on the
hard-wired set when the opfamily OID is visibly built-in, and only
checking the catalogs when dealing with an extension opfamily.
While here, rename IsBooleanOpfamily to IsBuiltinBooleanOpfamily
to remind future users of that macro of its limitations. At some
point we might want to make indxpath.c's improved version of the
test globally accessible, but it's not presently needed elsewhere.
Zongliang Quan and Tom Lane
Discussion: https://postgr.es/m/f293b91d-1d46-d386-b6bb-4b06ff5c667b@yeah.net
It was not strictly correct to say that a column list must always include
replica identity columns because that is true for only updates and
deletes.
Author: Peter Smith
Reviwed-by: Vignesh C, Amit Kapila
Backpatch-through: 15, where it was introduced
Discussion: https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=VUqh16UWtFO3GzcKQK_5m1hrW3vqg@mail.gmail.com
Several backend-side loops scanning one or more directories with
ReadDir() (WAL segment recycle/removal in xlog.c, backend-side directory
copy, temporary file removal, configuration file parsing, some logical
decoding logic and some pgtz stuff) already know the type of the entry
being scanned thanks to the dirent structure associated to the entry, on
platforms where we know about DT_REG, DT_DIR and DT_LNK to make the
difference between a regular file, a directory and a symbolic link.
Relying on the direct structure of an entry saves a few system calls to
stat() and lstat() in the loops updated here, shaving some code while on
it. The logic of the code remains the same, calling stat() or lstat()
depending on if it is necessary to look through symlinks.
Authors: Nathan Bossart, Bharath Rupireddy
Reviewed-by: Andres Freund, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACV8n-J-f=yiLUOx2=HrQGPSOZM3nWzyQQvLPcccPXxEdg@mail.gmail.com
The sysroot determination is fairly complex and will soon also be needed when
building with meson. Instead of duplicating the logic, move it to a dedicated
shell script invoked both by configure and meson.
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/2180a97c-c026-1b6c-cec8-d6e499f97017@enterprisedb.com
The reverts the following and makes some associated cleanups:
commit f79b803dc: Common SQL/JSON clauses
commit f4fb45d15: SQL/JSON constructors
commit 5f0adec25: Make STRING an unreserved_keyword.
commit 33a377608: IS JSON predicate
commit 1a36bc9db: SQL/JSON query functions
commit 606948b05: SQL JSON functions
commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
commit 4e34747c8: JSON_TABLE
commit fadb48b00: PLAN clauses for JSON_TABLE
commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
commit 14d3f24fa: Further improve jsonb_sqljson parallel test
commit a6baa4bad: Documentation for SQL/JSON features
commit b46bcf7a4: Improve readability of SQL/JSON documentation.
commit 112fdb352: Fix finalization for json_objectagg and friends
commit fcdb35c32: Fix transformJsonBehavior
commit 4cd8717af: Improve a couple of sql/json error messages
commit f7a605f63: Small cleanups in SQL/JSON code
commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
commit a1e7616d6: Rework SQL/JSON documentation
commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
commit 3c633f32b: Only allow returning string types or bytea from json_serialize
commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.
The release notes are also adjusted.
Backpatch to release 15.
Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
Not passing -shared to gcc when building a shared library triggers linking to
the wrong libgcc (libgcc.a instead of libgcc_s.a) and prevents emitting
correct unwind information. It's somewhat surprising that this hasn't caused
known problems so far.
Doing so requires adding path to libgcc to libpath, or linking statically to
libgcc - as the latter increases .so size substantially (for not entirely
obvious reasons), shared linking seems preferrable. It likely is worth
building executables with -shared-libgcc too, but I've not done that here.
Discussion: https://postgr.es/m/20220820174213.d574qde4ptwdzoqz@awork3.anarazel.de
Oversight in commit 418ec3207: it's better to do it like this,
else you have to drop and recreate the extension for each
permutation. tcn.spec only has one permutation at present,
so this doesn't speed it up any, but it's still a bad example.
Buildfarm member bowerbird is (inconsistently) showing different
results for this test case since we enabled ASLR for MSVC builds.
It's not very clear whether that's a bug in its version of libxml2
or the test case is relying on nominally-undefined behavior, ie the
ordering of results from XPath's node(). It seems quite unlikely
that it's *our* bug though, and what's more, using node() adds
nothing to the test coverage so far as our code is concerned.
So, tweak the test to not use node().
For the moment, only change HEAD because we've only seen the
problem there. Perhaps a case will emerge for back-patching.
Discussion: https://postgr.es/m/2655387.1661695793@sss.pgh.pa.us
During dumptuples() the call to writetuple() would pfree any non-null
tuple. This was quite wasteful as this happens just before we perform a
reset of the context which stores all of those tuples.
It seems to make sense to do a bit of a code refactor to make this work,
so here we just get rid of the writetuple function and adjust the WRITETUP
macro to call the state's writetup function. The WRITETUP usage in
mergeonerun() always has state->slabAllocatorUsed == true, so writetuple()
would never free the tuple or do any memory accounting. The only call
path that needs memory accounting done is in dumptuples(), so let's just
do it manually there.
In passing, let's get rid of the state->memtupcount-- code that counts the
memtupcount down to 0 one tuple at a time inside the loop. That seems to
be a rather inefficient way to set memtupcount to 0, so let's just zero it
after the loop instead.
Author: David Rowley
Discussion: https://postgr.es/m/CAApHDvqZXoDCyrfCzZJR0-xH+7_q+GgitcQiYXUjRani7h4j8Q@mail.gmail.com
get_database_list() failed to restore the caller's memory context,
instead leaving current context set to TopMemoryContext which is
how CommitTransactionCommand() leaves it. The callers both think
they are using short-lived contexts, for the express purpose of
not having to worry about cleaning up individual allocations.
The net effect therefore is that supposedly short-lived allocations
could accumulate indefinitely in the launcher's TopMemoryContext.
Although this has been broken for a long time, it seems we didn't
have any obvious memory leak here until v15's rearrangement of the
stats logic. I (tgl) am not entirely convinced that there's no
other leak at all, though, and we're surely at risk of adding one
in future back-patched fixes. So back-patch to all supported
branches, even though this may be only a latent bug in pre-v15.
Reid Thompson
Discussion: https://postgr.es/m/972a4e12b68b0f96db514777a150ceef7dcd2e0f.camel@crunchydata.com
Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs
to freeze were determined at the start of each VACUUM by taking related
cutoffs that represent which XIDs/MXIDs VACUUM should treat as still
running, and subtracting an XID/MXID age based value controlled by GUCs
like vacuum_freeze_min_age. The FreezeLimit cutoff (XID freeze cutoff)
was derived by subtracting an XID age value from OldestXmin, while the
MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting
an MXID age value from OldestMxact. This approach didn't match the
approach used nearby to determine whether this VACUUM operation should
be an aggressive VACUUM or not.
VACUUM now uses the standard approach instead: it subtracts the same
age-based values from next XID/next MXID (rather than subtracting from
OldestXmin/OldestMxact). This approach is simpler and more uniform.
Most of the time it will have only a negligible impact on how and when
VACUUM freezes. It will occasionally make VACUUM more robust in the
event of problems caused by long running transaction. These are cases
where OldestXmin and OldestMxact are held back by so much that they
attain an age that is a significant fraction of the value of age-based
settings like vacuum_freeze_min_age.
There is no principled reason why freezing should be affected in any way
by the presence of a long-running transaction -- at least not before the
point that the OldestXmin and OldestMxact limits used by each VACUUM
operation attain an age that makes it unsafe to freeze some of the
XIDs/MXIDs whose age exceeds the value of the relevant age-based
settings. The new approach should at least make freezing degrade more
gracefully than before, even in the most extreme cases.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com
Building the ecpg tests with MSVC, with warnings enabled, results in the
following warning:
src/interfaces/ecpg/test/compat_informix/rnull.pgc(19,1): warning C4305: 'initializing': truncation from 'double' to 'float'
The more obvious fix would be an 'f' suffix, but ecpg can't parse that.
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Discussion: https://postgr.es/m/2180a97c-c026-1b6c-cec8-d6e499f97017@enterprisedb.com
If the input word exceeds 1000 bytes, don't pass it to the stemmer;
just return it as-is after case folding. Such an input is surely
not a word in any human language, so whatever the stemmer might
do to it would be pretty dubious in the first place. Adding this
restriction protects us against a known recursion-to-stack-overflow
problem in the Turkish stemmer, and it seems like good insurance
against any other safety or performance issues that may exist in
the Snowball stemmers. (I note, for example, that they contain no
CHECK_FOR_INTERRUPTS calls, so we really don't want them running
for a long time.) The threshold of 1000 bytes is arbitrary.
An alternative definition could have been to treat such words as
stopwords, but that seems like a bigger break from the old behavior.
Per report from Egor Chindyaskin and Alexander Lakhin.
Thanks to Olly Betts for the recommendation to fix it this way.
Discussion: https://postgr.es/m/1661334672.728714027@f473.i.mail.ru
Commit e3ce2de09d rearranged this
function to be able to identify which inherited role had admin option
on the target role, but it got the order of operations wrong, causing
the function to return wrong answers in the presence of non-inherited
grants.
Fix that, and add a test case that verifies the correct behavior.
Patch by me, reviewed by Nathan Bossart
Discussion: http://postgr.es/m/CA+TgmoYamnu-xt-u7CqjYWnRiJ6BQaSpYOHXP=r4QGTfd1N_EA@mail.gmail.com
When reporting failure in check_ functions there is (typically) a text-
file mentioned in the error report which contains further details. Some
check_ functions kept a separate flag variable to indicate failure, and
some just checked the state of the filehandle as it's guaranteed to be
open when the check failed. This refactors the functions to consistently
do the same check on error reporting. As the error report contains the
filepath, it makes more sense to check the filehandle state and skip the
flag variable.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bruce Momjian <bruce@momjian.us>
Discussion: https://postgr.es/m/595759F6-625B-4ED7-8125-91AF00437F83@yesql.se
The default of lazy symbol resolution means that when the postmaster
first reaches the select() call in ServerLoop, it'll need to resolve
the link to that libc entry point. NetBSD's dynamic loader takes
an internal lock while doing that, and if a signal interrupts the
operation then there is a risk of self-deadlock should the signal
handler do anything that requires that lock, as several of the
postmaster signal handlers do. The window for this is pretty narrow,
and timing considerations make it unlikely that a signal would arrive
right then anyway. But it's semi-repeatable on slow single-CPU
machines, and in principle the race could happen with any hardware.
The least messy solution to this is to force binding of dynamic
symbols at postmaster start, using the "-z now" linker option.
While we're at it, also use "-z relro" so as to provide a small
security gain.
It's not entirely clear whether any other platforms share this
issue, but for now we'll assume it's NetBSD-specific. (We might
later try to use "-z now" on more platforms for performance
reasons, but that would not likely be something to back-patch.)
Report and patch by me; the idea to fix it this way is from
Andres Freund.
Discussion: https://postgr.es/m/3384826.1661802235@sss.pgh.pa.us
Robert Haas reported that his older clang compiler didn't like the two
Asserts which were verifying that the given MemoryContextMethodID was <=
MEMORY_CONTEXT_METHODID_MASK when building with
-Wtautological-constant-out-of-range-compare. In my (David's) opinion,
the compiler is wrong to warn about that. Newer versions of clang don't
warn about the out of range enum value, so perhaps this was a bug that has
now been fixed. To keep older clang versions happy, let's just cast the
enum value to int to stop the compiler complaining.
The main reason for the Asserts mentioned above to exist are to inform
future developers which are adding new MemoryContexts if they run out of
bit space in MemoryChunk to store the MemoryContextMethodID. As pointed
out by Tom Lane, it seems wise to also add a comment to the header for
that enum to document the restriction on these enum values.
Additionally, also fix an incorrect usage of UINT64CONST() which was
introduced in c6e0fe1f2.
Author: Robert Haas, David Rowley
Discussion: https://postgr.es/m/CA+TgmoYGG2C7Vbw1cjkQRRBL3zOk8SmhrQnsJgzscX=N9AwPrw@mail.gmail.com
This reverts commit df0f4feef. It turns out the problem which was causing
the 32-bit ARM and PPC animals to fail was due to a MAXALIGN problem in
slab.c. This was fixed by d5ee4db0e. The padding that was added in
df0f4feef would only do anything on machines where uint64 was not aligned
to 8 bytes. The 32-bit machines which were failing are not in that
category, so revert this commit.
Discussion: https://postgr.es/m/3209100.1661787561@sss.pgh.pa.us
Currently, the replication origin tracking of the tablesync worker is
dropped by the apply worker. So, there will be a small lag between the
tablesync worker exit and its origin tracking got removed. In the
meantime, new tablesync workers can be launched and will try to set up
a new origin tracking. This can lead the system to reach max configured
limit (max_replication_slots) even if the user has configured the max
limit considering the number of tablesync workers required in the system.
We decided not to back-patch as this can occur in very narrow
circumstances and users have to option to increase the configured limit by
increasing max_replication_slots.
Reported-by: Hubert Depesz Lubaczewski
Author: Ajin Cherian
Reviwed-by: Masahiko Sawada, Peter Smith, Hou Zhijie, Amit Kapila
Discussion: https://postgr.es/m/20220714115155.GA5439@depesz.com
c6e0fe1f2 added a new pointer field to SlabBlock to make it 4 bytes larger
on 32-bit machines. Prior to that commit, the size of that struct was a
multiple of 8, which meant that MAXALIGN(sizeof(SlabBlock)) was the same
as sizeof(SlabBlock), however, after c6e0fe1f2, due to the addition of the
new pointer field to store a pointer to the owning context, that was no
longer true on builds with sizeof(void *) == 4.
This problem was highlighted by an Assert failure which was checking that
the pointer given to pfree() was MAXALIGNED. Various 32-bit ARM buildfarm
animals were failing. These have MAXIMUM_ALIGNOF of 8. The only 32-bit
testing I'd managed to do on c6e0fe1f2 had been on x86, which has a
MAXIMUM_ALIGNOF of 4, therefore did not exhibit this issue.
Here we define Slab_BLOCKHDRSZ and copy what is being done in aset.c and
generation.c for doing calculations based on the size of the context's
block type. This means that SlabAlloc() will now always return a
MAXALIGNed pointer.
This also fixes an incorrect sentinel_ok() check in SlabCheck() which was
incorrectly checking the wrong sentinel byte. This must have previously
not caused any issues due to the fullChunkSize never being large enough to
store the sentinel byte.
Diagnosed-by: Tomas Vondra, Tom Lane
Author: Tomas Vondra, David Rowley
Discussion: https://postgr.es/m/CAA4eK1%2B1JyW5TiL%3DyV-3Uq1CrfnTyn0Xrk5uArt31Z%3D8rgPhXQ%40mail.gmail.com
All the code and comments cleaned up here is irrelevant since 495ed0e.
Note that this removes an assumption that CreateRestrictedToken() may
not exist, something that could have happened when running under Windows
NT as the code stated. Rather than assuming that it may not exist, this
causes pg_ctl to fail hard if the function cannot be loaded.
Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20220826112637.GD2342@telsasoft.com
More than twenty years ago (79fcde48b), we hacked the postmaster
to avoid a core-dump on systems that didn't support fflush(NULL).
We've mostly, though not completely, hewed to that rule ever since.
But such systems are surely gone in the wild, so in the spirit of
cleaning out no-longer-needed portability hacks let's get rid of
multiple per-file fflush() calls in favor of using fflush(NULL).
Also, we were fairly inconsistent about whether to fflush() before
popen() and system() calls. While we've received no bug reports
about that, it seems likely that at least some of these call sites
are at risk of odd behavior, such as error messages appearing in
an unexpected order. Rather than expend a lot of brain cells
figuring out which places are at hazard, let's just establish a
uniform coding rule that we should fflush(NULL) before these calls.
A no-op fflush() is surely of trivial cost compared to launching
a sub-process via a shell; while if it's not a no-op then we likely
need it.
Discussion: https://postgr.es/m/2923412.1661722825@sss.pgh.pa.us