The input document to functions JSON_EXISTS(), JSON_QUERY(),
JSON_VALUE(), and JSON_TABLE() can be specified as character or
UTF8-encoded bytea strings. These are automatically converted to
jsonb with an implicit cast before being passed to the jsonpath
machinery.
In the current implementation, errors that occur when parsing the
specified string into a valid JSON document are thrown
unconditionally. This means they are not subject to the explicit or
implicit ON ERROR clause of those functions, which is a standard-
conforming behavior. Add a note to the documentation to mention
that.
Reported-by: Markus Winand
Discussion: https://postgr.es/m/F7DD1442-265C-4220-A603-CB0DEB77E91D%40winand.at
errfinish() assumes that the __FUNC__ and __FILE__ arguments it's
passed are compile-time constant strings that can just be pointed
to rather than physically copied. However, it's possible for LLVM
to generate code in which those pointers point into a dynamically
loaded code segment. If that segment gets unloaded before we're
done with the ErrorData struct, we have dangling pointers that
will lead to SIGSEGV. In simple cases that won't happen, because we
won't unload LLVM code before end of transaction. But it's possible
to happen if the error is thrown within end-of-transaction code run by
_SPI_commit or _SPI_rollback, because since commit 2e517818f those
functions clean up by ending the transaction and starting a new one.
Rather than fixing this by adding pstrdup() overhead to every
elog/ereport sequence, let's fix it by copying the risky pointers
in CopyErrorData(). That solves it for _SPI_commit/_SPI_rollback
because they use that function to preserve the error data across
the transaction end/restart sequence; and it seems likely that
any other code doing something similar would need to do that too.
I'm suspicious that this behavior amounts to an LLVM bug (or a
bug in our use of it?), because it implies that string constant
references that should be pointer-equal according to a naive
understanding of C semantics will sometimes not be equal.
However, even if it is a bug and someday gets fixed, we'll have
to cope with the current behavior for a long time to come.
Report and patch by me. Back-patch to all supported branches.
Discussion: https://postgr.es/m/1565654.1719425368@sss.pgh.pa.us
We did not recover the subtransaction IDs of prepared transactions
when starting a hot standby from a shutdown checkpoint. As a result,
such subtransactions were considered as aborted, rather than
in-progress. That would lead to hint bits being set incorrectly, and
the subtransactions suddenly becoming visible to old snapshots when
the prepared transaction was committed.
To fix, update pg_subtrans with prepared transactions's subxids when
starting hot standby from a shutdown checkpoint. The snapshots taken
from that state need to be marked as "suboverflowed", so that we also
check the pg_subtrans.
Backport to all supported versions.
Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi
This went unnoticed, because only a few existing callers of
BackgroundPsql->query used the result, and the ones that did were not
bothered by an extra newline. I noticed because I was about to add a
new test that checks the result.
Backport to all supported versions, since I just backported the
BackgroundPsql facility to all supported versions too.
In pgoutput, when converting the child table's tuple format to match the
parent table's, we temporarily create a new slot to store the converted
tuple. However, we missed to drop such temporary slots, leading to
resource leakage.
Reported-by: Bowen Shi
Author: Hou Zhijie
Reviewed-by: Amit Kapila
Backpatch-through: 15
Discussion: https://postgr.es/m/CAM_vCudv8dc3sjWiPkXx5F2b27UV7_YRKRbtSCcE-pv=cVACGA@mail.gmail.com
When pgstats is initialized for a backend, it uses dsa_attach_in_place()
without a "segment" provided. Hence, no callback is registered to
automatically release the DSA attached once a backend exits. Not doing
any cleanup causes the reference count of the pgstats DSA to
continuously increment, at some point overflowing it (the more the
number of connections, the faster it is to reach this state). Once the
reference count overflows and then gets back to 0, new backends are not
able to attach to the pgstats DSA, failing startup.
This issue is resolved by adding in the pgstats shutdown hook a call to
dsa_release_in_place(), ensuring that the DSA attached at backend
startup is correctly released, keeping the reference count at bay.
The author of this patch has been able to see this issue on a server
with a long uptime and a high connection turnover.
Issue introduced by 5891c7a8ed, so backpatch down to 15.
Author: Anthonin Bonnefoy
Discussion: https://postgr.es/m/CAO6_XqqJbJBL=M7Ym13TcB4Xnq58vRa2jcC+gwEPBgbAda6B1Q@mail.gmail.com
Backpatch-through: 15
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:
TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
/lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]
To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.
2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.
More concretely, the autovacuum process is stuck here:
#0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
#2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
#3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
at ../src/backend/storage/ipc/latch.c:538
#4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
#5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
#6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
#7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
#8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
#9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569
and the checkpointer is stuck here:
#0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
#3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
#4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464
To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.
Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.
Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi
_bt_advance_array_keys didn't take sufficient care at the point where it
decides whether to start a new primitive index scan based on a call to
_bt_check_compare against finaltup (a call with the scan direction
flipped around). The final decision was conditioned on rules about how
the scan key offset sktrig that initially triggered array advancement
(passed to _bt_advance_array_keys from its _bt_checkkeys caller)
compared to the offset set by its own _bt_check_compare finaltup call.
This approach was faulty, in that it allowed _bt_advance_array_keys to
incorrectly start a new primitive index scan, that landed on the same
leaf page (on assert-enabled builds it led to an assertion failure).
In general, scans with array keys are expected to never have to read the
same leaf page more than once (barring cases involving cursors, and
cases where the scan restores a marked position for the inner side of a
merge join). This principle was established by commit 5bf748b8.
To fix, make the final decision based on whether the scan key offset set
by the _bt_check_compare finaltup call is an offset to an inequality
strategy scan key. An unsatisfied required inequality strategy scan key
indicates that all of the scan's required equality strategy scan keys
must also be satisfied by finaltup (not just by caller's tuple), and
that there is a decent chance that _bt_first will be able to reposition
the scan to a position many leaf pages ahead of the current leaf page.
Oversight in commit 5bf748b8.
Discussion: https://postgr.es/m/CAH2-Wz=DyHbcg7o6zXqzyiin8WE8vzk4tvU8Lrnh-a=EAvO0TQ@mail.gmail.com
When detaching partition in concurrent mode, it's possible for partition
descriptors to not match the set that was recently seen when the plan
was made, causing an assertion failure or (in production builds) failure
to construct a working plan. The case that was reported involves
prepared statements, but I think it may be possible to hit this bug
without that too.
The problem is that CreatePartitionPruneState is constructing a
PartitionPruneState under the assumption that new partitions can be
added, but never removed, but it turns out that this isn't true: a
prepared statement gets replanned when the DETACH CONCURRENTLY session
sends out its invalidation message, but if the invalidation message
arrives after ExecInitAppend started, we would build a partition
descriptor without the partition, and then CreatePartitionPruneState
would refuse to work with it.
CreatePartitionPruneState already contains code to deal with the new
descriptor having more partitions than before (and behaving for the
extra partitions as if they had been pruned), but doesn't have code to
deal with less partitions than before, and it is naïve about the case
where the number of partitions is the same. We could simply add that a
new stanza for less partitions than before, and in simple testing it
works to do that; but it's possible to press the test scripts even
further and hit the case where one partition is added and a partition is
removed quickly enough that we see the same number of partitions, but
they don't actually match, causing hangs during execution.
To cope with both these problems, we now memcmp() the arrays of
partition OIDs, and do a more elaborate mapping (relying on the fact
that both OID arrays are in partition-bounds order) if they're not
identical.
This fix was already pushed in backbranches earlier.
Reported-by: yajun Hu <1026592243@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
When detaching partition in concurrent mode, it's possible for partition
descriptors to not match the set that was recently seen when the plan
was made, causing an assertion failure or (in production builds) failure
to construct a working plan. The case that was reported involves
prepared statements, but I think it may be possible to hit this bug
without that too.
The problem is that CreatePartitionPruneState is constructing a
PartitionPruneState under the assumption that new partitions can be
added, but never removed, but it turns out that this isn't true: a
prepared statement gets replanned when the DETACH CONCURRENTLY session
sends out its invalidation message, but if the invalidation message
arrives after ExecInitAppend started, we would build a partition
descriptor without the partition, and then CreatePartitionPruneState
would refuse to work with it.
CreatePartitionPruneState already contains code to deal with the new
descriptor having more partitions than before (and behaving for the
extra partitions as if they had been pruned), but doesn't have code to
deal with less partitions than before, and it is naïve about the case
where the number of partitions is the same. We could simply add that a
new stanza for less partitions than before, and in simple testing it
works to do that; but it's possible to press the test scripts even
further and hit the case where one partition is added and a partition is
removed quickly enough that we see the same number of partitions, but
they don't actually match, causing hangs during execution.
To cope with both these problems, we now memcmp() the arrays of
partition OIDs, and do a more elaborate mapping (relying on the fact
that both OID arrays are in partition-bounds order) if they're not
identical.
Backpatch to 14, where DETACH CONCURRENTLY appeared.
Reported-by: yajun Hu <1026592243@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18377-e0324601cfebdfe5@postgresql.org
In commit af0e7deb4a, I removed a call to RelationCloseSmgr(), because
the dangling SMgrRelation was no longer an issue. However, we still
need the call when the relation's relfilelocator changes, so that the
new relfilelocator takes effect immediately.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/987b1c8c-8c91-4847-ca0e-879f421680ff%40gmail.com
0452b461bc adds alternative orderings of group-by keys during the query
optimization. This new feature is controlled by the new GUC parameter
enable_group_by_reordering, which accidentally came without the documentation.
This commit adds the missing documentation for that GUC.
Reported-by: Bruce Momjian
Discussion: https://postgr.es/m/ZnDx2FYlba_OafQd%40momjian.us
Author: Andrei Lepikhov
Reviewed-by: Pavel Borisov, Alexander Korotkov
RT_NODE_16_SEARCH_EQ() performs comparisions using vector registers
on x64-64 and aarch64. We apply a mask to the resulting bitfield
to eliminate irrelevant bits that may be set. This ensures correct
behavior, but Valgrind complains of the partially-uninitialised
values. So far the warnings have only occurred on aarch64, which
explains why this hasn't been seen earlier.
To fix this warning, initialize the whole fixed-sized part of the nodes
upon allocation, rather than just do the minimum initialization to
function correctly. The initialization for node48 is a bit different
in that the 256-byte slot index array must be populated with "invalid
index" rather than zero. Experimentation has shown that compilers
tend to emit code that uselessly memsets that array twice. To avoid
pessimizing this path, swap the order of the slot_idxs[] and isset[]
arrays so we can initialize with two non-overlapping memset calls.
Reported by Tomas Vondra
Analysis and patch by Tom Lane, reviewed by Masahiko Sawada. I
investigated the behavior of memset calls to overlapping regions,
leading to the above tweaks to node48 as discussed in the thread.
Discussion: https://postgr.es/m/120c63ad-3d12-415f-a7bf-3da451c31bf6%40enterprisedb.com
Apply const qualifiers to char * arguments and fields throughout the
jsonapi. This allows the top-level APIs such as
pg_parse_json_incremental() to declare their input argument as const.
It also reduces the number of unconstify() calls.
Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
afterTriggerInvokeEvents and AfterTriggerExecute have always
treated it as an error if the trigger OID mentioned in a queued
after-trigger event can't be found. However, that fails to
account for the edge case where the trigger's been dropped in
the current transaction since queueing the event. There seems
no very good reason to disallow that case, so instead silently
do nothing if the trigger OID can't be found.
This does give up a little bit of bug-detection ability, but I don't
recall that these error messages have ever actually revealed a bug,
so it seems mostly theoretical. Alternatives such as marking
pending events DONE at the time of dropping a trigger would be
complicated and perhaps introduce bugs of their own.
Per bug #18517 from Alexander Lakhin. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/18517-af2d19882240902c@postgresql.org
This changes the import library name from 'postgres.exe.lib' to
'postgres.lib', which is what it was with the old MSVC build system.
Extension builds use that name.
Bug: #18513
Reported-by: Muralikrishna Bandaru <muralikrishna.bandaru@enterprisedb.com>
Contrary to what the comment for the "check" struct member claims,
'pg_upgrade --check' performs only the checks and does not ask the
user for permission to make changes.
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/ZnHk7ci5IuTWVc_c%40nathan
Currently, when the ON EMPTY clause is not present, the ON ERROR
clause (implicit or explicit) dictates the behavior when jsonpath
evaluation in ExecEvalJsonExprPath() results in an empty sequence.
That is an oversight in the commit 6185c9737c.
This commit fixes things so that a NULL is returned instead in that
case which is the default behavior when the ON EMPTY clause is not
present.
Reported-by: Markus Winand
Discussion: https://postgr.es/m/F7DD1442-265C-4220-A603-CB0DEB77E91D%40winand.at
Previously, GetJsonPathVar() allowed a jsonpath expression to
reference any prefix of a PASSING variable's name. For example, the
following query would incorrectly work:
SELECT JSON_QUERY(context_item, jsonpath '$xy' PASSING val AS xyz);
The fix ensures that the length of the variable name mentioned in a
jsonpath expression matches exactly with the name of the PASSING
variable before comparing the strings using strncmp().
Reported-by: Alvaro Herrera (off-list)
Discussion: https://postgr.es/m/CA+HiwqFGkLWMvELBH6E4SQ45qUHthgcRH6gCJL20OsYDRtFx_w@mail.gmail.com
In cost_memoize_rescan(), when calculating the hit_ratio using the calls
and ndistinct estimations, if the value that was set in
MemoizePath.calls had not been processed through clamp_row_est(), then it
was possible that it was set to some non-integer value which could result
in ndistinct being 1 higher than calls due to estimate_num_groups()
performing clamp_row_est() on its input_rows. This could result in
hit_ratio values slightly below 0.0, which would cause an Assert failure.
The value of MemoizePath.calls comes from the final parameter in the
create_memoize_path() function, of which we only have one true caller of.
That caller passes outer_path->rows. All the core code I looked at
always seems to call clamp_row_est() on the Path.rows, so there might
have been no issues with any core Paths causing troubles here. The bug
report was about a CustomPath with a non-clamped row estimated.
The misbehavior as a result of this seems to be mostly limited to the
Assert() failing. Aside from that, it seems the Memoize costs would
just come out slightly higher than they should have, which is likely
fairly harmless.
Reported-by: Kohei KaiGai <kaigai@heterodb.com>
Discussion: https://postgr.es/m/CAOP8fzZnTU+N64UYJYogb1hN-5hFP+PwTb3m_cnGAD7EsQwrKw@mail.gmail.com
Reviewed-by: Richard Guo
Backpatch-through: 14, where Memoize was introduced
Reconstruction of an SP-GiST index by REINDEX CONCURRENTLY may
insert some REDIRECT tuples. This will typically happen in
a transaction that lacks an XID, which leads either to assertion
failure in spgFormDeadTuple or to insertion of a REDIRECT tuple
with zero xid. The latter's not good either, since eventually
VACUUM will apply GlobalVisTestIsRemovableXid() to the zero xid,
resulting in either an assertion failure or a garbage answer.
In practice, since REINDEX CONCURRENTLY locks out index scans
till it's done, it doesn't matter whether it inserts REDIRECTs
or PLACEHOLDERs; and likewise it doesn't matter how soon VACUUM
reduces such a REDIRECT to a PLACEHOLDER. So in non-assert builds
there's no observable problem here, other than perhaps a little
index bloat. But it's not behaving as intended.
To fix, remove the failing Assert in spgFormDeadTuple, acknowledging
that we might sometimes insert a zero XID; and guard VACUUM's
GlobalVisTestIsRemovableXid() call with a test for valid XID,
ensuring that we'll reduce such a REDIRECT the first time VACUUM
sees it. (Versions before v14 use TransactionIdPrecedes here,
which won't fail on zero xid, so they really have no bug at all
in non-assert builds.)
Another solution could be to not create REDIRECTs at all during
REINDEX CONCURRENTLY, making the relevant code paths treat that
case like index build (which likewise knows that no concurrent
index scans can be happening). That would allow restoring the
Assert in spgFormDeadTuple, but we'd still need the VACUUM change
because redirection tuples with zero xid may be out there already.
But there doesn't seem to be a nice way for spginsert() to tell that
it's being called in REINDEX CONCURRENTLY without some API changes,
so we'll leave that as a possible future improvement.
In HEAD, also rename the SpGistState.myXid field to redirectXid,
which seems less misleading (since it might not in fact be our
transaction's XID) and is certainly less uninformatively generic.
Per bug #18499 from Alexander Lakhin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18499-8a519c280f956480@postgresql.org