Two closely related bugs are fixed. First, xmin of logical slots was
advanced too early. During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them. The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.
To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN. This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment). Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.
The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise. To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.
Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.
Liberally sprinkle code comments, and rewrite a few existing ones. This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.
Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
randomAccess parallel tuplesorts are disallowed because the leader would
try to write to its own leader tape, not because the leader would try to
write to a worker tape directly.
Cleanup from commit 9da0cc3528.
Building a new nbtree index through incremental insertions would always
be slower than our actual approach of sorting using tuplesort,
assembling leaf pages from tuplesort output, and writing and WAL-logging
whole pages. Remove a comment block from the Berkeley days claiming
that incremental insertions might be slightly faster with presorted
input.
Discussion: https://postgr.es/m/CAH2-WzmKs4mLAoFgJ3yHMRYc849efc=dw+pNRb3NEog2oJoCNw@mail.gmail.com
Concurrently with partitioned index development (commit 8b08f7d482),
the code to handle failure to rename indexes was refactored (commit
8b9e9644dc). Turns out that that particular case was untested, which
naturally led it to be broken. Add tests and the missing code line.
Co-authored-by: David Rowley <dgrowley@gmail.com>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com>
Discussion: https://postgr.es/m/CAKcux6mfYMS3OX0ywjOiWiGSEKhJf-1zdeTceHFbd0mScUzU5A@mail.gmail.com
The backup history file has been no longer necessary for recovery
since the version 9.0. It's now basically just for informational purpose.
But previously the documentations still described that a recovery
requests the backup history file to proceed. The commit fixes this
documentation bug.
Back-patch to all supported versions.
Author: Yugo Nagata
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180626174752.0ce505e3.nagata@sraoss.co.jp
find_appinfos_by_relids had quite a large overhead when the number of
items in the append_rel_list was high, as it had to trawl through the
append_rel_list looking for AppendRelInfos belonging to the given
childrelids. Since there can only be a single AppendRelInfo for each
child rel, it seems much better to store an array in PlannerInfo which
indexes these by child relid, making the function O(1) rather than O(N).
This function was only called once inside the planner, so just replace
that call with a lookup to the new array. find_childrel_appendrelinfo
is now unused and thus removed.
This fixes a planner performance regression new to v11 reported by
Thomas Reiss.
Author: David Rowley
Reported-by: Thomas Reiss
Reviewed-by: Ashutosh Bapat
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/94dd7a4b-5e50-0712-911d-2278e055c622@dalibo.com
Upper limits for vacuum_cleanup_index_scale_factor GUC and reloption
were initially set to 100.0 in 857f9c36. However, after further
discussion, it appears that some users like to disable B-tree cleanup
index scan completely (assuming there are no deleted pages).
vacuum_cleanup_index_scale_factor is used barely to protect against
stalled index statistics. And after detailed consideration it appears
that risk of stalled index statistics is low. And it would be nice to
allow advanced users setting higher values of
vacuum_cleanup_index_scale_factor. So, set upper limit for these
GUC and reloption to DBL_MAX.
Author: Alexander Korotkov
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/CAC8Q8tJCb%3DgxhzcV7T6ctx7PY-Ux1oA-AsTJc6cAVNsQiYcCzA%40mail.gmail.com
The previous message for SPI_ERROR_TRANSACTION claimed "cannot begin/end
transactions in PL/pgSQL", but that is no longer true. Nevertheless,
the error can still happen, so reword the messages. The error cases in
exec_prepare_plan() could never happen, so remove them.
This file has been missing the fact that it needs to report back to
callers a proper failure on fsync calls. I have spotted the one in
tar_finish() while Kuntal has spotted the one in tar_close().
Backpatch down to 10 where this code has been introduced.
Reported by: Michael Paquier, Kuntal Ghosh
Author: Michael Paquier
Reviewed-by: Kuntal Ghosh, Magnus Hagander
Discussion: https://postgr.es/m/20180625024356.GD1146@paquier.xyz
System calls mixed up in error code paths are causing two issues which
several code paths have not correctly handled:
1) For write() calls, sometimes the system may return less bytes than
what has been written without errno being set. Some paths were careful
enough to consider that case, and assumed that errno should be set to
ENOSPC, other calls missed that.
2) errno generated by a system call is overwritten by other system calls
which may succeed once an error code path is taken, causing what is
reported to the user to be incorrect.
This patch uses the brute-force approach of correcting all those code
paths. Some refactoring could happen in the future, but this is let as
future work, which is not targeted for back-branches anyway.
Author: Michael Paquier
Reviewed-by: Ashutosh Sharma
Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
Two out of three code paths were mapping column numbers correctly if a
partition had different column numbers than parent table, but the most
commonly used one (recursing in CREATE INDEX to a new index on a
partition) failed to map attribute numbers in expressions. Oddly
enough, attnums in WHERE clauses are already handled correctly
everywhere.
Reported-by: Amit Langote
Author: Amit Langote
Discussion: https://postgr.es/m/dce1fda4-e0f0-94c9-6abb-f5956a98c057@lab.ntt.co.jp
Reviewed-by: Álvaro Herrera
Previously, if some or all partitions had no partially aggregated path,
we would still try to generate a partially aggregated path for the
parent, leading to assertion failures or wrong answers.
Report by Rajkumar Raghuwanshi. Patch by Jeevan Chalke, reviewed
by Ashutosh Bapat. A few changes by me.
Discussion: http://postgr.es/m/CAKcux6=q4+Mw8gOOX16ef6ZMFp9Cve7KWFstUsrDa4GiFaXGUQ@mail.gmail.com
Commit 16828d5c02 neglected to do this, so upgraded databases would
silently get null instead of the specified default in rows without the
attribute defined.
A new binary upgrade function is provided to perform this and pg_dump is
adjusted to output a call to the function if required in binary upgrade
mode.
Also included is code to drop missing attribute values for dropped
columns. That way if the type is later dropped the missing value won't
have a dangling reference to the type.
Finally the regression tests are adjusted to ensure that there is a row
with a missing value so that this code is exercised in upgrade testing.
Catalog version unfortunately bumped.
Regression test changes from Tom Lane.
Remainder from me, reviewed by Tom Lane, Andres Freund, Alvaro Herrera
Discussion: https://postgr.es/m/19987.1529420110@sss.pgh.pa.us
vacuum_cleanup_index_scale_factor was located in autovacuum group of
GUCs. However, it affects not only autovacuum, but also manually run
VACUUM. It appears that "client connection defaults" group of GUCs
is more appropriate for vacuum_cleanup_index_scale_factor, because
vacuum_*_age options are already located there.
Also, vacuum_cleanup_index_scale_factor was missed in
postgresql.conf.sample. So, add it there with appropriate comment.
Author: Masahiko Sawada with minor editorization by me
Discussion: https://postgr.es/m/CAD21AoArsoXMLKudXSKN679FRzs6oubEchM53bHwn8Tp%3D2boNg%40mail.gmail.com
The create_append_path code didn't consider that list_concat will
modify it's first argument leading to inconsistent traversal of
resulting list. In practice, it won't lead to any user-visible bug
but changing it for making the code behave consistently.
Reported-by: Tom Lane
Author: Tom Lane
Reviewed-by: Amit Khandekar and Amit Kapila
Discussion: https://postgr.es/m/32365.1528994120@sss.pgh.pa.us
Pavel Stehule's original patch had support for default namespace, but I
ripped it out before commit -- hence the docs were correct when written,
and I broke them by omission :-(. Remove the offending phrase.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/1550C5E5-FC70-4493-A226-AA137D831E8D@yesql.se
A typo in numeric_poly_combine caused bogus results for queries using
it, but of course would only manifest if parallel aggregation is
performed. Reported by Rajkumar Raghuwanshi.
David Rowley did the diagnosis and the fix; I editorialized rather
heavily on his regression test additions.
Back-patch to v10 where the breakage was introduced (by 9cca11c91).
Discussion: https://postgr.es/m/CAKcux6nU4E2x8nkSBpLOT2DPvQ5LviJ3SGyAN6Sz7qDH4G4+Pw@mail.gmail.com
According to the SQL standard, the context of XMLTABLE's XPath
row_expression is the document node of the XML input document, not the
root node. This becomes visible when a relative path rather than
absolute is used as row expression. Absolute paths is what was used in
original tests and docs (and the most common form used in examples
throughout the interwebs), which explains why this wasn't noticed
before.
Other functions such as xpath() and xpath_exists() also have this
problem. While not specified by the SQL standard, it would be pretty
odd to leave those functions to behave differently than XMLTABLE, so
change them too. However, this is a backwards-incompatible change.
No backpatch, out of fear of breaking code depending on the original
broken behavior.
Author: Markus Winand
Reported-By: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
split_pathtarget_at_srfs() neglected to worry about sortgroupref labels
in the intermediate PathTargets it constructs. I think we'd supposed
that their labeling didn't matter, but it does at least for the case that
GroupAggregate/GatherMerge nodes appear immediately under the ProjectSet
step(s). This results in "ERROR: ORDER/GROUP BY expression not found in
targetlist" during create_plan(), as reported by Rajkumar Raghuwanshi.
To fix, make this logic track the sortgroupref labeling of expressions,
not just their contents. This also restores the pre-v10 behavior that
separate GROUP BY expressions will be kept distinct even if they are
textually equal().
Discussion: https://postgr.es/m/CAKcux6=1_Ye9kx8YLBPmJs_xE72PPc6vNi5q2AOHowMaCWjJ2w@mail.gmail.com
PostgreSQL 11 introduces compress method for SP-GiST opclasses. That
was mistakenly interpreted as compression support for SP-GiST while
actually that allows lossy representation of leaf keys.
Author: Alexander Korotkov, based on proposal by Darafei Praliaskouski
Discussion: https://postgr.es/m/CAC8Q8tKbYmNdiyWr7hE4GfMY4fbqHKkFziKgrUuWHH6HJQs3og%40mail.gmail.com
Column expressions that match TEXT or CDATA nodes must return the
contents of the nodes themselves, not the content of non-existing
children (i.e. the empty string).
Author: Markus Winand
Reported-by: Markus Winand
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/0684A598-002C-42A2-AE12-F024A324EAE4@winand.at
Commit ab72716778 allowed Parallel Append paths to be generated for a
relation that is not parallel safe. Prevent that from happening.
Initial analysis by Tom Lane.
Reported-by: Rajkumar Raghuwanshi
Author: Amit Kapila and Rajkumar Raghuwanshi
Reviewed-by: Amit Khandekar and Robert Haas
Discussion:https://postgr.es/m/CAKcux6=tPJ6nJ08r__nU_pmLQiC0xY15Fn0HvG1Cprsjdd9s_Q@mail.gmail.com
Since their introduction, partition trees have been a bit lossy
regarding temporary relations. Inheritance trees respect the following
patterns:
1) a child relation can be temporary if the parent is permanent.
2) a child relation can be temporary if the parent is temporary.
3) a child relation cannot be permanent if the parent is temporary.
4) The use of temporary relations also imply that when both parent and
child need to be from the same sessions.
Partitions share many similar patterns with inheritance, however the
handling of the partition bounds make the situation a bit tricky for
case 1) as the partition code bases a lot of its lookup code upon
PartitionDesc which does not really look after relpersistence. This
causes for example a temporary partition created by session A to be
visible by another session B, preventing this session B to create an
extra partition which overlaps with the temporary one created by A with
a non-intuitive error message. There could be use-cases where mixing
permanent partitioned tables with temporary partitions make sense, but
that would be a new feature. Partitions respect 2), 3) and 4) already.
It is a bit depressing to see those error checks happening in
MergeAttributes() whose purpose is different, but that's left as future
refactoring work.
Back-patch down to 10, which is where partitioning has been introduced,
except that default partitions do not apply there. Documentation also
includes limitations related to the use of temporary tables with
partition trees.
Reported-by: David Rowley
Author: Amit Langote, Michael Paquier
Reviewed-by: Ashutosh Bapat, Amit Langote, Michael Paquier
Discussion: https://postgr.es/m/CAKJS1f94Ojk0og9GMkRHGt8wHTW=ijq5KzJKuoBoqWLwSVwGmw@mail.gmail.com
Explain the difference between "make check" and "make installcheck".
Mention the need for --enable-tap-tests (only some of these did so
before). Standardize their wording about how to run the tests.