This allows AM-specific knowledge to be applied during creation of
pg_amop and pg_amproc entries. Specifically, the AM knows better than
core code which entries to consider as required or optional. Giving
the latter entries the appropriate sort of dependency allows them to
be dropped without taking out the whole opclass or opfamily; which
is something we'd like to have to correct obsolescent entries in
extensions.
This callback also opens the door to performing AM-specific validity
checks during opclass creation, rather than hoping than an opclass
developer will remember to test with "amvalidate". For the most part
I've not actually added any such checks yet; that can happen in a
follow-on patch. (Note that we shouldn't remove any tests from
"amvalidate", as those are still needed to cross-check manually
constructed entries in the initdb data. So adding tests to
"amadjustmembers" will be somewhat duplicative, but it seems like
a good idea anyway.)
Patch by me, reviewed by Alexander Korotkov, Hamid Akhtar, and
Anastasia Lubennikova.
Discussion: https://postgr.es/m/4578.1565195302@sss.pgh.pa.us
Commit eba77534 fixed an amcheck false positive bug involving
inconsistencies in TOAST input state between table and index. A test
case was added that verified that such an inconsistency didn't result in
a spurious corruption related error.
Test coverage from the test was accidentally lost by commit 501e41dd,
which propagated ALTER TABLE ... SET STORAGE attstorage state to
indexes. This broke the test because the test specifically relied on
attstorage not being propagated. This artificially forced there to be
index tuples whose datums were equivalent to the datums in the heap
without the datums actually being bitwise equal.
Fix this by updating pg_attribute directly instead. Commit 501e41dd
made similar changes to a test_decoding TOAST-related test case which
made the same assumption, but overlooked the amcheck test case.
Backpatch: 11-, just like commit eba77534 (and commit 501e41dd).
Avoid repeatedly calling lseek(SEEK_END) during recovery by caching
the size of each fork. For now, we can't use the same technique in
other processes, because we lack a shared invalidation mechanism.
Do this by generalizing the pre-existing caching used by FSM and VM
to support all forks.
Discussion: https://postgr.es/m/CAEepm%3D3SSw-Ty1DFcK%3D1rU-K6GSzYzfdD4d%2BZwapdN7dTa6%3DnQ%40mail.gmail.com
This commit makes pg_stat_statements track the total number
of rows retrieved or affected by CREATE TABLE AS, SELECT INTO,
CREATE MATERIALIZED VIEW and FETCH commands.
Suggested-by: Pascal Legrand
Author: Fujii Masao
Reviewed-by: Asif Rehman
Discussion: https://postgr.es/m/1584293755198-0.post@n3.nabble.com
This adds seven methods to the output plugin API, adding support for
streaming changes of large in-progress transactions.
* stream_start
* stream_stop
* stream_abort
* stream_commit
* stream_change
* stream_message
* stream_truncate
Most of this is a simple extension of the existing methods, with
the semantic difference that the transaction (or subtransaction)
is incomplete and may be aborted later (which is something the
regular API does not really need to deal with).
This also extends the 'test_decoding' plugin, implementing these
new stream methods.
The stream_start/start_stop are used to demarcate a chunk of changes
streamed for a particular toplevel transaction.
This commit simply adds these new APIs and the upcoming patch to "allow
the streaming mode in ReorderBuffer" will use these APIs.
Author: Tomas Vondra, Dilip Kumar, Amit Kapila
Reviewed-by: Amit Kapila
Tested-by: Neha Sharma and Mahendra Singh Thalor
Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
A compressed stream may end with an empty packet. In this case
decompression finishes before reading the empty packet and the
remaining stream packet causes a failure in reading the following
data. This commit makes sure to consume such extra data, avoiding a
failure when decompression the data. This corner case was reproducible
easily with a data length of 16kB, and existed since e94dd6a. A cheap
regression test is added to cover this case based on a random,
incompressible string.
The first attempt of this patch has allowed to find an older failure
within the compression logic of pgcrypto, fixed by b9b6105. This
involved SLES 15 with z390 where a custom flavor of libz gets used.
Bonus thanks to Mark Wong for providing access to the specific
environment.
Reported-by: Frank Gagnepain
Author: Kyotaro Horiguchi, Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
Backpatch-through: 9.5
contrib/pgcrypto mishandled the case where deflate() does not consume
all of the offered input on the first try. It reset the next_in pointer
to the start of the input instead of leaving it alone, causing the wrong
data to be fed to the next deflate() call.
This has been broken since pgcrypto was committed. The reason for the
lack of complaints seems to be that it's fairly hard to get stock zlib
to not consume all the input, so long as the output buffer is big enough
(which it normally would be in pgcrypto's usage; AFAICT the input is
always going to be packetized into packets no larger than ZIP_OUT_BUF).
However, IBM's zlibNX implementation for AIX evidently will do it
in some cases.
I did not add a test case for this, because I couldn't find one that
would fail with stock zlib. When we put back the test case for
bug #16476, that will cover the zlibNX situation well enough.
While here, write deflate()'s second argument as Z_NO_FLUSH per its
API spec, instead of hard-wiring the value zero.
Per buildfarm results and subsequent investigation.
Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
This reverts commit 9e10898, after finding out that buildfarm members
running SLES 15 on z390 complain on the compression and decompression
logic of the new test: pipistrelles, barbthroat and steamerduck.
Those hosts are visibly using hardware-specific changes to improve zlib
performance, requiring more investigation.
Thanks to Tom Lane for the discussion.
Discussion: https://postgr.es/m/20200722093749.GA2564@paquier.xyz
Backpatch-through: 9.5
Add infinities that behave the same as they do in the floating-point
data types. Aside from any intrinsic usefulness these may have,
this closes an important gap in our ability to convert floating
values to numeric and/or replace float-based APIs with numeric.
The new values are represented by bit patterns that were formerly
not used (although old code probably would take them for NaNs).
So there shouldn't be any pg_upgrade hazard.
Patch by me, reviewed by Dean Rasheed and Andrew Gierth
Discussion: https://postgr.es/m/606717.1591924582@sss.pgh.pa.us
A compressed stream may end with an empty packet, and PGP decompression
finished before reading this empty packet in the remaining stream. This
caused a failure in pgcrypto, handling this case as corrupted data.
This commit makes sure to consume such extra data, avoiding a failure
when decompression the entire stream. This corner case was reproducible
with a data length of 16kB, and existed since its introduction in
e94dd6a. A cheap regression test is added to cover this case.
Thanks to Jeff Janes for the extra investigation.
Reported-by: Frank Gagnepain
Author: Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/16476-692ef7b84e5fb893@postgresql.org
Backpatch-through: 9.5
One change for getObjectIdentity() has been missed in 2a10fdc, causing
the module to not compile properly. This was actually the only problem,
and it happens that it is easy enough to check the compilation of the
module on Debian after installing libselinux1-dev.
Per buildfarm member rhinoceros.
When using the following functions, users could see various types of
errors of the type "cache lookup failed for OID XXX" with elog(), that
can only be used for internal errors:
* pg_describe_object()
* pg_identify_object()
* pg_identify_object_as_address()
The set of APIs managing object addresses for all object types are made
smarter by gaining a new argument "missing_ok" that allows any caller to
control if an error is raised or not on an undefined object. The SQL
functions listed above are changed to handle the case where an object is
missing.
Regression tests are added for all object types for the cases where
these are undefined. Before this commit, these cases failed with cache
lookup errors, and now they basically return NULL (minus the name of the
object type requested).
Author: Michael Paquier
Reviewed-by: Aleksander Alekseev, Dmitry Dolgov, Daniel Gustafsson,
Álvaro Herrera, Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAB7nPqSZxrSmdHK-rny7z8mi=EAFXJ5J-0RbzDw6aus=wB5azQ@mail.gmail.com
read_binary_file(), used by SQL functions pg_read_file() and friends,
uses stat to determine file length to read, when not passed an explicit
length as an argument. This is problematic, for example, if the file
being read is a virtual file with a stat-reported length of zero.
Arrange to read until EOF, or StringInfo data string lenth limit, is
reached instead.
Original complaint and patch by me, with significant review, corrections,
advice, and code optimizations by Tom Lane. Backpatched to v11. Prior to
that only paths relative to the data and log dirs were allowed for files,
so no "zero length" files were reachable anyway.
Reviewed-By: Tom Lane
Discussion: https://postgr.es/m/flat/969b8d82-5bb2-5fa8-4eb1-f0e685c5d736%40joeconway.com
Backpatch-through: 11
Since v13 pg_stat_statements is allowed to track the planning time of
statements when track_planning option is enabled. Its default was on.
But this feature could cause more terrible spinlock contentions in
pg_stat_statements. As a result of this, Robins Tharakan reported that
v13 beta1 showed ~45% performance drop at high DB connection counts
(when compared with v12.3) during fully-cached SELECT-only test using
pgbench.
To avoid this performance regression by the default setting,
this commit changes default of pg_stat_statements.track_planning to off.
Back-patch to v13 where pg_stat_statements.track_planning was introduced.
Reported-by: Robins Tharakan
Author: Fujii Masao
Reviewed-by: Julien Rouhaud
Discussion: https://postgr.es/m/2895b53b033c47ccb22972b589050dd9@EX13D05UWC001.ant.amazon.com
TOAST tables have a visibility map and a free space map, so they can
be supported by pgstattuple_approx just fine.
Add test cases to show how various pgstattuple functions accept TOAST
tables. Also add similar tests to pg_visibility, which already
accepted TOAST tables correctly but had no test coverage for them.
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://www.postgresql.org/message-id/flat/27c4496a-02b9-dc87-8f6f-bddbef54e0fe@2ndquadrant.com
Since %c only passes a C "char" to printf, it's incapable of dealing
with multibyte characters. Passing just the first byte of such a
character leads to an output string that is visibly not correctly
encoded, resulting in undesirable behavior such as encoding conversion
failures while sending error messages to clients.
We've lived with this issue for a long time because it was inconvenient
to avoid in a portable fashion. However, now that we always use our own
snprintf code, it's reasonable to use the %.*s format to print just one
possibly-multibyte character in a string. (We previously avoided that
obvious-looking answer in order to work around glibc's bug #6530, cf
commits 54cd4f045 and ed437e2b2.)
Hence, run around and fix a bunch of places that used %c to report
a character found in a user-supplied string. For simplicity, I did
not touch places that were emitting non-user-facing debug messages,
or reporting catalog data that should always be ASCII. (It's also
unclear how useful this approach could be in frontend code, where
it's less certain that we know what encoding we're dealing with.)
In passing, improve a couple of poorly-written error messages in
pageinspect/heapfuncs.c.
This is a longstanding issue, but I'm hesitant to back-patch because
of the impact on translatable message strings. In any case this fix
would not work reliably before v12.
Tom Lane and Quan Zongliang
Discussion: https://postgr.es/m/a120087c-4c88-d9d4-1ec5-808d7a7f133d@gmail.com
SQL:1999 had syntax
SUBSTRING(text FROM pattern FOR escapechar)
but this was replaced in SQL:2003 by the more clear
SUBSTRING(text SIMILAR pattern ESCAPE escapechar)
but this was never implemented in PostgreSQL. This patch adds that
new syntax as an alternative in the parser, and updates documentation
and tests to indicate that this is the preferred alternative now.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Vik Fearing <vik@postgresfriends.org>
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Discussion: https://www.postgresql.org/message-id/flat/a15db31c-d0f8-8ce0-9039-578a31758adb%402ndquadrant.com
This patch removes the hardcoded check for superuser privileges when
executing replication origin functions. Instead, execution is revoked
from public, meaning that those functions can be executed by a superuser
and that access to them can be granted.
Author: Martín Marqués
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Masahiko Sawada
Discussion: https:/postgr.es/m/CAPdiE1xJMZOKQL3dgHMUrPqysZkgwzSMXETfKkHYnBAB7-0VRQ@mail.gmail.com
Commit 5e0928005 changed the planner so that, instead of blindly using
DEFAULT_COLLATION_OID when invoking operators for selectivity estimation,
it would use the collation of the column whose statistics we're
considering. This was recognized as still being not quite the right
thing, but it seemed like a good incremental improvement. However,
shortly thereafter we introduced nondeterministic collations, and that
creates cases where operators can fail if they're passed the wrong
collation. We don't want planning to fail in cases where the query itself
would work, so this means that we *must* use the query's collation when
invoking operators for estimation purposes.
The only real problem this creates is in ineq_histogram_selectivity, where
the binary search might produce a garbage answer if we perform comparisons
using a different collation than the column's histogram is ordered with.
However, when the query's collation is significantly different from the
column's default collation, the estimate we previously generated would be
pretty irrelevant anyway; so it's not clear that this will result in
noticeably worse estimates in practice. (A follow-on patch will improve
this situation in HEAD, but it seems too invasive for back-patch.)
The patch requires changing the signatures of mcv_selectivity and allied
functions, which are exported and very possibly are used by extensions.
In HEAD, I just did that, but an API/ABI break of this sort isn't
acceptable in stable branches. Therefore, in v12 the patch introduces
"mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes
the old functions into wrappers that assume DEFAULT_COLLATION_OID should
be used. That does not match the prior behavior, but it should avoid risk
of failure in most cases. (In practice, I think most extension datatypes
aren't collation-aware, so the change probably doesn't matter to them.)
Per report from James Lucas. Back-patch to v12 where the problem was
introduced.
Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com
Commit 1f39bce021 added disk-based hash aggregation, which may spill
incoming tuples to disk. It however did not request projection to make
the tuples as narrow as possible, which may mean having to spill much
more data than necessary (increasing I/O, pushing other stuff from page
cache, etc.).
This adds CP_SMALL_TLIST in places that may use hash aggregation - we do
that only for AGG_HASHED. It's unnecessary for AGG_SORTED, because that
either uses explicit Sort (which already does projection) or pre-sorted
input (which does not need spilling to disk).
Author: Tomas Vondra
Reviewed-by: Jeff Davis
Discussion: https://postgr.es/m/20200519151202.u2p2gpiawoaznsv2%40development
Includes some manual cleanup of places that pgindent messed up,
most of which weren't per project style anyway.
Notably, it seems some people didn't absorb the style rules of
commit c9d297751, because there were a bunch of new occurrences
of function calls with a newline just after the left paren, all
with faulty expectations about how the rest of the call would get
indented.
amcheck expects at least hikey to always exist on leaf page even if it is
deleted page. But replica reinitializes page during replay of page deletion,
causing deleted page to have no items. Thus, replay of page deletion can
cause an error in concurrent amcheck run.
This commit relaxes amcheck expectation making it tolerate deleted page with
no items.
Reported-by: Konstantin Knizhnik
Discussion: https://postgr.es/m/CAPpHfdt_OTyQpXaPJcWzV2N-LNeNJseNB-K_A66qG%3DL518VTFw%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Peter Geoghegan
Backpatch-through: 11
The additional pain from level 4 is excessive for the gain.
Also revert all the source annotation changes to their original
wordings, to avoid back-patching pain.
Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
In commit 33e05f89c5, we have added the option to display WAL usage
statistics in Explain and auto_explain. The display format used two spaces
between each field which is inconsistent with Buffer usage statistics which
is using one space between each field. Change the format to make WAL usage
statistics consistent with Buffer usage statistics.
This commit also changed the usage of "full page writes" to
"full page images" for WAL usage statistics to make it consistent with
other parts of code and docs.
Author: Julien Rouhaud, Amit Kapila
Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Add a test case to contrib/amcheck that creates coverage of code paths
that are used to verify posting list tuples (tuples created when
deduplication merges together existing tuples to avoid a leaf page
split).
Writing a trailing semicolon in a macro is almost never the right thing,
because you almost always want to write a semicolon after each macro
call instead. (Even if there was some reason to prefer not to, pgindent
would probably make a hash of code formatted that way; so within PG the
rule should basically be "don't do it".) Thus, if we have a semi inside
the macro, the compiler sees "something;;". Much of the time the extra
empty statement is harmless, but it could lead to mysterious syntax
errors at call sites. In perhaps an overabundance of neatnik-ism, let's
run around and get rid of the excess semicolons whereever possible.
The only thing worse than a mysterious syntax error is a mysterious
syntax error that only happens in the back branches; therefore,
backpatch these changes where relevant, which is most of them because
most of these mistakes are old. (The lack of reported problems shows
that this is largely a hypothetical issue, but still, it could bite
us in some future patch.)
John Naylor and Tom Lane
Discussion: https://postgr.es/m/CACPNZCs0qWTqJ2QUSGJ07B7uvAvzMb-KbG2q+oo+J3tsWN5cqw@mail.gmail.com
The libpq parameters ssl{max|min}protocolversion are renamed to use
underscores, to become ssl_{max|min}_protocol_version. The related
environment variables still use the names introduced in commit ff8ca5f
that added the feature.
Per complaint from Peter Eisentraut (this was also mentioned by me in
the original patch review but the issue got discarded).
Author: Daniel Gustafsson
Reviewed-by: Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/b319e449-318d-e691-4997-1327e166fcc4@2ndquadrant.com
Lack of these checks could cause visible misbehavior, including
assertion failures. This was missed in commit c655077639, whereby
restart_lsn becomes invalid when the size limit is exceeded.
Also reword some existing error messages, and add errdetail(), so that
the reported errors all match in spirit.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200408.093710.447591748588426656.horikyota.ntt@gmail.com
fixup_whole_row_references() did the wrong thing with a dropped column,
resulting in a commit-time warning about a cache reference leak.
I (tgl) added a test case exercising this, but back-patched the test
only as far as v10; the patch didn't apply cleanly to 9.6 and it
didn't seem worth the trouble to adapt it. The bug is pretty old
though, so apply the code change all the way back.
Michael Luo, with cosmetic improvements by me
Discussion: https://postgr.es/m/BYAPR08MB5606D1453D7F50E2AF4D2FD29AD80@BYAPR08MB5606.namprd08.prod.outlook.com
An nbtree split point can be thought of as a point between two adjoining
tuples from an imaginary version of the page being split that includes
the incoming/new item (in addition to the items that really are on the
page). These adjoining tuples are called the lastleft and firstright
tuples.
The variables that represent split points contained a field called
firstright, which is an offset number of the first data item from the
original page that goes on the new right page. The corresponding tuple
from origpage was usually the same thing as the actual firstright tuple,
but not always: the firstright tuple is sometimes the new/incoming item
instead. This situation seems unnecessarily confusing.
Make things clearer by renaming the origpage offset returned by
_bt_findsplitloc() to "firstrightoff". We now have a firstright tuple
and a firstrightoff offset number which are comparable to the
newitem/lastleft tuples and the newitemoff/lastleftoff offset numbers
respectively. Also make sure that we are consistent about how we
describe nbtree page split point state.
Push the responsibility for dealing with pg_upgrade'd !heapkeyspace
indexes down to lower level code, relieving _bt_split() from dealing
with it directly. This means that we always have a palloc'd left page
high key on the leaf level, no matter what. This enables simplifying
some of the code (and code comments) within _bt_split().
Finally, restructure the page split code to make it clearer why suffix
truncation (which only takes place during leaf page splits) is
completely different to the first data item truncation that takes place
during internal page splits. Tuples are marked as having fewer
attributes stored in both cases, and the firstright tuple is truncated
in both cases, so it's easy to imagine somebody missing the distinction.
We've had a mixture of the warnings pragma, the -w switch on the shebang
line, and no warnings at all. This patch removes the -w swicth and add
the warnings pragma to all perl sources missing it. It raises the
severity of the TestingAndDebugging::RequireUseWarnings perlcritic
policy to level 5, so that we catch any future violations.
Discussion: https://postgr.es/m/20200412074245.GB623763@rfd.leadboat.com
Add a DEBUG1 message indicating that verification of the index structure
is underway. Also reduce the severity level of the existing "tree
level" debug message to DEBUG1. It should never have been made DEBUG2.
Any B-Tree index with more than a couple of levels will generally also
have so many pages that the per-page DEBUG2 messages will become
completely unmanageable.
In passing, add a new "Tip" to the docs that advises users that run into
corruption that the debug messages might provide useful additional
context.
Commit d2d8a229bc introduced Incremental Sort, but it was considered
only in create_ordered_paths() as an alternative to regular Sort. There
are many other places that require sorted input and might benefit from
considering Incremental Sort too.
This patch modifies a number of those places, but not all. The concern
is that just adding Incremental Sort to any place that already adds
Sort may increase the number of paths considered, negatively affecting
planning time, without any benefit. So we've taken a more conservative
approach, based on analysis of which places do affect a set of queries
that did seem practical. This means some less common queries may not
benefit from Incremental Sort yet.
Author: Tomas Vondra
Reviewed-by: James Coleman
Discussion: https://postgr.es/m/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etZEeAQ@mail.gmail.com
The txid_XXX family of fmgr functions exposes 64 bit transaction IDs to
users as int8. Now that we have an SQL type xid8 for FullTransactionId,
define a new set of functions including pg_current_xact_id() and
pg_current_snapshot() based on that. Keep the old functions around too,
for now.
It's a bit sneaky to use the same C functions for both, but since the
binary representation is identical except for the signedness of the
type, and since older functions are the ones using the wrong signedness,
and since we'll presumably drop the older ones after a reasonable period
of time, it seems reasonable to switch to FullTransactionId internally
and share the code for both.
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Takao Fujii <btfujiitkp@oss.nttdata.com>
Reviewed-by: Yoshikazu Imai <imai.yoshikazu@fujitsu.com>
Reviewed-by: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://postgr.es/m/20190725000636.666m5mad25wfbrri%40alap3.anarazel.de
This commit adds a new option WAL similar to existing option BUFFERS in the
EXPLAIN command. This option allows to include information on WAL record
generation added by commit df3b181499 in EXPLAIN output.
This also allows the WAL usage information to be displayed via
the auto_explain module. A new parameter auto_explain.log_wal controls
whether WAL usage statistics are printed when an execution plan is logged.
This parameter has no effect unless auto_explain.log_analyze is enabled.
Author: Julien Rouhaud
Reviewed-by: Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
This commit adds three new columns in pg_stat_statements output to
display WAL usage statistics added by commit df3b181499.
This commit doesn't bump the version of pg_stat_statements as the
same is done for this release in commit 17e0328224.
Author: Kirill Bychik and Julien Rouhaud
Reviewed-by: Julien Rouhaud, Fujii Masao, Dilip Kumar and Amit Kapila
Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN.
Future servers accept older WAL, so this bump is discretionary.
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org