This moves the functions related to performing WAL recovery into the new
xlogrecovery.c source file, leaving xlog.c responsible for maintaining
the WAL buffers, coordinating the startup and switch from recovery to
normal operations, and other miscellaneous stuff that have always been in
xlog.c.
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
This is in preparation for the next commit, which will split off
recovery-related code from xlog.c into a new source file. This is the
order that things will happen with the next commit, and the point of
this commit is to make these ordering changes more explicit, while the
next commit mechanically moves the source code to the new file. To aid
review, I added "BEGIN/END function" comments to mark which blocks of
code are moved to which functions in the next commit. They will be gone
in the next commit.
Reviewed-by: Andres Freund, Kyotaro Horiguchi, Robert Haas
Discussion: https://www.postgresql.org/message-id/a31f27b4-a31d-f976-6217-2b03be646ffa%40iki.fi
To tidy up after some recent refactorings in xlog.c. These would be
fixed by the pgindent run we do at the end of the development cycle,
but I want to clean these up now as I'm about to do some more big
refactorings on xlog.c.
src/backend/utils/misc/check_guc is a script that cross-checks the
consistency of the GUCs with postgresql.conf.sample, making sure that
its format is in line with what guc.c has. It has never been run
automatically, and has rotten over the years, creating a lot of false
positives as per a report from Justin Pryzby.
d10e41d has introduced a SQL function to publish the most relevant flags
associated to a GUC, with tests added in the main regression test suite
to make sure that we avoid most of the inconsistencies in the GUC
settings, based on recent reports, but there was nothing able to
cross-check postgresql.conf.sample with the contents of guc.c.
This commit adds a TAP test that covers the remaining gap. It emulates
the most relevant checks that check_guc did, so as any format mistakes
are detected in postgresql.conf.sample at development stage, with the
following checks:
- Check that parameters marked as NOT_IN_SAMPLE are not in the sample
file.
- Check that there are no dead entries in postgresql.conf.sample for
parameters not marked as NOT_IN_SAMPLE.
- Check that no parameters are missing from the sample file if listed in
guc.c without NOT_IN_SAMPLE.
The idea of building a list of the GUCs by parsing the sample file comes
from Justin, and he wrote the regex used in the patch to find all the
GUCs (this same formatting rule basically applies for the last 20~ years
or so). In order to test this patch, I have played with manual
modifications of postgresql.conf.sample and guc.c, making sure that we
detect problems with the GUC rules and the sample file format.
The test is located in src/test/modules/test_misc, which is the best
location I could think about for such sanity checks, rather than the
main regression test suite (src/test/regress) to avoid a new type of
dependency with the source tree.
The first attempt of this patch was b0a55f4, where the location of
postgresql.conf.sample was retrieved using pg_config --sharedir. This
has proven to be an issue for distributions that patch pg_config to
enforce the installation paths at some wanted location (like Debian),
that may not exist when the test is run, hence causing a failure.
Instead of that, as per a suggestion from Andres Freund, rely on the
fact that the test is always executed from its directory in the source
tree and use a relative path to find the sample file. This works for
the CI, VPATH builds and on Windows, and tests like the recovery one
added in f47ed79 rely on that already.
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/Yf9YGSwPiMu0c7fP@paquier.xyz
The 028_pitr_timelines.pl test would sometimes hang, waiting for a WAL
segment that was just filled up to be archived. It was because the
test used 'pg_stat_archiver.last_archived_wal' to check if a file was
archived, but the order that WAL files are archived when a standby is
promoted is not fully deterministic, and 'last_archived_wal' tracks
the last segment that was archived, not the highest-numbered WAL
segment. Because of that, if the archiver archived segment 3, and then
2, 'last_archived_wal' say 2, and the test query would think that 3
has not been archived yet.
Normally, WAL files are marked ready for archival in order, and the
archiver process will process them in order, so that issue doesn't
arise. We have used the same query on 'last_archived_wal' in a few
other tests with no problem. But when a standby is promoted, things
are a bit chaotic. After promotion, the server will try to archive all
the WAL segments from the old timeline that are in pg_wal, as well as
the history file and any new WAL segments on the new timeline. The
end-of-recovery checkpoint will create the .ready files for all the
WAL files on the old timeline, but at the same time, the new timeline
is opened up for business. A file from the new timeline can therefore
be archived before the files from the old timeline have been marked as
ready for archival.
It turns out that we don't really need to wait for the archival in
this particular test, because the standby server is about to be
stopped, and stopping a server will wait for the end-of-recovery
checkpoint and all WAL archivals to finish, anyway. So we can just
remove it from the test.
Add a note to the docs on 'pg_stat_archiver' view that files can be
archived out of order.
Reviewed-by: Tom Lane
Discussion: https://www.postgresql.org/message-id/3186114.1644960507@sss.pgh.pa.us
There is a very good (though non-obvious) reason to avoid relation
truncation during a VACUUM that has triggered the failsafe mechanism,
which was missed before now. Update related comments, so this isn't
forgotten.
Reported-By: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://postgr.es/m/CAFBsxsFiMPxQ-dHZ8tOgktn=+ffeJT3+GinZ4zdOGbmAnCYadA@mail.gmail.com
I think this will shut up a weird warning from buildfarm member
serinus. Perhaps it'd be better to change tsCompareString's
length arguments to unsigned, but that seems more invasive
than is justified.
Part of a general push to remove off-the-beaten-track warnings
where we can easily do so.
Our gcc-on-Solaris buildfarm members emit "incompatible pointer type"
warnings in places where it's not. This is a bit odd, since AFAICT
Solaris follows the POSIX spec in declaring shmdt's argument as
"const void *", and you'd think any pointer argument would satisfy that.
But whatever. Part of a general push to remove off-the-beaten-track
warnings where we can easily do so.
docbook-xsl's index generation stylesheet (autoidx.xsl) has a small bug: It
doesn't include xlink in exclude-result-prefixes. Normally just leads to a a
single xmlns:xlink in the <div> containing the index, but because our
customization emits that, xmlns:xlink intead gets added to every element
output by autoidx.xsl below the <div>, totalling around 100kB.
Adding the spurious xmlns:xlink to the <div> ourselves isn't great, but avoids
the duplication.
Reviewed-By: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/20220213201618.qz6p6noon3wagr3f%40alap3.anarazel.de
checkViewTupleDesc() didn't get the memo that it should verify
same attcollation along with same type/typmod. (A quick scan
did not find other similar oversights.)
Per bug #17404 from Pierre-Aurélien Georges. On another day
I might've back-patched this, but today I'm feeling paranoid
about unnecessary behavioral changes in back branches.
Discussion: https://postgr.es/m/17404-8a4a270ef30a6709@postgresql.org
The XSL stylesheets used a mix of incorrect or outdated namespace
declarations for XHTML, probably based on ancient advice and examples.
Clean all this up.
Besides improving correctness (although probably no impact in
practice, other than possible validation failures), this removes a
bunch of useless namespace declarations in the HTML output.
Reported-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/20220213201618.qz6p6noon3wagr3f%40alap3.anarazel.de
The test has failed a couple of times on buildfarm member 'hoverfly'. It
gets stuck waiting for the standby to archive 000000020000000000000003
WAL segment. I don't understand why, but with DEBUG1, we will get messages
in the log whenever a segment is archived, which hopefully will give a
clue the next time it happens.
The AF_UNIX macro was being used unprotected by HAVE_UNIX_SOCKETS,
apparently since 2008. So the redirection through IS_AF_UNIX() is
apparently no longer necessary. (More generally, all supported
platforms are now HAVE_UNIX_SOCKETS, but even if there were a new
platform in the future, it seems plausible that it would define the
AF_UNIX symbol even without kernel support.) So remove the
IS_AF_UNIX() macro and make the code a bit more consistent.
Discussion: https://www.postgresql.org/message-id/flat/f2d26815-9832-e333-d52d-72fbc0ade896%40enterprisedb.com
PostgreSQL currently accepts numeric literals with trailing
non-digits, such as 123abc where the abc is treated as the next token.
This may be a bit surprising. This commit adds test cases for this;
subsequent commits intend to change this behavior.
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
38bfae3 has mixed the "dump/" and "log/" subdirectories generated in
"pg_upgrade_output.d/", causing the internal dump files to be generated
in "log/" and the log files to be in "dump/", but the opposite should be
done. This was not directly an issue for pg_upgrade runs, as the
internal dump files were still picked up at the location of their
creation, but the newest version of the buildfarm client would have
reported the dump files instead of the log files on failures of
pg_upgrade.
Issue spotted while testing the TAP tests of pg_upgrade.
Previously, replication slots were released in ProcKill() on error, resulting
in reporting replication slot drop of ephemeral slots after the stats
subsystem was already shut down.
To fix this problem, move replication slot release to a before_shmem_exit()
hook that is called before the stats collector shuts down. There wasn't really
a good reason for the slot handling to be in ProcKill() anyway.
Patch by Masahiko Sawada, with very minor polishing by me.
I, Andres, wrote a test for dropping slots during process exit, but there may
be some OS dependent issues around the number of times FATAL error messages
are displayed due to a still debated libpq issue. So that test will be
committed separately / later.
Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-By: Andres Freund <andres@anarazel.de>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoDAeEpAbZEyYJsPZJUmSPaRicVSBObaL7sPaofnKz+9zg@mail.gmail.com
Move scanint8() to numutils.c and rename to pg_strtoint64(). We
already have a "16" and "32" version of that, and the code inside the
functions was aligned, so this move makes all three versions
consistent. The API is also changed to no longer provide the errorOK
case. Users that need the error checking can use strtoi64().
Reviewed-by: John Naylor <john.naylor@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/b239564c-cad0-b23e-c57e-166d883cb97d@enterprisedb.com
On AIX 7.1, struct sockaddr_un is declared to be 1025 bytes long,
but the sun_len field that should hold the length is only a byte.
Clamp the value we try to store to ensure it will fit in the field.
(This coding might need adjustment if there are any machines out
there where sun_len is as wide as size_t; but a preliminary survey
suggests there's not, so let's keep it simple.)
Discussion: https://postgr.es/m/2781112.1644819528@sss.pgh.pa.us
This function is unreferenced in modern usage; it was superseded in 9.1
by a core function of the same name. It has been left in place in the C
code only so that pre-9.1 SQL definitions of the contrib/xml2 functions
would continue to work. Eleven years seems like enough time for people
to have updated to the extension-style version of the xml2 module, so
let's drop this.
We did this once before, in 20540710e, and then reverted it because
the intended change of PGDLLEXPORT markings didn't happen. This
time the reason is to suppress link-time duplicate-symbol warnings
on AIX. That's not worth a lot perhaps, but the value of keeping
this function has surely dropped to about zero by now.
Discussion: https://postgr.es/m/2717731.1644778752@sss.pgh.pa.us
While I was working on a patch to refactor things around xlog.c, I mixed
up EndOfLogTLI and replayTLI at the end of recovery. As a result, if you
recovered to a point with a lower-numbered timeline in a WAL segment
that has a higher TLI in the filename, the end-of-recovery WAL record
was created with invalid PrevTimeLineId. I noticed that while
self-reviewing, but no tests failed. So add a test to cover that corner
case.
Thanks to Amul Sul who also submitted a test case for the same corner
case, although this patch is different from that.
Reviewed-by: Amul Sul, Michael Paquier
Discussion: https://www.postgresql.org/message-id/52bc9ccd-8591-431b-0086-15d9acf25a3f@iki.fi
Discussion: https://www.postgresql.org/message-id/CAAJ_b94Vjt5cXGza_1MkjLQWciNdEemsmiWuQj0d%3DM7JfjAa1g%40mail.gmail.com
This adds to database objects the same version tracking that collation
objects have. There is a new pg_database column datcollversion that
stores the version, a new function
pg_database_collation_actual_version() to get the version from the
operating system, and a new subcommand ALTER DATABASE ... REFRESH
COLLATION VERSION.
This was not originally added together with pg_collation.collversion,
since originally version tracking was only supported for ICU, and ICU
on a database-level is not currently supported. But we now have
version tracking for glibc (since PG13), FreeBSD (since PG14), and
Windows (since PG13), so this is useful to have now.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/f0ff3190-29a3-5b39-a179-fa32eee57db6%40enterprisedb.com
Some of the queries in the "sanity" tests in the regression test suite
(opr_sanity, type_sanity) are very confusing. One main stumbling
block is that for some probably ancient reason many of the older
queries are written with correlation names p1, p2, etc. independent of
the name of the catalog. This one is a good example:
SELECT p1.oid, p1.oprname, p2.oid, p2.proname
FROM pg_operator AS p1, pg_proc AS p2 <-- HERE
WHERE p1.oprcode = p2.oid AND
p1.oprkind = 'l' AND
(p2.pronargs != 1
OR NOT binary_coercible(p2.prorettype, p1.oprresult)
OR NOT binary_coercible(p1.oprright, p2.proargtypes[0])
OR p1.oprleft != 0);
This is better written as
SELECT o1.oid, o1.oprname, p1.oid, p1.proname
FROM pg_operator AS o1, pg_proc AS p1
WHERE o1.oprcode = p1.oid AND
o1.oprkind = 'l' AND
(p1.pronargs != 1
OR NOT binary_coercible(p1.prorettype, o1.oprresult)
OR NOT binary_coercible(o1.oprright, p1.proargtypes[0])
OR o1.oprleft != 0);
This patch cleans up all the queries in this manner.
(As in the above case, I kept the digits like o1 and p1 even in cases
where only one of each letter is used in a query. This is mainly to
keep the style consistent.)
Discussion: https://www.postgresql.org/message-id/flat/c538308b-319c-8784-e250-1284d12d5411%40enterprisedb.com
Previously we used poll() directly to check for a POLLRDHUP event.
Instead, use the WaitEventSet API to poll the socket for
WL_SOCKET_CLOSED, which knows how to detect this condition on many more
operating systems.
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
Provide a way for WaitEventSet to report that the remote peer has shut
down its socket, independently of whether there is any buffered data
remaining to be read. This works only on systems where the kernel
exposes that information, namely:
* WAIT_USE_POLL builds using POLLRDHUP, if available
* WAIT_USE_EPOLL builds using EPOLLRDHUP
* WAIT_USE_KQUEUE builds using EV_EOF
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Maksim Milyutin <milyutinma@gmail.com>
Discussion: https://postgr.es/m/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
Currently, during UPDATE, the unchanged replica identity key attributes
are not logged separately because they are getting logged as part of the
new tuple. But if they are stored externally then the untoasted values are
not getting logged as part of the new tuple and logical replication won't
be able to replicate such UPDATEs. So we need to log such attributes as
part of the old_key_tuple during UPDATE.
Reported-by: Haiying Tang
Author: Dilip Kumar and Amit Kapila
Reviewed-by: Alvaro Herrera, Haiying Tang, Andres Freund
Backpatch-through: 10
Discussion: https://postgr.es/m/OS0PR01MB611342D0A92D4F4BF26C0F47FB229@OS0PR01MB6113.jpnprd01.prod.outlook.com
At some point, Gen_fmgrtab.pl stopped needing the value of defined symbols
from access/transam.h, while genbki.pl starting doing so. The Makefiles
didn't get the memo, so update the relevant dependencies.
Some environments may compile with --with-lz4 while the command "lz4"
goes missing, causing two failures in the TAP tests of pg_verifybackup
(008_untar.pl and 010_client_untar.pl) as the code assumed that the
command always existed with a hardcoded value in src/Makefile.global.
Rather than this method, this adds a ./configure check based on
PGAC_PATH_PROGS() to find automatically the command and get an absolute
path to it.
Both tests need to be adjusted for the case where the command does not
exist, actually, as Makefile.global would set now LZ4 to an empty value
in this case. The TAP tests of pg_receivewal already do that.
Per report from buildfarm member copperhead, as an effect of dab2984.
The origin of the failure is actually babbbb5 that did not centralize
the check for the existence of a "lz4" command at ./configure to shave a
few cycles. Note that one just needs to tweak an environment to move
"lz4" out of the way to reproduce the problem, which is what I did to
test this change.
Per discussion with Robert Haas, Tom Lane, Andres Freund and myself.
Discussion: https://postgr.es/m/Ygc51WVAFGocSu4h@paquier.xyz
As of 1eb6d65, the origin data is optionally stored in a 2PC file
header, with the data filled in EndPrepare() even in the default case
where there is no origin data to add. This was inconsistent with all
the other fields of TwoPhaseFileHeader which are initialized in
StartPrepare(), so move the initialization of origin_lsn and
origin_timestamp there instead. The effect of missing the
initialization at this early stage is only cosmetic based on the current
logic of the code, but could have led to issues in the long-term, and it
is more consistent done this way.
Reported-by: Ranier Vilela
Discussion: https://postgr.es/m/CAEudQAooECJ+gU_RZB-yhioPOV94R4ucoHAf68PiJhLpgpVpBw@mail.gmail.com
"const foo *" is quite different from "foo * const".
This code was evidently trying to avoid casting away
const from the arguments, but entirely failed to do so.
Per study of some buildfarm warnings from anole
(which unfortunately are mostly ignorable, since it
seems not to understand "restrict" very well).
I'm surprised though that nothing else has complained.
Depending on compiler version and optimization level, we might
get a complaint that lazy_scan_heap's "freespace" is used
uninitialized.
Compilers not aware that ereport(ERROR) doesn't return complained
about bbsink_lz4_new().
Assigning "-1" to a uint64 value has unportable results; fortunately,
the value of xlogreadsegno is unimportant when xlogreadfd is -1.
(It looks to me like there is no need for xlogreadsegno to be static
in the first place, but I didn't venture to change that.)
Having the different types of logs in artifacts instruction makes it quicker
to navigate between them. Either that didn't use to be possible, or I just did
it wrong. I saw that / how it works in a patch by Justin Pryzby.
Also remove a few unnecessary uses of **, suggested by Justin Pryzby.
Discussion: https://postgr.es/m/20220203195718.smqo5xg4ygp5qktq@alap3.anarazel.de
Regression tests need to use skip-empty-xacts = false, because there
might be accidental concurrent activity (like autovacuum), particularly
on slow machines. The tests added by 80901b3291 failed to do that in a
couple places, triggering occasional failures on buildfarm.
Fixing the tests however uncovered a bug in the code, because sequence
callbacks did not handle skip-empty-xacts properly. For trasactional
increments we need to check/update the xact_wrote_changes flag, and emit
the BEGIN if it's the first change in the transaction.
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20220212220413.b25amklo7t4xb7ni%40alap3.anarazel.de
Commit 1f39a1c06 implemented write-failure postponement in pqSendSome,
which is above SSL/GSS processing. However, we've now seen failures
indicating that (some versions of?) OpenSSL have a tendency to report
write failures prematurely too. Hence, move the primary responsibility
for postponing write failures down to pqsecure_raw_write(), below
SSL/GSS processing. pqSendSome now sets write_failed only in corner
cases where we'd lost the connection already.
A side-effect of this change is that errors detected in the SSL/GSS
layer itself will be reported immediately (as if they were read
errors) rather than being postponed like write errors. That's
reverting an effect of 1f39a1c06, and I think it's fine: if there's
not a socket-level error, it's hard to be sure whether an OpenSSL
error ought to be considered a read or write failure anyway.
Another important point is that write-failure postponement is now
effective during connection setup. OpenSSL's misbehavior of this
sort occurs during SSL_connect(), so that's a change we want.
Per bug #17391 from Nazir Bilal Yavuz. Possibly this should be
back-patched, but I think it prudent to let it age awhile in HEAD
first.
Discussion: https://postgr.es/m/17391-304f81bcf724b58b@postgresql.org
In commit 1f39a1c06 I made PQisBusy consider conn->write_failed, but
that is now looking like complete brain fade. In the first place, the
logic is quite wrong: it ought to be like "and not" rather than "or".
This meant that once we'd gotten into a write_failed state, PQisBusy
would always return true, probably causing the calling application to
iterate its loop until PQconsumeInput returns a hard failure thanks
to connection loss. That's not what we want: the intended behavior
is to return an error PGresult, which the application probably has
much cleaner support for.
But in the second place, checking write_failed here seems like the
wrong thing anyway. The idea of the write_failed mechanism is to
postpone handling of a write failure until we've read all we can from
the server; so that flag should not interfere with input-processing
behavior. (Compare 7247e243a.) What we *should* check for is
status = CONNECTION_BAD, ie, socket already closed. (Most places that
close the socket don't touch asyncStatus, but they do reset status.)
This primarily ensures that if PQisBusy() returns true then there is
an open socket, which is assumed by several call sites in our own
code, and probably other applications too.
While at it, fix a nearby thinko in libpq's my_sock_write: we should
only consult errno for res < 0, not res == 0. This is harmless since
pqsecure_raw_write would force errno to zero in such a case, but it
still could confuse readers.
Noted by Andres Freund. Backpatch to v12 where 1f39a1c06 came in.
Discussion: https://postgr.es/m/20220211011025.ek7exh6owpzjyudn@alap3.anarazel.de
This reverts commit b0a55f4, to remove for now the TAP test that did the
equivalent of check_guc. The test has been using pg_config --sharedir
to find the location of postgresql.conf.sample. While the buildfarm and
normal build environments rather liked that, this proves to be an issue
for Debian where pg_config is patched to not be relocatable, causing the
test to fail.
Rather than relying on pg_config, we'd better find the sample file based
on its location from the source directory. However, this is also an
issue as a TAP test only offers the build directory as of TESTDIR in the
environment context, so this would fail with VPATH builds. Perhaps the
source path could be provided additionally when running the TAP tests.
Or perhaps we may be able to get away by just switching to a SQL
approach, by using PG_ABS_SRCDIR but this is going to require some extra
loops to get the sample file from the correct path in src/backend/. In
any case, this needs more thoughts, so just revert the test case until
something better is done about this relocation problem.
Reported-by: Christopher Berg
Discussion: https://postgr.es/m/YgYw25OXV5men8Fj@msg.df7cb.de