It is useful for debugging purposes to report the checkpoint LSN and
REDO LSN in log_checkpoints message. It can give more context while
analyzing checkpoint-related issues. pg_controldata reports the last
checkpoint LSN and REDO LSN, but having this information alongside
the log message helps analyze issues that happened previously,
connect the dots and identify the root cause.
Author: Bharath Rupireddy, Kyotaro Horiguchi
Reviewed-by: Michael Paquier, Julien Rouhaud, Nathan Bossart, Fujii Masao, Greg Stark
Discussion: https://postgr.es/m/CALj2ACWt6kqriAHrO+AJj+OmP=suwbktHT5JoYAn-nqZe2gd2g@mail.gmail.com
When locking a specific named relation for a FOR [KEY] UPDATE/SHARE
clause, transformLockingClause() finds the relation to lock by
scanning the rangetable for an RTE with a matching eref->aliasname.
However, it failed to account for the visibility rules of a join RTE.
If a join RTE doesn't have a user-supplied alias, it will have a
generated eref->aliasname of "unnamed_join" that is not visible as a
relation name in the parse namespace. Such an RTE needs to be skipped,
otherwise it might be found in preference to a regular base relation
with a user-supplied alias of "unnamed_join", preventing it from being
locked.
In addition, if a join RTE doesn't have a user-supplied alias, but
does have a join_using_alias, then the RTE needs to be matched using
that alias rather than the generated eref->aliasname, otherwise a
misleading "relation not found" error will be reported rather than a
"join cannot be locked" error.
Backpatch all the way, except for the second part which only goes back
to 14, where JOIN USING aliases were added.
Dean Rasheed, reviewed by Tom Lane.
Discussion: https://postgr.es/m/CAEZATCUY_KOBnqxbTSPf=7fz9HWPnZ5Xgb9SwYzZ8rFXe7nb=w@mail.gmail.com
This commit bumps the runtime value of _WIN32_WINNT to be 0x0A00 for any
builds on Windows. Hence, this makes Windows 10 the minimal requirement
when running PostgreSQL under WIN32, be it for builds of Cygwin, MinGW
or Visual Studio.
The previous minimal runtime version was either Windows Vista when
building with at least Visual Studio 2015 or Windows XP for the rest.
Windows 10 is the most modern version supported by Microsoft, and per
discussion, as we don't have buildfarm members that run older versions
anymore, this is the minimal supported version that suits better for our
needs. This will actually make easier the development of some patches,
two being async I/O and large page handling by avoiding a lot of
compatibility gotchas, on platforms that have most likely few users
anyway.
It is possible to remove MIN_WINNT in win32.h and the macros
IsWindowsXXXOrGreater() that were used in the code at runtime to check
which version of Windows was getting used. The change in pg_locale.c
comes from Juan. Note that all my tests passed, and that the CI is
green. The buildfarm will quickly tell if this needs more adjustments.
Author: Michael Paquier, Juan José Santamaría Flecha
Reviewed-by: Thomas Munro
Discussion: https://postgr.es/m/Yo7tHKD8VCkeNi71@paquier.xyz
By convention, the tab-complete subscription parameters are listed in the
COMPLETE_WITH lists in alphabetical order, but when the "disable_on_error"
parameter was introduced this was not done.
This patch just tidies that up.
Reported-by: Peter Smith
Author: Peter Smith
Reviewed-by: Euler Taveira, Takamichi Osumi
Backpatch-through: 15, where it was introduced
Discussion: https://postgr.es/m/CAHut+PucvKZgg_eJzUW--iL6DXHg1Jwj6F09tQziE3kUF67uLg@mail.gmail.com
That comment might have been true at some point during development, but
definitely isn't anymore.
Reported-By: Melanie Plageman <melanieplageman@gmail.com>
Backpatch: 15-
We hadn't noticed this because it's dead code: there is no
situation where we read raw parse trees from text format.
So maybe the right fix is to remove the function altogether,
but I'll forbear for now; it's not the only dead code in
readfuncs.c, I think.
Noted while comparing existing code to the results of
Peter's auto-generation script.
40af10b57 changed things so we make use of a generation memory context for
storing tuples to be sorted by tuplesort.c. That change does not play
nicely with the changes made in 9f03ca915 (back in 2014). That commit
changed things so that index_form_tuple() is called while switched into
the tuplestore's tuplecontext. In order to fetch the tuple from the index,
index_form_tuple() must do various memory allocations which are unrelated
to the storage of the final returned tuple. Although all of these
allocations are pfree'd, the fact that we now use a generation context
means that the memory for these pfree'd allocations won't be used again by
any other allocation due to generation.c's lack of freelists. This could
result in sorts used for building indexes exceeding maintenance_work_mem
by a very large amount.
Here we fix it so we no longer allocate anything apart from the tuple
itself into the generation context by adding a new version of
index_form_tuple named index_form_tuple_context, which can be called to
specify the MemoryContext to allocate the tuple into.
Discussion: https://postgr.es/m/CAApHDvrHQkiFRHiGiAS-LMOvJN-eK-s762=tVzBz8ZqUea-a_A@mail.gmail.com
Backpatch-through: 15, where 40af10b57 was added.
It's not very robust to assume that each inserted row will produce
exactly one WAL record and that no other WAL records will be generated
in the process, because for example a particular transaction could
always be the one that has to extend clog.
Because these tests are not run by 'make installcheck' but only by
'make check', it may be that in our current testing infrastructure
this can't be hit, but it doesn't seem like a good idea to rely on
that, since unrelated changes to the regression tests or the way
write-ahead logging is done could easily cause it to start happening,
and debugging such failures is a pain.
Adjust the regression test to be less sensitive.
Anton Melnikov, reviewed by Julien Rouhaud
Discussion: http://postgr.es/m/1ccd00d9-1723-6b68-ae56-655aab00d406@inbox.ru
There's no reason anymore to only drop subscription stats if associated with a
slot, now that stats drops are transactional. And since there's now no other
cleanup of stats, this would lead to stats for slot-less subscriptions to get
leaked (however most slot-less subs won't have stats).
Additionally, the comment referring to autovacuum cleaning up stats was
clearly outdated.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAD21AoAwiby3HeJE7vJe16Gr75RFfJ640dyHqvsiUhyKJTXPtw@mail.gmail.com
Backpatch: 15-
We have been using the term RelFileNode to refer to either (1) the
integer that is used to name the sequence of files for a certain relation
within the directory set aside for that tablespace/database combination;
or (2) that value plus the OIDs of the tablespace and database; or
occasionally (3) the whole series of files created for a relation
based on those values. Using the same name for more than one thing is
confusing.
Replace RelFileNode with RelFileNumber when we're talking about just the
single number, i.e. (1) from above, and with RelFileLocator when we're
talking about all the things that are needed to locate a relation's files
on disk, i.e. (2) from above. In the places where we refer to (3) as
a relfilenode, instead refer to "relation storage".
Since there is a ton of SQL code in the world that knows about
pg_class.relfilenode, don't change the name of that column, or of other
SQL-facing things that derive their name from it.
On the other hand, do adjust closely-related internal terminology. For
example, the structure member names dbNode and spcNode appear to be
derived from the fact that the structure itself was called RelFileNode,
so change those to dbOid and spcOid. Likewise, various variables with
names like rnode and relnode get renamed appropriately, according to
how they're being used in context.
Hopefully, this is clearer than before. It is also preparation for
future patches that intend to widen the relfilenumber fields from its
current width of 32 bits. Variables that store a relfilenumber are now
declared as type RelFileNumber rather than type Oid; right now, these
are the same, but that can now more easily be changed.
Dilip Kumar, per an idea from me. Reviewed also by Andres Freund.
I fixed some whitespace issues, changed a couple of words in a
comment, and made one other minor correction.
Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com
Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
50e17ad28 increased the size of ExprEvalStep from 64 bytes up to 88 bytes.
Lots of effort was spent during the development of the current expression
evaluation code to make an instance of this struct as small as possible.
Making this struct larger than needed reduces CPU cache efficiency during
expression evaluation which causes noticeable slowdowns during query
execution.
In order to reduce the size of the struct, here we remove the fn_addr
field. The values from this field can be obtained via fcinfo, just with
some extra pointer dereferencing. The extra indirection does not seem to
cause any noticeable slowdowns.
Various other fields have been moved into the ScalarArrayOpExprHashTable
struct. These fields are only used when the ScalarArrayOpExprHashTable
pointer has already been dereferenced, so no additional pointer
dereferences occur for these. Here we also make hash_fcinfo_data the last
field in ScalarArrayOpExprHashTable so that we can avoid a further pointer
dereference to get the FunctionCallInfoBaseData. This also saves a call to
palloc().
50e17ad28 was added in 14, but it's too late to adjust the size of the
ExprEvalStep in that version, so here we just backpatch to 15, which is
currently in beta.
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Backpatch-through: 15
Some routines open-coded the construction of DataRow messages. Use
TupOutputState struct and associated functions instead, which was
already done in some places.
SendTimeLineHistory() is a bit more complicated and isn't converted by
this.
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/7e4fdbdc-699c-4cd0-115d-fb78a957fc22@enterprisedb.com
macOS has traditionally used extension .dylib for shared libraries
(used at build time) and .so for dynamically loaded modules (used by
dlopen()). This complicates the build system a bit. Also, Meson uses
.dylib for both, so it would be worth unifying this in order to be
able to get equal build output.
There doesn't appear to be any reason to use any particular extension
for dlopened modules, since dlopen() will accept anything and
PostgreSQL is well-factored to be able to deal with any extension.
Other software packages that I have handy appear to be about 50/50
split on which extension they use for their plugins. So it seems
possible to change this safely.
Discussion: https://www.postgresql.org/message-id/flat/bcc45f78-e3c3-8fb3-7c42-5371b48b5266%40enterprisedb.com
auto_explain.log_parameter_max_length is a new GUC part of the
extension, similar to the corresponding core setting, that controls the
inclusion of query parameters in the logged explain output.
More tests are added to check the behavior of this new parameter: when
parameters logged in full (the default of -1), when disabled (value of
0) and when partially truncated (value different than the two others).
Author: Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/87ee09mohb.fsf@wibble.ilmari.org
Previously the timer was enabled whenever there were any pending stats after
executing a statement, just to then be disabled again when not idle
anymore. That lead to an increase in GetCurrentTimestamp() calls from within
timeout.c compared to 14.
To avoid that increase, leave the timer enabled until stats are reported,
rather than until idle. The timer is only disabled once the pending stats have
been reported.
For me this fixes the increase in GetCurrentTimestamp() calls, there now are
fewer calls in 15 than in 14, in the previously slowed down workload.
While at it, also update assertion in pgstat_report_stat() to be more precise.
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Backpatch: 15-
The new expression step types increased the size of ExprEvalStep by ~4 for all
types of expression steps, slowing down expression evaluation noticeably. Move
them out of line.
There's other issues with these expression steps, but addressing them is
largely independent of this aspect.
Author: Andres Freund <andres@anarazel.de>
Reviewed-By: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Backpatch: 15-
This reverts most of 91c0570a79, f28bf667f6, fe0972ee5e, afdeff1052. The
only thing left is the retry loop in 019_replslot_limit.pl that avoids
spurious failures by retrying a couple times.
We haven't seen any hard evidence that this is caused by anything but slow
process shutdown. We did not find any cases where walsenders did not vanish
after waiting for longer. Therefore there's no reason for this debugging code
to remain.
Discussion: https://postgr.es/m/20220530190155.47wr3x2prdwyciah@alap3.anarazel.de
Backpatch: 15-
When we changed some built-in functions to use anycompatiblearray
instead of anyarray, we created a dump/restore hazard for user-defined
operators and aggregates relying on those functions: the user objects
have to be modified to change their signatures similarly. This causes
pg_upgrade to fail partway through if the source installation contains
such objects. We generally try to have pg_upgrade detect such hazards
and fail before it does anything exciting, so add logic to detect
this case too.
Back-patch to v14 where the change was made.
Justin Pryzby, reviewed by Andrey Borodin
Discussion: https://postgr.es/m/3383880.QJadu78ljV@vejsadalnx
Noted while comparing existing code to the output of the proposed
patch to automate creation of these functions. Some of the changes
are just cosmetic, but others represent real bugs. I've not
attempted to analyze the user-visible impact.
Back-patch to v15 where this code came in.
Discussion: https://postgr.es/m/1794155.1656984188@sss.pgh.pa.us
We were going into IDLE state too soon when executing queries via
PQsendQuery in pipeline mode, causing several scenarios to misbehave in
different ways -- most notably, as reported by Daniele Varrazzo, that a
warning message is produced by libpq:
message type 0x33 arrived from server while idle
But it is also possible, if queries are sent and results consumed not in
lockstep, for the expected mediating NULL result values from PQgetResult
to be lost (a problem which has not been reported, but which is more
serious).
Fix this by introducing two new concepts: one is a command queue element
PGQUERY_CLOSE to tell libpq to wait for the CloseComplete server
response to the Close message that is sent by PQsendQuery. Because the
application is not expecting any PGresult from this, the mechanism to
consume it is a bit hackish.
The other concept, authored by Horiguchi-san, is a PGASYNC_PIPELINE_IDLE
state for libpq's state machine to differentiate "really idle" from
merely "the idle state that occurs in between reading results from the
server for elements in the pipeline". This makes libpq not go fully
IDLE when the libpq command queue contains entries; in normal cases, we
only go IDLE once at the end of the pipeline, when the server response
to the final SYNC message is received. (However, there are corner cases
it doesn't fix, such as terminating the query sequence by
PQsendFlushRequest instead of PQpipelineSync; this sort of scenario is
what requires PGQUERY_CLOSE bit above.)
This last bit helps make the libpq state machine clearer; in particular
we can get rid of an ugly hack in pqParseInput3 to avoid considering
IDLE as such when the command queue contains entries.
A new test mode is added to libpq_pipeline.c to tickle some related
problematic cases.
Reported-by: Daniele Varrazzo <daniele.varrazzo@gmail.com>
Co-authored-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/CA+mi_8bvD0_CW3sumgwPvWdNzXY32itoG_16tDYRu_1S2gV2iw@mail.gmail.com
Amendment to 84ad713cf8: Not all
prepared statements have a result descriptor. As currently coded,
this would crash when reading pg_prepared_statements. Make those
cases return null for result_types instead. Also add a test case for
it.
As noted by Thomas Munro, CLDR 36 has added SOUND RECORDING COPYRIGHT
(U+2117), and we use CLDR 41, so this can be removed from the set of
special cases.
The set of regression tests is expanded for degree signs, which are two
of the special cases, and a fancy case with U+210C in Latin-ASCII.xml
that we have discovered about when diving into what could be done for
Cyrillic characters (this last part is material for a future patch, not
tackled yet).
While on it, some of the assertions of generate_unaccent_rules.py are
expanded to report the codepoint on which a failure is found, something
useful for debugging.
Extracted from a larger patch by the same author.
Author: Przemysław Sztoch
Discussion: https://postgr.es/m/8478da0d-3b61-d24f-80b4-ce2f5e971c60@sztoch.pl
A previous commit replaced all the calls to this function with
durable_rename() as of dac1ff3, making it used nowhere in the tree.
Using it in extension code is also risky based on the issues described
in this previous commit, so let's remove it. This makes possible the
removal of HAVE_WORKING_LINK.
Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), and it falls back to rename() on some
platforms (aka WIN32), which offers no such overwrite protection. Most
callers use durable_rename_excl() just in case there is an existing
file, but in practice there shouldn't be one (see below for more
details).
Furthermore, failures during durable_rename_excl() can result in
multiple hard links to the same file. As per Nathan's tests, it is
possible to end up with two links to the same file in pg_wal after a
crash just before unlink() during WAL recycling. Specifically, the test
produced links to the same file for the current WAL file and the next
one because the half-recycled WAL file was re-recycled upon restarting,
leading to WAL corruption.
This change replaces all the calls of durable_rename_excl() to
durable_rename(). This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it and ordinarily there shouldn't be one. The function itself
is left around in case any extensions are using it. It will be removed
on HEAD via a follow-up commit.
Here is a summary of the existing callers of durable_rename_excl() (see
second discussion link at the bottom), replaced by this commit. First,
basic_archive used it to avoid overwriting an archive concurrently
created by another server, but as mentioned above, it will still
overwrite files on some platforms. Second, xlog.c uses it to recycle
past WAL segments, where an overwrite should not happen (origin of the
change at f0e37a8) because there are protections about the WAL segment
to select when recycling an entry. The third and last area is related
to the write of timeline history files. writeTimeLineHistory() will
write a new timeline history file at the end of recovery on promotion,
so there should be no such files for the same timeline.
What remains is writeTimeLineHistoryFile(), that can be used in parallel
by a WAL receiver and the startup process, and some digging of the
buildfarm shows that EEXIST from a WAL receiver can happen with an error
of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\",
which would cause an automatic restart of the WAL receiver as it is
promoted to FATAL, hence this should improve the stability of the WAL
receiver as rename() would overwrite an existing TLI history file
already fetched by the startup process at recovery.
This is a bug fix, but knowing the unlikeliness of the problem involving
one or more crashes at an exceptionally bad moment, no backpatch is
done. Also, I want to be careful with such changes (aaa3aed did the
opposite of this change by removing HAVE_WORKING_LINK so as Windows
would do a link() rather than a rename() but this was not
concurrent-safe). A backpatch could be revisited in the future. This
is the second time this change is attempted, ccfbd92 being the first
one, but this time no assertions are added for the case of a TLI history
file written concurrently by the WAL receiver or the startup process
because we can expect one to exist (some of the TAP tests are able to
trigger with a proper timing).
Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz
Use it for RelationSyncEntry->streamed_txns, which is currently using an
integer list.
The API support is not complete, not because it is hard to write but
because it's unclear that it's worth the code space, there being so
little use of XID lists.
Discussion: https://postgr.es/m/202205130830.g5ntonhztspb@alvherre.pgsql
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Attempting such an operation would already fail, but in various and
confusing ways. For example, while in recovery, some elog() messages
would be reported, but these should never be user-facing. This commit
restricts any write operations done on large objects in a read-only
context, so as the errors generated are more user-friendly. This is per
the discussion done with Tom Lane and Robert Haas.
Some regression tests are added to check the case of all the SQL
functions working on large objects (including an update of the test's
alternate output).
Author: Yugo Nagata
Discussion: https://postgr.es/m/20220527153028.61a4608f66abcd026fd3806f@sraoss.co.jp
Amendment to ec40f34224: We also need to
change the way the datum is supplied to int8. Otherwise, the value is
still cut off as an int4, and it will crash on 32-bit platforms.
Several buildfarm members are failing the pg_upgrade test in
REL_15_STABLE, though the identical test is fine in HEAD.
On thorntail it's possible to see that the problem is an
overlength socket path name, and I bet the same is true
on the others.
The normally-started postmasters used in the test are already
set up with short socket directory paths, but we neglected to
tell pg_upgrade itself to do likewise when starting child
postmasters, and indeed it seems to be explicitly selecting
the test directory instead.
Back-patch to v15 where the current test script was introduced.
(The previous script might have the same issue, because I don't
see any use of -s/--socketdir in it either; but we've had no
complaints, so leave it alone for now.)
Discussion: https://postgr.es/m/1410025.1656890531@sss.pgh.pa.us
None of the other bison parsers contains this directive, and it gives
rise to some unfortunate and impenetrable messages, so just remove it.
Backpatch to release 12, where it was introduced.
Per gripe from Erik Rijkers
Discussion: https://postgr.es/m/ba069ce2-a98f-dc70-dc17-2ccf2a9bf7c7@xs4all.nl
Interpret its privileges argument as a comma-separated list of
privilege names, as in has_table_privilege and other functions.
This is actually net less code, since the support routine to
parse that already exists, and we can drop convert_priv_string()
which had no other use-case.
Robins Tharakan
Discussion: https://postgr.es/m/e5a05dc54ba64408b3dd260171c1abaf@EX13D05UWC001.ant.amazon.com
After commit 662dbe2c8, psql tab completion didn't conveniently
support the case of "ALTER EXTENSION foo UPDATE". It'd always
add "TO", which is fine if you want to specify a target version
but not if you don't ... and surely the latter is the much more
common case.
To fix, remove "TO" from the initially offered completion; you now
need to press TAB one additional time to get that. We won't try to
duplicate the old behavior of attempting initial completion on the
target version along with TO. It's too squirrelly to get the quoting
right, and this is such an infrequent usage that it doesn't seem worth
expending a lot of effort and special code on.
Noted by Noah Misch. Back-patch to v15.
Discussion: https://postgr.es/m/20220703083217.GB2476530@rfd.leadboat.com
Per buildfarm member prairiedog, this platform rejects uninitialized
global variables in shared libraries. Back-patch to v10, like the
addition of the variable.
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20220703030619.GB2378460@rfd.leadboat.com
ecpglib has been calling it once per SQL query and once per EXEC SQL GET
DESCRIPTOR. Instead, if newlocale() has not succeeded before, call it
while establishing a connection. This mitigates three problems:
- If newlocale() failed in EXEC SQL GET DESCRIPTOR, the command silently
proceeded without the intended locale change.
- On AIX, each newlocale()+freelocale() cycle leaked memory.
- newlocale() CPU usage may have been nontrivial.
Fail the connection attempt if newlocale() fails. Rearrange
ecpg_do_prologue() to validate the connection before its uselocale().
The sort of program that may regress is one running in an environment
where newlocale() fails. If that program establishes connections
without running SQL statements, it will stop working in response to this
change. I'm betting against the importance of such an ECPG use case.
Most SQL execution (any using ECPGdo()) has long required newlocale()
success, so there's little a connection could do without newlocale().
Back-patch to v10 (all supported versions).
Reviewed by Tom Lane. Reported by Guillaume Lelarge.
Discussion: https://postgr.es/m/20220101074055.GA54621@rfd.leadboat.com