Each of the libraries incorporates src/port files, which often check
FRONTEND. Build systems disagreed on whether to build libpgtypes this
way. Only libecpg incorporates files that rely on it today. Back-patch
to 9.0 (all supported versions) to forestall surprises.
When the startup process recovers transactions by scanning pg_twophase
directory, it should clear MyLockedGxact after it's done processing each
transaction. Like we do during normal operation, at PREPARE TRANSACTION.
Otherwise, if the startup process exits due to an error, it will try to
clear the locking_backend field of the last recovered transaction. That's
usually harmless, but if the error happens in MarkAsPreparing, while
holding TwoPhaseStateLock, the shmem-exit hook will try to acquire
TwoPhaseStateLock again, and deadlock with itself.
This fixes bug #13128 reported by Grant McAlister. The bug was introduced
by commit bb38fb0d, so backpatch to all supported versions like that
commit.
SLRU_SEGMENTS_PER_PAGE -> SLRU_PAGES_PER_SEGMENT
I introduced this ancient typo in subtrans.c and later propagated it to
multixact.c. I fixed the latter in f741300c, but only back to 9.3;
backpatch to all supported branches for consistency.
After a timeline switch, we would leave behind recycled WAL segments that
are in the future, but on the old timeline. After promotion, and after they
become old enough to be recycled again, we would notice that they don't have
a .ready or .done file, create a .ready file for them, and archive them.
That's bogus, because the files contain garbage, recycled from an older
timeline (or prealloced as zeros). We shouldn't archive such files.
This could happen when we're following a timeline switch during replay, or
when we switch to new timeline at end-of-recovery.
To fix, whenever we switch to a new timeline, scan the data directory for
WAL segments on the old timeline, but with a higher segment number, and
remove them. Those don't belong to our timeline history, and are most
likely bogus recycled or preallocated files. They could also be valid files
that we streamed from the primary ahead of time, but in any case, they're
not needed to recover to the new timeline.
It was previously possible to have the launcher re-execute its main loop
before shutting down if some other signal was received or an error
occurred after getting SIGTERM, as reported by Qingqing Zhou.
While investigating, Tom Lane further noticed that if autovacuum had
been disabled in the config file, it would misbehave by trying to start
a new worker instead of bailing out immediately -- it would consider
itself as invoked in emergency mode.
Fix both problems by checking the shutdown flag in a few more places.
These problems have existed since autovacuum was introduced, so
backpatch all the way back.
While gcc doesn't complain if you declare a function "static" and then
define it not-static, other compilers do; and in any case the code is
highly misleading this way. Add the missing "static" keywords to a
couple of recent patches. Per buildfarm member pademelon.
Previously we would re-use input subexpressions in all expression trees
attached to a Join plan node. However, if it's an outer join and the
subexpression appears in the nullable-side input, this is potentially
incorrect for apparently-matching subexpressions that came from above
the outer join (ie, targetlist and qpqual expressions), because the
executor will treat the subexpression value as NULL when maybe it should
not be.
The case is fairly hard to hit because (a) you need a non-strict
subexpression (else NULL is correct), and (b) we don't usually compute
expressions in the outputs of non-toplevel plan nodes. But we might do
so if the expressions are sort keys for a mergejoin, for example.
Probably in the long run we should make a more explicit distinction between
Vars appearing above and below an outer join, but that will be a major
planner redesign and not at all back-patchable. For the moment, just hack
set_join_references so that it will not match any non-Var expressions
coming from nullable inputs to expressions that came from above the join.
(This is somewhat overkill, in that a strict expression could still be
matched, but it doesn't seem worth the effort to check that.)
Per report from Qingqing Zhou. The added regression test case is based
on his example.
This has been broken for a very long time, so back-patch to all active
branches.
Commit ed9cc2b5df made it unnecessary to pass
start_nblkno to _hash_splitbucket(), and for that matter unnecessary to
have the internal nblkno variable either. My compiler didn't complain
about that, but some did. I also rearranged the use of oblkno a bit to
make that case more parallel.
Report and initial patch by Petr Jelinek, rearranged a bit by me.
Back-patch to all branches, like the previous patch.
psql was already accepting conninfo strings as the first parameter in
\connect, but the way it worked wasn't sane; some of the other
parameters would get the previous connection's values, causing it to
connect to a completely unexpected server or, more likely, not finding
any server at all because of completely wrong combinations of
parameters.
Fix by explicitely checking for a conninfo-looking parameter in the
dbname position; if one is found, use its complete specification rather
than mix with the other arguments. Also, change tab-completion to not
try to complete conninfo/URI-looking "dbnames" and document that
conninfos are accepted as first argument.
There was a weak consensus to backpatch this, because while the behavior
of using the dbname as a conninfo is nowhere documented for \connect, it
is reasonable to expect that it works because it does work in many other
contexts. Therefore this is backpatched all the way back to 9.0.
To implement this, routines previously private to libpq have been
duplicated so that psql can decide what looks like a conninfo/URI
string. In back branches, just duplicate the same code all the way back
to 9.2, where URIs where introduced; 9.0 and 9.1 have a simpler version.
In master, the routines are moved to src/common and renamed.
Author: David Fetter, Andrew Dunstan. Some editorialization by me
(probably earning a Gierth's "Sloppy" badge in the process.)
Reviewers: Andrew Gierth, Erik Rijkers, Pavel Stěhule, Stephen Frost,
Robert Haas, Andrew Dunstan.
You're required to write either RANGE or ROWS to start a frame clause,
but the documentation incorrectly implied this is optional. Noted by
David Johnston.
As with initdb these programs need to run with a restricted token, and
if they don't pg_upgrade will fail when run as a user with Adminstrator
privileges.
Backpatch to all live branches. On the development branch the code is
reorganized so that the restricted token code is now in a single
location. On the stable bramches a less invasive change is made by
simply copying the relevant code to pg_upgrade.c and pg_resetxlog.c.
Patches and bug report from Muhammad Asif Naeem, reviewed by Michael
Paquier, slightly edited by me.
_hash_splitbucket() obtained the base page of the new bucket by calling
_hash_getnewbuf(), but it held no exclusive lock that would prevent some
other process from calling _hash_getnewbuf() at the same time. This is
contrary to _hash_getnewbuf()'s API spec and could in fact cause failures.
In practice, we must only call that function while holding write lock on
the hash index's metapage.
An additional problem was that we'd already modified the metapage's bucket
mapping data, meaning that failure to extend the index would leave us with
a corrupt index.
Fix both issues by moving the _hash_getnewbuf() call to just before we
modify the metapage in _hash_expandtable().
Unfortunately there's still a large problem here, which is that we could
also incur ENOSPC while trying to get an overflow page for the new bucket.
That would leave the index corrupt in a more subtle way, namely that some
index tuples that should be in the new bucket might still be in the old
one. Fixing that seems substantially more difficult; even preallocating as
many pages as we could possibly need wouldn't entirely guarantee that the
bucket split would complete successfully. So for today let's just deal
with the base case.
Per report from Antonin Houska. Back-patch to all active branches.
Slow functions in index expressions might cause this loop to take long
enough to make it worth being cancellable. Probably it would be enough
to call CHECK_FOR_INTERRUPTS here, but for consistency with other
per-sample-row loops in this file, let's use vacuum_delay_point.
Report and patch by Jeff Janes. Back-patch to all supported branches.
ExecOpenScanRelation assumed that any relation listed in the ExecRowMark
list has been locked by InitPlan; but this is not true if the rel's
markType is ROW_MARK_COPY, which is possible if it's a foreign table.
In most (possibly all) cases, failure to acquire a lock here isn't really
problematic because the parser, planner, or plancache would have taken the
appropriate lock already. In principle though it might leave us vulnerable
to working with a relation that we hold no lock on, and in any case if the
executor isn't depending on previously-taken locks otherwise then it should
not do so for ROW_MARK_COPY relations.
Noted by Etsuro Fujita. Back-patch to all active versions, since the
inconsistency has been there a long time. (It's almost certainly
irrelevant in 9.0, since that predates foreign tables, but the code's
still wrong on its own terms.)
It's all very well to claim that a simplistic sort is fast in easy
cases, but O(N^2) in the worst case is not good ... especially if the
worst case is as easy to hit as "descending order input". Replace that
bit with our standard qsort.
Per bug #12866 from Maksym Boguk. Back-patch to all active branches.
GNU readline defines the return value of write_history() as "zero if OK,
else an errno code". libedit's version of that function used to have a
different definition (to wit, "-1 if error, else the number of lines
written to the file"). We tried to work around that by checking whether
errno had become nonzero, but this method has never been kosher according
to the published API of either library. It's reportedly completely broken
in recent Ubuntu releases: psql bleats about "No such file or directory"
when saving ~/.psql_history, even though the write worked fine.
However, libedit has been following the readline definition since somewhere
around 2006, so it seems all right to finally break compatibility with
ancient libedit releases and trust that the return value is what readline
specifies. (I'm not sure when the various Linux distributions incorporated
this fix, but I did find that OS X has been shipping fixed versions since
10.5/Leopard.)
If anyone is still using such an ancient libedit, they will find that psql
complains it can't write ~/.psql_history at exit, even when the file was
written correctly. This is no worse than the behavior we're fixing for
current releases.
Back-patch to all supported branches.
The SGML docs claimed that 1-byte integers could be sent or received with
the "isint" options, but no such behavior has ever been implemented in
pqGetInt() or pqPutInt(). The in-code documentation header for PQfn() was
even less in tune with reality, and the code itself used parameter names
matching neither the SGML docs nor its libpq-fe.h declaration. Do a bit
of additional wordsmithing on the SGML docs while at it.
Since the business about 1-byte integers is a clear documentation bug,
back-patch to all supported branches.
We were using "user mapping for user XYZ" as description for user mappings, but
that's ambiguous because users can have mappings on multiple foreign
servers; therefore change it to "for user XYZ on server UVW" instead.
Object identities for user mappings are also updated in the same way, in
branches 9.3 and above.
The incomplete description string was introduced together with the whole
SQL/MED infrastructure by commit cae565e503 of 8.4 era, so backpatch all
the way back.
When the library already exists in the build directory, "ar" preserves
members not named on its command line. This mattered when, for example,
a "configure" rerun dropped a file from $(LIBOBJS). libpgport carried
the obsolete member until "make clean". Back-patch to 9.0 (all
supported versions).
Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.
This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.
This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...
This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.
To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.
The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.
Discussion: 20150223165359.GF30784@awork2.anarazel.de,
369698E947874884A77849D8FE3680C2@maumau,
5CF4ABBA67674088B3941894E22A0D25@maumau
Per complaint from maumau and Thom Brown
Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.
"RETURN SQLERRM" prompted plpgsql_exec_function() to read from freed
memory. Back-patch to 9.0 (all supported versions). Little code ran
between the premature free and the read, so non-assert builds are
unlikely to witness user-visible consequences.
The "simple" path for printing VALUES clauses doesn't work if we need
to attach nondefault column aliases, because there's noplace to do that
in the minimal VALUES() syntax. So modify get_simple_values_rte() to
detect nondefault aliases and treat that as a non-simple case. This
further exposes that the "non-simple" path never actually worked;
it didn't produce valid syntax. Fix that too. Per bug #12789 from
Curtis McEnroe, and analysis by Andrew Gierth.
Back-patch to all supported branches. Before 9.3, this also requires
back-patching the part of commit 092d7ded29
that created get_simple_values_rte() to begin with; inserting the extra
test into the old factorization of that logic would've been too messy.
When LockBufferForCleanup() has to wait for getting a cleanup lock on a
buffer it does so by setting a flag in the buffer header and then wait
for other backends to signal it using ProcWaitForSignal().
Unfortunately LockBufferForCleanup() missed that ProcWaitForSignal() can
return for other reasons than the signal it is hoping for. If such a
spurious signal arrives the wait flags on the buffer header will still
be set. That then triggers "ERROR: multiple backends attempting to wait
for pincount 1".
The fix is simple, unset the flag if still set when retrying. That
implies an additional spinlock acquisition/release, but that's unlikely
to matter given the cost of waiting for a cleanup lock. Alternatively
it'd have been possible to move responsibility for maintaining the
relevant flag to the waiter all together, but that might have had
negative consequences due to possible floods of signals. Besides being
more invasive.
This looks to be a very longstanding bug. The relevant code in
LockBufferForCleanup() hasn't changed materially since its introduction
and ProcWaitForSignal() was documented to return for unrelated reasons
since 8.2. The master only patch series removing ImmediateInterruptOK
made it much easier to hit though, as ProcSendSignal/ProcWaitForSignal
now uses a latch shared with other tasks.
Per discussion with Kevin Grittner, Tom Lane and me.
Backpatch to all supported branches.
Discussion: 11553.1423805224@sss.pgh.pa.us
If libpq output buffer is full, pqSendSome() function tries to drain any
incoming data. This avoids deadlock, if the server e.g. sends a lot of
NOTICE messages, and blocks until we read them. However, pqSendSome() only
did that in blocking mode. In non-blocking mode, the deadlock could still
happen.
To fix, take a two-pronged approach:
1. Change the documentation to instruct that when PQflush() returns 1, you
should wait for both read- and write-ready, and call PQconsumeInput() if it
becomes read-ready. That fixes the deadlock, but applications are not going
to change overnight.
2. In pqSendSome(), drain the input buffer before returning 1. This
alleviates the problem for applications that only wait for write-ready. In
particular, a slow but steady stream of NOTICE messages during COPY FROM
STDIN will no longer cause a deadlock. The risk remains that the server
attempts to send a large burst of data and fills its output buffer, and at
the same time the client also sends enough data to fill its output buffer.
The application will deadlock if it goes to sleep, waiting for the socket
to become write-ready, before the server's data arrives. In practice,
NOTICE messages and such that the server might be sending are usually
short, so it's highly unlikely that the server would fill its output buffer
so quickly.
Backpatch to all supported versions.
Commit 9406884af1 cleaned up
column-privilege related leaks in various error-message paths, but ended
up including a few more things than it should have in the back branches.
Specifically, there's no need for the GetModifiedColumns macro in
execMain.c as 9.1 and older didn't include the row in check constraint
violations. Further, the regression tests added to check those cases
aren't necessary.
This patch removes the GetModifiedColumns macro from execMain.c, removes
the comment which was added to trigger.c related to the duplicate macro
definition, and removes the check-constraint-related regression tests.
Pointed out by Robert.
Back-patched to 9.1 and 9.0.
In investigating yesterday's crash report from Hugo Osvaldo Barrera, I only
looked back as far as commit f3aec2c7f5 where the breakage occurred
(which is why I thought the IPv4-in-IPv6 business was undocumented). But
actually the logic dates back to commit 3c9bb8886d and was simply
broken by erroneous refactoring in the later commit. A bit of archives
excavation shows that we added the whole business in response to a report
that some 2003-era Linux kernels would report IPv4 connections as having
IPv4-in-IPv6 addresses. The fact that we've had no complaints since 9.0
seems to be sufficient confirmation that no modern kernels do that, so
let's just rip it all out rather than trying to fix it.
Do this in the back branches too, thus essentially deciding that our
effective behavior since 9.0 is correct. If there are any platforms on
which the kernel reports IPv4-in-IPv6 addresses as such, yesterday's fix
would have made for a subtle and potentially security-sensitive change in
the effective meaning of IPv4 pg_hba.conf entries, which does not seem like
a good thing to do in minor releases. So let's let the post-9.0 behavior
stand, and change the documentation to match it.
In passing, I failed to resist the temptation to wordsmith the description
of pg_hba.conf IPv4 and IPv6 address entries a bit. A lot of this text
hasn't been touched since we were IPv4-only.
The previous coding copied garbage into a local variable, pretty much
ensuring that the intended test of an IPv6 connection address against a
promoted IPv4 address from pg_hba.conf would never match. The lack of
field complaints likely indicates that nobody realized this was supposed
to work, which is unsurprising considering that no user-facing docs suggest
it should work.
In principle this could have led to a SIGSEGV due to reading off the end of
memory, but since the source address would have pointed to somewhere in the
function's stack frame, that's quite unlikely. What led to discovery of
the bug is Hugo Osvaldo Barrera's report of a crash after an OS upgrade,
which is probably because he is now running a system in which memcpy raises
abort() upon detecting overlapping source and destination areas. (You'd
have to additionally suppose some things about the stack frame layout to
arrive at this conclusion, but it seems plausible.)
This has been broken since the code was added, in commit f3aec2c7f5,
so back-patch to all supported branches.
pg_database.datfrozenxid and pg_database.datminmxid were not preserved
for the 'postgres' and 'template1' databases. This could cause missing
clog file errors on access to user tables and indexes after upgrades in
these databases.
Backpatch through 9.0
When ecpg was rewritten to the new protocol version not all variable types
were corrected. This patch rewrites the code for these types to fix that. It
also fixes the documentation to correctly tell the status of array handling.
Conflicts:
doc/src/sgml/ecpg.sgml
Back in 2003 we had a discussion about how to decide which casts to dump.
At the time pg_dump really only considered an object's containing schema
to decide what to dump (ie, dump whatever's not in pg_catalog), and so
we chose a complicated idea involving whether the underlying types were to
be dumped (cf commit a6790ce857). But users
are allowed to create casts between built-in types, and we failed to dump
such casts. Let's get rid of that heuristic, which has accreted even more
ugliness since then, in favor of just looking at the cast's OID to decide
if it's a built-in cast or not.
In passing, also fix some really ancient code that supposed that it had to
manufacture a dependency for the cast on its cast function; that's only
true when dumping from a pre-7.3 server. This just resulted in some wasted
cycles and duplicate dependency-list entries with newer servers, but we
might as well improve it.
Per gripes from a number of people, most recently Greg Sabino Mullane.
Back-patch to all supported branches.
Back in commit 400e2c9344 I rewrote GEQO's
gimme_tree function to improve its heuristic for modifying the given tour
into a legal join order. In what can only be called a fit of hubris,
I supposed that this new heuristic would *always* find a legal join order,
and ripped out the old logic that allowed gimme_tree to sometimes fail.
The folly of this is exposed by bug #12760, in which the "greedy" clumping
behavior of merge_clump() can lead it into a dead end which could only be
recovered from by un-clumping. We have no code for that and wouldn't know
exactly what to do with it if we did. Rather than try to improve the
heuristic rules still further, let's just recognize that it *is* a
heuristic and probably must always have failure cases. So, put back the
code removed in the previous commit to allow for failure (but comment it
a bit better this time).
It's possible that this code was actually fully correct at the time and
has only been broken by the introduction of LATERAL. But having seen this
example I no longer have much faith in that proposition, so back-patch to
all supported branches.
On windows _isnan() (which isnan() is redirected to in port/win32.h)
is declared in float.h, not math.h.
Per buildfarm animal currawong.
Backpatch to all supported branches.
If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.
We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.
To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.
To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.
Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.
Security: CVE-2015-0244
This covers alterations to buffer sizing and zeroing made between imath
1.3 and imath 1.20. Valgrind Memcheck identified the buffer overruns
and reliance on uninitialized data; their exploit potential is unknown.
Builds specifying --with-openssl are unaffected, because they use the
OpenSSL BIGNUM facility instead of imath. Back-patch to 9.0 (all
supported versions).
Security: CVE-2015-0243
Most callers pass a stack buffer. The ensuing stack smash can crash the
server, and we have not ruled out the viability of attacks that lead to
privilege escalation. Back-patch to 9.0 (all supported versions).
Marko Tiikkaja
Security: CVE-2015-0243
Prevent port/snprintf() from overflowing its local fixed-size
buffer and pad to the desired number of digits with zeros, even
if the precision is beyond the ability of the native sprintf().
port/snprintf() is only used on systems that lack a native
snprintf().
Reported by Bruce Momjian. Patch by Tom Lane. Backpatch to all
supported versions.
Security: CVE-2015-0242
Previously very long localized month and weekday strings could
overflow the allocated buffers, causing a server crash.
Reported and patch reviewed by Noah Misch. Backpatch to all
supported versions.
Security: CVE-2015-0241
Previously very long field masks for floats could access memory
beyond the existing buffer allocated to hold the result.
Reported by Andres Freund and Peter Geoghegan. Backpatch to all
supported versions.
Security: CVE-2015-0241
The previous wording claimed that the file was always in /etc, but of
course this varies with the installation layout. Write instead that it
can be found via `pg_config --sysconfdir`. Even though this is still
somewhat incorrect because it doesn't account of moved installations, it
at least conveys that the location depends on the installation.
"ECHO all" is ignored for interactive input, and has been for a very long
time, though possibly not for as long as the documentation has claimed the
opposite. Fix that, and also note that empty lines aren't echoed, which
while dubious is another longstanding behavior (it's embedded in our
regression test files for one thing). Per bug #12721 from Hans Ginzel.
In HEAD, also improve the code comments in this area, and suppress an
unnecessary fflush(stdout) when we're not echoing. That would likely
be safe to back-patch, but I'll not risk it mere hours before a release
wrap.