There were several oversights in recovery code where COMMIT/ABORT PREPARED
records were ignored:
* pg_last_xact_replay_timestamp() (wasn't updated for 2PC commits)
* recovery_min_apply_delay (2PC commits were applied immediately)
* recovery_target_xid (recovery would not stop if the XID used 2PC)
The first of those was reported by Sergiy Zuban in bug #11032, analyzed by
Tom Lane and Andres Freund. The bug was always there, but was masked before
commit d19bd29f07, because COMMIT PREPARED
always created an extra regular transaction that was WAL-logged.
Backpatch to all supported versions (older versions didn't have all the
features and therefore didn't have all of the above bugs).
findDependencyLoops() was not bright about cases where there are multiple
dependency paths between the same two dumpable objects. In most scenarios
this did not hurt us too badly; but since the introduction of section
boundary pseudo-objects in commit a1ef01fe16,
it was possible for this code to take unreasonable amounts of time (tens
of seconds on a database with a couple thousand objects), as reported in
bug #11033 from Joe Van Dyk. Joe's particular problem scenario involved
"pg_dump -a" mode with long chains of foreign key constraints, but I think
that similar problems could arise with other situations as long as there
were enough objects. To fix, add a flag array that lets us notice when we
arrive at the same object again while searching from a given start object.
This simple change seems to be enough to eliminate the performance problem.
Back-patch to 9.1, like the patch that introduced section boundary objects.
Break the list of available options into an <itemizedlist> instead of
inline sentences. This is mostly motivated by wanting to ensure that the
cross-references to the FSM and VM docs don't cross page boundaries in PDF
format; but it seems to me to read more easily this way anyway. I took the
liberty of editorializing a bit further while at it.
Per complaint from Magnus about 9.0.18 docs not building in A4 format.
Patch all active branches so we don't get blind-sided by this particular
issue again in future.
This is consistent with the POSIX verdict that kill() shall not report
ESRCH for a zombie process. Back-patch to 9.0 (all supported versions).
Test code from commit d7cdf6ee36 depends
on it, and log messages about kill() reporting "Invalid argument" will
cease to appear for this not-unexpected condition.
The executor has thrown errors for negative OFFSET values since 8.4 (see
commit bfce56eea4), but in a moment of brain
fade I taught the planner that OFFSET with a constant negative value was a
no-op (commit 1a1832eb08). Reinstate the
former behavior by only discarding OFFSET with a value of exactly 0. In
passing, adjust a planner comment that referenced the ancient behavior.
Back-patch to 9.3 where the mistake was introduced.
get_raw_page tried to validate the supplied block number against
RelationGetNumberOfBlocks(), which of course is only right when
accessing the main fork. In most cases, the main fork is longer
than the others, so that the check was too weak (allowing a
lower-level error to be reported, but no real harm to be done).
However, very small tables could have an FSM larger than their heap,
in which case the mistake prevented access to some FSM pages.
Per report from Torsten Foertsch.
In passing, make the bad-block-number error into an ereport not elog
(since it's certainly not an internal error); and fix sloppily
maintained comment for RelationGetNumberOfBlocksInFork.
This has been wrong since we invented relation forks, so back-patch
to all supported branches.
With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL
backends can crash at exit. Raise a warning during "configure" based on
the compile-time OpenLDAP version number, and test the crash scenario in
the dblink test suite. Back-patch to 9.0 (all supported versions).
In commit 631dc390f4, we started to handle
simple numeric timezone offsets via the zic library instead of the old
CTimeZone/HasCTZSet kluge. However, we overlooked the fact that the zic
code will reject UTC offsets exceeding a week (which seems a bit arbitrary,
but not because it's too tight ...). This led to possibly setting
session_timezone to NULL, which results in crashes in most timezone-related
operations as of 9.4, and crashes in a small number of places even before
that. So check for NULL return from pg_tzset_offset() and report an
appropriate error message. Per bug #11014 from Duncan Gillis.
Back-patch to all supported branches, like the previous patch.
(Unfortunately, as of today that no longer includes 8.4.)
In commit a61daa14d5, we fixed pg_upgrade so
that it would install sane relminmxid and datminmxid values, but that does
not cure the problem for installations that were already pg_upgraded to
9.3; they'll initially have "1" in those fields. This is not a big problem
so long as 1 is "in the past" compared to the current nextMultiXact
counter. But if an installation were more than halfway to the MXID wrap
point at the time of upgrade, 1 would appear to be "in the future" and
that would effectively disable tracking of oldest MXIDs in those
tables/databases, until such time as the counter wrapped around.
While in itself this isn't worse than the situation pre-9.3, where we did
not manage MXID wraparound risk at all, the consequences of premature
truncation of pg_multixact are worse now; so we ought to make some effort
to cope with this. We discussed advising users to fix the tracking values
manually, but that seems both very tedious and very error-prone.
Instead, this patch adopts two amelioration rules. First, a relminmxid
value that is "in the future" is allowed to be overwritten with a
full-table VACUUM's actual freeze cutoff, ignoring the normal rule that
relminmxid should never go backwards. (This essentially assumes that we
have enough defenses in place that wraparound can never occur anymore,
and thus that a value "in the future" must be corrupt.) Second, if we see
any "in the future" values then we refrain from truncating pg_clog and
pg_multixact. This prevents loss of clog data until we have cleaned up
all the broken tracking data. In the worst case that could result in
considerable clog bloat, but in practice we expect that relfrozenxid-driven
freezing will happen soon enough to fix the problem before clog bloat
becomes intolerable. (Users could do manual VACUUM FREEZEs if not.)
Note that this mechanism cannot save us if there are already-wrapped or
already-truncated-away MXIDs in the table; it's only capable of dealing
with corrupt tracking values. But that's the situation we have with the
pg_upgrade bug.
For consistency, apply the same rules to relfrozenxid/datfrozenxid. There
are not known mechanisms for these to get messed up, but if they were, the
same tactics seem appropriate for fixing them.
When a view has a function-returning-composite in FROM, and there are
some dropped columns in the underlying composite type, ruleutils.c
printed junk in the column alias list for the reconstructed FROM entry.
Before 9.3, this was prevented by doing get_rte_attribute_is_dropped
tests while printing the column alias list; but that solution is not
currently available to us for reasons I'll explain below. Instead,
check for empty-string entries in the alias list, which can only exist
if that column position had been dropped at the time the view was made.
(The parser fills in empty strings to preserve the invariant that the
aliases correspond to physical column positions.)
While this is sufficient to handle the case of columns dropped before
the view was made, we have still got issues with columns dropped after
the view was made. In particular, the view could contain Vars that
explicitly reference such columns! The dependency machinery really
ought to refuse the column drop attempt in such cases, as it would do
when trying to drop a table column that's explicitly referenced in
views. However, we currently neglect to store dependencies on columns
of composite types, and fixing that is likely to be too big to be
back-patchable (not to mention that existing views in existing databases
would not have the needed pg_depend entries anyway). So I'll leave that
for a separate patch.
Pre-9.3, ruleutils would print such Vars normally (with their original
column names) even though it suppressed their entries in the RTE's
column alias list. This is certainly bogus, since the printed view
definition would fail to reload, but at least it didn't crash. However,
as of 9.3 the printed column alias list is tightly tied to the names
printed for Vars; so we can't treat columns as dropped for one purpose
and not dropped for the other. This is why we can't just put back the
get_rte_attribute_is_dropped test: it results in an assertion failure
if the view in fact contains any Vars referencing the dropped column.
Once we've got dependencies preventing such cases, we'll probably want
to do it that way instead of relying on the empty-string test used here.
This fix turned up a very ancient bug in outfuncs/readfuncs, namely
that T_String nodes containing empty strings were not dumped/reloaded
correctly: the node was printed as "<>" which is read as a string
value of <>. Since (per SQL) we disallow empty-string identifiers,
such nodes don't occur normally, which is why we'd not noticed.
(Such nodes aren't used for literal constants, just identifiers.)
Per report from Marc Schablewski. Back-patch to 9.3 which is where
the rule printing behavior changed. The dangling-variable case is
broken all the way back, but that's not what his complaint is about.
~/.pgpass is a sound choice everywhere, and "peer" authentication is
safe on every platform it supports. Cease to recommend "trust"
authentication, the safety of which is deeply configuration-specific.
Back-patch to 9.0, where pg_upgrade was introduced.
If pg_regcomp failed after having invoked markst/cleanst, it would leak any
"struct subre" nodes it had created. (We've already detected all regex
syntax errors at that point, so the only likely causes of later failure
would be query cancel or out-of-memory.) To fix, make sure freesrnode
knows the difference between the pre-cleanst and post-cleanst cleanup
procedures. Add some documentation of this less-than-obvious point.
Also, newlacon did the wrong thing with an out-of-memory failure from
realloc(), so that the previously allocated array would be leaked.
Both of these are pretty low-probability scenarios, but a bug is a bug,
so patch all the way back.
Per bug #10976 from Arthur O'Dwyer.
The consistent function contained several bugs:
* The "if (which2) { ... }" block was broken. It compared the argument's
lower bound against centroid's upper bound, while it was supposed to compare
the argument's upper bound against the centroid's lower bound (the comment
was correct, code was wrong). Also, it cleared bits in the "which1"
variable, while it was supposed to clear bits in "which2".
* If the argument's upper bound was equal to the centroid's lower bound, we
descended to both halves (= all quadrants). That's unnecessary, searching
the right quadrants is sufficient. This didn't lead to incorrect query
results, but was clearly wrong, and slowed down queries unnecessarily.
* In the case that argument's lower bound is adjacent to the centroid's
upper bound, we also don't need to visit all quadrants. Per similar
reasoning as previous point.
* The code where we compare the previous centroid with the current centroid
should match the code where we compare the current centroid with the
argument. The point of that code is to redo the calculation done in the
previous level, to see if we were supposed to traverse left or right (or up
or down), and if we actually did. If we moved in the different direction,
then we know there are no matches for bound.
Refactor the code and adds comments to make it more readable and easier to
reason about.
Backpatch to 9.3 where SP-GiST support for range types was introduced.
Trying to reassign objects owned by a user that had text search
dictionaries or configurations used to fail with:
ERROR: unexpected classid 3600
or
ERROR: unexpected classid 3602
Fix by adding cases for those object types in a switch in pg_shdepend.c.
Both REASSIGN OWNED and text search objects go back all the way to 8.1,
so backpatch to all supported branches. In 9.3 the alter-owner code was
made generic, so the required change in recent branches is pretty
simple; however, for 9.2 and older ones we need some additional
reshuffling to enable specifying objects by OID rather than name.
Text search templates and parsers are not owned objects, so there's no
change required for them.
Per bug #9749 reported by Michal Novotný
ExecEvalWholeRowVar incorrectly supposed that it could "bless" the source
TupleTableSlot just once per query. But if the input is coming from an
Append (or, perhaps, other cases?) more than one slot might be returned
over the query run. This led to "record type has not been registered"
errors when a composite datum was extracted from a non-blessed slot.
This bug has been there a long time; I guess it escaped notice because when
dealing with subqueries the planner tends to expand whole-row Vars into
RowExprs, which don't have the same problem. It is possible to trigger
the problem in all active branches, though, as illustrated by the added
regression test.
While the x output of "select x from t group by x" can be presumed unique,
this does not hold for "select x, generate_series(1,10) from t group by x",
because we may expand the set-returning function after the grouping step.
(Perhaps that should be re-thought; but considering all the other oddities
involved with SRFs in targetlists, it seems unlikely we'll change it.)
Put a check in query_is_distinct_for() so it's not fooled by such cases.
Back-patch to all supported branches.
David Rowley
Previously, when calculations on the need for toast tables changed,
pg_upgrade could not handle cases where the new cluster needed a TOAST
table and the old cluster did not. (It already handled the opposite
case.) This fixes the "OID mismatch" error typically generated in this
case.
Backpatch through 9.2
This function wasn't originally thought to be really user-facing,
because converting a table to a view isn't something we expect people
to do manually. So not all that much effort was spent on the error
messages; in particular, while the code will complain that you got
the column types wrong it won't say exactly what they are. But since
we repurposed the code to also check compatibility of rule RETURNING
lists, it's definitely user-facing. It now seems worthwhile to add
errdetail messages showing exactly what the conflict is when there's
a mismatch of column names or types. This is prompted by bug #10836
from Matthias Raffelsieper, which might have been forestalled if the
error message had reported the wrong column type as being "record".
Per Alvaro's advice, back-patch to branches before 9.4, but resist
the temptation to rephrase any existing strings there. Adding new
strings is not really a translation degradation; anyway having the
info presented in English is better than not having it at all.
The output buffer size in unaccent_lexize() was calculated as input string
length times pg_database_encoding_max_length(), which effectively assumes
that replacement strings aren't more than one character. While that was
all that we previously documented it to support, the code actually has
always allowed replacement strings of arbitrary length; so if you tried
to make use of longer strings, you were at risk of buffer overrun. To fix,
use an expansible StringInfo buffer instead of trying to determine the
maximum space needed a-priori.
This would be a security issue if unaccent rules files could be installed
by unprivileged users; but fortunately they can't, so in the back branches
the problem can be labeled as improper configuration by a superuser.
Nonetheless, a memory stomp isn't a nice way of reacting to improper
configuration, so let's back-patch the fix.
This function continued to use it after heap_endscan() freed it. In
passing, don't explicit create a strategy here. Instead, use the one
created by heap_beginscan_strat(), if any. Back-patch to 9.2, where use
of a BufferAccessStrategy here was introduced.
Instead of truncating pg_multixact at vacuum time, do it only at
checkpoint time. The reason for doing it this way is twofold: first, we
want it to delete only segments that we're certain will not be required
if there's a crash immediately after the removal; and second, we want to
do it relatively often so that older files are not left behind if
there's an untimely crash.
Per my proposal in
http://www.postgresql.org/message-id/20140626044519.GJ7340@eldon.alvh.no-ip.org
we now execute the truncation in the checkpointer process rather than as
part of vacuum. Vacuum is in only charge of maintaining in shared
memory the value to which it's possible to truncate the files; that
value is stored as part of checkpoints also, and so upon recovery we can
reuse the same value to re-execute truncate and reset the
oldest-value-still-safe-to-use to one known to remain after truncation.
Per bug reported by Jeff Janes in the course of his tests involving
bug #8673.
While at it, update some comments that hadn't been updated since
multixacts were changed.
Backpatch to 9.3, where persistency of pg_multixact files was
introduced by commit 0ac5ad5134.
We were allowing a table's pg_class.relminmxid value to move backwards
when heaps were swapped by VACUUM FULL or CLUSTER. There is a
similar protection against relfrozenxid going backwards, which we
neglected to clone when the multixact stuff was rejiggered by commit
0ac5ad5134.
Backpatch to 9.3, where relminmxid was introduced.
As reported by Heikki in
http://www.postgresql.org/message-id/52401AEA.9000608@vmware.com
Don't assert MultiXactIdIsRunning if the multi came from a tuple that
had been share-locked and later copied over to the new cluster by
pg_upgrade. Doing that causes an error to be raised unnecessarily:
MultiXactIdIsRunning is not open to the possibility that its argument
came from a pg_upgraded tuple, and all its other callers are already
checking; but such multis cannot, obviously, have transactions still
running, so the assert is pointless.
Noticed while investigating the bogus pg_multixact/offsets/0000 file
left over by pg_upgrade, as reported by Andres Freund in
http://www.postgresql.org/message-id/20140530121631.GE25431@alap3.anarazel.de
Backpatch to 9.3, as the commit that introduced the buglet.
When we committed a87c729153, we somehow
failed to notice that it didn't merely improve plan quality for expression
indexes; there were very closely related cases that failed outright with
"could not find pathkey item to sort". The failing cases seem to be those
where the planner was already capable of selecting a MergeAppend plan,
and there was inheritance involved: the lack of appropriate eclass child
members would prevent prepare_sort_from_pathkeys() from succeeding on the
MergeAppend's child plan nodes for inheritance child tables.
Accordingly, back-patch into 9.1 through 9.3, along with an extra
regression test case covering the problem.
Per trouble report from Michael Glaesemann.
7380b63 changed log_filename so that epoch was not appended to it
when no format specifier is given. But the example of CSV log file name
with epoch still left in log_filename document. This commit removes
such obsolete example.
This commit also documents the defaults of log_directory and
log_filename.
Backpatch to all supported versions.
Christoph Berg
populate_recordset_object_start() improperly created a new hash table
(overwriting the link to the existing one) if called at nest levels
greater than one. This resulted in previous fields not appearing in
the final output, as reported by Matti Hameister in bug #10728.
In 9.4 the problem also affects json_to_recordset.
This perhaps missed detection earlier because the default behavior is to
throw an error for nested objects: you have to pass use_json_as_text = true
to see the problem.
In addition, fix query-lifespan leakage of the hashtable created by
json_populate_record(). This is pretty much the same problem recently
fixed in dblink: creating an intended-to-be-temporary context underneath
the executor's per-tuple context isn't enough to make it go away at the
end of the tuple cycle, because MemoryContextReset is not
MemoryContextResetAndDeleteChildren.
Michael Paquier and Tom Lane
This fixes a bug that caused vacuum to fail when the '0000' files left
by initdb were accessed as part of vacuum's cleanup of old pg_multixact
files.
Backpatch through 9.3
The syntax doesn't let you specify "WITH OIDS" for foreign tables, but it
was still possible with default_with_oids=true. But the rest of the system,
including pg_dump, isn't prepared to handle foreign tables with OIDs
properly.
Backpatch down to 9.1, where foreign tables were introduced. It's possible
that there are databases out there that already have foreign tables with
OIDs. There isn't much we can do about that, but at least we can prevent
them from being created in the future.
Patch by Etsuro Fujita, reviewed by Hadi Moshayedi.
By using curly braces, the template had specified that one of
"NOT DEFERRABLE", "INITIALLY IMMEDIATE", or "INITIALLY DEFERRED"
was required on any CREATE TRIGGER statement, which is not
accurate. Change to square brackets makes that optional.
Backpatch to 9.1, where the error was introduced.
dblink uses a short-lived data conversion memory context. However it
was not deleted when no longer needed, leading to a noticeable memory
leak under some circumstances. Plug the hole, along with minor
refactoring. Backpatch to 9.2 where the leak was introduced.
Report and initial patch by MauMau. Reviewed/modified slightly by
Tom Lane and me.
Since fdf9e21196 lazy_vacuum_page() rechecks the all-visible status
of pages in the second pass over the heap. It does so inside a
critical section, but both visibilitymap_test() and
heap_page_is_all_visible() perform operations that should not happen
inside one. The former potentially performs IO and both potentially do
memory allocations.
To fix, simply move all the all-visible handling outside the critical
section. Doing so means that the PD_ALL_VISIBLE on the page won't be
included in the full page image of the HEAP2_CLEAN record anymore. But
that's fine, the flag will be set by the HEAP2_VISIBLE logged later.
Backpatch to 9.3 where the problem was introduced. The bug only came
to light due to the assertion added in 4a170ee9 and isn't likely to
cause problems in production scenarios. The worst outcome is a
avoidable PANIC restart.
This also gets rid of the difference in the order of operations
between master and standby mentioned in 2a8e1ac5.
Per reports from David Leverton and Keith Fiske in bug #10533.
ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM
in the query-lifespan memory context. This is insignificant in simple
cases where the function relation is scanned only once; but if the function
is in a sub-SELECT or is on the inside of a nested loop, any memory
consumed during argument evaluation can add up quickly. (The potential for
trouble here had been foreseen long ago, per existing comments; but we'd
not previously seen a complaint from the field about it.) To fix, create
an additional temporary context just for this purpose.
Per an example from MauMau. Back-patch to all active branches.
Any OS user able to access the socket can connect as the bootstrap
superuser and proceed to execute arbitrary code as the OS user running
the test. Protect against that by placing the socket in a temporary,
mode-0700 subdirectory of /tmp. The pg_regress-based test suites and
the pg_upgrade test suite were vulnerable; the $(prove_check)-based test
suites were already secure. Back-patch to 8.4 (all supported versions).
The hazard remains wherever the temporary cluster accepts TCP
connections, notably on Windows.
As a convenient side effect, this lets testing proceed smoothly in
builds that override DEFAULT_PGSOCKET_DIR. Popular non-default values
like /var/run/postgresql are often unwritable to the build user.
Security: CVE-2014-0067
This function is pervasive on free software operating systems; import
NetBSD's implementation. Back-patch to 8.4, like the commit that will
harness it.
Prior to 9.0, pg_dump handled comments on large objects by dumping a bunch
of COMMENT commands into a single BLOB COMMENTS archive object. With
sufficiently many such comments, some of the commands would likely get
split across bufferloads when restoring, causing failures in
direct-to-database restores (though no problem would be evident in text
output). This is the same type of issue we have with table data dumped as
INSERT commands, and it can be fixed in the same way, by using a mini SQL
lexer to figure out where the command boundaries are. Fortunately, the
COMMENT commands are no more complex to lex than INSERTs, so we can just
re-use the existing lexer for INSERTs.
Per bug #10611 from Jacek Zalewski. Back-patch to all active branches.
Robert Frost is no longer with us, but his copyrights still are, so
let's stop using "Stopping by Woods on a Snowy Evening" as test data
before somebody decides to sue us. Wordsworth is more safely dead.
When we grabbed this file off the Snowball project's website, we mistakenly
supposed that it was in LATIN1 encoding, but evidently it was actually in
LATIN2. This resulted in ő (o-double-acute, U+0151, which is code 0xF5 in
LATIN2) being misconverted into õ (o-tilde, U+00F5), as complained of in
bug #10589 from Zoltán Sörös. We'd have messed up u-double-acute too,
but there aren't any of those in the file. Other characters used in the
file have the same codes in LATIN1 and LATIN2, which no doubt helped hide
the problem for so long.
The error is not only ours: the Snowball project also was confused about
which encoding is required for Hungarian. But dealing with that will
require source-code changes that I'm not at all sure we'll wish to
back-patch. Fixing the stopword file seems reasonably safe to back-patch
however.
Although this bug is already fixed in post-9.2 branches, the case
triggering it is quite different from what was under consideration
at the time. It seems worth memorializing this example in HEAD
just to make sure it doesn't get broken again in future.
Extracted from commit 187ae17300.