My "fix" for bugs #7578 and #6116 on DROP OWNED at fe3b5eb08a not only
misstated that it applied to REASSIGN OWNED (which it did not affect),
but it also failed to fix the problems fully, because I didn't test the
case of owned shared objects. Thus I created a new bug, reported by
Thomas Kellerer as #7748, which would cause DROP OWNED to fail with a
not-for-user-consumption error message. The code would attempt to drop
the database, which not only fails to work because the underlying code
does not support that, but is a pretty dangerous and undesirable thing
to be doing as well.
This patch fixes that bug by having DROP OWNED only attempt to process
shared objects when grants on them are found, ignoring ownership.
Backpatch to 8.3, which is as far as the previous bug was backpatched.
If a PL/Python function raises an SPIError (or one if its subclasses)
directly with python's raise statement, treat it the same as an SPIError
generated internally. In particular, if the user sets the sqlstate
attribute, preserve that.
Oskari Saarenmaa and Jan Urbański, reviewed by Karl O. Pinc.
The SQL standard does not have general functions-in-FROM, but it does
allow UNNEST() there (see the <collection derived table> production),
and the semantics of that are defined to include lateral references.
So spec compliance requires allowing lateral references within UNNEST()
even without an explicit LATERAL keyword. Rather than making UNNEST()
a special case, it seems best to extend this flexibility to any
function-in-FROM. We'll still allow LATERAL to be written explicitly
for clarity's sake, but it's now a noise word in this context.
In theory this change could result in a change in behavior of existing
queries, by allowing what had been an outer reference in a function-in-FROM
to be captured by an earlier FROM-item at the same level. However, all
pre-9.3 PG releases have a bug that causes them to match variable
references to earlier FROM-items in preference to outer references (and
then throw an error). So no previously-working query could contain the
type of ambiguity that would risk a change of behavior.
Per a suggestion from Andrew Gierth, though I didn't use his patch.
DROP IF EXISTS with a missing schema in commit
7e2322dff3 applies not only to tables, but
to DROP IF EXISTS with missing schemas for indexes, views, sequences,
and foreign tables. Yeah!
Previously non-honored FREEZE mode was ignored. This also issues an
appropriate error message based on the cause of the failure, per
suggestion from Tom. Additional regression test case added.
Previously, CREATE TABLE IF EXIST threw an error if the schema was
nonexistent. This was done by passing 'missing_ok' to the function that
looks up the schema oid.
plpython tried to use a single cache entry for a trigger function, but it
needs a separate cache entry for each table the trigger is applied to,
because there is table-dependent data in there. This was done correctly
before 9.1, but commit 46211da1b8 broke it
by simplifying the lookup key from "function OID and triggered table OID"
to "function OID and is-trigger boolean". Go back to using both OIDs
as the lookup key. Per bug report from Sandro Santilli.
Andres Freund
In the initial implementation of plan caching, we saved the active
search_path when a plan was first cached, then reinstalled that path
anytime we needed to reparse or replan. The idea of that was to try to
reselect the same referenced objects, in somewhat the same way that views
continue to refer to the same objects in the face of schema or name
changes. Of course, that analogy doesn't bear close inspection, since
holding the search_path fixed doesn't cope with object drops or renames.
Moreover sticking with the old path seems to create more surprises than
it avoids. So instead of doing that, consider that the cached plan depends
on search_path, and force reparse/replan if the active search_path is
different than it was when we last saved the plan.
This gets us fairly close to having "transparency" of plan caching, in the
sense that the cached statement acts the same as if you'd just resubmitted
the original query text for another execution. There are still some corner
cases where this fails though: a new object added in the search path
schema(s) might capture a reference in the query text, but we'd not realize
that and force a reparse. We might try to fix that in the future, but for
the moment it looks too expensive and complicated.
When descending the tree for an insert, and there are multiple equally good
pages we could insert to, make the choice in random. Previously, we would
always choose the tuple with lowest offset number. That meant that when two
non-leaf pages overlap - in the extreme case they might have exactly the same
key - all but the first such page went unused. That wasn't optimal for space
usage; if you deleted some tuples from the non-first pages, the space would
never be reused.
With this patch, the other pages are sometimes chosen too, although there's
still a heavy bias towards low-offset tuples, so that we don't lose cache
locality when doing a lot of inserts with similar keys.
Original idea by Alexander Korotkov, although this patch version was written
by me and copy-edited by Tom Lane.
Previously, the VARIADIC labeling was effectively ignored, but now these
functions act as though the array elements had all been given as separate
arguments.
Pavel Stehule
Since 9.0, the count parameter has only limited the number of tuples
actually returned by the executor. It doesn't affect the behavior of
INSERT/UPDATE/DELETE unless RETURNING is specified, because without
RETURNING, the ModifyTable plan node doesn't return control to execMain.c
for each tuple. And we only check the limit at the top level.
While this behavioral change was unintentional at the time, discussion of
bug #6572 led us to the conclusion that we prefer the new behavior anyway,
and so we should just adjust the docs to match rather than change the code.
Accordingly, do that. Back-patch as far as 9.0 so that the docs match the
code in each branch.
This ensures that mapping of non-ascii prompts
to the correct code page occurs.
Bug report and original patch from Alexander Law,
reviewed and reworked by Noah Misch.
Backpatch to all live branches.
Tuples marked SELECT FOR UPDATE in a cluster that's later processed by
pg_upgrade would have a different infomask bit pattern than those
produced by 9.3dev; that bit pattern was being seen as "dead" by HEAD
(because they would fail the "is this tuple locked" test, and so the
visibility rules would thing they're updated, even though there's no
HEAP_UPDATED version of them). In other words, some rows could silently
disappear after pg_upgrade.
With this new definition, those tuples become visible again.
This is breakage resulting from my commit 0ac5ad5134.
This makes 9.3 -> 9.3 upgrades work when they cross the commit that
added persistent multixacts; early 9.3 pg_controldata did not have the
required oldestMultiXact line, and so would fail to upgrade.
per Bruce Momjian
The machinery around XLOG_HEAP2_CLEANUP_INFO failed
to correctly pass through the necessary information
on latestRemovedXid, avoiding cancellations in some
infrequent concurrent update/cleanup scenarios.
Backpatchable fix to 9.0
Detailed bug report and fix by Noah Misch,
backpatchable version by me.
When pg_upgrade can't find required pg_controldata information, report
_which_ cluster is failing, with this message:
The %s cluster lacks some required control information:
When we eliminated "unnecessary" wakeups of the syslogger process, we
broke size-based logfile rotation on Windows, because on that platform
data transfer is done in a separate thread. While non-Windows platforms
would recheck the output file size after every log message, Windows only
did so when the control thread woke up for some other reason, which might
be quite infrequent. Per bug #7814 from Tsunezumi. Back-patch to 9.2
where the problem was introduced.
Jeff Janes
This patch introduces two additional lock modes for tuples: "SELECT FOR
KEY SHARE" and "SELECT FOR NO KEY UPDATE". These don't block each
other, in contrast with already existing "SELECT FOR SHARE" and "SELECT
FOR UPDATE". UPDATE commands that do not modify the values stored in
the columns that are part of the key of the tuple now grab a SELECT FOR
NO KEY UPDATE lock on the tuple, allowing them to proceed concurrently
with tuple locks of the FOR KEY SHARE variety.
Foreign key triggers now use FOR KEY SHARE instead of FOR SHARE; this
means the concurrency improvement applies to them, which is the whole
point of this patch.
The added tuple lock semantics require some rejiggering of the multixact
module, so that the locking level that each transaction is holding can
be stored alongside its Xid. Also, multixacts now need to persist
across server restarts and crashes, because they can now represent not
only tuple locks, but also tuple updates. This means we need more
careful tracking of lifetime of pg_multixact SLRU files; since they now
persist longer, we require more infrastructure to figure out when they
can be removed. pg_upgrade also needs to be careful to copy
pg_multixact files over from the old server to the new, or at least part
of multixact.c state, depending on the versions of the old and new
servers.
Tuple time qualification rules (HeapTupleSatisfies routines) need to be
careful not to consider tuples with the "is multi" infomask bit set as
being only locked; they might need to look up MultiXact values (i.e.
possibly do pg_multixact I/O) to find out the Xid that updated a tuple,
whereas they previously were assured to only use information readily
available from the tuple header. This is considered acceptable, because
the extra I/O would involve cases that would previously cause some
commands to block waiting for concurrent transactions to finish.
Another important change is the fact that locking tuples that have
previously been updated causes the future versions to be marked as
locked, too; this is essential for correctness of foreign key checks.
This causes additional WAL-logging, also (there was previously a single
WAL record for a locked tuple; now there are as many as updated copies
of the tuple there exist.)
With all this in place, contention related to tuples being checked by
foreign key rules should be much reduced.
As a bonus, the old behavior that a subtransaction grabbing a stronger
tuple lock than the parent (sub)transaction held on a given tuple and
later aborting caused the weaker lock to be lost, has been fixed.
Many new spec files were added for isolation tester framework, to ensure
overall behavior is sane. There's probably room for several more tests.
There were several reviewers of this patch; in particular, Noah Misch
and Andres Freund spent considerable time in it. Original idea for the
patch came from Simon Riggs, after a problem report by Joel Jacobson.
Most code is from me, with contributions from Marti Raudsepp, Alexander
Shulgin, Noah Misch and Andres Freund.
This patch was discussed in several pgsql-hackers threads; the most
important start at the following message-ids:
AANLkTimo9XVcEzfiBR-ut3KVNDkjm2Vxh+t8kAmWjPuv@mail.gmail.com1290721684-sup-3951@alvh.no-ip.org1294953201-sup-2099@alvh.no-ip.org1320343602-sup-2290@alvh.no-ip.org1339690386-sup-8927@alvh.no-ip.org4FE5FF020200002500048A3D@gw.wicourts.gov4FEAB90A0200002500048B7D@gw.wicourts.gov
When a standby server follows the master using WAL archive, and it chooses
a new timeline (recovery_target_timeline='latest'), it only fetches the
timeline history file for the chosen target timeline, not any other history
files that might be missing from pg_xlog. For example, if the current
timeline is 2, and we choose 4 as the new recovery target timeline, the
history file for timeline 3 is not fetched, even if it's part of this
server's history. That's enough for the standby itself - the history file
for timeline 4 includes timeline 3 as well - but if a cascading standby
server wants to recover to timeline 3, it needs the history file. To fix,
when a new recovery target timeline is chosen, try to copy any missing
history files from the archive to pg_xlog between the old and new target
timeline.
A second similar issue was with the WAL files. When a standby recovers from
archive, and it reaches a segment that contains a switch to a new timeline,
recovery fetches only the WAL file labelled with the new timeline's ID. The
file from the new timeline contains a copy of the WAL from the old timeline
up to the point where the switch happened, and recovery recovers it from the
new file. But in streaming replication, walsender only tries to read it
from the old timeline's file. To fix, change walsender to read it from the
new file, so that it behaves the same as recovery in that sense, and doesn't
try to open the possibly nonexistent file with the old timeline's ID.
With AtEOXact applied, --single-transaction makes pg_restore slower, and
has the potential to require lock table configuration, so remove the
argument.
Per suggestion from Tom.