Don't insist on pg_dumpall and psql being present in the old cluster,
since they are not needed. Do insist on pg_resetxlog being present
(in both old and new), since we need it. Also check for pg_config,
but only in the new cluster. Remove the useless attempt to call
pg_config in the old cluster; we don't need to know the old value of
--pkglibdir. (In the case of a stripped-down migration installation
there might be nothing there to look at anyway, so any future change
that might reintroduce that need would have to be considered carefully.)
Per my attempts to build a minimal previous-version installation to support
pg_upgrade.
The "date" type supports a wider range of dates than int64 timestamps do.
However, there is pre-int64-timestamp code in the planner that assumes that
all date values can be converted to timestamp with impunity. Fortunately,
what we really need out of the conversion is always a double (float8)
value; so even when the date is out of timestamp's range it's possible to
produce a sane answer. All we need is a code path that doesn't try to
force the result into int64. Per trouble report from David Rericha.
Back-patch to all supported versions. Although this is surely a corner
case, there's not much point in advertising a date range wider than
timestamp's if we will choke on such values in unexpected places.
eval_const_expressions() can replace CaseTestExprs with constants when
the surrounding CASE's test expression is a constant. This confuses
ruleutils.c's heuristic for deparsing simple-form CASEs, leading to
Assert failures or "unexpected CASE WHEN clause" errors. I had put in
a hack solution for that years ago (see commit
514ce7a331 of 2006-10-01), but bug #5794
from Peter Speck shows that that solution failed to cover all cases.
Fortunately, there's a much better way, which came to me upon reflecting
that Peter's "CASE TRUE WHEN" seemed pretty redundant: we can "simplify"
the simple-form CASE to the general form of CASE, by simply omitting the
constant test expression from the rebuilt CASE construct. This is
intuitively valid because there is no need for the executor to evaluate
the test expression at runtime; it will never be referenced, because any
CaseTestExprs that would have referenced it are now replaced by constants.
This won't save a whole lot of cycles, since evaluating a Const is pretty
cheap, but a cycle saved is a cycle earned. In any case it beats kluging
ruleutils.c still further. So this patch improves const-simplification
and reverts the previous change in ruleutils.c.
Back-patch to all supported branches. The bug exists in 8.1 too, but it's
out of warranty.
After parsing a parenthesized subexpression, we must pop all pending
ANDs and NOTs off the stack, just like the case for a simple operand.
Per bug #5793.
Also fix clones of this routine in contrib/intarray and contrib/ltree,
where input of types query_int and ltxtquery had the same problem.
Back-patch to all supported versions.
We don't actually need optreset, because we can easily fix the code to
ensure that it's cleanly restartable after having completed a scan over the
argv array; which is the only case we need to restart in. Getting rid of
it avoids a class of interactions with the system libraries and allows
reversion of my change of yesterday in postmaster.c and postgres.c.
Back-patch to 8.4. Before that the getopt code was a bit different anyway.
The mingw people don't appear to care about compatibility with non-GNU
versions of getopt, so force use of our own copy of getopt on Windows.
Also, ensure that we make use of optreset when using our own copy.
Per report from Andrew Dunstan. Back-patch to all versions supported
on Windows.
Fix the same size_alpha versus size_beta typo that was recently fixed
in contrib/cube. Noted by Alexander Korotkov.
Back-patch to all supported branches (there is a more invasive fix in
HEAD).
The original coding in tuplestore_trim() was only meant to work efficiently
in cases where each trim call deleted most of the tuples in the store.
Which, in fact, was the pattern of the original usage with a Material node
supporting mark/restore operations underneath a MergeJoin. However,
WindowAgg now uses tuplestores and it has considerably less friendly
trimming behavior. In particular it can attempt to trim one tuple at a
time off a large tuplestore. tuplestore_trim() had O(N^2) runtime in this
situation because of repeatedly shifting its tuple pointer array. Fix by
avoiding shifting the array until a reasonably large number of tuples have
been deleted. This can waste some pointer space, but we do still reclaim
the tuples themselves, so the percentage wastage should be pretty small.
Per Jie Li's report of slow percent_rank() evaluation. cume_dist() and
ntile() would certainly be affected as well, along with any other window
function that has a moving frame start and requires reading substantially
ahead of the current row.
Back-patch to 8.4, where window functions were introduced. There's no
need to tweak it before that.
Hot Standby conflicts only with tuples that were visible at
some point. So ignore tuples from aborted transactions or for
tuples updated/deleted during the inserting transaction when
generating the conflict transaction ids.
Following detailed analysis and test case by Noah Misch.
Original report covered btree delete records, correctly observed
by Heikki Linnakangas that this applies to other cases also.
Fix covers all sources of cleanup records via common code.
Includes additional fix compared to commit on HEAD
With hundreds of thousands of TOC entries, the repeated searches in
reduce_dependencies() become the dominant cost. Get rid of that searching
by constructing reverse-dependency lists, which we can do in O(N) time
during the fix_dependencies() preprocessing. I chose to store the reverse
dependencies as DumpId arrays for consistency with the forward-dependency
representation, and keep the previously-transient tocsByDumpId[] array
around to locate actual TOC entry structs quickly from dump IDs.
While this fixes the slow case reported by Vlad Arkhipov, there is still
a potential for O(N^2) behavior with sufficiently many tables:
fix_dependencies itself, as well as mark_create_done and
inhibit_data_for_failed_table, are doing repeated searches to deal with
table-to-table-data dependencies. Possibly this work could be extended
to deal with that, although the latter two functions are also used in
non-parallel restore where we currently don't run fix_dependencies.
Another TODO is that we fail to parallelize restore of multiple blobs
at all. This appears to require changes in the archive format to fix.
Back-patch to 9.0 where the problem was reported. 8.4 has potential issues
as well; but since it doesn't create a separate TOC entry for each blob,
it's at much less risk of having enough TOC entries to cause real problems.
Recent versions of the Linux system header files cause xlogdefs.h to
believe that open_datasync should be the default sync method, whereas
formerly fdatasync was the default on Linux. open_datasync is a bad
choice, first because it doesn't actually outperform fdatasync (in fact
the reverse), and second because we try to use O_DIRECT with it, causing
failures on certain filesystems (e.g., ext4 with data=journal option).
This part of the patch is largely per a proposal from Marti Raudsepp.
More extensive changes are likely to follow in HEAD, but this is as much
change as we want to back-patch.
Also clean up confusing code and incorrect documentation surrounding the
fsync_writethrough option. Those changes shouldn't result in any actual
behavioral change, but I chose to back-patch them anyway to keep the
branches looking similar in this area.
In 9.0 and HEAD, also do some copy-editing on the WAL Reliability
documentation section.
Back-patch to all supported branches, since any of them might get used
on modern Linux versions.
an old transaction running in the master, and a lot of transactions have
started and finished since, and a WAL-record is written in the gap between
the creating the running-xacts snapshot and WAL-logging it, recovery will fail
with "too many KnownAssignedXids" error. This bug was reported by
Joachim Wieland on Nov 19th.
In the same scenario, when fewer transactions have started so that all the
xids fit in KnownAssignedXids despite the first bug, a more serious bug
arises. We incorrectly initialize the clog code with the oldest still running
transaction, and when we see the WAL record belonging to a transaction with
an XID larger than one that committed already before the checkpoint we're
recovering from, we zero the clog page containing the already committed
transaction, leading to data loss.
In hindsight, trying to track xids in the known-assigned-xids array before
seeing the running-xacts record was too complicated. To fix that, hold
XidGenLock while the running-xacts snapshot is taken and WAL-logged. That
ensures that no transaction can begin or end in that gap, so that in recvoery
we know that the snapshot contains all transactions running at that point in
WAL.
There are some code paths, such as SPI_execute(), where we invoke
copyObject() on raw parse trees before doing parse analysis on them. Since
the bison grammar is capable of building heavily nested parsetrees while
itself using only minimal stack depth, this means that copyObject() can be
the front-line function that hits stack overflow before anything else does.
Accordingly, it had better have a check_stack_depth() call. I did a bit of
performance testing and found that this slows down copyObject() by only a
few percent, so the hit ought to be negligible in the context of complete
processing of a query.
Per off-list report from Toshihide Katayama. Back-patch to all supported
branches.
There were corner cases in which the planner would attempt to inline such
a function, which would result in a failure at runtime due to loss of
information about exactly what the result record type is. Fix by disabling
inlining when the function's recorded result type is RECORD. There might
be some sub-cases where inlining could still be allowed, but this is a
simple and backpatchable fix, so leave refinements for another day.
Per bug #5777 from Nate Carson.
Back-patch to all supported branches. 8.1 happens to avoid a core-dump
here, but it still does the wrong thing.
removing an infrequently occurring race condition in Hot Standby.
An xid must be assigned before a lock appears in shared memory,
rather than immediately after, else GetRunningTransactionLocks()
may see InvalidTransactionId, causing assertion failures during
lock processing on standby.
Bug report and diagnosis by Fujii Masao, fix by me.
Most of the functions that execute XPath queries leaked the data structures
created by libxml2. This memory would not be recovered until end of
session, so it mounts up pretty quickly in any serious use of the feature.
Per report from Pavel Stehule, though this isn't his patch.
Back-patch to all supported branches.
When using default autovacuum_vac_cost_limit, autovac_balance_cost relied
on VacuumCostLimit to contain the correct global value ... but after the
first time through in a particular worker process, it didn't, because we'd
trashed it in previous iterations. Depending on the state of other autovac
workers, this could result in a steady reduction of the effective
cost_limit setting as a particular worker processed more and more tables,
causing it to go slower and slower. Spotted by Simon Poole (bug #5759).
Fix by saving and restoring the GUC variables in the loop in do_autovacuum.
In passing, improve a few comments.
Back-patch to 8.3 ... the cost rebalancing code has been buggy since it was
put in.
Given a column reference foo.bar, where there is a composite plpgsql
variable foo but it doesn't contain a column bar, the pre-9.0 coding would
immediately throw a "record foo has no field bar" error. In 9.0 the parser
hook instead falls through to let the core parser see if it can resolve the
reference. If not, you get a complaint about "missing FROM-clause entry
for table foo", which while in some sense correct isn't terribly helpful.
Complicate things a bit so that we can throw the old error message if
neither the core parser nor the hook are able to resolve the column
reference, while not changing the behavior in any other case.
Per bug #5757 from Andrey Galkin.
The handle to the shared memory segment containing startup
parameters was sent as 32-bit even on 64-bit systems. Since
HANDLEs appear to be allocated sequentially this shouldn't
be a problem until we reach 2^32 open handles in the postmaster,
but a 64-bit value should be sent across as 64-bit, and not
zero out the top 32 bits.
Noted by Tom Lane.
temporary indexes are not WAL-logged. We used a constant LSN for temporary
indexes, on the assumption that we don't need to worry about concurrent page
splits in temporary indexes because they're only visible to the current
session. But that assumption is wrong, it's possible to insert rows and
split pages in the same session, while a scan is in progress. For example,
by opening a cursor and fetching some rows, and INSERTing new rows before
fetching some more.
Fix by generating fake increasing LSNs, used in place of real LSNs in
temporary GiST indexes.
We must stay in the function's SPI context until done calling the iterator
that returns the set result. Otherwise, any attempt to invoke SPI features
in the python code called by the iterator will malfunction. Diagnosis and
patch by Jan Urbanski, per bug report from Jean-Baptiste Quenot.
Back-patch to 8.2; there was no support for SRFs in previous versions of
plpython.
Similar conflicts were already avoided for related record types.
Massive over-caution resulted in a usability bug. Clear theoretical
basis for doing this is now confirmed by me.
Request to remove from Heikki (twice), over-caution by me.
We must not return any "okay to proceed" result code without having checked
for too many children, else we might fail later on when trying to add the
new child to one of the per-child state arrays. It's not clear whether
this oversight explains Stefan Kaltenbrunner's recent report, but it could
certainly produce a similar symptom.
Back-patch to 8.4; the logic was not broken before that.
This is needed to support debug_print_parse, per report from Jon Nelson.
Cursory testing via the regression tests suggests we aren't missing
anything else.
Once we have found a non-null constant argument, there is no need to
examine additional arguments of the COALESCE. The previous coding got it
right only if the constant was in the first argument position; otherwise
it tried to simplify following arguments too, leading to unexpected
behavior like this:
regression=# select coalesce(f1, 42, 1/0) from int4_tbl;
ERROR: division by zero
It's a minor corner case, but a bug is a bug, so back-patch all the way.
belonging to a user at DROP OWNED BY. Foreign data wrappers and servers
don't do anything useful yet, which is why no-one has noticed, but since we
have them, seems prudent to fix this. Per report from Chetan Suttraway.
Backpatch to 9.0, 8.4 has the same problem but this patch didn't apply
there so I'm not going to bother.
location read from backup label file can be found: wasShutdown was set
incorrectly when a backup label file was found.
Jeff Davis, with a little tweaking by me.
This code was just plain wrong: what you got was not a line through the
given point but a line almost indistinguishable from the Y-axis, although
not truly vertical. The only caller that tries to use this function with
m == DBL_MAX is dist_ps_internal for the case where the lseg is horizontal;
it would end up producing the distance from the given point to the place
where the lseg's line crosses the Y-axis. That function is used by other
operators too, so there are several operators that could compute wrong
distances from a line segment to something else. Per bug #5745 from
jindiax.
Back-patch to all supported branches.