recover from elog(ERROR). Problem was created by introduction of hash seq
search tracking awhile back, and affects all branches that have bgwriter;
in HEAD the disease has snuck into autovacuum and walwriter too. (Not sure
that the latter two use hash_seq_search at the moment, but surely they might
someday.) Per report from Sergey Koposov.
an exclusive lock on the table at this point, which we want to release as soon
as possible. This is called in the phase of lazy vacuum where we truncate the
empty pages at the end of the table.
An alternative solution would be to lower the vacuum delay settings before
starting the truncating phase, but this doesn't work very well in autovacuum
due to the autobalancing code (which can cause other processes to change our
cost delay settings). This case could be considered in the balancing code, but
it is simpler this way.
big misalignement, then it tries to split page basing on distribution
of boxe's centers.
Per report from Dolafi, Tom <dolafit@janelia.hhmi.org>
Backpatch is needed, changes doesn't affect on-disk storage.
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
generating the tuples has resjunk output columns. This is not possible for
simple table scans but can happen when evaluating a whole-row Var for a view.
Per example from Patryk Kordylewski. The problem exists back to 8.0 but
I'm not going to risk back-patching further than 8.2 because of the many
changes in this area.
sets for outer joins, in the light of bug #3588 and additional thought and
experimentation. The original methodology was fatally flawed for nests of
more than two outer joins: it got the relationships between adjacent joins
right, but didn't always come to the right conclusions about whether a join
could be interchanged with one two or more levels below it. This was largely
caused by a mistaken idea that we should use the min_lefthand + min_righthand
sets of a sub-join as the minimum left or right input set of an upper join
when we conclude that the sub-join can't commute with the upper one. If
there's a still-lower join that the sub-join *can* commute with, this method
led us to think that that one could commute with the topmost join; which it
can't. Another problem (not directly connected to bug #3588) was that
make_outerjoininfo's processing-order-dependent method for enforcing outer
join identity #3 didn't work right: if we decided that join A could safely
commute with lower join B, we dropped all information about sub-joins under B
that join A could perhaps not safely commute with, because we removed B's
entire min_righthand from A's.
To fix, make an explicit computation of all inner join combinations that occur
below an outer join, and add to that the full syntactic relsets of any lower
outer joins that we determine it can't commute with. This method gives much
more direct enforcement of the outer join rearrangement identities, and it
turns out not to cost a lot of additional bookkeeping.
Thanks to Richard Harris for the bug report and test case.
read from the temp file didn't match the file length reported by ftello(),
the wrong variable's value was printed, and so the message made no sense.
Clean up a couple other coding infelicities while at it.
relcache entry after having heap_close'd it. This could lead to misbehavior
if a relcache flush wiped out the cache entry meanwhile. In 8.2 there is a
very real risk of CREATE INDEX CONCURRENTLY using the wrong relid for locking
and waiting purposes. I think the bug is only cosmetic in 8.0 and 8.1,
because their transgression is limited to using RelationGetRelationName(rel)
in an ereport message immediately after heap_close, and there's no way (except
with special debugging options) for a cache flush to occur in that interval.
Not quite sure that it's cosmetic in 7.4, but seems best to patch anyway.
Found by trying to run the regression tests with CLOBBER_CACHE_ALWAYS enabled.
Maybe we should try to do that on a regular basis --- it's awfully slow,
but perhaps some fast buildfarm machine could do it once in awhile.
padded encryption scheme. Formerly it would try to access res[(unsigned) -1],
which resulted in core dumps on 64-bit machines, and was certainly trouble
waiting to happen on 32-bit machines (though in at least the known case
it was harmless because that byte would be overwritten after return).
Per report from Ken Colson; fix by Marko Kreen.
byte after the last full byte of the bit array, regardless of whether that
byte was part of the valid data or not. Found by buildfarm testing.
Thanks to Stefan Kaltenbrunner for nailing down the cause.
row within one query: we were firing check triggers before all the updates
were done, leading to bogus failures. Fix by making the triggers queued by
an RI update go at the end of the outer query's trigger event list, thereby
effectively making the processing "breadth-first". This was indeed how it
worked pre-8.0, so the bug does not occur in the 7.x branches.
Per report from Pavel Stehule.
hash table is allocated in a child context of the agg node's memory
context, MemoryContextReset() will reset but *not* delete the child
context. Since ExecReScanAgg() proceeds to build a new hash table
from scratch (in a new sub-context), this results in leaking the
header for the previous memory context. Therefore, use
MemoryContextResetAndDeleteChildren() instead.
Credit: My colleague Sailesh Krishnamurthy at Truviso for isolating
the cause of the leak.
on Windows. This is yet another manifestation of the problem that Windows
returns time zone names that may be in a different encoding than we are using.
I've put a better solution in HEAD, but the back branches need a simple patch.
Per report from Hiroshi Saito.
clauses in which one side or the other references both sides of the join
cannot be removed as redundant, because that expression won't have been
constrained below the join. Per report from Sergey Burladyan.
checking whether an IS NULL/IS NOT NULL clause is implied or refuted by
a strict function. Per example from Dawid Kuroczko.
Backpatch to 8.2 since this is arguably a performance bug.
log_min_error_statement is active and there is some problem in logging the
current query string; for example, that it's too long to include in the log
message without running out of memory. This problem has existed since the
log_min_error_statement feature was introduced. No doubt the reason it
wasn't detected long ago is that 8.2 is the first release that defaults
log_min_error_statement to less than PANIC level.
Per report from Bill Moran.
truncated relation was deleted later in the WAL sequence. Since replay
normally auto-creates a relation upon its first reference by a WAL log entry,
failure is seen only if the truncate entry happens to be the first reference
after the checkpoint we're restarting from; which is a pretty unusual case but
of course not impossible. Fix by making truncate entries auto-create like
the other ones do. Per report and test case from Dharmendra Goyal.
when handed an invalidly-encoded pattern. The previous coding could get
into an infinite loop if pg_mb2wchar_with_len() returned a zero-length
string after we'd tested for nonempty pattern; which is exactly what it
will do if the string consists only of an incomplete multibyte character.
This led to either an out-of-memory error or a backend crash depending
on platform. Per report from Wiktor Wodecki.
a MIN or MAX aggregate call into an indexscan: the initplan is being made at
the current query nesting level and so we shouldn't increment query_level.
Though usually harmless, this mistake could lead to bogus "plan should not
reference subplan's variable" failures on complex queries. Per bug report
from David Sanchez i Gregori.
referencing table does not change the tuple's FK column(s), we don't bother
to check the PK table since the constraint was presumably already valid.
However, the check is still necessary if the tuple was inserted by our own
transaction, since in that case the INSERT trigger will conclude it need not
make the check (since its version of the tuple has been deleted). We got this
right for simple cases, but not when the insert and update are in different
subtransactions of the current top-level transaction; in such cases the FK
check would never be made at all. (Hence, problem dates back to 8.0 when
subtransactions were added --- it's actually the subtransaction version of a
bug fixed in 7.3.5.) Fix, and add regression test cases. Report and fix by
Affan Salman.
been broken since forever, but was not noticed because people seldom look
at raw parse trees. AFAIK, no impact on users except that debug_print_parse
might fail; but patch it all the way back anyway. Per report from Jeff Ross.
to prevent possible escalation of privilege. Provide new SECURITY
DEFINER functions with old behavior, but initially REVOKE ALL
from public for these functions. Per list discussion and design
proposed by Tom Lane.
we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out. While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
This is a Linux kernel bug that apparently exists in every extant kernel
version: sometimes shmctl() will fail with EIDRM when EINVAL is correct.
We were assuming that EIDRM indicates a possible conflict with pre-existing
backends, and refusing to start the postmaster when this happens. Fortunately,
there does not seem to be any case where Linux can legitimately return EIDRM
(it doesn't track shmem segments in a way that would allow that), so we can
get away with just assuming that EIDRM means EINVAL on this platform.
Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
we have not seen more reports, actually.