The general design of memory management in Postgres is that intermediate
results computed by an expression are not freed until the end of the tuple
cycle. For expression indexes, ANALYZE has to re-evaluate each expression
for each of its sample rows, and it wasn't bothering to free intermediate
results until the end of processing of that index. This could lead to very
substantial leakage if the intermediate results were large, as in a recent
example from Jakub Ouhrabka. Fix by doing ResetExprContext for each sample
row. This necessitates adding a datumCopy step to ensure that the final
expression value isn't recycled too. Some quick testing suggests that this
change adds at worst about 10% to the time needed to analyze a table with
an expression index; which is annoying, but seems a tolerable price to pay
to avoid unexpected out-of-memory problems.
Back-patch to all supported branches.
length stored in the line pointer the same way it's calculated in the normal
heap_insert() codepath. As noted by Jeff Davis, the length stored by
raw_heap_insert() included padding but the one stored by the normal codepath
did not. While the mismatch seems to be harmless, inconsistency isn't good,
and the normal codepath has received a lot more testing over the years.
Backpatch to 8.3 where the heap rewrite code was introduced.
The original coding in FileClose() reset the file-is-temp flag before
unlinking the file, so that if control came back through due to an error,
it wouldn't try to unlink the file twice. This was correct when written,
but when the log_temp_files feature was added, the logging action was put
in between those two steps. An error occurring during the logging action
--- such as a query cancel --- would result in the unlink not getting done
at all, as in recent report from Michael Glaesemann.
To fix this, make sure that we do both the stat and the unlink before doing
anything that could conceivably CHECK_FOR_INTERRUPTS. There is a judgment
call here, which is which log message to emit first: if you can see only
one, which should it be? I chose to log unlink failure at the risk of
losing the log_temp_files log message --- after all, if the unlink does
fail, the temp file is still there for you to see.
Back-patch to all versions that have log_temp_files. The code was OK
before that.
get_database_list was uselessly allocating its output data, along some
created along the way, in a permanent memory context. This didn't
matter when autovacuum was a single, short-lived process, but now that
the launcher is permanent, it shows up as a permanent leak.
To fix, make get_database list allocate its output data in the caller's
context, which is in charge of freeing it when appropriate; and the
memory leaked by heap_beginscan et al is allocated in a throwaway
transaction context.
Per recent investigation, the register stack can grow faster than the
regular stack depending on compiler and choice of options. To avoid
crashes we must check both stacks in check_stack_depth().
Back-patch to all supported versions.
Some buildfarm members fail the test with the original depth of 10 levels,
apparently because they are running at the minimum max_stack_depth setting
of 100kB and using ~ 10k per recursion level. While it might be
interesting to try to figure out why they're eating so much stack, it isn't
likely that any fix for that would be back-patchable. So just change the
test to recurse only 5 levels. The extra levels don't prove anything
correctness-wise anyway.
It was reporting that these were fully indexed (hence cheap), when of
course they're the exact opposite of that. I'm not certain if the case
would arise in practice, since a clauseless semijoin is hard to produce
in SQL, but if it did happen we'd make some dumb decisions.
We failed to record any dependency on the underlying table for an index
declared like "create index i on t (foo(t.*))". This would create trouble
if the table were dropped without previously dropping the index. To fix,
simplify some overly-cute code in index_create(), accepting the possibility
that sometimes the whole-table dependency will be redundant. Also document
this hazard in dependency.c. Per report from Kevin Grittner.
In passing, prevent a core dump in pg_get_indexdef() if the index's table
can't be found. I came across this while experimenting with Kevin's
example. Not sure it's a real issue when the catalogs aren't corrupt, but
might as well be cautious.
Back-patch to all supported versions.
rather than 0/0, so that we can safely use 0/0 as an invalid value. This is a
more future-proof fix for the corner-case bug in streaming replication that
was fixed yesterday. We had a similar corner-case bug with log/seg 0/0 back in
February as well. Avoiding 0/0 as a valid value should prevent bugs like that
in the future. Per Tom Lane's idea.
Back-patch to 9.0. Since this only affects bootstrapping, it makes no
difference to existing installations. We don't need to worry about the
bug in existing installations, because if you've managed to get past the
initial base backup already, you won't hit the bug in the future either.
streaming replication. We used log/seg 0/0 to indicate that no WAL segments
have been removed since startup, but 0/0 is a valid value for the very first
WAL segment after initdb. To make that disambiguous, store
(latest removed WAL segment + 1) in the global variable.
Per report from Matt Chesler, also reproduced by Greg Smith.
In general, expression execution state trees aren't re-entrantly usable,
since functions can store private state information in them.
For efficiency reasons, plpgsql tries to cache and reuse state trees for
"simple" expressions. It can get away with that most of the time, but it
can fail if the state tree is dirty from a previous failed execution (as
in an example from Alvaro) or is being used recursively (as noted by me).
Fix by tracking whether a state tree is in use, and falling back to the
"non-simple" code path if so. This results in a pretty considerable speed
hit when the non-simple path is taken, but the available alternatives seem
even more unpleasant because they add overhead in the simple path. Per
idea from Heikki.
Back-patch to all supported branches.
Original patch failed to include new exclusive states in a switch that
needed to include them; and also was guilty of very fuzzy thinking
about how to handle error cases. Per bug #5729 from Alan Choi.
WAL-logged. Make the notice about the lack of WAL-logging more visible by
making it a <caution>. Also remove the false statement from hot standby
caveats section that hash indexes are not used during hot standby.
that WAL file containing the checkpoint redo-location can be found. This
avoids making the cluster irrecoverable if the redo location is in an earlie
WAL file than the checkpoint record.
Report, analysis and patch by Jeff Davis, with small changes by me.
This avoids a possible crash when inlining a SRF whose argument list
contains a reference to an inline-able user function. The crash is quite
reproducible with CLOBBER_FREED_MEMORY enabled, but would be less certain
in a production build. Problem introduced in 9.0 by the named-arguments
patch, which requires invoking eval_const_expressions() before we can try
to inline a SRF. Per report from Brendan Jurd.
as a variable or column name, and it's not reserved in recent versions of
the SQL spec either. This became particularly annoying in 9.0, before that
PL/pgSQL replaced variable names in queries with parameter markers, so
it was possible to use OFF and many other backend parser keywords as
variable names. Because of that, backpatch to 9.0.
outside a transaction.
This repairs brain fade in my patch of 2009-08-30: the reason we had been
storing oldest-database name, not OID, in ShmemVariableCache was of course
to avoid having to do a catalog lookup at times when it might be unsafe.
This error explains why Aleksandr Dushein is having trouble getting out of
an XID wraparound state in bug #5718, though not how he got into that state
in the first place. I suspect pg_upgrade is at fault there.
The trick is to not try to build executables directly from .c files,
but to always build the intermediate .o files. For obscure reasons,
Darwin's version of gcc will leave debug cruft behind in the first
case but not the second. Per complaint from Robert Haas.
A couple of places in the planner need to generate whole-row Vars, and were
cutting corners by setting vartype = RECORDOID in the Vars, even in cases
where there's an identifiable named composite type for the RTE being
referenced. While we mostly got away with this, it failed when there was
also a parser-generated whole-row reference to the same RTE, because the
two Vars weren't equal() due to the difference in vartype. Fix by
providing a subroutine the planner can call to generate whole-row Vars
the same way the parser does.
Per bug #5716 from Andrew Tipton. Back-patch to 9.0 where one of the bogus
calls was introduced (the other one is new in HEAD).
Look only at the non-localized part of the output from "vcbuild /?",
which is used to determine the version of Visual Studio in use. Different
languages seem to localize different amounts of the string, but we assume
the part "Microsoft Visual C++" won't be modified.
Corrupt RADIUS responses were treated as errors and not ignored
(which the RFC2865 states they should be). This meant that a
user with unfiltered access to the network of the PostgreSQL
or RADIUS server could send a spoofed RADIUS response
to the PostgreSQL server causing it to reject a valid login,
provided the attacker could also guess (or brute-force) the
correct port number.
Fix is to simply retry the receive in a loop until the timeout
has expired or a valid (signed by the correct RADIUS server)
packet arrives.
Reported by Alan DeKok in bug #5687.
The GRANT reference page failed to mention that the USAGE privilege
allows modifying associated user mappings, although this was already
documented on the CREATE/ALTER/DROP USER MAPPING pages.
The original coding was quite sloppy about handling the case where
XLogReadBuffer fails (because the page has since been deleted). This
would result in either "bad buffer id: 0" or an Assert failure during
replay, if indeed the page were no longer there. In a couple of places
it also neglected to check whether the change had already been applied,
which would probably result in corrupted index contents. I believe that
bug #5703 is an instance of the first problem. These issues could show up
without replication, but only if you were unfortunate enough to crash
between modification of a GIN index and the next checkpoint.
Back-patch to 8.2, which is as far back as GIN has WAL support.
In particular, we are now more explicit about the fact that you may need
wal_sync_method=fsync_writethrough for crash-safety on some platforms,
including MaxOS X. There's also now an explicit caution against assuming
that the default setting of wal_sync_method is either crash-safe or best
for performance.
In versions 8.2 and up, the grammar allows attaching ORDER BY, LIMIT,
FOR UPDATE, or WITH to VALUES, and hence to INSERT ... VALUES. But the
special-case code for VALUES in transformInsertStmt() wasn't expecting any
of those, and just ignored them, leading to unexpected results. Rather
than complicate the special-case path, just ensure that the presence of any
of those clauses makes us treat the query as if it had a general SELECT.
Per report from Hitoshi Harada.