There was a race condition where the receiving pipe could be closed by the
child thread if the main thread was pre-empted before it got a chance to
create a new one, and the dispatch thread ran to completion during that time.
One symptom of this is that rows in pg_listener could be dropped under
heavy load.
Analysis and original patch by Radu Ilie, with some small
modifications by Magnus Hagander.
Since all current and foreseeable future command tags will be pure ASCII,
there is no need to do conversion on them. This saves a few cycles and also
avoids polluting otherwise-pristine subtransaction memory contexts, which
is the cause of the backend memory leak exhibited in bug #5302. (Someday
we'll probably want to have a better method of determining whether
subtransaction contexts need to be kept around, but today is not that day.)
Backpatch to 8.0. The cycle-shaving aspect of this would work in 7.4
too, but without subtransactions the memory-leak aspect doesn't apply,
so it doesn't seem worth touching 7.4.
You might think this is unnecessary since that interpreter is never used
to run code --- but it turns out that's wrong. As of Tcl 8.5, the "clock"
command (alone among builtin Tcl commands) is partially implemented by
loaded-on-demand Tcl code, which means that it fails if there's not
unknown-command support, and also that it's impossible to run it directly
in a safe interpreter. The way they get around the latter is that
Tcl_CreateSlave() automatically sets up an alias command that forwards any
execution of "clock" in a safe slave interpreter to its parent interpreter.
Thus, when attempting to execute "clock" in trusted pltcl, the command
actually executes in the "hold" interpreter, where it will fail if
unknown-command support hasn't been introduced by sourcing the standard
init.tcl script, which is done by Tcl_Init(). (This is a pretty dubious
design decision on the Tcl boys' part, if you ask me ... but they didn't.)
Back-patch all the way. It's not clear that anyone would try to use ancient
versions of pltcl with a recent Tcl, but it's not clear they wouldn't, either.
Also add a regression test using "clock", in branches that have regression
test support for pltcl.
Per recent trouble report from Kyle Bateman.
AbortTransaction or AbortSubTransaction, when trying to clean up after an
error that prevented (sub)transaction start from completing:
* access to TopTransactionResourceOwner that might not exist
* assert failure in AtEOXact_GUC, if AtStart_GUC not called yet
* assert failure or core dump in AfterTriggerEndSubXact, if
AfterTriggerBeginSubXact not called yet
Per testing by injecting elog(ERROR) at successive steps in StartTransaction
and StartSubTransaction. It's not clear whether all of these cases could
really occur in the field, but at least one of them is easily exposed by
simple stress testing, as per my accidental discovery yesterday.
the various disk-size-reporting functions will respond to query cancel
reasonably promptly even in very large databases. Per report from
Kevin Grittner.
after it's released its reference count for the cached plan. There are
code paths that might try to examine the plan list before noticing that
the portal is already in aborted state. Report and diagnosis by Tatsuo
Ishii, though this isn't exactly his proposed patch.
underlying catalog not only the index itself. Otherwise, if the cache
load process touches the catalog (which will happen for many though not
all of these indexes), we are locking index before parent table, which can
result in a deadlock against processes that are trying to lock them in the
normal order. Per today's failure on buildfarm member gothic_moth; it's
surprising the problem hadn't been identified before.
Back-patch to 8.2. Earlier releases didn't have the issue because they
didn't try to lock these indexes during load (instead assuming that they
couldn't change schema at all during multiuser operation).
occurring during a reload, such as query-cancel. Instead of zeroing out
an existing relcache entry and rebuilding it in place, build a new relcache
entry, then swap its contents with the old one, then free the new entry.
This avoids problems with code believing that a previously obtained pointer
to a cache entry must still reference a valid entry, as seen in recent
failures on buildfarm member jaguar. (jaguar is using CLOBBER_CACHE_ALWAYS
which raises the probability of failure substantially, but the problem
could occur in the field without that.) The previous design was okay
when it was made, but subtransactions and the ResourceOwner mechanism
make it unsafe now.
Also, make more use of the already existing rd_isvalid flag, so that we
remember that the entry requires rebuilding even if the first attempt fails.
Back-patch as far as 8.2. Prior versions have enough issues around relcache
reload anyway (due to inadequate locking) that fixing this one doesn't seem
worthwhile.
of the string". The previous coding treated only -1 that way, and would
produce an invalid result value for other negative values.
We ought to fix it so that 2-parameter bit substring() is a different C
function and the 3-parameter form throws error for negative length, but
that takes a pg_proc change which is impractical in the back branches;
and in any case somebody might be relying on -1 working this way.
So just do this as a back-patchable fix.
at least in some Windows versions, these functions are capable of returning
a failure indication without setting errno. That puts us into an infinite
loop if the previous value happened to be EINTR. Per report from Brendan
Hill.
Back-patch to 8.2. We could take it further back, but since this is only
known to be an issue on Windows and we don't support Windows before 8.2,
it does not seem worth the trouble.
PL/pgSQL function within an exception handler. Make sure we use the right
resource owner when we create the tuplestore to hold returned tuples.
Simplify tuplestore API so that the caller doesn't need to be in the right
memory context when calling tuplestore_put* functions. tuplestore.c
automatically switches to the memory context used when the tuplestore was
created. Tuplesort was already modified like this earlier. This patch also
removes the now useless MemoryContextSwitch calls from callers.
Report by Aleksei on pgsql-bugs on Dec 22 2009. Backpatch to 8.1, like
the previous patch that broke this.
index page split. This would result in index corruption, or even more likely
an error during WAL replay, if we were unlucky enough to crash during
end-of-recovery cleanup after having completed an incomplete GIST insertion.
Yoichi Hirai
correctly when the output bit width is wider than the given integer by
something other than a multiple of 8 bits.
This has been wrong since I first wrote that code for 8.0 :-(. Kudos to
Roman Kononov for being the first to notice, though I didn't use his
patch. Per bug #5237.
an allegedly immutable index function. It was previously recognized that
we had to prevent such a function from executing SET/RESET ROLE/SESSION
AUTHORIZATION, or it could trivially obtain the privileges of the session
user. However, since there is in general no privilege checking for changes
of session-local state, it is also possible for such a function to change
settings in a way that might subvert later operations in the same session.
Examples include changing search_path to cause an unexpected function to
be called, or replacing an existing prepared statement with another one
that will execute a function of the attacker's choosing.
The present patch secures VACUUM, ANALYZE, and CREATE INDEX/REINDEX against
these threats, which are the same places previously deemed to need protection
against the SET ROLE issue. GUC changes are still allowed, since there are
many useful cases for that, but we prevent security problems by forcing a
rollback of any GUC change after completing the operation. Other cases are
handled by throwing an error if any change is attempted; these include temp
table creation, closing a cursor, and creating or deleting a prepared
statement. (In 7.4, the infrastructure to roll back GUC changes doesn't
exist, so we settle for rejecting changes of "search_path" in these contexts.)
Original report and patch by Gurjeet Singh, additional analysis by
Tom Lane.
Security: CVE-2009-4136
attacks where an attacker would put <attack>\0<propername> in the field and
trick the validation code that the certificate was for <attack>.
This is a very low risk attack since it reuqires the attacker to trick the
CA into issuing a certificate with an incorrect field, and the common
PostgreSQL deployments are with private CAs, and not external ones. Also,
default mode in 8.4 does not do any name validation, and is thus also not
vulnerable - but the higher security modes are.
Backpatch all the way. Even though versions 8.3.x and before didn't have
certificate name validation support, they still exposed this field for
the user to perform the validation in the application code, and there
is no way to detect this problem through that API.
Security: CVE-2009-4034
in a subtransaction stays open even if the subtransaction is aborted, so
any temporary files related to it must stay alive as well. With the patch,
we use ResourceOwners to track open temporary files and don't automatically
close them at subtransaction end (though in the normal case temporary files
are registered with the subtransaction resource owner and will therefore be
closed).
At end of top transaction, we still check that there's no temporary files
marked as close-at-end-of-transaction open, but that's now just a debugging
cross-check as the resource owner cleanup should've closed them already.
This avoids a useless connection retry and complaint in the postmaster log
when receiving a connection from 8.5 or later libpq.
Backpatch in all supported branches, but of course *not* HEAD.
we have to tell Perl it can release its compiled copy of the function
text. Noted by Alexey Klyukin.
Back-patch to 8.2 --- the problem exists further back, but this patch
won't work without modification, and it's probably not worth the trouble.
be part of multixacts, so allocate a slot for each prepared transaction in
the "oldest member" array in multixact.c. On PREPARE TRANSACTION, transfer
the oldest member value from the current backends slot to the prepared xact
slot. Also save and recover the value from the 2pc state file.
The symptom of the bug was that after a transaction prepared, a shared lock
still held by the prepared transaction was sometimes ignored by other
transactions.
Fix back to 8.1, where both 2PC and multixact were introduced.
output filename if CSV logging was enabled and only one of the two possible
output files got rotated during a particular call (which would, in fact,
typically be the case during a size-based rotation). This would amount to
about MAXPGPATH (1KB) per rotation, and it's been there since the CSV
code was put in, so it's surprising that nobody noticed it before.
Per bug #5196 from Thomas Poindessous.
mainloop.c. This ensures that postgres_fe.h is read before including
any system headers, which is necessary to avoid problems on some platforms
where we make nondefault selections of feature macros for stdio.h or
other headers. We have had this policy for flex modules in the backend
for many years, but for some reason it was not applied to psql.
Per trouble report from Alexandra Roy and diagnosis by Albe Laurenz.
In VACUUM FULL, an interrupt after the initial transaction has been recorded
as committed can cause postmaster to restart with the following error message:
PANIC: cannot abort transaction NNNN, it was already committed
This problem has been reported many times.
In lazy VACUUM, an interrupt after the table has been truncated by
lazy_truncate_heap causes other backends' relcache to still point to the
removed pages; this can cause future INSERT and UPDATE queries to error out
with the following error message:
could not read block XX of relation 1663/NNN/MMMM: read only 0 of 8192 bytes
The window to this race condition is extremely narrow, but it has been seen in
the wild involving a cancelled autovacuum process.
The solution for both problems is to inhibit interrupts in both operations
until after the respective transactions have been committed. It's not a
complete solution, because the transaction could theoretically be aborted by
some other error, but at least fixes the most common causes of both problems.
Windows doesn't do signal processing like other platforms do. It never
really worked, but recent changes to the signal handling made it crash.
This fixes bug #4961. Patch by Fujii Masao.
In PLy_output(), when the elog() call in the TRY branch throws an exception
(this can happen when a statement timeout kicks in, for example), the
PyErr_SetString() call in the CATCH branch can cause a segfault, because the
Py_XDECREF(so) call before it releases memory that is still used by the sv
variable that PyErr_SetString() uses as argument, because sv points into
memory owned by so.
Backpatched back to 8.0, where this code was introduced.
I also threw in a couple of volatile declarations for variables that are used
before and after the TRY. I don't think they caused the crash that I
observed, but they could become issues.
The original coding ensured nbuckets and nbatch didn't exceed INT_MAX,
which while not insane on its own terms did nothing to protect subsequent
code like "palloc(nbatch * sizeof(BufFile *))". Since enormous join size
estimates might well be planner error rather than reality, it seems best
to constrain the initial sizes to be not more than work_mem/sizeof(pointer),
thus ensuring the allocated arrays don't exceed work_mem. We will allow
nbatch to get bigger than that during subsequent ExecHashIncreaseNumBatches
calls, but we should still guard against integer overflow in those palloc
requests. Per bug #5145 from Bernt Marius Johnsen.
Although the given test case only seems to fail back to 8.2, previous
releases have variants of this issue, so patch all supported branches.
that it's called within an AfterTriggerBeginQuery/AfterTriggerEndQuery pair.
The RI cascade triggers suppress that overhead on the assumption that they
are always run non-deferred, so it's possible to violate the condition if
someone mistakenly changes pg_trigger to mark such a trigger deferred.
We don't really care about supporting that, but throwing an error instead
of crashing seems desirable. Per report from Marcelo Costa.
pam_message array contains exactly one PAM_PROMPT_ECHO_OFF message.
Instead, deal with however many messages there are, and don't throw error
for PAM_ERROR_MSG and PAM_TEXT_INFO messages. This logic is borrowed from
openssh 5.2p1, which hopefully has seen more real-world PAM usage than we
have. Per bug #5121 from Ryan Douglas, which turned out to be caused by
the conv_proc being called with zero messages. Apparently that is normal
behavior given the combination of Linux pam_krb5 with MS Active Directory
as the domain controller.
Patch all the way back, since this code has been essentially untouched
since 7.4. (Surprising we've not heard complaints before.)
and SSPI athentication methods. While the old 2000 byte limit was more than
enough for Unix Kerberos implementations, tickets issued by Windows Domain
Controllers can be much larger.
Ian Turner
in CREATE OR REPLACE FUNCTION. The original code would update pg_shdepend
as if a new function was being created, even if it wasn't, with two bad
consequences: pg_shdepend might record the wrong owner for the function,
and any dependencies for roles mentioned in the function's ACL would be lost.
The fix is very easy: just don't touch pg_shdepend at all when doing a
function replacement.
Also update the CREATE FUNCTION reference page, which never explained
exactly what changes and doesn't change in a function replacement.
In passing, fix the CREATE VIEW reference page similarly; there's no
code bug there, but the docs didn't say what happens.
The original coding correctly noted that these aren't just redundancies
(they're effectively X IS NOT NULL, assuming = is strict). However, they
got treated that way if X happened to be in a single-member EquivalenceClass
already, which could happen if there was an ORDER BY X clause, for instance.
The simplest and most reliable solution seems to be to not try to process
such clauses through the EquivalenceClass machinery; just throw them back
for traditional processing. The amount of work that'd be needed to be
smarter than that seems out of proportion to the benefit.
Per bug #5084 from Bernt Marius Johnsen, and analysis by Andrew Gierth.
possibility of shared-inval messages causing a relcache flush while it tries
to fill in missing data in preloaded relcache entries. There are actually
two distinct failure modes here:
1. The flush could delete the next-to-be-processed cache entry, causing
the subsequent hash_seq_search calls to go off into the weeds. This is
the problem reported by Michael Brown, and I believe it also accounts
for bug #5074. The simplest fix is to restart the hashtable scan after
we've read any new data from the catalogs. It appears that pre-8.4
branches have not suffered from this failure, because by chance there were
no other catalogs sharing the same hash chains with the catalogs that
RelationCacheInitializePhase2 had work to do for. However that's obviously
pretty fragile, and it seems possible that derivative versions with
additional system catalogs might be vulnerable, so I'm back-patching this
part of the fix anyway.
2. The flush could delete the *current* cache entry, in which case the
pointer to the newly-loaded data would end up being stored into an
already-deleted Relation struct. As long as it was still deleted, the only
consequence would be some leaked space in CacheMemoryContext. But it seems
possible that the Relation struct could already have been recycled, in
which case this represents a hard-to-reproduce clobber of cached data
structures, with unforeseeable consequences. The fix here is to pin the
entry while we work on it.
In passing, also change RelationCacheInitializePhase2 to Assert that
formrdesc() set up the relation's cached TupleDesc (rd_att) with the
correct type OID and hasoids values. This is more appropriate than
silently updating the values, because the original tupdesc might already
have been copied into the catcache. However this part of the patch is
not in HEAD because it fails due to some questionable recent changes in
formrdesc :-(. That will be cleaned up in a subsequent patch.
of checkpoint. Although the checkpoint has been written to WAL at that point
already, so that all data is safe, and we'll retry removing the WAL segment at
the next checkpoint, if such a failure persists we won't be able to remove any
other old WAL segments either and will eventually run out of disk space. It's
better to treat the failure as non-fatal, and move on to clean any other WAL
segment and continue with any other end-of-checkpoint cleanup.
We don't normally expect any such failures, but on Windows it can happen with
some anti-virus or backup software that lock files without FILE_SHARE_DELETE
flag.
Also, the loop in pgrename() to retry when the file is locked was broken. If a
file is locked on Windows, you get ERROR_SHARE_VIOLATION, not
ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left
the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct
in some environment), and added ERROR_LOCK_VIOLATION to be consistent with
similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to
10s, on the grounds that since it's been broken, we've effectively had a
timeout of 0s and no-one has complained, so a smaller timeout is actually
closer to the old behavior. A longer timeout would mean that if recycling a
WAL file fails because it's locked for some reason, InstallXLogFileSegment()
will hold ControlFileLock for longer, potentially blocking other backends, so
a long timeout isn't totally harmless.
While we're at it, set errno correctly in pgrename().
Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c
changes would make sense on other platforms and thus on older versions as
well, but since there's no such locking issues on other platforms, it's not
worth it.
file handle on it, the file goes into "pending deletion" state where it
still shows up in directory listing, but isn't accessible otherwise. That
confuses RemoveOldXLogFiles(), making it think that the file hasn't been
archived yet, while it actually was, and it was deleted along with the .done
file.
Fix that by renaming the file with ".deleted" extension before deleting it.
Also check the return value of rename() and unlink(), so that if the removal
fails for any reason (e.g another process is holding the file locked), we
don't delete the .done file until the WAL file is really gone.
Backpatch to 8.2, which is the oldest version supported on Windows.
It seems the flex developers have decided to change yyleng from int to size_t.
This has already happened in the latest release of OS X, and will start
happening elsewhere once the next release of flex appears. Rather than trying
to divine how it's declared in any particular build, let's just remove the one
existing not-very-necessary external usage.
Back-patch to all supported branches; not so much because users in the field
are likely to care about building old branches with cutting-edge flex, as
to keep OSX-based buildfarm members from having problems with old branches.
to the Default timezone abbreviation set.
Back-port the the current file set to all branches that contain tznames.
This includes adding SGT to the Default set in pre-8.4 releases.
Joachim Wieland