I made multiple errors in commit 97532f7c29,
stemming mostly from failure to think about the available frequency data
as being element frequencies not value frequencies (so that occurrences of
different elements are not mutually exclusive). This led to sillinesses
such as estimating that "word" would match more rows than "word:*".
The choice to clamp to a minimum estimate of DEFAULT_TS_MATCH_SEL also
seems pretty ill-considered in hindsight, as it would frequently result in
an estimate much larger than the available data suggests. We do need some
sort of clamp, since a pattern not matching any of the MCELEMs probably
still needs a selectivity estimate of more than zero. I chose instead to
clamp to at least what a non-MCELEM word would be estimated as, preserving
the property that "word:*" doesn't get an estimate less than plain "word",
whether or not the word appears in MCELEM.
Per investigation of a gripe from Bill Martin, though I suspect that his
example case actually isn't even reaching the erroneous code.
Back-patch to 9.1 where this code was introduced.
validate_plperl_function() supposed that it could free an old
plperl_proc_desc struct immediately upon detecting that it was stale.
However, if a plperl function is called recursively, this could result
in deleting the struct out from under an outer invocation, leading to
misbehavior or crashes. Add a simple reference-count mechanism to
ensure that such structs are freed only when the last reference goes
away.
Per investigation of bug #7516 from Marko Tiikkaja. I am not certain
that this error explains his report, because he says he didn't have
any recursive calls --- but it's hard to see how else it could have
crashed right there. In any case, this definitely fixes some problems
in the area.
Back-patch to all active branches.
Investigation shows that some intermittent build failures in ecpg are the
result of a gmake bug that was reported quite some time ago:
http://savannah.gnu.org/bugs/?30653
Preventing parallel builds of the ecpg subdirectories seems to dodge the
bug. Per yesterday's pgsql-hackers discussion, there are some other things
in the subdirectory makefiles that seem rather unsafe for parallel builds
too, but there's little point in fixing them as long as we have to work
around a make bug.
Back-patch to 9.1; parallel builds weren't very well supported before
that anyway.
Commit 2cfb1c6f77 fixed some issues caused
by Python 3.3 choosing to iterate through dict entries in a different order
than before. But here's another one: the test cases adjusted here made two
bad entries in a dict and expected the one complained of would always be
the same.
Possibly this should be back-patched further than 9.2, but there seems
little point unless the earlier fix is too.
Create an internal function pqDropConnection that does the physical socket
close and cleans up closely-associated state. This removes a bunch of ad
hoc, not always consistent closure code. The ulterior motive is to have a
single place to wait for a spawned child backend to exit, but this seems
like good cleanup even if that never happens.
I went back and forth on whether to include "conn->status = CONNECTION_BAD"
in pqDropConnection's actions, but for the moment decided not to. Only a
minority of the call sites actually want that, and in any case it's
arguable that conn->status is slightly higher-level state, and thus not
part of this function's purview.
This fix removes an unnecessary incompatibility with the old behavior of
the unix_socket_directory parameter. Since pathnames with embedded spaces
are fairly popular on some platforms, the incompatibility could be
significant in practice. We'll still strip unquoted leading/trailing
spaces, however.
No docs update since the documentation already implied that it worked
like this.
Per bug #7514 from Murray Cumming.
If we call pg_ctl stop, the server might continue and thus
hold a log file for a short time after it has deleted its pid file,
(which is when pg_ctl will exit), and so a subsequent attempt to
open the log file might fail.
We therefore try to open it a few times, sleeping one second between
tries, to give the server time to exit.
This corrects an error that was observed on the buildfarm.
Backpatched to 9.2,
When the startup process restores a WAL file from the archive, it deletes
any old file with the same name and renames the new file in its place. On
Windows, however, when a file is deleted, it still lingers as long as a
process holds a file handle open on it. With cascading replication, a
walsender process can hold the old file open, so the rename() in the startup
process would fail. To fix that, rename the old file to a temporary name, to
make the original file name available for reuse, before deleting the old
file.
Give the correct name of the GUC parameter being complained of.
Also, emit a more suitable SQLSTATE (INVALID_PARAMETER_VALUE,
not the default INTERNAL_ERROR).
Gurjeet Singh, errcode adjustment by me
Call pg_dumpall using -f switch instead of redirection, to avoid
writing the output in text mode and generating spurious carriage
returns. Remove to carriage return ignoring hack introduced by
commit e442b0f0c6.
Backpatch to 9.2.
pg_upgrade opened the output from pg_dumpall in text mode and
wrote the split files in text mode. This caused unwanted eating
of intended carriage returns on input and production of spurious
carriage returns on output. To avoid this, open all these files
in binary mode. On non-Windows platforms, this change has no
effect.
Backpatch to 9.0. On 9.0 and 9.1, we also switch from redirecting
pg_dumpall's output to using pg_dumpall's -f switch, for the same
reason.
Perl, for some unaccountable reason, believes it's a good idea to reset
SIGFPE handling to SIG_IGN. Which wouldn't be a good idea even if it
worked; but on some platforms (Linux at least) it doesn't work at all,
instead resulting in forced process termination if the signal occurs.
Given the lack of other complaints, it seems safe to assume that Perl
never actually provokes SIGFPE and so there is no value in the setting
anyway. Hence, reset it to our normal handler after initializing Perl.
Report, analysis and patch by Andres Freund.
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
The cascading replication code assumed that the current RecoveryTargetTLI
never changes, but that's not true with recovery_target_timeline='latest'.
The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
to be protected by a lock. A less obvious consequence is that when a
cascading standby is connected, and the standby switches to a new target
timeline after scanning the archive, it will continue to stream WAL to the
cascading standby, but from a wrong file, ie. the file of the previous
timeline. For example, if the standby is currently streaming from the middle
of file 000000010000000000000005, and the timeline changes, the standby
will continue to stream from that file. However, the WAL on the new
timeline is in file 000000020000000000000005, so the standby sends garbage
from 000000010000000000000005 to the cascading standby, instead of the
correct WAL from file 000000020000000000000005.
This also fixes a related bug where a partial WAL segment is restored from
the archive and streamed to a cascading standby. The code assumed that when
a WAL segment is copied from the archive, it can immediately be fully
streamed to a cascading standby. However, if the segment is only partially
filled, ie. has the right size, but only N first bytes contain valid WAL,
that's not safe. That can happen if a partial WAL segment is manually copied
to the archive, or if a partial WAL segment is archived because a server is
started up on a new timeline within that segment. The cascading standby will
get confused if the WAL it received is not valid, and will get stuck until
it's restarted. This patch fixes that problem by not allowing WAL restored
from the archive to be streamed to a cascading standby until it's been
replayed, and thus validated.
Serializable Snapshot Isolation used for serializable transactions
depends on acquiring SIRead locks on all heap relation tuples which
are used to generate the query result, so that a later delete or
update of any of the tuples can flag a read-write conflict between
transactions. This is normally handled in heapam.c, with tuple level
locking. Since an index-only scan avoids heap access in many cases,
building the result from the index tuple, the necessary predicate
locks were not being acquired for all tuples in an index-only scan.
To prevent problems with tuple IDs which are vacuumed and re-used
while the transaction still matters, the xmin of the tuple is part of
the tag for the tuple lock. Since xmin is not available to the
index-only scan for result rows generated from the index tuples, it
is not possible to acquire a tuple-level predicate lock in such
cases, in spite of having the tid. If we went to the heap to get the
xmin value, it would no longer be an index-only scan. Rather than
prohibit index-only scans under serializable transaction isolation,
we acquire an SIRead lock on the page containing the tuple, when it
was not necessary to visit the heap for other reasons.
Backpatch to 9.2.
Kevin Grittner and Tom Lane
Each setup block is run as a single PQexec submission, and some
statements such as VACUUM cannot be combined with others in such a
block.
Backpatch to 9.2.
Kevin Grittner and Tom Lane
The same message is used in both pg_restore and pg_dump, and it's
confusing to output "restoring data for table xyz" when the user
is just doing a pg_dump.
socket location. Also, prevent putting the socket in the current
directory for pre-9.1 servers in live check and non-live check mode,
because pre-9.1 pg_ctl -w can't handle it.
Backpatch to 9.2.
pg_upgrade produces a platform-specific script to remove the old
directory, but on Windows it has not been making sure that the
paths it writes as arguments for rmdir and del use the backslash
path separator, which will cause these scripts to fail.
The fix is backpatched to Release 9.0.
This gets rid of a dangerous-looking use of the not-volatile XLogCtl
pointer in a couple of spinlock-protected sections, where the normal
coding rule is that you should only access shared memory through a
pointer-to-volatile. I think the risk is only hypothetical not actual,
since for there to be a bug the compiler would have to move the spinlock
acquire or release across the memcpy() call, which one sincerely hopes
it will not. Still, it looks cleaner this way.
Per comment from Daniel Farina and subsequent discussion.
When starting either an old or new postmaster, force it to place its Unix
socket in the current directory. This makes it even harder for accidental
connections to occur during pg_upgrade, and also works around some
scenarios where the default socket location isn't usable. (For example,
if the default location is something other than "/tmp", it might not exist
during "make check".)
When checking an already-running old postmaster, find out its actual socket
directory location from postmaster.pid, if possible. This dodges problems
with an old postmaster having a configured location different from the
default built into pg_upgrade's libpq. We can't find that out if the old
postmaster is pre-9.1, so also document how to cope with such scenarios
manually.
In support of this, centralize handling of the connection-related command
line options passed to pg_upgrade's subsidiary programs, such as pg_dump.
This should make future changes easier.
Bruce Momjian and Tom Lane
Formerly it would only show them for relkinds 'r' and 'f' (plain tables
and foreign tables). However, as of 9.2, views can also have reloptions,
namely security_barrier. The relkind restriction seems pointless and
not at all future-proof, so just print reloptions whenever there are any.
In passing, make some cosmetic improvements to the code that pulls the
"tableinfo" fields out of the PGresult.
Noted and patched by Dean Rasheed, with adjustment for all relkinds by me.
We can detect whether the planner top level is going to care at all about
cheap startup cost (it will only do so if query_planner's tuple_fraction
argument is greater than zero). If it isn't, we might as well discard
paths immediately whose only advantage over others is cheap startup cost.
This turns out to get rid of quite a lot of paths in complex queries ---
I saw planner runtime reduction of more than a third on one large query.
Since add_path isn't currently passed the PlannerInfo "root", the easiest
way to tell it whether to do this was to add a bool flag to RelOptInfo.
That's a bit redundant, since all relations in a given query level will
have the same setting. But in the future it's possible that we'd refine
the control decision to work on a per-relation basis, so this seems like
a good arrangement anyway.
Per my suggestion of a few months ago.
If a PlaceHolderVar contains a pulled-up LATERAL reference, its minimum
possible evaluation level might be higher in the join tree than its
original syntactic location. That in turn affects the ph_needed level for
any contained PlaceHolderVars (that is, those PHVs had better propagate up
the join tree at least to the evaluation level of the outer PHV). We got
this mostly right, but mark_placeholder_maybe_needed() failed to account
for the effect, and in consequence could leave the inner PHVs with
ph_may_need less than what their ultimate ph_needed value will be. That's
bad because it could lead to failure to select a join order that will allow
evaluation of the inner PHV at a valid location. Fix that, and add an
Assert that checks that we don't ever set ph_needed to more than
ph_may_need.