Per the C standard, the routine should be passed an int, with a value that's
representable as an unsigned char or EOF. Passing a signed char is wrong,
because a negative value is not representable as an unsigned char.
Unfortunately no compiler warns about that.
If walsender doesn't hear from the client for the time specified by
wal_sender_timeout, it will conclude the connection or client is dead, and
disconnect. When half of wal_sender_timeout has elapsed, it sends a ping
to the client, leaving it the remainig half of wal_sender_timeout to
respond. However, it only checked if half of wal_sender_timeout had elapsed
when it was about to sleep, so if it was busy sending WAL to the client for
long enough, it would not send the ping request in time. Then the client
would not know it needs to send a reply, and the walsender will disconnect
even though the client is still alive. Fix that.
Andres Freund, reviewed by Robert Haas, and some further changes by me.
Backpatch to 9.3. Earlier versions relied on the client to send the
keepalives on its own, and hence didn't have this problem.
We should allow this so that matviews can be referenced in UPDATE/DELETE
statements in READ COMMITTED isolation level. The requirement for that
is that a re-fetch by TID will see the same row version the query saw
earlier, which is true of matviews, so there's no reason for the
restriction. Per bug #9398.
Michael Paquier, after a suggestion by me
Explicitly reject infinity/NaN inputs, rather than just assuming that
something else will do it for us. Per buildfarm.
While at it, make some over-parenthesized and under-legible code
more readable.
This was already documented a few lines further down, but the comment
just beside the field declaration could be misleading. Per gripe
from Kyotaro Horiguchi.
Commit 6f37c08057 removed whitespace
from the SQL file but not the expected-output file, and commit
7e8db2dc42 changed the error message
without updating the expected outputs.
We were unlinking the permanent file, not the non-permanent one. But
since the stat collector already unlinks all permanent files on startup,
there was nothing for it to unlink. The non-permanent file remained in
place, and was copied to the permanent directory on shutdown, so in
effect no file was ever dropped.
Backpatch to 9.3, where the issue was introduced by commit 187492b6c2.
Before that, there were no per-database files and thus no file to drop
on DROP DATABASE.
Per report from Thom Brown.
Author: Tomáš Vondra
Instead of having read_post_opts() depend on the memory allocated for
the config file (which is now getting free'd), pg_strdup() for
post_opts and exec_path (similar to how it's being done elsewhere).
Noted by Thom Brown.
CheckRequiredParameterValues() should perform the checks if archive recovery
was requested, even if we are going to perform crash recovery first.
Reported by Kyotaro HORIGUCHI. Backpatch to 9.2, like the crash-then-archive
recovery mode.
When entering crash recovery followed by archive recovery, and the latest
checkpoint is a shutdown checkpoint, and there are no more WAL records to
replay before transitioning from crash to archive recovery, we would not
immediately allow read-only connections in hot standby mode even if we
could. That's because when starting from a shutdown checkpoint, we set
lastReplayedEndRecPtr incorrectly to the record before the checkpoint
record, instead of the checkpoint record itself. We don't run the redo
routine of the shutdown checkpoint record, but starting recovery from it
goes through the same motions, so it should be considered as replayed.
Reported by Kyotaro HORIGUCHI. All versions with hot standby are affected,
so backpatch to 9.0.
The new, small, free_readfile managed to have bug in it which could
cause it to try and free something it shouldn't, and fix the case
where it was being called with an invalid pointer leading to a
segfault.
Noted by Bruce, issues introduced and fixed by me.
This forces an input field containing the quoted null string to be
returned as a NULL. Without this option, only unquoted null strings
behave this way. This helps where some CSV producers insist on quoting
every field, whether or not it is needed. The option takes a list of
fields, and only applies to those columns. There is an equivalent
column-level option added to file_fdw.
Ian Barwick, with some tweaking by Andrew Dunstan, reviewed by Payal
Singh.
Author: Pavel Stěhule, editorialized somewhat by Álvaro Herrera
Reviewed-by: Tomáš Vondra, Marko Tiikkaja
With input from Fabrízio de Royes Mello, Jim Nasby
pg_class is a special case for CLUSTER and VACUUM FULL, so although
commit 3cff1879f8 caused these
operations to advance relfrozenxid and relminmxid for all other
tables, it did not provide the same benefit for pg_class. This
plugs that gap.
Andres Freund
I changed the loop in 9.3 to use "goto send_failure" instead of "break" on
errors, but I missed this one case. It was a relatively harmless bug: if
the flush fails once it will most likely fail again as soon as we try to
flush the output again. But it's a bug nevertheless.
Report and fix by Andres Freund.
This feature, building on previous commits, allows the write-ahead log
stream to be decoded into a series of logical changes; that is,
inserts, updates, and deletes and the transactions which contain them.
It is capable of handling decoding even across changes to the schema
of the effected tables. The output format is controlled by a
so-called "output plugin"; an example is included. To make use of
this in a real replication system, the output plugin will need to be
modified to produce output in the format appropriate to that system,
and to perform filtering.
Currently, information can be extracted from the logical decoding
system only via SQL; future commits will add the ability to stream
changes via walsender.
Andres Freund, with review and other contributions from many other
people, including Álvaro Herrera, Abhijit Menon-Sen, Peter Gheogegan,
Kevin Grittner, Robert Haas, Heikki Linnakangas, Fujii Masao, Abhijit
Menon-Sen, Michael Paquier, Simon Riggs, Craig Ringer, and Steve
Singer.
This option makes pg_dump, pg_dumpall and pg_restore inject an IF EXISTS
clause to each DROP command they emit. (In pg_dumpall, the clause is
not added to individual objects drops, but rather to the CREATE DATABASE
commands, as well as CREATE ROLE and CREATE TABLESPACE.)
This allows for a better user dump experience when using --clean in case
some objects do not already exist. Per bug #7873 by Dave Rolsky.
Author: Pavel Stěhule
Reviewed-by: Jeevan Chalke, Álvaro Herrera, Josh Kupershmidt
Because of the new SLOT clause in the START_REPLICATION command, it's
possible for the command to end up too long for the old maximum buffer
length.
Andres Freund
Additional non-security issues/improvements spotted by Coverity.
In backend/libpq, no sense trying to protect against port->hba being
NULL after we've already dereferenced it in the switch() statement.
Prevent against possible overflow due to 32bit arithmitic in
basebackup throttling (not yet released, so no security concern).
Remove nonsensical check of array pointer against NULL in procarray.c,
looks to be a holdover from 9.1 and earlier when there were pointers
being used but now it's just an array.
Remove pointer check-against-NULL in tsearch/spell.c as we had already
dereferenced it above (in the strcmp()).
Remove dead code from adt/orderedsetaggs.c, isnull is checked
immediately after each tuplesort_getdatum() call and if true we return,
so no point checking it again down at the bottom.
Remove recently added minor error-condition memory leak in pg_regress.
A number of issues were identified by the Coverity scanner and are
addressed in this patch. None of these appear to be security issues
and many are mostly cosmetic changes.
Short comments for each of the changes follows.
Correct the semi-colon placement in be-secure.c regarding SSL retries.
Remove a useless comparison-to-NULL in proc.c (value is dereferenced
prior to this check and therefore can't be NULL).
Add checking of chmod() return values to initdb.
Fix a couple minor memory leaks in initdb.
Fix memory leak in pg_ctl- involves free'ing the config file contents.
Use an int to capture fgetc() return instead of an enum in pg_dump.
Fix minor memory leaks in pg_dump.
(note minor change to convertOperatorReference()'s API)
Check fclose()/remove() return codes in psql.
Check fstat(), find_my_exec() return codes in psql.
Various ECPG memory leak fixes.
Check find_my_exec() return in ECPG.
Explicitly ignore pqFlush return in libpq error-path.
Change PQfnumber() to avoid doing an strdup() when no changes required.
Remove a few useless check-against-NULL's (value deref'd beforehand).
Check rmtree(), malloc() results in pg_regress.
Also check get_alternative_expectfile() return in pg_regress.
The regex code didn't have any provision for query cancel; which is
unsurprising given its non-Postgres origin, but still problematic since
some operations can take a long time. Introduce a callback function to
check for a pending query cancel or session termination request, and
call it in a couple of strategic spots where we can make the regex code
exit with an error indicator.
If we ever actually split out the regex code as a standalone library,
some additional work will be needed to let the cancel callback function
be specified externally to the library. But that's straightforward
(certainly so by comparison to putting the locale-dependent character
classification logic on a similar arms-length basis), and there seems
no need to do it right now.
A bigger issue is that there may be more places than these two where
we need to check for cancels. We can always add more checks later,
now that the infrastructure is in place.
Since there are known examples of not-terribly-long regexes that can
lock up a backend for a long time, back-patch to all supported branches.
I have hopes of fixing the known performance problems later, but adding
query cancel ability seems like a good idea even if they were all fixed.
Commit abf5c5c9a4 added a bogus while-
statement after the for(;;)-loop. It went unnoticed in testing, because
it was dead code.
Report by KONDO Mitsumasa. Backpatch to 9.3. The commit that introduced
this was also applied to 9.2, but not the bogus while-loop part, because
the code in 9.2 looks quite different.
A new MAX_RATE option allows imposing a limit to the network transfer
rate from the server side. This is useful to limit the stress that
taking a base backup has on the server.
pg_basebackup is now able to specify a value to the server, too.
Author: Antonin Houska
Patch reviewed by Stefan Radomski, Andres Freund, Zoltán Böszörményi,
Fujii Masao, and Álvaro Herrera.
Testing convert_to(..., 'ISO-8859-1') fails if there isn't a conversion
function available from the database encoding to ISO-8859-1. This has
been broken since day one, but the breakage was hidden by
pg_do_encoding_conversion's failure to complain, up till commit
49c817eab7.
Since the data being converted in this test is plain ASCII, no actual
conversion need happen (and if it did, it would prove little about citext
anyway). So that we still have some code coverage of the convert() family
of functions, let's switch to using convert_from, with SQL_ASCII as the
specified source encoding. Per buildfarm.
We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
WAL replay of a tuple-lock operation, which is incorrect when the tuple
is already updated.
Back-patch to 9.3. The clearing of both header elements was there
previously, but since no update could be present on a tuple that was
being locked, it was harmless.
Bug reported by Peter Geoghegan and Greg Stark in
CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com and
CAM-w4HPTOeMT4KP0OJK+mGgzgcTOtLRTvFZyvD0O4aH-7dxo3Q@mail.gmail.com
respectively; diagnosis by Andres Freund.
json_to_record() depends on get_call_result_type() for the tuple
descriptor of the record that should be returned, but in some cases
that cannot be determined. Add a guard to check if the tuple
descriptor has been properly resolved, similar to other callers of
get_call_result_type().
Also add guard for two other callers of get_call_result_type() in
jsonfuncs.c. Although json_to_record() is the only actual bug, it's a
good idea to follow convention.
If there are lots of uncommitted tuples at the end of the index range,
get_actual_variable_range() ends up fetching each one and doing an MVCC
visibility check on it, until it finally hits a visible tuple. This is
bad enough in isolation, considering that we don't need an exact answer
only an approximate one. But because the tuples are not yet committed,
each visibility check does a TransactionIdIsInProgress() test, which
involves scanning the ProcArray. When multiple sessions do this
concurrently, the ensuing contention results in horrid performance loss.
20X overall throughput loss on not-too-complicated queries is easy to
demonstrate in the back branches (though someone's made it noticeably
less bad in HEAD).
We can dodge the problem fairly effectively by using SnapshotDirty rather
than a normal MVCC snapshot. This will cause the index probe to take
uncommitted tuples as good, so that we incur only one tuple fetch and test
even if there are many such tuples. The extent to which this degrades the
estimate is debatable: it's possible the result is actually a more accurate
prediction than before, if the endmost tuple has become committed by the
time we actually execute the query being planned. In any case, it's not
very likely that it makes the estimate a lot worse.
SnapshotDirty will still reject tuples that are known committed dead, so
we won't give bogus answers if an invalid outlier has been deleted but not
yet vacuumed from the index. (Because btrees know how to mark such tuples
dead in the index, we shouldn't have a big performance problem in the case
that there are many of them at the end of the range.) This consideration
motivates not using SnapshotAny, which was also considered as a fix.
Note: the back branches were using SnapshotNow instead of an MVCC snapshot,
but the problem and solution are the same.
Per performance complaints from Bartlomiej Romanski, Josh Berkus, and
others. Back-patch to 9.0, where the issue was introduced (by commit
40608e7f94).
- Write HIGH:MEDIUM instead of DEFAULT:!LOW:!EXP for clarity.
- Order 3DES last to work around inappropriate OpenSSL default.
- Remove !MD5 and @STRENGTH, because they are irrelevant.
- Add clarifying documentation.
Effectively, the new default is almost the same as the old one, but it
is arguably easier to understand and modify.
Author: Marko Kreen <markokr@gmail.com>