This fixes a problem which is not new, but with the advent of direct
foreign table modification in 0bf3ae88af,
it's somewhat more likely to be annoying than previously. So,
arrange for a local query cancelation to propagate to the remote side.
Michael Paquier, reviewed by Etsuro Fujita. Original report by
Thom Brown.
If there's a filter condition on either side of a full outer join,
it is neither correct to attach it to the join's ON clause nor to
throw it into the toplevel WHERE clause. Just don't push down the
join in that case.
To maximize the number of cases where we can still push down full
joins, push inner join conditions into the ON clause at the first
opportunity rather than postponing them to the top-level WHERE
clause. This produces nicer SQL, anyway.
This bug was introduced in e4106b2528.
Ashutosh Bapat, per report from Rajkumar Raghuwanshi.
The reverted changes were intended to force a choice of whether any
newly-added BufferGetPage() calls needed to be accompanied by a
test of the snapshot age, to support the "snapshot too old"
feature. Such an accompanying test is needed in about 7% of the
cases, where the page is being used as part of a scan rather than
positioning for other purposes (such as DML or vacuuming). The
additional effort required for back-patching, and the doubt whether
the intended benefit would really be there, have indicated it is
best just to rely on developers to do the right thing based on
comments and existing usage, as we do with many other conventions.
This change should have little or no effect on generated executable
code.
Motivated by the back-patching pain of Tom Lane and Robert Haas
Previously, querying the xmin column of a single postgres_fdw foreign
table fetched the tuple length, xmax the typmod, and cmin or cmax the
composite type OID of the tuple. However, when you queried several
such tables and the join got shipped to the remote side, these columns
ended up containing the remote values of the corresponding columns.
Both behaviors are rather unprincipled, the former for obvious reasons
and the latter because the remote values of these columns don't have
any local significance; our transaction IDs are in a different space
than those of the remote machine. Clean this up by setting all of
these fields to 0 in both cases. Also fix the handling of tableoid
to be sane.
Robert Haas and Ashutosh Bapat, reviewed by Etsuro Fujita.
When re-reading an update involving both an old tuple and a new tuple from
disk, reorderbuffer.c was careless about whether the new tuple is suitably
aligned for direct access --- in general, it isn't. We'd missed seeing
this in the buildfarm because the contrib/test_decoding tests exercise this
code path only a few times, and by chance all of those cases have old
tuples with length a multiple of 4, which is usually enough to make the
access to the new tuple's t_len safe. For some still-not-entirely-clear
reason, however, Debian's sparc build gets a bus error, as reported by
Christoph Berg; perhaps it's assuming 8-byte alignment of the pointer?
The lack of previous field reports is probably because you need all of
these conditions to trigger a crash: an alignment-picky platform (not
Intel), a transaction large enough to spill to disk, an update within
that xact that changes a primary-key field and has an odd-length old tuple,
and of course logical decoding tracing the transaction.
Avoid the alignment assumption by using memcpy instead of fetching t_len
directly, and add a test case that exposes the crash on picky platforms.
Back-patch to 9.4 where the bug was introduced.
Discussion: <20160413094117.GC21485@msg.credativ.de>
Logical messages, added in 3fe3511d05, during decoding failed to filter
messages emitted in other databases and messages emitted "under" a
replication origin the output plugin isn't interested in.
Add tests to verify that both types of filtering actually work. While
touching message.sql remove hunk obsoleted by d25379e.
Bump XLOG_PAGE_MAGIC because xl_logical_message changed and because
3fe3511d05 had omitted doing so. 3fe3511d05 additionally didn't bump
catversion, but 7a542700d has done so since.
Author: Petr Jelinek
Reported-By: Andres Freund
Discussion: 20160406142513.wotqy3ba3kanr423@alap3.anarazel.de
Rename this function to GenericXLogRegisterBuffer() to make it clearer
what it does, and leave room for other sorts of "register" actions in
future. Also, replace its "bool isNew" argument with an integer flags
argument, so as to allow adding more flags in future without an API
break.
Alexander Korotkov, adjusted slightly by me
Added to ensure that bloom index pages can be distinguished from other pages
by pg_filedump. Because there wasn't any public/production versions before,
it doesn't pay attention to any compatibility issues.
Per notice from Tom Lane
Pinning/Unpinning a buffer is a very frequent operation; especially in
read-mostly cache resident workloads. Benchmarking shows that in various
scenarios the spinlock protecting a buffer header's state becomes a
significant bottleneck. The problem can be reproduced with pgbench -S on
larger machines, but can be considerably worse for queries which touch
the same buffers over and over at a high frequency (e.g. nested loops
over a small inner table).
To allow atomic operations to be used, cram BufferDesc's flags,
usage_count, buf_hdr_lock, refcount into a single 32bit atomic variable;
that allows to manipulate them together using 32bit compare-and-swap
operations. This requires reducing MAX_BACKENDS to 2^18-1 (which could
be lifted by using a 64bit field, but it's not a realistic configuration
atm).
As not all operations can easily implemented in a lockfree manner,
implement the previous buf_hdr_lock via a flag bit in the atomic
variable. That way we can continue to lock the header in places where
it's needed, but can get away without acquiring it in the more frequent
hot-paths. There's some additional operations which can be done without
the lock, but aren't in this patch; but the most important places are
covered.
As bufmgr.c now essentially re-implements spinlocks, abstract the delay
logic from s_lock.c into something more generic. It now has already two
users, and more are coming up; there's a follupw patch for lwlock.c at
least.
This patch is based on a proof-of-concept written by me, which Alexander
Korotkov made into a fully working patch; the committed version is again
revised by me. Benchmarking and testing has, amongst others, been
provided by Dilip Kumar, Alexander Korotkov, Robert Haas.
On a large x86 system improvements for readonly pgbench, with a high
client count, of a factor of 8 have been observed.
Author: Alexander Korotkov and Andres Freund
Discussion: 2400449.GjM57CE0Yg@dinodell
Originally, this test created a 100000-row test table, which made it
run rather slowly compared to other contrib tests. Investigation with
gcov showed that we got no further improvement in code coverage after
the first 700 or so rows, making the large table 99% a waste of time.
Cut it back to 2000 rows to fix the runtime problem and still leave
some headroom for testing behaviors that may appear later.
A closer look at the gcov results showed that the main coverage
omissions in contrib/bloom occurred because the test never filled more
than one entry in the notFullPage array; which is unsurprising because
it exercised index cleanup only in the scenario of complete table
deletion, allowing every page in the index to become deleted rather
than not-full. Add testing that allows the not-full path to be
exercised as well.
Also, test the amvalidate function, because blvalidate.c had zero
coverage without that, and besides it's a good idea to check for
mistakes in the bloom opclass definitions.
That routine is dangerous, and unnecessary once we get rid of this
one caller.
In passing, fix failure to clean up temp memory context, or switch
back to caller's context, during slowest exit path.
This feature is controlled by a new old_snapshot_threshold GUC. A
value of -1 disables the feature, and that is the default. The
value of 0 is just intended for testing. Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect. The xmin associated with a transaction ID does
still protect dead tuples. A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.
This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.
This patch is a no-op patch which is intended to reduce the chances
of failures of omission once the functional part of the "snapshot
too old" patch goes in. It adds parameters for snapshot, relation,
and an enum to specify whether the snapshot age check needs to be
done for the page at this point. This initial patch passes NULL
for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the
third. The follow-on patch will change the places where the test
needs to be made.
Now indexes (but only B-tree for now) can contain "extra" column(s) which
doesn't participate in index structure, they are just stored in leaf
tuples. It allows to use index only scan by using single index instead
of two or more indexes.
Author: Anastasia Lubennikova with minor editorializing by me
Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery.
On-disk and binary in/out format of tsquery are backward compatible.
It has two side effect:
- change order for tsquery, so, users, who has a btree index over tsquery,
should reindex it
- less number of parenthesis in tsquery output, and tsquery becomes more
readable
Authors: Teodor Sigaev, Oleg Bartunov, Dmitry Ivanov
Reviewers: Alexander Korotkov, Artur Zakirov
API and mechanism to allow generic messages to be inserted into WAL that are
intended to be read by logical decoding plugins. This commit adds an optional
new callback to the logical decoding API.
Messages are either text or bytea. Messages can be transactional, or not, and
are identified by a prefix to allow multiple concurrent decoding plugins.
(Not to be confused with Generic WAL records, which are intended to allow crash
recovery of extensible objects.)
Author: Petr Jelinek and Andres Freund
Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs
Discussion: 5685F999.6010202@2ndquadrant.com
The restore() function assumed that the result of sprintf() with %e format
would necessarily contain an 'e', which is false: what if the supplied
number is an infinity or NaN? If that did happen, we'd get a
null-pointer-dereference core dump. The case appears impossible currently,
because seg_in() does not accept such values, and there are no seg-creating
functions that would create one. But it seems unwise to rely on it never
happening in future.
Quite aside from that, the code was pretty ugly: it relied on modifying a
static format string when it could use a "*" precision argument, and it
used strtok() entirely gratuitously, and it stripped off trailing spaces
by hand instead of just not asking for them to begin with.
Coverity noticed the potential null pointer dereference (though I wonder
why it didn't complain years ago, since this code is ancient).
Since this is just code cleanup and forestalling a hypothetical future
bug, there seems no need for back-patching.
The code was supposing that rd_amcache wouldn't disappear from under it
during a scan; which is wrong. Copy the data out of the relcache rather
than trying to reference it there.
Coverity complained about implicit sign-extension in the
BloomPageGetFreeSpace macro, probably because sizeOfBloomTuple isn't wide
enough for size calculations. No overflow is really possible as long as
maxoff and sizeOfBloomTuple are small enough to represent a realistic
situation, but it seems like a good idea to declare sizeOfBloomTuple as
Size not int32.
Add missing check on BloomPageAddItem() result, again from Coverity.
Avoid core dump due to not allocating so->sign array when
scan->numberOfKeys is zero. Also thanks to Coverity.
Use FLEXIBLE_ARRAY_MEMBER rather than declaring an array as size 1
when it isn't necessarily.
Very minor beautification of related code.
Unfortunately, none of the Coverity-detected mistakes look like they
could account for the remaining buildfarm unhappiness with this
module. It's barely possible that the FLEXIBLE_ARRAY_MEMBER mistake
does account for that, if it's enabling bogus compiler optimizations;
but I'm not terribly optimistic. We probably still have bugs to
find here.
Looking at result of buildfarm member jaguarundi it seems to me that
BloomOptions isn't inited sometime, but I don't see yet how it's possible.
Nevertheless, check of signature length's is missed, so, add
a limit of it. Also add missed GenericXLogAbort() in case of already
deleted page in vacuum + minor code refactoring.
Module provides new access method. It is actually a simple Bloom filter
implemented as pgsql's index. It could give some benefits on search
with large number of columns.
Module is a single way to test generic WAL interface committed earlier.
Author: Teodor Sigaev, Alexander Korotkov
Reviewers: Aleksander Alekseev, Michael Paquier, Jim Nasby
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.
Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
brin_page_type() and brin_metapage_info() did not enforce being called
by superuser, like other pageinspect functions that take bytea do.
Since they don't verify the passed page thoroughly, it is possible to
use them to read the server memory with a carefully crafted bytea value,
up to a file kilobytes from where the input bytea is located.
Have them throw errors if called by a non-superuser.
Report and initial patch: Andreas Seltenreich
Security: CVE-2016-3065
A join clause might mention multiple relations on either side, so it
need not be the case that a given joinrel's constituent relations are
all on one side of the join clause or all on the other.
Report by Rajkumar Raghuwanshi. Analysis and fix by Michael Paquier
and Ashutosh Bapat.
The two get_tle_by_resno() calls introduced by this commit lacked any
check for a NULL return, unlike any other calls of that function anywhere
in our tree. Coverity quite properly complained about it. Also fix a
misindented line in process_query_params(), which Coverity also complained
about on the grounds that the bad indentation suggested possible programmer
misinterpretation.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.
Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
Deprecated set_limit() is modified to use SetConfigOption() to set
similarity_threshold which is actually an instance of
pg_trgm.similarity_threshold GUC variable. Previous coding directly sets
similarity_threshold what could cause an inconsistency between states of
actual variable and GUC representation.
Per gripe from Tom Lane
Patch introduces a concept of similarity over string and just a word from
another string.
Version of extension is not changed because 1.2 was already introduced in 9.6
release cycle, so, there wasn't a public version.
Author: Alexander Korotkov, Artur Zakirov
Use GUC variable pg_trgm.similarity_threshold insead of
set_limit()/show_limit() which was introduced when defining GUC varuables
by modules was absent.
Author: Artur Zakirov
There's no reason for this function to do this for every other
attribute number and omit it for CTID, especially since
conversion_error_callback has code to handle that case. This seems
to be an oversight in commit e690b95150.
Etsuro Fujita