Commit 3151f16e18 was intended to be
a commit of a patch from Ashutosh Bapat, but instead I mistakenly
committed an earlier version from Michael Paquier (because both
patches were submitted with the same filename, and I confused them).
Michael's patch fixes the crash but doesn't actually implement the
correct test.
Repair the incorrect logic, and also expand the comments considerably
so that this is all more clear.
Ashutosh Bapat and Robert Haas
First, even if we cancel a query, we still have to roll back the
containing transaction; otherwise, the session will be left in a
failed transaction state.
Second, we need to support canceling queries whe aborting a
subtransaction as well as when aborting a toplevel transaction.
Etsuro Fujita, reviewed by Michael Paquier
This fixes a problem which is not new, but with the advent of direct
foreign table modification in 0bf3ae88af,
it's somewhat more likely to be annoying than previously. So,
arrange for a local query cancelation to propagate to the remote side.
Michael Paquier, reviewed by Etsuro Fujita. Original report by
Thom Brown.
If there's a filter condition on either side of a full outer join,
it is neither correct to attach it to the join's ON clause nor to
throw it into the toplevel WHERE clause. Just don't push down the
join in that case.
To maximize the number of cases where we can still push down full
joins, push inner join conditions into the ON clause at the first
opportunity rather than postponing them to the top-level WHERE
clause. This produces nicer SQL, anyway.
This bug was introduced in e4106b2528.
Ashutosh Bapat, per report from Rajkumar Raghuwanshi.
Previously, querying the xmin column of a single postgres_fdw foreign
table fetched the tuple length, xmax the typmod, and cmin or cmax the
composite type OID of the tuple. However, when you queried several
such tables and the join got shipped to the remote side, these columns
ended up containing the remote values of the corresponding columns.
Both behaviors are rather unprincipled, the former for obvious reasons
and the latter because the remote values of these columns don't have
any local significance; our transaction IDs are in a different space
than those of the remote machine. Clean this up by setting all of
these fields to 0 in both cases. Also fix the handling of tableoid
to be sane.
Robert Haas and Ashutosh Bapat, reviewed by Etsuro Fujita.
Commit fbe5a3fb73 accidentally changed
this behavior; put things back the way they were, and add some
regression tests.
Report by Andres Freund; patch by Ashutosh Bapat, with a bit of
kibitzing by me.
A join clause might mention multiple relations on either side, so it
need not be the case that a given joinrel's constituent relations are
all on one side of the join clause or all on the other.
Report by Rajkumar Raghuwanshi. Analysis and fix by Michael Paquier
and Ashutosh Bapat.
The two get_tle_by_resno() calls introduced by this commit lacked any
check for a NULL return, unlike any other calls of that function anywhere
in our tree. Coverity quite properly complained about it. Also fix a
misindented line in process_query_params(), which Coverity also complained
about on the grounds that the bad indentation suggested possible programmer
misinterpretation.
postgres_fdw can now sent an UPDATE or DELETE statement directly to
the foreign server in simple cases, rather than sending a SELECT FOR
UPDATE statement and then updating or deleting rows one-by-one.
Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro
Horiguchi, Albe Laurenz, Thom Brown, and me.
There's no reason for this function to do this for every other
attribute number and omit it for CTID, especially since
conversion_error_callback has code to handle that case. This seems
to be an oversight in commit e690b95150.
Etsuro Fujita
Although the default choice of rel->reltarget should typically be
sufficient for scan or join paths, it's not at all sufficient for the
purposes PathTargets were invented for; in particular not for
upper-relation Paths. So break API compatibility by adding a PathTarget
argument to create_foreignscan_path(). To ease updating of existing
code, accept a NULL value of the argument as selecting rel->reltarget.
In commit 19a541143a I did not make PathTarget a subtype of Node,
and embedded a RelOptInfo's reltarget directly into it rather than having
a separately-allocated Node. In hindsight that was misguided
micro-optimization, enabled by the fact that at that point we didn't have
any Paths with custom PathTargets. Now that PathTarget processing has
been fleshed out some more, it's easier to see that it's better to have
PathTarget as an indepedent Node type, even if it does cost us one more
palloc to create a RelOptInfo. So change it while we still can.
This commit just changes the representation, without doing anything more
interesting than that.
In commit 1d97c19a0f and later c1d9579dd8, we extended
pull_var_clause's API by adding enum-type arguments. That's sort of a pain
to maintain, though, because it means every time we add a new behavior we
must touch every last one of the call sites, even if there's a reasonable
default behavior that most of them could use. Let's switch over to using a
bitmask of flags, instead; that seems more maintainable and might save a
nanosecond or two as well. This commit changes no behavior in itself,
though I'm going to follow it up with one that does add a new behavior.
In passing, remove flatten_tlist(), which has not been used since 9.1
and would otherwise need the same API changes.
Removing these enums means that optimizer/tlist.h no longer needs to
depend on optimizer/var.h. Changing that caused a number of C files to
need addition of #include "optimizer/var.h" (probably we can thank old
runs of pgrminclude for that); but on balance it seems like a good change
anyway.
Commit ccd8f97922 gave us the ability to
request that the remote side sort the data, and, later, commit
e4106b2528 gave us the ability to
request that the remote side perform the join for us rather than doing
it locally. But we could not do both things at the same time: a
remote SQL query that had an ORDER BY clause would never be a join.
This commit adds that capability.
Ashutosh Bapat, reviewed by me.
Previously, we included NULLS FIRST when appropriate but relied on the
default behavior to be NULLS LAST. This is, however, not true for a
sort in descending order and seems like a fragile assumption anyway.
Report by Rajkumar Raghuwanshi. Patch by Ashutosh Bapat. Review
comments from Michael Paquier and Tom Lane.
list_concat(list_concat(a, b), c) destructively changes both a and b;
to avoid such perils, copy lists of remote_conds before incorporating
them into larger lists via list_concat().
Ashutosh Bapat, per a report from Etsuro Fujita
Up to now, there's been an assumption that all Paths for a given relation
compute the same output column set (targetlist). However, there are good
reasons to remove that assumption. For example, an indexscan on an
expression index might be able to return the value of an expensive function
"for free". While we have the ability to generate such a plan today in
simple cases, we don't have a way to model that it's cheaper than a plan
that computes the function from scratch, nor a way to create such a plan
in join cases (where the function computation would normally happen at
the topmost join node). Also, we need this so that we can have Paths
representing post-scan/join steps, where the targetlist may well change
from one step to the next. Therefore, invent a "struct PathTarget"
representing the columns we expect a plan step to emit. It's convenient
to include the output tuple width and tlist evaluation cost in this struct,
and there will likely be additional fields in future.
While Path nodes that actually do have custom outputs will need their own
PathTargets, it will still be true that most Paths for a given relation
will compute the same tlist. To reduce the overhead added by this patch,
keep a "default PathTarget" in RelOptInfo, and allow Paths that compute
that column set to just point to their parent RelOptInfo's reltarget.
(In the patch as committed, actually every Path is like that, since we
do not yet have any cases of custom PathTargets.)
I took this opportunity to provide some more-honest costing of
PlaceHolderVar evaluation. Up to now, the assumption that "scan/join
reltargetlists have cost zero" was applied not only to Vars, where it's
reasonable, but also PlaceHolderVars where it isn't. Now, we add the eval
cost of a PlaceHolderVar's expression to the first plan level where it can
be computed, by including it in the PathTarget cost field and adding that
to the cost estimates for Paths. This isn't perfect yet but it's much
better than before, and there is a way forward to improve it more. This
costing change affects the join order chosen for a couple of the regression
tests, changing expected row ordering.
If we've got a relatively straightforward join between two tables,
this pushes that join down to the remote server instead of fetching
the rows for each table and performing the join locally. Some cases
are not handled yet, such as SEMI and ANTI joins. Also, we don't
yet attempt to create presorted join paths or parameterized join
paths even though these options do get tried for a base relation
scan. Nevertheless, this seems likely to be a very significant win
in many practical cases.
Shigeru Hanada and Ashutosh Bapat, reviewed by Robert Haas, with
additional review at various points by Tom Lane, Etsuro Fujita,
KaiGai Kohei, and Jeevan Chalke.
deparseReturningList ended up adding up RETURNING NULL to the code, but
code elsewhere saw an empty list of attributes and concluded that it
should not expect tuples from the remote side.
Etsuro Fujita and Robert Haas, reviewed by Thom Brown
Remove duplicate assignment. This part by Ashutosh Bapat.
Remove now-obsolete comment. This part by me, although the pending
join pushdown patch does something similar, and for the same reason:
there's no reason to keep two lists of the things in the fdw_private
structure that have to be kept in sync with each other.
The default fetch size of 100 rows might not be right in every
environment, so allow users to configure it.
Corey Huinker, reviewed by Kyotaro Horiguchi, Andres Freund, and me.
The code that generates a complete SQL query for a given foreign relation
was repeated in two places, and they didn't quite agree: the EXPLAIN case
left out the locking clause. Centralize the code so we get the same
behavior everywhere, and adjust calling conventions and which functions
are static vs. extern accordingly . Centralize the code so we get the same
behavior everywhere, and adjust calling conventions and which functions
are static vs. extern accordingly.
Ashutosh Bapat, reviewed and slightly adjusted by me.
listForeignTables' invocation of processSQLNamePattern did not match up
with the other ones that handle potentially-schema-qualified names; it
failed to make use of pg_table_is_visible() and also passed the name
arguments in the wrong order. Bug seems to have been aboriginal in commit
0d692a0dc9. It accidentally sort of worked as long as you didn't
inquire too closely into the behavior, although the silliness was later
exposed by inconsistencies in the test queries added by 59efda3e50
(which I probably should have questioned at the time, but didn't).
Per bug #13899 from Reece Hart. Patch by Reece Hart and Tom Lane.
Back-patch to all affected branches.
The upcoming patch to allow join pushdown in postgres_fdw needs to use
this code multiple times, which requires moving it to deparse.c. That
seems like a good idea anyway, so do that now both on general principle
and to simplify the future patch.
Inspired by a patch by Shigeru Hanada and Ashutosh Bapat, but I did
it a little differently than what that patch did.
Previously, postgres_fdw's connection cache was keyed by user OID and
server OID, but this can lead to multiple connections when it's not
really necessary. In particular, if all relevant users are mapped to
the public user mapping, then their connection options are certainly
the same, so one connection can be used for all of them.
While we're cleaning things up here, drop the "server" argument to
GetConnection(), which isn't really needed. This saves a few cycles
because callers no longer have to look this up; the function itself
does, but only when establishing a new connection, not when reusing
an existing one.
Ashutosh Bapat, with a few small changes by me.
postgresImportForeignSchema pays attention to IMPORT's EXCEPT and LIMIT TO
options, but only as an efficiency hack, not for correctness' sake. The
FDW documentation does explain that, but someone using postgres_fdw.c
as a coding guide might not remember it, so let's add a comment here.
Per question from Regina Obe.
When use_remote_estimate is enabled, consider adding ORDER BY to the
query we sending to the remote server so that we can use that ordered
data for a merge join. Commit f18c944b61
arranges to push down the query pathkeys, which seems like the case
mostly likely to be a win, but testing shows this can sometimes win,
too.
For a regular table, we know which indexes are present and therefore
test whether the ordering provided by each such index is useful. Here,
we take the opposite approach: guess what orderings would be useful if
they could be generated cheaply, and then ask the remote side what those
will cost.
Ashutosh Bapat, with very substantial cosmetic revisions by me. Also
reviewed by Rushabh Lathia.
Commit e7cb7ee145 provided basic
infrastructure for allowing a foreign data wrapper or custom scan
provider to replace a join of one or more tables with a scan.
However, this infrastructure failed to take into account the need
for possible EvalPlanQual rechecks, and ExecScanFetch would fail
an assertion (or just overwrite memory) if such a check was attempted
for a plan containing a pushed-down join. To fix, adjust the EPQ
machinery to skip some processing steps when scanrelid == 0, making
those the responsibility of scan's recheck method, which also has
the responsibility in this case of correctly populating the relevant
slot.
To allow foreign scans to gain control in the right place to make
use of this new facility, add a new, optional RecheckForeignScan
method. Also, allow a foreign scan to have a child plan, which can
be used to correctly populate the slot (or perhaps for something
else, but this is the only use currently envisioned).
KaiGai Kohei, reviewed by Robert Haas, Etsuro Fujita, and Kyotaro
Horiguchi.
Rather than relying on other extensions to be available for installation,
let's just add some test objects to the postgres_fdw extension itself
within the regression script.
The user can whitelist specified extension(s) in the foreign server's
options, whereupon we will treat immutable functions and operators of those
extensions as candidates to be sent for remote execution.
Whitelisting an extension in this way basically promises that the extension
exists on the remote server and behaves compatibly with the local instance.
We have no way to prove that formally, so we have to rely on the user to
get it right. But this seems like something that people can usually get
right in practice.
We might in future allow functions and operators to be whitelisted
individually, but extension granularity is a very convenient special case,
so it got done first.
The patch as-committed lacks any regression tests, which is unfortunate,
but introducing dependencies on other extensions for testing purposes
would break "make installcheck" scenarios, which is worse. I have some
ideas about klugy ways around that, but it seems like material for a
separate patch. For the moment, leave the problem open.
Paul Ramsey, hacked up a bit more by me
If the join problem's entire ORDER BY clause can be pushed to the
remote server, consider a path that adds this ORDER BY clause. If
use_remote_estimate is on, we cost this path using an additional
remote EXPLAIN. If not, we just estimate that the path costs 20%
more, which is intended to be large enough that we won't request a
remote sort when it's not helpful, but small enough that we'll have
the remote side do the sort when in doubt. In some cases, the remote
sort might actually be free, because the remote query plan might
happen to produce output that is ordered the way we need, but without
remote estimates we have no way of knowing that.
It might also be useful to request sorted output from the remote side
if it enables an efficient merge join, but this patch doesn't attempt
to handle that case.
Ashutosh Bapat with revisions by me. Also reviewed by Fabrízio de Royes
Mello and Jeevan Chalke.
This fixes a long-standing bug which was discovered while investigating
the interaction between the new join pushdown code and the EvalPlanQual
machinery: if a ForeignScan appears on the inner side of a paramaterized
nestloop, an EPQ recheck would re-return the original tuple even if
it no longer satisfied the pushed-down quals due to changed parameter
values.
This fix adds a new member to ForeignScan and ForeignScanState and a
new argument to make_foreignscan, and requires changes to FDWs which
push down quals to populate that new argument with a list of quals they
have chosen to push down. Therefore, I'm only back-patching to 9.5,
even though the bug is not new in 9.5.
Etsuro Fujita, reviewed by me and by Kyotaro Horiguchi.
If we have a local Var of say varchar type with default collation, and
we apply a RelabelType to convert that to text with default collation, we
don't want to consider that as creating an FDW_COLLATE_UNSAFE situation.
It should be okay to compare that to a remote Var, so long as the remote
Var determines the comparison collation. (When we actually ship such an
expression to the remote side, the local Var would become a Param with
default collation, meaning the remote Var would in fact control the
comparison collation, because non-default implicit collation overrides
default implicit collation in parse_collate.c.) To fix, be more precise
about what FDW_COLLATE_NONE means: it applies either to a noncollatable
data type or to a collatable type with default collation, if that collation
can't be traced to a remote Var. (When it can, FDW_COLLATE_SAFE is
appropriate.) We were essentially using that interpretation already at
the Var/Const/Param level, but we weren't bubbling it up properly.
An alternative fix would be to introduce a separate FDW_COLLATE_DEFAULT
value to describe the second situation, but that would add more code
without changing the actual behavior, so it didn't seem worthwhile.
Also, since we're clarifying the rule to be that we care about whether
operator/function input collations match, there seems no need to fail
immediately upon seeing a Const/Param/non-foreign-Var with nondefault
collation. We only have to reject if it appears in a collation-sensitive
context (for example, "var IS NOT NULL" is perfectly safe from a collation
standpoint, whatever collation the var has). So just set the state to
UNSAFE rather than failing immediately.
Per report from Jeevan Chalke. This essentially corrects some sloppy
thinking in commit ed3ddf918b, so back-patch
to 9.3 where that logic appeared.
Patch by David Rowley. Backpatch to 9.5, as some of the calls were new in
9.5, and keeping the code in sync with master makes future backpatching
easier.
Add a TABLESAMPLE clause to SELECT statements that allows
user to specify random BERNOULLI sampling or block level
SYSTEM sampling. Implementation allows for extensible
sampling functions to be written, using a standard API.
Basic version follows SQLStandard exactly. Usable
concrete use cases for the sampling API follow in later
commits.
Petr Jelinek
Reviewed by Michael Paquier and Simon Riggs
If a postgres_fdw foreign table is a non-locked source relation in an
UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, and the query selects its
ctid column, the wrong value would be returned if an EvalPlanQual
recheck occurred. This happened because the foreign table's result row
was copied via the ROW_MARK_COPY code path, and EvalPlanQualFetchRowMarks
just unconditionally set the reconstructed tuple's t_self to "invalid".
To fix that, we can have EvalPlanQualFetchRowMarks copy the composite
datum's t_ctid field, and be sure to initialize that along with t_self
when postgres_fdw constructs a tuple to return.
If we just did that much then EvalPlanQualFetchRowMarks would start
returning "(0,0)" as ctid for all other ROW_MARK_COPY cases, which perhaps
does not matter much, but then again maybe it might. The cause of that is
that heap_form_tuple, which is the ultimate source of all composite datums,
simply leaves t_ctid as zeroes in newly constructed tuples. That seems
like a bad idea on general principles: a field that's really not been
initialized shouldn't appear to have a valid value. So let's eat the
trivial additional overhead of doing "ItemPointerSetInvalid(&(td->t_ctid))"
in heap_form_tuple.
This closes out our handling of Etsuro Fujita's report that tableoid and
ctid weren't correctly set in postgres_fdw EvalPlanQual cases. Along the
way we did a great deal of work to improve FDWs' ability to control row
locking behavior; which was not wasted effort by any means, but it didn't
end up being a fix for this problem because that feature would be too
expensive for postgres_fdw to use all the time.
Although the fix for the tableoid misbehavior was back-patched, I'm
hesitant to do so here; it seems far less likely that people would care
about remote ctid than tableoid, and even such a minor behavioral change
as this in heap_form_tuple is perhaps best not back-patched. So commit
to HEAD only, at least for the moment.
Etsuro Fujita, with some adjustments by me