diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 83e8d7ec16..a867cd885e 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -7,7 +7,7 @@ actual output plan, the /path code generates all possible ways to join the tables, and /prep handles special cases like inheritance. /util is utility stuff. /geqo is the separate "genetic optimization" planner --- it does a semi-random search through the join tree space, rather than exhaustively -considering all possible join trees. (But each join considered by geqo +considering all possible join trees. (But each join considered by /geqo is given to /path to create paths for, so we consider all possible implementation paths for each specific join even in GEQO mode.) @@ -40,7 +40,7 @@ the WHERE clause "tab1.col1 = tab2.col1" generates a JoinInfo for tab1 listing tab2 as an unjoined relation, and also one for tab2 showing tab1 as an unjoined relation. -If we have only a single base relation in the query, we are done here. +If we have only a single base relation in the query, we are done now. Otherwise we have to figure out how to join the base relations into a single join relation. @@ -225,5 +225,185 @@ way, the next level up will have the maximum freedom to build mergejoins without sorting, since it can pick from any of the paths retained for its inputs. -See path/pathkeys.c for an explanation of the PathKeys data structure that -represents what is known about the sort order of a particular Path. + +PathKeys +-------- + +The PathKeys data structure represents what is known about the sort order +of a particular Path. + +Path.pathkeys is a List of Lists of PathKeyItem nodes that represent +the sort order of the result generated by the Path. The n'th sublist +represents the n'th sort key of the result. + +In single/base relation RelOptInfo's, the Paths represent various ways +of scanning the relation and the resulting ordering of the tuples. +Sequential scan Paths have NIL pathkeys, indicating no known ordering. +Index scans have Path.pathkeys that represent the chosen index's ordering, +if any. A single-key index would create a pathkey with a single sublist, +e.g. ( (tab1.indexkey1/sortop1) ). A multi-key index generates a sublist +per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which +shows major sort by indexkey1 (ordering by sortop1) and minor sort by +indexkey2 with sortop2. + +Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since +we can say nothing about the overall order of its result. Also, an +indexscan on an unordered type of index generates NIL pathkeys. However, +we can always create a pathkey by doing an explicit sort. The pathkeys +for a Sort plan's output just represent the sort key fields and the +ordering operators used. + +Things get more interesting when we consider joins. Suppose we do a +mergejoin between A and B using the mergeclause A.X = B.Y. The output +of the mergejoin is sorted by X --- but it is also sorted by Y. We +represent this fact by listing both keys in a single pathkey sublist: +( (A.X/xsortop B.Y/ysortop) ). This pathkey asserts that the major +sort order of the Path can be taken to be *either* A.X or B.Y. +They are equal, so they are both primary sort keys. By doing this, +we allow future joins to use either var as a pre-sorted key, so upper +Mergejoins may be able to avoid having to re-sort the Path. This is +why pathkeys is a List of Lists. + +We keep a sortop associated with each PathKeyItem because cross-data-type +mergejoins are possible; for example int4 = int8 is mergejoinable. +In this case we need to remember that the left var is ordered by int4lt +while the right var is ordered by int8lt. So the different members of +each sublist could have different sortops. + +Note that while the order of the top list is meaningful (primary vs. +secondary sort key), the order of each sublist is arbitrary. Each sublist +should be regarded as a set of equivalent keys, with no significance +to the list order. + +With a little further thought, it becomes apparent that pathkeys for +joins need not only come from mergejoins. For example, if we do a +nestloop join between outer relation A and inner relation B, then any +pathkeys relevant to A are still valid for the join result: we have +not altered the order of the tuples from A. Even more interesting, +if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y, +and A.X was a pathkey for the outer relation A, then we can assert that +B.Y is a pathkey for the join result; X was ordered before and still is, +and the joined values of Y are equal to the joined values of X, so Y +must now be ordered too. This is true even though we used neither an +explicit sort nor a mergejoin on Y. + +More generally, whenever we have an equijoin clause A.X = B.Y and a +pathkey A.X, we can add B.Y to that pathkey if B is part of the joined +relation the pathkey is for, *no matter how we formed the join*. It works +as long as the clause has been applied at some point while forming the +join relation. (In the current implementation, we always apply qual +clauses as soon as possible, ie, as far down in the plan tree as possible. +So we can always make this deduction. If we postponed filtering by qual +clauses then we'd not be able to assume pathkey equivalence until after +the equality check(s) had been applied.) + +In short, then: when producing the pathkeys for a merge or nestloop join, +we can keep all of the keys of the outer path, since the ordering of the +outer path will be preserved in the result. Furthermore, we can add to +each pathkey sublist any inner vars that are equijoined to any of the +outer vars in the sublist; this works regardless of whether we are +implementing the join using that equijoin clause as a mergeclause, +or merely enforcing the clause after-the-fact as a qpqual filter. + +Although Hashjoins also work only with equijoin operators, it is *not* +safe to consider the output of a Hashjoin to be sorted in any particular +order --- not even the outer path's order. This is true because the +executor might have to split the join into multiple batches. Therefore +a Hashjoin is always given NIL pathkeys. (Also, we need to use only +mergejoinable operators when deducing which inner vars are now sorted, +because a mergejoin operator tells us which left- and right-datatype +sortops can be considered equivalent, whereas a hashjoin operator +doesn't imply anything about sort order.) + +Pathkeys are also useful to represent an ordering that we wish to achieve, +since they are easily compared to the pathkeys of a potential candidate +path. So, SortClause lists are turned into pathkeys lists for use inside +the optimizer. + +OK, now for how it *really* works: + +We did implement pathkeys just as described above, and found that the +planner spent a huge amount of time comparing pathkeys, because the +representation of pathkeys as unordered lists made it expensive to decide +whether two were equal or not. So, we've modified the representation +as described next. + +If we scan the WHERE clause for equijoin clauses (mergejoinable clauses) +during planner startup, we can construct lists of equivalent pathkey items +for the query. There could be more than two items per equivalence set; +for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the +equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops). +Any pathkey item that belongs to an equivalence set implies that all the +other items in its set apply to the relation too, or at least all the ones +that are for fields present in the relation. (Some of the items in the +set might be for as-yet-unjoined relations.) Furthermore, any multi-item +pathkey sublist that appears at any stage of planning the query *must* be +a subset of one or another of these equivalence sets; there's no way we'd +have put two items in the same pathkey sublist unless they were equijoined +in WHERE. + +Now suppose that we allow a pathkey sublist to contain pathkey items for +vars that are not yet part of the pathkey's relation. This introduces +no logical difficulty, because such items can easily be seen to be +irrelevant; we just mandate that they be ignored. But having allowed +this, we can declare (by fiat) that any multiple-item pathkey sublist +must be "equal()" to the appropriate equivalence set. In effect, +whenever we make a pathkey sublist that mentions any var appearing in an +equivalence set, we instantly add all the other vars equivalenced to it, +whether they appear yet in the pathkey's relation or not. And we also +mandate that the pathkey sublist appear in the same order as the +equivalence set it comes from. (In practice, we simply return a pointer +to the relevant equivalence set without building any new sublist at all. +Each equivalence set becomes a "canonical pathkey" for all its members.) +This makes comparing pathkeys very simple and fast, and saves a lot of +work and memory space for pathkey construction as well. + +Note that pathkey sublists having just one item still exist, and are +not expected to be equal() to any equivalence set. This occurs when +we describe a sort order that involves a var that's not mentioned in +any equijoin clause of the WHERE. We could add singleton sets containing +such vars to the query's list of equivalence sets, but there's little +point in doing so. + +By the way, it's OK and even useful for us to build equivalence sets +that mention multiple vars from the same relation. For example, if +we have WHERE A.X = A.Y and we are scanning A using an index on X, +we can legitimately conclude that the path is sorted by Y as well; +and this could be handy if Y is the variable used in other join clauses +or ORDER BY. So, any WHERE clause with a mergejoinable operator can +contribute to an equivalence set, even if it's not a join clause. + +As sketched so far, equijoin operators allow us to conclude that +A.X = B.Y and B.Y = C.Z together imply A.X = C.Z, even when different +datatypes are involved. What is not immediately obvious is that to use +the "canonical pathkey" representation, we *must* make this deduction. +An example (from a real bug in Postgres 7.0) is a mergejoin for a query +like + SELECT * FROM t1, t2 WHERE t1.f2 = t2.f3 AND t1.f1 = t2.f3; +The canonical-pathkey mechanism is able to deduce that t1.f1 = t1.f2 +(ie, both appear in the same canonical pathkey set). If we sort t1 +and then apply a mergejoin, we *must* filter the t1 tuples using the +implied qualification f1 = f2, because otherwise the output of the sort +will be ordered by f1 or f2 (whichever we sort on) but not both. The +merge will then fail since (depending on which qual clause it applies +first) it's expecting either ORDER BY f1,f2 or ORDER BY f2,f1, but the +actual output of the sort has neither of these orderings. The best fix +for this is to generate all the implied equality constraints for each +equijoin set and add these clauses to the query's qualification list. +In other words, we *explicitly* deduce f1 = f2 and add this to the WHERE +clause. The constraint will be applied as a qpqual to the output of the +scan on t1, resulting in sort output that is indeed ordered by both vars. +This approach provides more information to the selectivity estimation +code than it would otherwise have, and reduces the number of tuples +processed in join stages, so it's a win to make these deductions even +if we weren't forced to. + +Yet another implication of all this is that mergejoinable operators +must form closed equivalence sets. For example, if "int2 = int4" +and "int4 = int8" are both marked mergejoinable, then there had better +be a mergejoinable "int2 = int8" operator as well. Otherwise, when +we're given WHERE int2var = int4var AND int4var = int8var, we'll fail +while trying to create a representation of the implied clause +int2var = int8var. + +-- bjm & tgl diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index e51daddd04..e9906bfef2 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -3,12 +3,15 @@ * pathkeys.c * Utilities for matching and building path keys * + * See src/backend/optimizer/README for a great deal of information about + * the nature and use of path keys. + * + * * Portions Copyright (c) 1996-2000, PostgreSQL, Inc * Portions Copyright (c) 1994, Regents of the University of California * - * * IDENTIFICATION - * $Header: /cvsroot/pgsql/src/backend/optimizer/path/pathkeys.c,v 1.22 2000/05/30 00:49:47 momjian Exp $ + * $Header: /cvsroot/pgsql/src/backend/optimizer/path/pathkeys.c,v 1.23 2000/07/24 03:10:56 tgl Exp $ * *------------------------------------------------------------------------- */ @@ -18,156 +21,17 @@ #include "optimizer/clauses.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" +#include "optimizer/planmain.h" #include "optimizer/tlist.h" #include "parser/parsetree.h" #include "parser/parse_func.h" #include "utils/lsyscache.h" + static PathKeyItem *makePathKeyItem(Node *key, Oid sortop); static List *make_canonical_pathkey(Query *root, PathKeyItem *item); static Var *find_indexkey_var(Query *root, RelOptInfo *rel, - AttrNumber varattno); - - -/*-------------------- - * Explanation of Path.pathkeys - * - * Path.pathkeys is a List of Lists of PathKeyItem nodes that represent - * the sort order of the result generated by the Path. The n'th sublist - * represents the n'th sort key of the result. - * - * In single/base relation RelOptInfo's, the Paths represent various ways - * of scanning the relation and the resulting ordering of the tuples. - * Sequential scan Paths have NIL pathkeys, indicating no known ordering. - * Index scans have Path.pathkeys that represent the chosen index's ordering, - * if any. A single-key index would create a pathkey with a single sublist, - * e.g. ( (tab1.indexkey1/sortop1) ). A multi-key index generates a sublist - * per key, e.g. ( (tab1.indexkey1/sortop1) (tab1.indexkey2/sortop2) ) which - * shows major sort by indexkey1 (ordering by sortop1) and minor sort by - * indexkey2 with sortop2. - * - * Note that a multi-pass indexscan (OR clause scan) has NIL pathkeys since - * we can say nothing about the overall order of its result. Also, an - * indexscan on an unordered type of index generates NIL pathkeys. However, - * we can always create a pathkey by doing an explicit sort. The pathkeys - * for a sort plan's output just represent the sort key fields and the - * ordering operators used. - * - * Things get more interesting when we consider joins. Suppose we do a - * mergejoin between A and B using the mergeclause A.X = B.Y. The output - * of the mergejoin is sorted by X --- but it is also sorted by Y. We - * represent this fact by listing both keys in a single pathkey sublist: - * ( (A.X/xsortop B.Y/ysortop) ). This pathkey asserts that the major - * sort order of the Path can be taken to be *either* A.X or B.Y. - * They are equal, so they are both primary sort keys. By doing this, - * we allow future joins to use either var as a pre-sorted key, so upper - * Mergejoins may be able to avoid having to re-sort the Path. This is - * why pathkeys is a List of Lists. - * - * We keep a sortop associated with each PathKeyItem because cross-data-type - * mergejoins are possible; for example int4 = int8 is mergejoinable. - * In this case we need to remember that the left var is ordered by int4lt - * while the right var is ordered by int8lt. So the different members of - * each sublist could have different sortops. - * - * Note that while the order of the top list is meaningful (primary vs. - * secondary sort key), the order of each sublist is arbitrary. Each sublist - * should be regarded as a set of equivalent keys, with no significance - * to the list order. - * - * With a little further thought, it becomes apparent that pathkeys for - * joins need not only come from mergejoins. For example, if we do a - * nestloop join between outer relation A and inner relation B, then any - * pathkeys relevant to A are still valid for the join result: we have - * not altered the order of the tuples from A. Even more interesting, - * if there was a mergeclause (more formally, an "equijoin clause") A.X=B.Y, - * and A.X was a pathkey for the outer relation A, then we can assert that - * B.Y is a pathkey for the join result; X was ordered before and still is, - * and the joined values of Y are equal to the joined values of X, so Y - * must now be ordered too. This is true even though we used no mergejoin. - * - * More generally, whenever we have an equijoin clause A.X = B.Y and a - * pathkey A.X, we can add B.Y to that pathkey if B is part of the joined - * relation the pathkey is for, *no matter how we formed the join*. - * - * In short, then: when producing the pathkeys for a merge or nestloop join, - * we can keep all of the keys of the outer path, since the ordering of the - * outer path will be preserved in the result. Furthermore, we can add to - * each pathkey sublist any inner vars that are equijoined to any of the - * outer vars in the sublist; this works regardless of whether we are - * implementing the join using that equijoin clause as a mergeclause, - * or merely enforcing the clause after-the-fact as a qpqual filter. - * - * Although Hashjoins also work only with equijoin operators, it is *not* - * safe to consider the output of a Hashjoin to be sorted in any particular - * order --- not even the outer path's order. This is true because the - * executor might have to split the join into multiple batches. Therefore - * a Hashjoin is always given NIL pathkeys. (Also, we need to use only - * mergejoinable operators when deducing which inner vars are now sorted, - * because a mergejoin operator tells us which left- and right-datatype - * sortops can be considered equivalent, whereas a hashjoin operator - * doesn't imply anything about sort order.) - * - * Pathkeys are also useful to represent an ordering that we wish to achieve, - * since they are easily compared to the pathkeys of a potential candidate - * path. So, SortClause lists are turned into pathkeys lists for use inside - * the optimizer. - * - * OK, now for how it *really* works: - * - * We did implement pathkeys just as described above, and found that the - * planner spent a huge amount of time comparing pathkeys, because the - * representation of pathkeys as unordered lists made it expensive to decide - * whether two were equal or not. So, we've modified the representation - * as described next. - * - * If we scan the WHERE clause for equijoin clauses (mergejoinable clauses) - * during planner startup, we can construct lists of equivalent pathkey items - * for the query. There could be more than two items per equivalence set; - * for example, WHERE A.X = B.Y AND B.Y = C.Z AND D.R = E.S creates the - * equivalence sets { A.X B.Y C.Z } and { D.R E.S } (plus associated sortops). - * Any pathkey item that belongs to an equivalence set implies that all the - * other items in its set apply to the relation too, or at least all the ones - * that are for fields present in the relation. (Some of the items in the - * set might be for as-yet-unjoined relations.) Furthermore, any multi-item - * pathkey sublist that appears at any stage of planning the query *must* be - * a subset of one or another of these equivalence sets; there's no way we'd - * have put two items in the same pathkey sublist unless they were equijoined - * in WHERE. - * - * Now suppose that we allow a pathkey sublist to contain pathkey items for - * vars that are not yet part of the pathkey's relation. This introduces - * no logical difficulty, because such items can easily be seen to be - * irrelevant; we just mandate that they be ignored. But having allowed - * this, we can declare (by fiat) that any multiple-item pathkey sublist - * must be equal() to the appropriate equivalence set. In effect, whenever - * we make a pathkey sublist that mentions any var appearing in an - * equivalence set, we instantly add all the other vars equivalenced to it, - * whether they appear yet in the pathkey's relation or not. And we also - * mandate that the pathkey sublist appear in the same order as the - * equivalence set it comes from. (In practice, we simply return a pointer - * to the relevant equivalence set without building any new sublist at all.) - * This makes comparing pathkeys very simple and fast, and saves a lot of - * work and memory space for pathkey construction as well. - * - * Note that pathkey sublists having just one item still exist, and are - * not expected to be equal() to any equivalence set. This occurs when - * we describe a sort order that involves a var that's not mentioned in - * any equijoin clause of the WHERE. We could add singleton sets containing - * such vars to the query's list of equivalence sets, but there's little - * point in doing so. - * - * By the way, it's OK and even useful for us to build equivalence sets - * that mention multiple vars from the same relation. For example, if - * we have WHERE A.X = A.Y and we are scanning A using an index on X, - * we can legitimately conclude that the path is sorted by Y as well; - * and this could be handy if Y is the variable used in other join clauses - * or ORDER BY. So, any WHERE clause with a mergejoinable operator can - * contribute to an equivalence set, even if it's not a join clause. - * - * -- bjm & tgl - *-------------------- - */ + AttrNumber varattno); /* @@ -225,35 +89,107 @@ add_equijoined_keys(Query *root, RestrictInfo *restrictinfo) * into our new set. When done, we add the new set to the front of * equi_key_list. * + * It may well be that the two items we're given are already known to + * be equijoin-equivalent, in which case we don't need to change our + * data structure. If we find both of them in the same equivalence + * set to start with, we can quit immediately. + * * This is a standard UNION-FIND problem, for which there exist better * data structures than simple lists. If this code ever proves to be * a bottleneck then it could be sped up --- but for now, simple is * beautiful. */ - newset = lcons(item1, lcons(item2, NIL)); + newset = NIL; foreach(cursetlink, root->equi_key_list) { List *curset = lfirst(cursetlink); + bool item1here = member(item1, curset); + bool item2here = member(item2, curset); - if (member(item1, curset) || member(item2, curset)) + if (item1here || item2here) { + /* If find both in same equivalence set, no need to do any more */ + if (item1here && item2here) + { + /* Better not have seen only one in an earlier set... */ + Assert(newset == NIL); + return; + } + + /* Build the new set only when we know we must */ + if (newset == NIL) + newset = lcons(item1, lcons(item2, NIL)); + /* Found a set to merge into our new set */ newset = LispUnion(newset, curset); /* * Remove old set from equi_key_list. NOTE this does not - * change lnext(cursetlink), so the outer foreach doesn't - * break. + * change lnext(cursetlink), so the foreach loop doesn't break. */ root->equi_key_list = lremove(curset, root->equi_key_list); freeList(curset); /* might as well recycle old cons cells */ } } + /* Build the new set only when we know we must */ + if (newset == NIL) + newset = lcons(item1, lcons(item2, NIL)); + root->equi_key_list = lcons(newset, root->equi_key_list); } +/* + * generate_implied_equalities + * Scan the completed equi_key_list for the query, and generate explicit + * qualifications (WHERE clauses) for all the pairwise equalities not + * already mentioned in the quals. This is useful because the additional + * clauses help the selectivity-estimation code, and in fact it's + * *necessary* to ensure that sort keys we think are equivalent really + * are (see src/backend/optimizer/README for more info). + * + * This routine just walks the equi_key_list to find all pairwise equalities. + * We call process_implied_equality (in plan/initsplan.c) to determine whether + * each is already known and add it to the proper restrictinfo list if not. + */ +void +generate_implied_equalities(Query *root) +{ + List *cursetlink; + + foreach(cursetlink, root->equi_key_list) + { + List *curset = lfirst(cursetlink); + List *ptr1; + + /* + * A set containing only two items cannot imply any equalities + * beyond the one that created the set, so we can skip it. + */ + if (length(curset) < 3) + continue; + + /* + * Match each item in the set with all that appear after it + * (it's sufficient to generate A=B, need not process B=A too). + */ + foreach(ptr1, curset) + { + PathKeyItem *item1 = (PathKeyItem *) lfirst(ptr1); + List *ptr2; + + foreach(ptr2, lnext(ptr1)) + { + PathKeyItem *item2 = (PathKeyItem *) lfirst(ptr2); + + process_implied_equality(root, item1->key, item2->key, + item1->sortop, item2->sortop); + } + } + } +} + /* * make_canonical_pathkey * Given a PathKeyItem, find the equi_key_list subset it is a member of, diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 207981b527..8b95deca58 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -8,13 +8,14 @@ * * * IDENTIFICATION - * $Header: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v 1.46 2000/04/12 17:15:21 momjian Exp $ + * $Header: /cvsroot/pgsql/src/backend/optimizer/plan/initsplan.c,v 1.47 2000/07/24 03:11:01 tgl Exp $ * *------------------------------------------------------------------------- */ #include #include "postgres.h" +#include "catalog/pg_operator.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "optimizer/clauses.h" @@ -25,6 +26,9 @@ #include "optimizer/planmain.h" #include "optimizer/tlist.h" #include "optimizer/var.h" +#include "parser/parse_expr.h" +#include "parser/parse_oper.h" +#include "parser/parse_type.h" #include "utils/lsyscache.h" @@ -122,6 +126,7 @@ add_missing_rels_to_query(Query *root) } } + /***************************************************************************** * * QUALIFICATIONS @@ -129,7 +134,6 @@ add_missing_rels_to_query(Query *root) *****************************************************************************/ - /* * add_restrict_and_join_to_rels * Fill RestrictInfo and JoinInfo lists of relation entries for all @@ -280,6 +284,113 @@ add_join_info_to_rels(Query *root, RestrictInfo *restrictinfo, } } +/* + * process_implied_equality + * Check to see whether we already have a restrictinfo item that says + * item1 = item2, and create one if not. This is a consequence of + * transitivity of mergejoin equality: if we have mergejoinable + * clauses A = B and B = C, we can deduce A = C (where = is an + * appropriate mergejoinable operator). + */ +void +process_implied_equality(Query *root, Node *item1, Node *item2, + Oid sortop1, Oid sortop2) +{ + Index irel1; + Index irel2; + RelOptInfo *rel1; + List *restrictlist; + List *itm; + Oid ltype, + rtype; + Operator eq_operator; + Form_pg_operator pgopform; + Expr *clause; + + /* + * Currently, since check_mergejoinable only accepts Var = Var clauses, + * we should only see Var nodes here. Would have to work a little + * harder to locate the right rel(s) if more-general mergejoin clauses + * were accepted. + */ + Assert(IsA(item1, Var)); + irel1 = ((Var *) item1)->varno; + Assert(IsA(item2, Var)); + irel2 = ((Var *) item2)->varno; + /* + * If both vars belong to same rel, we need to look at that rel's + * baserestrictinfo list. If different rels, each will have a + * joininfo node for the other, and we can scan either list. + */ + rel1 = get_base_rel(root, irel1); + if (irel1 == irel2) + restrictlist = rel1->baserestrictinfo; + else + { + JoinInfo *joininfo = find_joininfo_node(rel1, + lconsi(irel2, NIL)); + + restrictlist = joininfo->jinfo_restrictinfo; + } + /* + * Scan to see if equality is already known. + */ + foreach(itm, restrictlist) + { + RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(itm); + Node *left, + *right; + + if (restrictinfo->mergejoinoperator == InvalidOid) + continue; /* ignore non-mergejoinable clauses */ + /* We now know the restrictinfo clause is a binary opclause */ + left = (Node *) get_leftop(restrictinfo->clause); + right = (Node *) get_rightop(restrictinfo->clause); + if ((equal(item1, left) && equal(item2, right)) || + (equal(item2, left) && equal(item1, right))) + return; /* found a matching clause */ + } + /* + * This equality is new information, so construct a clause + * representing it to add to the query data structures. + */ + ltype = exprType(item1); + rtype = exprType(item2); + eq_operator = oper("=", ltype, rtype, true); + if (!HeapTupleIsValid(eq_operator)) + { + /* + * Would it be safe to just not add the equality to the query if + * we have no suitable equality operator for the combination of + * datatypes? NO, because sortkey selection may screw up anyway. + */ + elog(ERROR, "Unable to identify an equality operator for types '%s' and '%s'", + typeidTypeName(ltype), typeidTypeName(rtype)); + } + pgopform = (Form_pg_operator) GETSTRUCT(eq_operator); + /* + * Let's just make sure this appears to be a compatible operator. + */ + if (pgopform->oprlsortop != sortop1 || + pgopform->oprrsortop != sortop2 || + pgopform->oprresult != BOOLOID) + elog(ERROR, "Equality operator for types '%s' and '%s' should be mergejoinable, but isn't", + typeidTypeName(ltype), typeidTypeName(rtype)); + + clause = makeNode(Expr); + clause->typeOid = BOOLOID; + clause->opType = OP_EXPR; + clause->oper = (Node *) makeOper(oprid(eq_operator), /* opno */ + InvalidOid, /* opid */ + BOOLOID, /* operator result type */ + 0, + NULL); + clause->args = lcons(item1, lcons(item2, NIL)); + + add_restrict_and_join_to_rel(root, (Node *) clause); +} + + /***************************************************************************** * * CHECKS FOR MERGEJOINABLE AND HASHJOINABLE CLAUSES diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 0e05c94538..a33d068a38 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -14,7 +14,7 @@ * * * IDENTIFICATION - * $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planmain.c,v 1.55 2000/04/12 17:15:22 momjian Exp $ + * $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planmain.c,v 1.56 2000/07/24 03:11:01 tgl Exp $ * *------------------------------------------------------------------------- */ @@ -184,7 +184,7 @@ subplanner(Query *root, * base_rel_list as relation references are found (e.g., in the * qualification, the targetlist, etc.). Restrict and join clauses * are added to appropriate lists belonging to the mentioned - * relations, and we also build lists of equijoined keys for pathkey + * relations. We also build lists of equijoined keys for pathkey * construction. */ root->base_rel_list = NIL; @@ -193,8 +193,18 @@ subplanner(Query *root, make_var_only_tlist(root, flat_tlist); add_restrict_and_join_to_rels(root, qual); + + /* + * Make sure we have RelOptInfo nodes for all relations used. + */ add_missing_rels_to_query(root); + /* + * Use the completed lists of equijoined keys to deduce any implied + * but unstated equalities (for example, A=B and B=C imply A=C). + */ + generate_implied_equalities(root); + /* * We should now have all the pathkey equivalence sets built, so it's * now possible to convert the requested query_pathkeys to canonical diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index ff26523418..66520d6a89 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -8,7 +8,7 @@ * Portions Copyright (c) 1996-2000, PostgreSQL, Inc * Portions Copyright (c) 1994, Regents of the University of California * - * $Id: paths.h,v 1.45 2000/05/31 00:28:38 petere Exp $ + * $Id: paths.h,v 1.46 2000/07/24 03:10:54 tgl Exp $ * *------------------------------------------------------------------------- */ @@ -90,6 +90,7 @@ typedef enum } PathKeysComparison; extern void add_equijoined_keys(Query *root, RestrictInfo *restrictinfo); +extern void generate_implied_equalities(Query *root); extern List *canonicalize_pathkeys(Query *root, List *pathkeys); extern PathKeysComparison compare_pathkeys(List *keys1, List *keys2); extern bool pathkeys_contained_in(List *keys1, List *keys2); diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index 6c16364f67..723543c437 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -7,7 +7,7 @@ * Portions Copyright (c) 1996-2000, PostgreSQL, Inc * Portions Copyright (c) 1994, Regents of the University of California * - * $Id: planmain.h,v 1.42 2000/06/18 22:44:33 tgl Exp $ + * $Id: planmain.h,v 1.43 2000/07/24 03:10:54 tgl Exp $ * *------------------------------------------------------------------------- */ @@ -43,6 +43,8 @@ extern Result *make_result(List *tlist, Node *resconstantqual, Plan *subplan); extern void make_var_only_tlist(Query *root, List *tlist); extern void add_restrict_and_join_to_rels(Query *root, List *clauses); extern void add_missing_rels_to_query(Query *root); +extern void process_implied_equality(Query *root, Node *item1, Node *item2, + Oid sortop1, Oid sortop2); /* * prototypes for plan/setrefs.c