diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2022-10-27 14:42:18 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2022-10-27 14:42:18 -0400 |
commit | a5fc46414deb7cbcd4cec1275efac69b9ac10500 (patch) | |
tree | d17ba92470e28a3646c7316f178f7dd54188891b /src/backend/optimizer/path | |
parent | 4ab8c81bd90ae442dbd092df04a12dbb7e68f562 (diff) | |
download | postgresql-a5fc46414deb7cbcd4cec1275efac69b9ac10500.tar.gz |
Avoid making commutatively-duplicate clauses in EquivalenceClasses.
When we decide we need to make a derived clause equating a.x and
b.y, we already will re-use a previously-made clause "a.x = b.y".
But we might instead have "b.y = a.x", which is perfectly usable
because equivclass.c has never promised anything about the
operand order in clauses it builds. Saving construction of a
new RestrictInfo doesn't matter all that much in itself --- but
because we cache selectivity estimates and so on per-RestrictInfo,
there's a possibility of saving a fair amount of duplicative
effort downstream.
Hence, check for commutative matches as well as direct ones when
seeing if we have a pre-existing clause. This changes the visible
clause order in several regression test cases, but they're all
clearly-insignificant changes.
Checking for the reverse operand order is simple enough, but
if we wanted to check for operator OID match we'd need to call
get_commutator here, which is not so cheap. I concluded that
we don't really need the operator check anyway, so I just
removed it. It's unlikely that an opfamily contains more than
one applicable operator for a given pair of operand datatypes;
and if it does they had better give the same answers, so there
seems little need to insist that we use exactly the one
select_equality_operator chose.
Using the current core regression suite as a test case, I see
this change reducing the number of new join clauses built by
create_join_clause from 9673 to 5142 (out of 26652 calls).
So not quite 50% savings, but pretty close to it.
Discussion: https://postgr.es/m/78062.1666735746@sss.pgh.pa.us
Diffstat (limited to 'src/backend/optimizer/path')
-rw-r--r-- | src/backend/optimizer/path/equivclass.c | 28 |
1 files changed, 20 insertions, 8 deletions
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index f962ff82ad..e65b967b1f 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -1382,7 +1382,9 @@ generate_base_implied_equalities_broken(PlannerInfo *root, * whenever we select a particular pair of EquivalenceMembers to join, * we check to see if the pair matches any original clause (in ec_sources) * or previously-built clause (in ec_derives). This saves memory and allows - * re-use of information cached in RestrictInfos. + * re-use of information cached in RestrictInfos. We also avoid generating + * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but + * we already have "b.y = a.x", we return the existing clause. * * join_relids should always equal bms_union(outer_relids, inner_rel->relids). * We could simplify this function's API by computing it internally, but in @@ -1790,7 +1792,8 @@ select_equality_operator(EquivalenceClass *ec, Oid lefttype, Oid righttype) /* * create_join_clause * Find or make a RestrictInfo comparing the two given EC members - * with the given operator. + * with the given operator (or, possibly, its commutator, because + * the ordering of the operands in the result is not guaranteed). * * parent_ec is either equal to ec (if the clause is a potentially-redundant * join clause) or NULL (if not). We have to treat this as part of the @@ -1811,16 +1814,22 @@ create_join_clause(PlannerInfo *root, /* * Search to see if we already built a RestrictInfo for this pair of * EquivalenceMembers. We can use either original source clauses or - * previously-derived clauses. The check on opno is probably redundant, - * but be safe ... + * previously-derived clauses, and a commutator clause is acceptable. + * + * We used to verify that opno matches, but that seems redundant: even if + * it's not identical, it'd better have the same effects, or the operator + * families we're using are broken. */ foreach(lc, ec->ec_sources) { rinfo = (RestrictInfo *) lfirst(lc); if (rinfo->left_em == leftem && rinfo->right_em == rightem && - rinfo->parent_ec == parent_ec && - opno == ((OpExpr *) rinfo->clause)->opno) + rinfo->parent_ec == parent_ec) + return rinfo; + if (rinfo->left_em == rightem && + rinfo->right_em == leftem && + rinfo->parent_ec == parent_ec) return rinfo; } @@ -1829,8 +1838,11 @@ create_join_clause(PlannerInfo *root, rinfo = (RestrictInfo *) lfirst(lc); if (rinfo->left_em == leftem && rinfo->right_em == rightem && - rinfo->parent_ec == parent_ec && - opno == ((OpExpr *) rinfo->clause)->opno) + rinfo->parent_ec == parent_ec) + return rinfo; + if (rinfo->left_em == rightem && + rinfo->right_em == leftem && + rinfo->parent_ec == parent_ec) return rinfo; } |