[PATCH] Keeps tracking the uniqueness with UniqueKey

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
On Thu, 16 Apr 2020 at 14:17, Andy Fan <[hidden email]> wrote:
> V6 also includes:
> 1.  Fix the comment misleading you mentioned above.
> 2.  Fixed a concern case for `relation_has_uniquekeys_for` function.

Over on [1], Richard highlights a problem in the current join removals
lack of ability to remove left joins unless the min_righthand side of
the join is a singleton rel. It's my understanding that the reason the
code checks for this is down to the fact that join removals used
unique indexed to prove the uniqueness of the relation and obviously,
those can only exist on base relations.  I wondered if you might want
to look into a 0003 patch which removes that restriction? I think this
can be done now since we no longer look at unique indexes to provide
the proves that the join to be removed won't duplicate outer side
rows.

David

[1] https://www.postgresql.org/message-id/CAMbWs4-THacv3DdMpiTrvg5ZY7sNViFF1pTU=kOKmtPBrE9-0Q@...


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Wed, Apr 29, 2020 at 8:29 AM David Rowley <[hidden email]> wrote:
On Thu, 16 Apr 2020 at 14:17, Andy Fan <[hidden email]> wrote:
> V6 also includes:
> 1.  Fix the comment misleading you mentioned above.
> 2.  Fixed a concern case for `relation_has_uniquekeys_for` function.

Over on [1], Richard highlights a problem in the current join removals
lack of ability to remove left joins unless the min_righthand side of
the join is a singleton rel. It's my understanding that the reason the
code checks for this is down to the fact that join removals used
unique indexed to prove the uniqueness of the relation and obviously,
those can only exist on base relations.  I wondered if you might want
to look into a 0003 patch which removes that restriction? I think this
can be done now since we no longer look at unique indexes to provide
the proves that the join to be removed won't duplicate outer side
rows.

Yes, I think that would be another benefit of UniqueKey,  but it doesn't happen
until now.  I will take a look of it today and fix it in a separated commit. 
 
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Wed, Apr 29, 2020 at 8:34 AM Andy Fan <[hidden email]> wrote:


On Wed, Apr 29, 2020 at 8:29 AM David Rowley <[hidden email]> wrote:
On Thu, 16 Apr 2020 at 14:17, Andy Fan <[hidden email]> wrote:
> V6 also includes:
> 1.  Fix the comment misleading you mentioned above.
> 2.  Fixed a concern case for `relation_has_uniquekeys_for` function.

Over on [1], Richard highlights a problem in the current join removals
lack of ability to remove left joins unless the min_righthand side of
the join is a singleton rel. It's my understanding that the reason the
code checks for this is down to the fact that join removals used
unique indexed to prove the uniqueness of the relation and obviously,
those can only exist on base relations.  I wondered if you might want
to look into a 0003 patch which removes that restriction? I think this
can be done now since we no longer look at unique indexes to provide
the proves that the join to be removed won't duplicate outer side
rows.

Yes, I think that would be another benefit of UniqueKey,  but it doesn't happen
until now.  I will take a look of it today and fix it in a separated commit. 
 
I have make it work locally,  the basic idea is to postpone the join removal at
build_join_rel stage where the uniquekey info is well maintained.   I will test 
more to send a product-ready-target patch tomorrow. 

# explain (costs off) select a.i from a left join b on a.i = b.i and
    b.j in (select j from c);
  QUERY PLAN
---------------
 Seq Scan on a
(1 row)

 Best Regard
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
I just uploaded the v7 version and split it into smaller commits for easier
review/merge. I also maintain a  up-to-date README.uniquekey
document since something may changed during discussion or later code. 

Here is the simple introduction of each commit.

====
1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch

This commit adds the notnullattrs to RelOptInfo,  which grabs the information
from both catalog and user's query.


2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch

This commit just add the uniquekeys to RelOptInfo and maintain it at every
stage. However the upper level code is not changed due to this.

Some changes of this part in v7:
1). Removed the UniqueKey.positions attribute. In the past it is used in
    convert_subquery_uniquekeys, however we don't need it actually (And I
    maintained it wrong in the past). Now I build the relationship between the
    outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
2). onerow UniqueKey(exprs = NIL) need to be converted to normal uniquekey(exprs
   != NIL) if it is not one-row any more. This may happen on some outer join.


3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch

Refactor the existing functions like innerrel_is_unique/res_is_distinct_for to
use UniqueKey, and postpone the call of remove_useless_join and 
reduce_unique_semijoins to use the new implementation.

4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch

Remove the distinct node if the result is distinct already.  Remove the aggnode
if the group by clause is unique already AND there is no aggregation function in
query.

5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch

If the group by clause is unique and query has aggregation function, we use
the AGG_SORT strategy but without really sort since it has only one row in each
group. 


6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch

This commit run join removal at build_join_rel.  At that time, it can fully uses
unique key. It can handle some more cases, I added some new test cases to
join.sql. However it can be a replacement of the current one. There are some
cases the new strategy can work run well but the current one can.  Like

SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id = b.id);

during the join a & b, the join can't be removed since b.id is still useful in
future. However in the future, we know the b.id can be removed as well, but
it is too late to remove the previous join.

At the implementation part,  the main idea is if the join_canbe_removed. we
will copy the pathlist from outerrel to joinrel. There are several items need to
handle.

1. To make sure the overall join_search_one_level, we have to keep the joinrel
   even the innerrel is removed (rather than discard the joinrel).
2. If the innerrel can be removed, we don't need to build pathlist for joinrel,
   we just reuse the pathlist from outerrel. However there are many places where
   use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
   have to change the parent to joinrel.
3. During create plan for some path on RTE_RELATION, it needs to know the
   relation Oid with path->parent->relid. so we have to use the outerrel->relid
   to overwrite the joinrel->relid which is 0 before.
4. Almost same paths as item 3, it usually assert best_path->parent->rtekind ==
   RTE_RELATION; now the path may appeared in joinrel, so I used
   outerrel->rtekind to overwrite joinrel->rtekind.
5. I guess there are some dependencies between path->pathtarget and
   rel->reltarget. since we reuse the pathlist of outerrel, so I used the
   outer->reltarget as well. If the join can be removed, I guess the length of
   list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
   rely on the ProjectionPath to reduce the tlist.

My patches is based on the current latest commit fb544735f1.

Best Regards
Andy Fan

v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch (18K) Download Attachment
v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (3K) Download Attachment
v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch (32K) Download Attachment
v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch (26K) Download Attachment
v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch (65K) Download Attachment
v7-0006-Join-removal-at-run-time-with-UniqueKey.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Ashutosh Bapat-2
Hi Andy,
Sorry for delay in review. Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.


+/*
+ * Return true iff there is an equal member in target for every
+ * member in members

Suggest reword: return true iff every entry in "members" list is also present
in the "target" list. This function doesn't care about multi-sets, so please
mention that in the prologue clearly.

+
+    if (root->parse->hasTargetSRFs)
+        return;

Why? A relation's uniqueness may be useful well before we work on SRFs.

+
+    if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+        /*
+         * Set UniqueKey on member rel is useless, we have to recompute it at
+         * upper level, see populate_partitionedrel_uniquekeys for reference
+         */
+        return;

Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.

+        /*
+         * If the unique index doesn't contain partkey, then it is unique
+         * on this partition only, so it is useless for us.
+         */

Not really. It could help partitionwise join.

+
+    /* Now we have the unique index list which as exactly same on all
childrels,
+     * Set the UniqueIndex just like it is non-partition table
+     */

I think it's better to annotate each partition with whatever unique index it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

+    /* A Normal group by without grouping set */
+    if (parse->groupClause)
+        add_uniquekey_from_sortgroups(root,
+                                      grouprel,
+                                      root->parse->groupClause);

Those keys which are part of groupClause and also form unique keys in the input
relation, should be recorded as unique keys in group rel. Knowing the minimal
set of keys allows more optimizations.

+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses(). At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer.

+
+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".

+            else if (inner_is_onerow)
+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.

I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

+    }
+    /*
+     * Calculate max_colno in subquery. In fact we can check this with
+     * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+     * is not set on UPPERREL_FINAL relation, so do it this way
+     */


Should/can we use the same logic to convert an expression in the subquery into
a Var of the outer query as in convert_subquery_pathkeys(). If the subquery
doesn't have a reltarget set, we should be able to use reltarget of any of its
paths since all of those should match the positions across subquery and the
outer query.

Will continue reviewing your new set of patches as time permits.

On Thu, May 7, 2020 at 7:02 AM Andy Fan <[hidden email]> wrote:

>
> I just uploaded the v7 version and split it into smaller commits for easier
> review/merge. I also maintain a  up-to-date README.uniquekey
> document since something may changed during discussion or later code.
>
> Here is the simple introduction of each commit.
>
> ====
> 1. v7-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch
>
> This commit adds the notnullattrs to RelOptInfo,  which grabs the information
> from both catalog and user's query.
>
>
> 2. v7-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch
>
> This commit just add the uniquekeys to RelOptInfo and maintain it at every
> stage. However the upper level code is not changed due to this.
>
> Some changes of this part in v7:
> 1). Removed the UniqueKey.positions attribute. In the past it is used in
>     convert_subquery_uniquekeys, however we don't need it actually (And I
>     maintained it wrong in the past). Now I build the relationship between the
>     outer var to subuqery's TargetList with outrel.subquery.processed_tlist.
> 2). onerow UniqueKey(exprs = NIL) need to be converted to normal uniquekey(exprs
>    != NIL) if it is not one-row any more. This may happen on some outer join.
>
>
> 3. v7-0003-Refactor-existing-uniqueness-related-code-to-use-.patch
>
> Refactor the existing functions like innerrel_is_unique/res_is_distinct_for to
> use UniqueKey, and postpone the call of remove_useless_join and
> reduce_unique_semijoins to use the new implementation.
>
> 4. v7-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch
>
> Remove the distinct node if the result is distinct already.  Remove the aggnode
> if the group by clause is unique already AND there is no aggregation function in
> query.
>
> 5. v7-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch
>
> If the group by clause is unique and query has aggregation function, we use
> the AGG_SORT strategy but without really sort since it has only one row in each
> group.
>
>
> 6. v7-0006-Join-removal-at-run-time-with-UniqueKey.patch
>
> This commit run join removal at build_join_rel.  At that time, it can fully uses
> unique key. It can handle some more cases, I added some new test cases to
> join.sql. However it can be a replacement of the current one. There are some
> cases the new strategy can work run well but the current one can.  Like
>
> SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id) ON (a.b_id = b.id);
>
> during the join a & b, the join can't be removed since b.id is still useful in
> future. However in the future, we know the b.id can be removed as well, but
> it is too late to remove the previous join.
>
> At the implementation part,  the main idea is if the join_canbe_removed. we
> will copy the pathlist from outerrel to joinrel. There are several items need to
> handle.
>
> 1. To make sure the overall join_search_one_level, we have to keep the joinrel
>    even the innerrel is removed (rather than discard the joinrel).
> 2. If the innerrel can be removed, we don't need to build pathlist for joinrel,
>    we just reuse the pathlist from outerrel. However there are many places where
>    use assert rel->pathlist[*]->parent == rel. so I copied the pathlist, we
>    have to change the parent to joinrel.
> 3. During create plan for some path on RTE_RELATION, it needs to know the
>    relation Oid with path->parent->relid. so we have to use the outerrel->relid
>    to overwrite the joinrel->relid which is 0 before.
> 4. Almost same paths as item 3, it usually assert best_path->parent->rtekind ==
>    RTE_RELATION; now the path may appeared in joinrel, so I used
>    outerrel->rtekind to overwrite joinrel->rtekind.
> 5. I guess there are some dependencies between path->pathtarget and
>    rel->reltarget. since we reuse the pathlist of outerrel, so I used the
>    outer->reltarget as well. If the join can be removed, I guess the length of
>    list_length(outrel->reltarget->exprs) >= (joinrel->reltarget->exprs). we can
>    rely on the ProjectionPath to reduce the tlist.
>
> My patches is based on the current latest commit fb544735f1.
>
> Best Regards
> Andy Fan



--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
Hi Ashutosh:

Appreciate for your comments!

On Thu, May 7, 2020 at 7:26 PM Ashutosh Bapat <[hidden email]> wrote:
Hi Andy,
Sorry for delay in review.

I  understand no one has obligation to do that,  and it must take reviewer's time 
and more, so really appreciated for it!  Hope I can provide more kindly help like
this in future as well. 
 
Your earlier patches are very large and it
requires some time to review those. I didn't finish reviewing those
but here are whatever comments I have till now on the previous set of
patches. Please see if any of those are useful to the new set.

Yes, I just realized the size as well, so I split them to smaller commit and 
each commit and be build and run make check successfully.  

All of your comments still valid except the last one (covert_subquery_uniquekeys) 
which has been fixed v7 version. 
 

+/*
+ * Return true iff there is an equal member in target for every
+ * member in members

Suggest reword: return true iff every entry in "members" list is also present
in the "target" list.

Will do, thanks!
 
This function doesn't care about multi-sets, so please
mention that in the prologue clearly.

+
+    if (root->parse->hasTargetSRFs)
+        return;

Why? A relation's uniqueness may be useful well before we work on SRFs.


Looks I misunderstand when the SRF function is executed.  I will fix this in v8. 

+
+    if (baserel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+        /*
+         * Set UniqueKey on member rel is useless, we have to recompute it at
+         * upper level, see populate_partitionedrel_uniquekeys for reference
+         */
+        return;

Handling these here might help in bottom up approach. We annotate each
partition here and then annotate partitioned table based on the individual
partitions. Same approach can be used for partitioned join produced by
partitionwise join.

+        /*
+         * If the unique index doesn't contain partkey, then it is unique
+         * on this partition only, so it is useless for us.
+         */

Not really. It could help partitionwise join.

+
+    /* Now we have the unique index list which as exactly same on all
childrels,
+     * Set the UniqueIndex just like it is non-partition table
+     */

I think it's better to annotate each partition with whatever unique index it
has whether or not global. That will help partitionwise join, partitionwise
aggregate/group etc.

Excellent catch! All the 3 items above is partitionwise join related, I need some time
to check how to handle that. 
 

+    /* A Normal group by without grouping set */
+    if (parse->groupClause)
+        add_uniquekey_from_sortgroups(root,
+                                      grouprel,
+                                      root->parse->groupClause);

Those keys which are part of groupClause and also form unique keys in the input
relation, should be recorded as unique keys in group rel. Knowing the minimal
set of keys allows more optimizations.

This is a very valid point now. I ignored this because I wanted to remove the AggNode
totally if the part of groupClause is unique,  However it doesn't happen later if there is
aggregation call in this query.


+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

OK, actually I called this function in generate_union_paths. which handle
UNION case only.  I will add the Assert anyway. 
 

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses().

I didn't realized this before.  I will refactor this code accordingly if necessary
after reading that. 
 
At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer. 
+
+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

good catch. 
 

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".
 
Exactly.  
 
+            else if (inner_is_onerow)
+            {
+                /* Since rows in innerrel can't be duplicated AND if
innerrel is onerow,
+                 * the join result will be onerow also as well. Note:
onerow implies
+                 * multi_nullvals = false.

I don't understand this comment. Why is there only one row in the other
relation which can join to this row?

I guess you may miss the onerow special case if I understand correctly. 
inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
innerrel can't be duplicated means:  t1.y = t2.pk;  so the finally result is onerow 
as well.  One of the overall query is  SELECT .. FROM t1, t2 where t2.y = t2.pk


I explained more about onerow in the v7 README.unqiuekey document, just copy 
it here. 

===
1. What is UniqueKey?
.... 
onerow is also a kind of UniqueKey which means the RelOptInfo will have 1 row at
most. it has a stronger semantic than others. like SELECT uk FROM t; uk is
normal unique key and may have different values.
SELECT colx FROM t WHERE uk = const.  colx is unique AND we have only 1 value. This
field can used for innerrel_is_unique. and also be used as an optimization for
this case. We don't need to maintain multi UniqueKey, we just maintain one with
onerow = true and exprs = NIL.

onerow is set to true only for 2 cases right now. 1) SELECT .. FROM t WHERE uk =
1; 2). SELECT aggref(xx) from t; // Without group by.
===

===
2. How is it maintained?
....
More considerations about onerow:
1. If relation with one row and it can't be duplicated, it is still possible
   contains mulit_nullvas after outer join.
2. If the either UniqueKey can be duplicated after join, the can get one row
   only when both side is one row AND there is no outer join.
3. Whenever the onerow UniqueKey is not a valid any more, we need to convert one
   row UniqueKey to normal unique key since we don't store exprs for one-row
   relation. get_exprs_from_uniquekeys will be used here.
===

and 3. in the v7 implementation,  the code struct is more clearer:)  

 

+    }
+    /*
+     * Calculate max_colno in subquery. In fact we can check this with
+     * list_length(sub_final_rel->reltarget->exprs), However, reltarget
+     * is not set on UPPERREL_FINAL relation, so do it this way
+     */


Should/can we use the same logic to convert an expression in the subquery into
a Var of the outer query as in convert_subquery_pathkeys().

Yes,  my previous implementation is actually wrong. and should be  fixed it in v7. 
 
If the subquery doesn't have a reltarget set, we should be able to use reltarget 
of any of its paths since all of those should match the positions across subquery 
and the outer query.

but I think it should be rel->subroot->processed_tlist rather than reltarget?  Actually I still
a bit of uneasy about  rel->subroot->processed_tlist for some DML case, which the 
processed_tlist is different and I still not figure out its impact. 


Will continue reviewing your new set of patches as time permits.

Thank you!  Actually there is no big difference between v6 and v7 regarding the
 UniqueKey part except 2 bug fix.  However v7 has some more documents, 
comments improvement and code refactor/split, which may be helpful
for review. You may try v7 next time if v8 has not come yet:) 

Best Regards
Andy Fan 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
The attached is the v8-patches.  The main improvements are based on Ashutosh's
review (reduce the SRF impact and partition level UniqueKey). I also update the
README.uniquekey based on the discussion. So anyone who don't want to go
through the long email can read the README.uniquekey first.

===
Just copy some content from the README for easy discussion. 

As for inherit table, we maintain the UnqiueKey on childrel as usual. But for
partitioned table we need to maintain 2 different kinds of UnqiueKey. 
1). UniqueKey on the parent relation 2). UniqueKey on child relation for 
partition wise query.

Example:
CREATE TABLE p (a int not null, b int not null) partition by list (a);
CREATE TABLE p0 partition of p for values in (1);
CREATE TABLE p1 partition of p for values in (2);

create unique index p0_b on p0(b);
create unique index p1_b on p1(b);

Now b is unique on partition level only, so the distinct can't be removed on
the following cases. SELECT DISTINCT b FROM p; However for query 
SELECT DISTINCT b FROM p WHERE a = 1; where only one
partition is chosen, the UniqueKey on child relation is same as the UniqueKey
on parent relation. The distinct can be removed.

Another usage of UniqueKey on partition level is it be helpful for
partition-wise join.

As for the UniqueKey on parent table level, it comes with 2 different ways.

1). the UniqueKey is also derived in Unique index, but the index must be same
in all the related children relations and the unique index must contains
Partition Key in it. Example:

CREATE UNIQUE INDEX p_ab ON p(a, b);  -- where a is the partition key.

-- Query
SELECT a, b FROM p;     -- the (a, b) is a UniqueKey of p.

 2). If the parent relation has only one childrel, the UniqueKey on childrel is
 the UniqueKey on parent as well.

The patch structure is not changed,  you can see [1] for reference. The patches is 
based on latest commit ac3a4866c0. 


Best Regards
Andy Fan

v8-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch (38K) Download Attachment
v8-0003-Refactor-existing-uniqueness-related-code-to-use-.patch (26K) Download Attachment
v8-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (6K) Download Attachment
v8-0005-If-the-group-by-clause-is-unique-and-we-have-aggr.patch (18K) Download Attachment
v8-0002-Introuduce-RelOptInfo.uniquekeys-attribute.patch (73K) Download Attachment
v8-0006-Join-removal-at-run-time-with-UniqueKey.patch (17K) Download Attachment
v8-0007-Renamed-adjust_inherited_tlist-List-tlist-AppendR.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Ashutosh Bapat-2
In reply to this post by Andy Fan
On Fri, May 8, 2020 at 7:27 AM Andy Fan <[hidden email]> wrote:

>> +            else if (inner_is_onerow)
>> +            {
>> +                /* Since rows in innerrel can't be duplicated AND if
>> innerrel is onerow,
>> +                 * the join result will be onerow also as well. Note:
>> onerow implies
>> +                 * multi_nullvals = false.
>>
>> I don't understand this comment. Why is there only one row in the other
>> relation which can join to this row?
>
>
> I guess you may miss the onerow special case if I understand correctly.
> inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
> innerrel can't be duplicated means:  t1.y = t2.pk;  so the finally result is onerow
> as well.  One of the overall query is  SELECT .. FROM t1, t2 where t2.y = t2.pk;
>
>
> I explained more about onerow in the v7 README.unqiuekey document, just copy
> it here.
For some reason this mail remained in my drafts without being sent.
Sending it now. Sorry.

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns or create one
unique key with no column as you have done but handle it more
gracefully rather than spreading it all over the place.

Also, the amount of code that these patches changes seems to be much
larger than the feature's worth arguably. But it indicates that we are
modifying/adding more code than necessary. Some of that code can be
merged into existing code which does similar things as I have pointed
out in my previous comment.

Thanks for working on the expanded scope of the initial feature you
proposed. But it makes the feature more useful, I think.

--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <[hidden email]> wrote:
On Fri, May 8, 2020 at 7:27 AM Andy Fan <[hidden email]> wrote:

>> +            else if (inner_is_onerow)
>> +            {
>> +                /* Since rows in innerrel can't be duplicated AND if
>> innerrel is onerow,
>> +                 * the join result will be onerow also as well. Note:
>> onerow implies
>> +                 * multi_nullvals = false.
>>
>> I don't understand this comment. Why is there only one row in the other
>> relation which can join to this row?
>
>
> I guess you may miss the onerow special case if I understand correctly.
> inner_is_onerow means something like "SELECT xx FROM t1 where uk = 1".
> innerrel can't be duplicated means:  t1.y = t2.pk;  so the finally result is onerow
> as well.  One of the overall query is  SELECT .. FROM t1, t2 where t2.y = t2.pk;
>
>
> I explained more about onerow in the v7 README.unqiuekey document, just copy
> it here.
For some reason this mail remained in my drafts without being sent.
Sending it now. Sorry.

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey,  later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.  
 
or create one
unique key with no column as you have done but handle it more
gracefully rather than spreading it all over the place.

I think this is what I do now, but it is possible that I spread it more than
necessary, if so, please let me know.  I maintained the README.uniquekey
carefully since v7 and improved a lot in v8, it may be a good place to check
it. 

 
Also, the amount of code that these patches changes seems to be much
larger than the feature's worth arguably. But it indicates that we are
modifying/adding more code than necessary. Some of that code can be
merged into existing code which does similar things as I have pointed
out in my previous comment.


I have reused the code select_mergejoin_clause rather than maintaining my
own copies in v8. Thanks a lot about that suggestion.  This happened mainly
because I didn't read enough code.  I will go through more to see if I have similar
issues. 
 
Thanks for working on the expanded scope of the initial feature you
proposed. But it makes the feature more useful, I think.

That's mainly because your suggestions are always insightful which makes me
willing to continue to work on it, so thank you all!

===
In fact, I was hesitated that how to reply an email when I send an new version
of the patch.  One idea is I should explain clear what is the difference between Vn
and Vn-1.  The other one is not many people read the Vn-1, so I would like to keep
the email simplified and keep the README clearly to save the reviewer's time. 
Actually there are more changes in v8 than I stated above. for example:  for the
UniqueKey on baserelation,  we will reduce the expr from the UniqueKey if the 
expr is a const.   Unique on (A, B).   query is SELECT b FROM t WHERE a = 1;   
in v7, the UniqueKey is (a, b).  In v8, the UniqueKey is (b) only.  But since most
people still not start to read it, so I add such  information to README rather than 
echo the same in email thread.  I will try more to understand how to communicate more
smooth.  But any suggestion on this part is appreciated. 

Best Regards
Andy Fan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
In reply to this post by Andy Fan
Hi Ashutosh:

All your suggestions are followed except the UNION ALL one. I replied the reason
below.  For the suggestions about partitioned table,  looks lot of cases to handle, so
I summarize/enrich your idea in README and email thread,  we can continue
to talk about that.  



+
+    foreach(lc,  unionrel->reltarget->exprs)
+    {
+        exprs = lappend(exprs, lfirst(lc));
+        colnos = lappend_int(colnos, i);
+        i++;
+    }

This should be only possible when it's not UNION ALL. We should add some assert
or protection for that.

OK, actually I called this function in generate_union_paths. which handle
UNION case only.  I will add the Assert anyway. 
 
 
Finally I found I have to add one more parameter to populate_unionrel_uniquekeys, and
the only usage of that parameter is used to Assert, so I didn't do that at last. 
 

+
+    /* Fast path */
+    if (innerrel->uniquekeys == NIL || outerrel->uniquekeys == NIL)
+        return;
+
+    outer_is_onerow = relation_is_onerow(outerrel);
+    inner_is_onerow = relation_is_onerow(innerrel);
+
+    outerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
outerrel);
+    innerrel_ukey_ctx = initililze_uniquecontext_for_joinrel(joinrel,
innerrel);
+
+    clause_list = gather_mergeable_joinclauses(joinrel, outerrel, innerrel,
+                                               restrictlist, jointype);

Something similar happens in select_mergejoin_clauses().

I didn't realized this before.  I will refactor this code accordingly if necessary
after reading that. 
 

 I reused select_mergejoin_clauses and removed the duplicated code in uniquekeys.c
in v8. 

At least from the
first reading of this patch, I get an impression that all these functions which
are going through clause lists and index lists should be merged into other
functions which go through these lists hunting for some information useful to
the optimizer. 
+
+
+    if (innerrel_keeps_unique(root, outerrel, innerrel, clause_list, false))
+    {
+        foreach(lc, innerrel_ukey_ctx)
+        {
+            UniqueKeyContext ctx = (UniqueKeyContext)lfirst(lc);
+            if (!list_is_subset(ctx->uniquekey->exprs,
joinrel->reltarget->exprs))
+            {
+                /* The UniqueKey on baserel is not useful on the joinrel */

A joining relation need not be a base rel always, it could be a join rel as
well.

good catch. 

Fixed. 
 
 

+                ctx->useful = false;
+                continue;
+            }
+            if ((jointype == JOIN_LEFT || jointype == JOIN_FULL) &&
!ctx->uniquekey->multi_nullvals)
+            {
+                /* Change the multi_nullvals to true at this case */

Need a comment explaining this. Generally, I feel, this and other functions in
this file need good comments explaining the logic esp. "why" instead of "what".
 
Exactly. 

Done in v8. 


Will continue reviewing your new set of patches as time permits.

Thank you!  Actually there is no big difference between v6 and v7 regarding the
 UniqueKey part except 2 bug fix.  However v7 has some more documents, 
comments improvement and code refactor/split, which may be helpful
for review. You may try v7 next time if v8 has not come yet:) 


v8 has come :) 

Best Regards
Andy Fan
 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
In reply to this post by Andy Fan
On Thu, 14 May 2020 at 03:48, Andy Fan <[hidden email]> wrote:

> On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <[hidden email]> wrote:
>> My impression about the one row stuff, is that there is too much
>> special casing around it. We should somehow structure the UniqueKey
>> data so that one row unique keys come naturally rather than special
>> cased. E.g every column in such a case is unique in the result so
>> create as many UniqueKeys are the number of columns
>
>
> This is the beginning state of the UniqueKey,  later David suggested
> this as an optimization[1], I buy-in the idea and later I found it mean
> more than the original one [2], so I think onerow is needed actually.

Having the "onerow" flag was not how I intended it to work.

Here's an example of how I thought it should work:

Assume t1 has UniqueKeys on {a}

SELECT DISTINCT a,b FROM t1;

Here the DISTINCT can be a no-op due to "a" being unique within t1. Or
more basically, {a} is a subset of {a,b}.

The code which does this is relation_has_uniquekeys_for(), which
contains the code:

+ if (list_is_subset(ukey->exprs, exprs))
+ return true;

In this case, ukey->exprs is {a} and exprs is {a,b}. So, if the
UniqueKey's exprs are a subset of, in this case, the DISTINCT exprs
then relation_has_uniquekeys_for() returns true. Basically
list_is_subset({a}, {a,b}), Answer: "Yes".

For the onerow stuff, if we can prove the relation returns only a
single row, e.g an aggregate without a GROUP BY, or there are
EquivalenceClasses with ec_has_const == true for each key of a unique
index, then why can't set just set the UniqueKeys to {}?  That would
mean the code to determine if we can avoid performing an explicit
DISTINCT operation would be called with list_is_subset({}, {a,b}),
which is also true, in fact, an empty set is a subset of any set. Why
is there a need to special case that fact?

In light of those thoughts, can you explain why you think we need to
keep the onerow flag?

David


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Thu, May 14, 2020 at 6:20 AM David Rowley <[hidden email]> wrote:
On Thu, 14 May 2020 at 03:48, Andy Fan <[hidden email]> wrote:
> On Wed, May 13, 2020 at 8:04 PM Ashutosh Bapat <[hidden email]> wrote:
>> My impression about the one row stuff, is that there is too much
>> special casing around it. We should somehow structure the UniqueKey
>> data so that one row unique keys come naturally rather than special
>> cased. E.g every column in such a case is unique in the result so
>> create as many UniqueKeys are the number of columns
>
>
> This is the beginning state of the UniqueKey,  later David suggested
> this as an optimization[1], I buy-in the idea and later I found it mean
> more than the original one [2], so I think onerow is needed actually.

Having the "onerow" flag was not how I intended it to work.

Thanks for the detailed explanation.  So I think we do need to handle onerow
specially, (It means more things than adding each column as a UniqueKey). 
but we don't need the onerow flag since we can tell it by ukey->exprs == NIL.  

During the developer of this feature,  I added some Asserts as double checking
for onerow and exprs.  it helps me to find some special cases. like 
SELECT FROM multirows  union SELECT  FROM multirows; where targetlist is NIL.  
(I find the above case returns onerow as well just now).  so onerow flag allows us
check this special things with more attention. Even this is not the original intention
but looks it is the one purpose now. 

However I am feeling that removing onerow flag doesn't require much of code
changes. Almost all the special cases which are needed before are still needed
after that and all the functions based on that like relation_is_onerow
/add_uniquekey_onerow is still valid, we just need change the implementation. 
so do you think we need to remove onerow flag totally? 

Best Regards
Andy Fan
 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
In reply to this post by Andy Fan


On Wed, May 13, 2020 at 11:48 PM Andy Fan <[hidden email]> wrote:

My impression about the one row stuff, is that there is too much
special casing around it. We should somehow structure the UniqueKey
data so that one row unique keys come naturally rather than special
cased. E.g every column in such a case is unique in the result so
create as many UniqueKeys are the number of columns

This is the beginning state of the UniqueKey,  later David suggested
this as an optimization[1], I buy-in the idea and later I found it mean
more than the original one [2], so I think onerow is needed actually.  


I just found I forget the links yesterday. Here is it.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
In reply to this post by David Rowley
> On Tue, Apr 14, 2020 at 09:09:31PM +1200, David Rowley wrote:
>
> The infrastructure (knowing the unique properties of a RelOptInfo), as
> provided by the patch Andy has been working on, which is based on my
> rough prototype version, I believe should be used for the skip scans
> patch as well.

Hi,

Following our agreement about making skip scan patch to use UniqueKeys
implementation from this thread I've rebased index skip scan on first
two patches from v8 series [1] (if I understand correctly those two are
introducing the concept, and others are just using it). I would like to
clarify couple of things:

* It seems populate_baserel_uniquekeys, which actually sets uniquekeys,
  is called after create_index_paths, where index skip scan already
  needs to use them. Is it possible to call it earlier?

* Do I understand correctly, that a UniqueKey would be created in a
  simplest case only when an index is unique? This makes it different
  from what was implemented for index skip scan, since there UniqueKeys
  also represents potential to use non-unique index to facilitate search
  for unique values via skipping.

[1]: https://www.postgresql.org/message-id/CAKU4AWpOM3_J-B%3DwQtCeU1TGr89MhpJBBkv2he1tAeQz6i4XNw%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
On Fri, 22 May 2020 at 07:49, Dmitry Dolgov <[hidden email]> wrote:
> * It seems populate_baserel_uniquekeys, which actually sets uniquekeys,
>   is called after create_index_paths, where index skip scan already
>   needs to use them. Is it possible to call it earlier?

Seems reasonable. I originally put it after check_index_predicates().
We need to wait until at least then so we can properly build
UniqueKeys for partial indexes.

> * Do I understand correctly, that a UniqueKey would be created in a
>   simplest case only when an index is unique? This makes it different
>   from what was implemented for index skip scan, since there UniqueKeys
>   also represents potential to use non-unique index to facilitate search
>   for unique values via skipping.

The way I picture the two working together is that the core UniqueKey
patch adds UniqueKeys to RelOptInfos to allow us to have knowledge
about what they're unique on based on the base relation's unique
indexes.

For Skipscans, that patch must expand on UniqueKeys to teach Paths
about them. I imagine we'll set some required UniqueKeys during
standard_qp_callback() and then we'll try to create some Skip Scan
paths (which are marked with UniqueKeys) if the base relation does not
already have UniqueKeys that satisfy the required UniqueKeys that were
set during standard_qp_callback().  In the future, there may be other
reasons to create Skip Scan paths for certain rels, e.g if they're on
the inner side of a SEMI/ANTI join, it might be useful to try those
when planning joins.

David


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
In reply to this post by Andy Fan
On Thu, 14 May 2020 at 14:39, Andy Fan <[hidden email]> wrote:

>
> On Thu, May 14, 2020 at 6:20 AM David Rowley <[hidden email]> wrote:
>> Having the "onerow" flag was not how I intended it to work.
>>
> Thanks for the detailed explanation.  So I think we do need to handle onerow
> specially, (It means more things than adding each column as a UniqueKey).
> but we don't need the onerow flag since we can tell it by ukey->exprs == NIL.
>
> During the developer of this feature,  I added some Asserts as double checking
> for onerow and exprs.  it helps me to find some special cases. like
> SELECT FROM multirows  union SELECT  FROM multirows; where targetlist is NIL.
> (I find the above case returns onerow as well just now).  so onerow flag allows us
> check this special things with more attention. Even this is not the original intention
> but looks it is the one purpose now.

But surely that special case should just go in
populate_unionrel_uniquekeys(). If the targetlist is empty, then add a
UniqueKey with an empty set of exprs.

> However I am feeling that removing onerow flag doesn't require much of code
> changes. Almost all the special cases which are needed before are still needed
> after that and all the functions based on that like relation_is_onerow
> /add_uniquekey_onerow is still valid, we just need change the implementation.
> so do you think we need to remove onerow flag totally?

Well, at the moment I'm not quite understanding why it's needed. If
it's not needed then we should remove it. If it turns out there is
some valid reason for it, then we should keep it.

David


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
In reply to this post by David Rowley


On Fri, May 22, 2020 at 4:40 AM David Rowley <[hidden email]> wrote:
On Fri, 22 May 2020 at 07:49, Dmitry Dolgov <[hidden email]> wrote:
> * It seems populate_baserel_uniquekeys, which actually sets uniquekeys,
>   is called after create_index_paths, where index skip scan already
>   needs to use them. Is it possible to call it earlier?

Seems reasonable. I originally put it after check_index_predicates().
We need to wait until at least then so we can properly build
UniqueKeys for partial indexes.


Looks a very valid reason,  I will add this in the next version. 
 
> * Do I understand correctly, that a UniqueKey would be created in a
>   simplest case only when an index is unique? This makes it different
>   from what was implemented for index skip scan, since there UniqueKeys
>   also represents potential to use non-unique index to facilitate search
>   for unique values via skipping.

The way I picture the two working together is that the core UniqueKey
patch adds UniqueKeys to RelOptInfos to allow us to have knowledge
about what they're unique on based on the base relation's unique
indexes. 
For Skipscans, that patch must expand on UniqueKeys to teach Paths
about them. I imagine we'll set some required UniqueKeys during
standard_qp_callback() and then we'll try to create some Skip Scan
paths (which are marked with UniqueKeys) if the base relation does not
already have UniqueKeys that satisfy the required UniqueKeys that were
set during standard_qp_callback().  In the future, there may be other
reasons to create Skip Scan paths for certain rels, e.g if they're on
the inner side of a SEMI/ANTI join, it might be useful to try those
when planning joins.


Yes,  In current implementation, we also add UniqueKey during create_xxx_paths,
xxx may be grouping/union.  after the index skipscan patch, we can do the similar
things in create_indexskip_path. 

--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
In reply to this post by Dmitry Dolgov



if I understand correctly those two are introducing the concept, and 
others are just using it

You are understand it correctly.  
 
--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
In reply to this post by David Rowley


On Fri, May 22, 2020 at 4:52 AM David Rowley <[hidden email]> wrote:
On Thu, 14 May 2020 at 14:39, Andy Fan <[hidden email]> wrote:
>
> On Thu, May 14, 2020 at 6:20 AM David Rowley <[hidden email]> wrote:
>> Having the "onerow" flag was not how I intended it to work.
>>
> Thanks for the detailed explanation.  So I think we do need to handle onerow
> specially, (It means more things than adding each column as a UniqueKey).
> but we don't need the onerow flag since we can tell it by ukey->exprs == NIL.
>
> During the developer of this feature,  I added some Asserts as double checking
> for onerow and exprs.  it helps me to find some special cases. like
> SELECT FROM multirows  union SELECT  FROM multirows; where targetlist is NIL.
> (I find the above case returns onerow as well just now).  so onerow flag allows us
> check this special things with more attention. Even this is not the original intention
> but looks it is the one purpose now.

But surely that special case should just go in
populate_unionrel_uniquekeys(). If the targetlist is empty, then add a
UniqueKey with an empty set of exprs.

This is correct on this special case.  

> However I am feeling that removing onerow flag doesn't require much of code
> changes. Almost all the special cases which are needed before are still needed
> after that and all the functions based on that like relation_is_onerow
> /add_uniquekey_onerow is still valid, we just need change the implementation.
> so do you think we need to remove onerow flag totally?

Well, at the moment I'm not quite understanding why it's needed. If
it's not needed then we should remove it. If it turns out there is
some valid reason for it, then we should keep it.

Currently I uses it to detect more special case which we can't image at first, we can 
understand it as it used to  debug/Assert purpose only.   After the mainly code is 
reviewed,  that can be removed (based on the change is tiny). 

--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
In reply to this post by David Rowley
> On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
>
> The way I picture the two working together is that the core UniqueKey
> patch adds UniqueKeys to RelOptInfos to allow us to have knowledge
> about what they're unique on based on the base relation's unique
> indexes.

Yep, I'm onboard with the idea.

> For Skipscans, that patch must expand on UniqueKeys to teach Paths
> about them.

That's already there.

> I imagine we'll set some required UniqueKeys during
> standard_qp_callback()

In standard_qp_callback, because pathkeys are computed at this point I
guess?

> and then we'll try to create some Skip Scan
> paths (which are marked with UniqueKeys) if the base relation does not
> already have UniqueKeys that satisfy the required UniqueKeys that were
> set during standard_qp_callback().

For a simple distinct query those UniqueKeys would be set based on
distinct clause. If I understand correctly, the very same is implemented
right now in create_distinct_paths, just after building all index paths,
so wouldn't it be just a duplication?

In general UniqueKeys in the skip scan patch were created from
distinctClause in build_index_paths (looks similar to what you've
described) and then based on them created index skip scan paths. So my
expectations were that the patch from this thread would work similar.


1234