[PATCH] Keeps tracking the uniqueness with UniqueKey

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote:
>
> > On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
> > I imagine we'll set some required UniqueKeys during
> > standard_qp_callback()
>
> In standard_qp_callback, because pathkeys are computed at this point I
> guess?

Yes. In particular, we set the pathkeys for DISTINCT clauses there.

> > and then we'll try to create some Skip Scan
> > paths (which are marked with UniqueKeys) if the base relation does not
> > already have UniqueKeys that satisfy the required UniqueKeys that were
> > set during standard_qp_callback().
>
> For a simple distinct query those UniqueKeys would be set based on
> distinct clause. If I understand correctly, the very same is implemented
> right now in create_distinct_paths, just after building all index paths,
> so wouldn't it be just a duplication?

I think we need to create the skip scan paths when we create the other
paths for base relations. We shouldn't be adjusting existing index
paths during create_distinct_paths().  The last code I saw for the
skip scans patch did something like if (IsA(path, IndexScanPath)) in
create_distinct_paths(), but that's only ever going to work when the
query is to a single relation. You'll never see IndexScanPaths in the
upper planner's paths when there are joins. You'd see join type paths
instead.  It is possible to make use of skip scans for DISTINCT when
the query has joins. We'd just need to ensure the join does not
duplicate the unique rows from the skip scanned relation.

> In general UniqueKeys in the skip scan patch were created from
> distinctClause in build_index_paths (looks similar to what you've
> described) and then based on them created index skip scan paths. So my
> expectations were that the patch from this thread would work similar.

The difference will be that you'd be setting some distinct_uniquekeys
in standard_qp_callback() to explicitly request that some skip scan
paths be created for the uniquekeys, whereas the patch here just does
not bother doing DISTINCT if the upper relation already has unique
keys that state that the DISTINCT is not required. The skip scans
patch should check if the RelOptInfo for the uniquekeys set in
standard_qp_callback() are already mentioned in the RelOptInfo's
uniquekeys.  If they are then there's no point in skip scanning as the
rel is already unique for the distinct_uniquekeys.

David


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
> On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:
>
> > For a simple distinct query those UniqueKeys would be set based on
> > distinct clause. If I understand correctly, the very same is implemented
> > right now in create_distinct_paths, just after building all index paths,
> > so wouldn't it be just a duplication?
>
> I think we need to create the skip scan paths when we create the other
> paths for base relations. We shouldn't be adjusting existing index
> paths during create_distinct_paths().  The last code I saw for the
> skip scans patch did something like if (IsA(path, IndexScanPath)) in
> create_distinct_paths()

It's not the case since the late March.

> > In general UniqueKeys in the skip scan patch were created from
> > distinctClause in build_index_paths (looks similar to what you've
> > described) and then based on them created index skip scan paths. So my
> > expectations were that the patch from this thread would work similar.
>
> The difference will be that you'd be setting some distinct_uniquekeys
> in standard_qp_callback() to explicitly request that some skip scan
> paths be created for the uniquekeys, whereas the patch here just does
> not bother doing DISTINCT if the upper relation already has unique
> keys that state that the DISTINCT is not required. The skip scans
> patch should check if the RelOptInfo for the uniquekeys set in
> standard_qp_callback() are already mentioned in the RelOptInfo's
> uniquekeys.  If they are then there's no point in skip scanning as the
> rel is already unique for the distinct_uniquekeys.

It sounds like it makes semantics of UniqueKey a bit more confusing,
isn't it? At the moment it says:

     Represents the unique properties held by a RelOptInfo.

With the proposed changes it would be "unique properties, that are held"
and "unique properties, that are requested", which are partially
duplicated, but stored in some different fields. From the skip scan
patch perspective it's probably doesn't make any difference, seems like
the implementation would be almost the same, just created UniqueKeys
would be of different type. But I'm afraid potentiall future users of
UniqueKeys could be easily confused.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
On Mon, 25 May 2020 at 19:14, Dmitry Dolgov <[hidden email]> wrote:

>
> > On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:
> > The difference will be that you'd be setting some distinct_uniquekeys
> > in standard_qp_callback() to explicitly request that some skip scan
> > paths be created for the uniquekeys, whereas the patch here just does
> > not bother doing DISTINCT if the upper relation already has unique
> > keys that state that the DISTINCT is not required. The skip scans
> > patch should check if the RelOptInfo for the uniquekeys set in
> > standard_qp_callback() are already mentioned in the RelOptInfo's
> > uniquekeys.  If they are then there's no point in skip scanning as the
> > rel is already unique for the distinct_uniquekeys.
>
> It sounds like it makes semantics of UniqueKey a bit more confusing,
> isn't it? At the moment it says:
>
>      Represents the unique properties held by a RelOptInfo.
>
> With the proposed changes it would be "unique properties, that are held"
> and "unique properties, that are requested", which are partially
> duplicated, but stored in some different fields. From the skip scan
> patch perspective it's probably doesn't make any difference, seems like
> the implementation would be almost the same, just created UniqueKeys
> would be of different type. But I'm afraid potentiall future users of
> UniqueKeys could be easily confused.

If there's some comment that says UniqueKeys are for RelOptInfos, then
perhaps that comment just needs to be expanded to mention the Path
uniqueness when we add the uniquekeys field to Path.

I think the main point of basing skip scans on top of this uniquekeys
patch is to ensure it's the right thing for the job. I don't think
it's realistic to be maintaining two different sets of infrastructure
which serve a very similar purpose. It's important we make UniqueKeys
general purpose enough to support future useful forms of optimisation.
Basing skip scans on it seems like a good exercise towards that. I'm
not expecting that we need to make zero changes here to allow it to
work well with skip scans.

David


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
In reply to this post by David Rowley


On Mon, May 25, 2020 at 2:34 AM David Rowley <[hidden email]> wrote:
On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote:
>
> > On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
> > I imagine we'll set some required UniqueKeys during
> > standard_qp_callback()
>
> In standard_qp_callback, because pathkeys are computed at this point I
> guess?

Yes. In particular, we set the pathkeys for DISTINCT clauses there.


Actually I have some issues to understand from here,  then try to read index
skip scan patch to fully understand what is the requirement, but that doesn't
get it so far[1].  So what  is the "UniqueKeys" in "UniqueKeys during 
standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed
at this point” means?  I tried to think it as root->distinct_pathkeys,  however I 
didn't fully understand where root->distinct_pathkeys is used for as well.  

--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
On Fri, 5 Jun 2020 at 14:36, Andy Fan <[hidden email]> wrote:

> On Mon, May 25, 2020 at 2:34 AM David Rowley <[hidden email]> wrote:
>>
>> On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote:
>> >
>> > > On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
>> > > I imagine we'll set some required UniqueKeys during
>> > > standard_qp_callback()
>> >
>> > In standard_qp_callback, because pathkeys are computed at this point I
>> > guess?
>>
>> Yes. In particular, we set the pathkeys for DISTINCT clauses there.
>>
>
> Actually I have some issues to understand from here,  then try to read index
> skip scan patch to fully understand what is the requirement, but that doesn't
> get it so far[1].  So what  is the "UniqueKeys" in "UniqueKeys during
> standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed
> at this point” means?  I tried to think it as root->distinct_pathkeys,  however I
> didn't fully understand where root->distinct_pathkeys is used for as well.

In standard_qp_callback(), what we'll do with uniquekeys is pretty
much what we already do with pathkeys there. Basically pathkeys are
set there to have the planner attempt to produce a plan that satisfies
those pathkeys.  Notice at the end of standard_qp_callback() we set
the pathkeys according to the first upper planner operation that'll
need to make use of those pathkeys.  e.g, If there's a GROUP BY and a
DISTINCT in the query, then use the pathkeys for GROUP BY, since that
must occur before DISTINCT.  Likely uniquekeys will want to follow the
same rules there for the operations that can make use of paths with
uniquekeys, which in this case, I believe, will be the same as the
example I just mentioned for pathkeys, except we'll only be able to
support GROUP BY without any aggregate functions.

David

> [1] https://www.postgresql.org/message-id/CAKU4AWq%3DwWkAo-CDOQ5Ea6UwYvZCgb501w6iqU0rtnTT-zg6bQ%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Fri, Jun 5, 2020 at 10:57 AM David Rowley <[hidden email]> wrote:
On Fri, 5 Jun 2020 at 14:36, Andy Fan <[hidden email]> wrote:
> On Mon, May 25, 2020 at 2:34 AM David Rowley <[hidden email]> wrote:
>>
>> On Sun, 24 May 2020 at 04:14, Dmitry Dolgov <[hidden email]> wrote:
>> >
>> > > On Fri, May 22, 2020 at 08:40:17AM +1200, David Rowley wrote:
>> > > I imagine we'll set some required UniqueKeys during
>> > > standard_qp_callback()
>> >
>> > In standard_qp_callback, because pathkeys are computed at this point I
>> > guess?
>>
>> Yes. In particular, we set the pathkeys for DISTINCT clauses there.
>>
>
> Actually I have some issues to understand from here,  then try to read index
> skip scan patch to fully understand what is the requirement, but that doesn't
> get it so far[1].  So what  is the "UniqueKeys" in "UniqueKeys during
> standard_qp_callback()" and what is the "pathkeys" in "pathkeys are computed
> at this point” means?  I tried to think it as root->distinct_pathkeys,  however I
> didn't fully understand where root->distinct_pathkeys is used for as well.

In standard_qp_callback(), what we'll do with uniquekeys is pretty
much what we already do with pathkeys there. Basically pathkeys are
set there to have the planner attempt to produce a plan that satisfies
those pathkeys.  Notice at the end of standard_qp_callback() we set
the pathkeys according to the first upper planner operation that'll
need to make use of those pathkeys.  e.g, If there's a GROUP BY and a
DISTINCT in the query, then use the pathkeys for GROUP BY, since that
must occur before DISTINCT. 

Thanks for your explanation.  Looks I understand now based on your comments.
Take root->group_pathkeys for example,  the similar information also available in 
root->parse->groupClauses but we do use of root->group_pathkeys  with 
pathkeys_count_contained_in function in many places, that is mainly because 
the content between between the 2 is different some times, like the case in
pathkey_is_redundant. 

Likely uniquekeys will want to follow the
same rules there for the operations that can make use of paths with
uniquekeys, which in this case, I believe, will be the same as the
example I just mentioned for pathkeys, except we'll only be able to
support GROUP BY without any aggregate functions.


All the places I want to use UniqueKey so far (like distinct, group by and others)
have an input_relation (RelOptInfo),  and the UniqueKey information can be get
there.  at the same time,  all the pathkey in PlannerInfo is used for Upper planner
but UniqueKey may be used in current planner some time, like reduce_semianti_joins/
remove_useless_join, I am not sure if we must maintain uniquekey in PlannerInfo. 

--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
In reply to this post by David Rowley
> On Fri, Jun 05, 2020 at 12:26:15PM +1200, David Rowley wrote:
> On Mon, 25 May 2020 at 19:14, Dmitry Dolgov <[hidden email]> wrote:
> >
> > > On Mon, May 25, 2020 at 06:34:30AM +1200, David Rowley wrote:
> > > The difference will be that you'd be setting some distinct_uniquekeys
> > > in standard_qp_callback() to explicitly request that some skip scan
> > > paths be created for the uniquekeys, whereas the patch here just does
> > > not bother doing DISTINCT if the upper relation already has unique
> > > keys that state that the DISTINCT is not required. The skip scans
> > > patch should check if the RelOptInfo for the uniquekeys set in
> > > standard_qp_callback() are already mentioned in the RelOptInfo's
> > > uniquekeys.  If they are then there's no point in skip scanning as the
> > > rel is already unique for the distinct_uniquekeys.
> >
> > It sounds like it makes semantics of UniqueKey a bit more confusing,
> > isn't it? At the moment it says:
> >
> >      Represents the unique properties held by a RelOptInfo.
> >
> > With the proposed changes it would be "unique properties, that are held"
> > and "unique properties, that are requested", which are partially
> > duplicated, but stored in some different fields. From the skip scan
> > patch perspective it's probably doesn't make any difference, seems like
> > the implementation would be almost the same, just created UniqueKeys
> > would be of different type. But I'm afraid potentiall future users of
> > UniqueKeys could be easily confused.
>
> If there's some comment that says UniqueKeys are for RelOptInfos, then
> perhaps that comment just needs to be expanded to mention the Path
> uniqueness when we add the uniquekeys field to Path.

My concerns are more about having two different sets of distinct
uniquekeys:

* one prepared in standard_qp_callback for skip scan (I guess those
  should be added to PlannerInfo?)

* one in create_distinct_paths as per current implementation

with what seems to be similar content.

> I think the main point of basing skip scans on top of this uniquekeys
> patch is to ensure it's the right thing for the job. I don't think
> it's realistic to be maintaining two different sets of infrastructure
> which serve a very similar purpose. It's important we make UniqueKeys
> general purpose enough to support future useful forms of optimisation.
> Basing skip scans on it seems like a good exercise towards that. I'm
> not expecting that we need to make zero changes here to allow it to
> work well with skip scans.

Sure, no one suggests to have two ways of saying "this thing is unique".
I'm just trying to figure out how to make skip scan and uniquekeys play
together without having rough edges.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

David Rowley
On Sat, 6 Jun 2020 at 21:15, Dmitry Dolgov <[hidden email]> wrote:
> My concerns are more about having two different sets of distinct
> uniquekeys:
>
> * one prepared in standard_qp_callback for skip scan (I guess those
>   should be added to PlannerInfo?)

Yes. Those must be set so that we know if and what we should try to
create Skip Scan Index paths for. Just like we'll create index paths
for PlannerInfo.query_pathkeys.

> * one in create_distinct_paths as per current implementation
>
> with what seems to be similar content.

I think we need to have UniqueKeys in RelOptInfo so we can describe
what a relation is unique by.  There's no point for example in
creating skip scan paths for a relation that's already unique on
whatever we might try to skip scan on. e.g someone does:

SELECT DISTINCT unique_and_indexed_column FROM tab;

Since there's a unique index on unique_and_indexed_column then we
needn't try to create a skipscan path for it.

However, the advantages of having UniqueKeys on the RelOptInfo goes a
little deeper than that.  We can make use of it anywhere where we
currently do relation_has_unique_index_for() for. Plus we get what
Andy wants and can skip useless DISTINCT operations when the result is
already unique on the distinct clause.  Sure we could carry all the
relation's unique properties around in Paths, but that's not the right
place. It's logically a property of the relation, not the path
specifically.  RelOptInfo is a good place to store the properties of
relations.

The idea of the meaning of uniquekeys within a path is that the path
is specifically making those keys unique.  We're not duplicating the
RelOptInfo's uniquekeys there.

If we have a table like:

CREATE TABLE tab (
   a INT PRIMARY KEY,
   b INT NOT NULL
);

CREATE INDEX tab_b_idx ON tab (b);

Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
index scan paths uniquekey properties set to NULL, but the skipscan
index path uniquekeys for tab_b_idx set to {b}.  Then when we go
create the distinct paths Andy's work will see that there's no
RelOptInfo uniquekeys for the distinct clause, but the skip scan work
will loop over the unique_pathlist and find that we have a skipscan
path with the required uniquekeys, a.k.a {b}.

Does that make sense?

David


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
> On Sun, Jun 07, 2020 at 06:51:22PM +1200, David Rowley wrote:
>
> > * one in create_distinct_paths as per current implementation
> >
> > with what seems to be similar content.
>
> I think we need to have UniqueKeys in RelOptInfo so we can describe
> what a relation is unique by.  There's no point for example in
> creating skip scan paths for a relation that's already unique on
> whatever we might try to skip scan on. e.g someone does:
>
> SELECT DISTINCT unique_and_indexed_column FROM tab;
>
> Since there's a unique index on unique_and_indexed_column then we
> needn't try to create a skipscan path for it.
>
> However, the advantages of having UniqueKeys on the RelOptInfo goes a
> little deeper than that.  We can make use of it anywhere where we
> currently do relation_has_unique_index_for() for. Plus we get what
> Andy wants and can skip useless DISTINCT operations when the result is
> already unique on the distinct clause.  Sure we could carry all the
> relation's unique properties around in Paths, but that's not the right
> place. It's logically a property of the relation, not the path
> specifically.  RelOptInfo is a good place to store the properties of
> relations.
>
> The idea of the meaning of uniquekeys within a path is that the path
> is specifically making those keys unique.  We're not duplicating the
> RelOptInfo's uniquekeys there.
>
> If we have a table like:
>
> CREATE TABLE tab (
>    a INT PRIMARY KEY,
>    b INT NOT NULL
> );
>
> CREATE INDEX tab_b_idx ON tab (b);
>
> Then I'd expect a query such as: SELECT DISTINCT b FROM tab; to have
> the uniquekeys for tab's RelOptInfo set to {a}, and the seqscan and
> index scan paths uniquekey properties set to NULL, but the skipscan
> index path uniquekeys for tab_b_idx set to {b}.  Then when we go
> create the distinct paths Andy's work will see that there's no
> RelOptInfo uniquekeys for the distinct clause, but the skip scan work
> will loop over the unique_pathlist and find that we have a skipscan
> path with the required uniquekeys, a.k.a {b}.
>
> Does that make sense?

Yes, from this point of view it makes sense. I've already posted the
first version of index skip scan based on this implementation [1]. There
could be rought edges, but overall I hope we're on the same page.

[1]: https://www.postgresql.org/message-id/flat/20200609102247.jdlatmfyeecg52fi%40localhost


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
I just did another self-review about this patch and took some suggestions based
on the discussion above.   The attached is the v9 version. When you check the
uniquekey patch, README.uniquekey should be a good place to start with.

Main changes in v9 includes:

1.  called populate_baserel_uniquekeys after check_index_predicates.
2.  removed the UniqueKey->onerow flag since we can tell it by exprs == NIL.
3.  expression index code improvement.
4.  code & comments refactoring. 

As for the Index Skip Scan,  I still have not merged the changes in the Index
Skip Scan patch[1].   We may need some addition for that, but probably not 
need to modify the existing code.  After we can finalize it, we can add it in 
that patch. I will keep a close eye on it as well. 

--
Best Regards
Andy Fan

v9-0005-Treat-the-input-as-sorted-if-the-group-by-clause-.patch (18K) Download Attachment
v9-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (6K) Download Attachment
v9-0003-Refactoring-existing-uniqueness-related-code-to-u.patch (26K) Download Attachment
v9-0004-Remove-distinct-node-AggNode-if-the-input-is-uniq.patch (35K) Download Attachment
v9-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-stru.patch (80K) Download Attachment
v9-0006-Join-removal-at-run-time-with-UniqueKey.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
Fixed a test case in v10. 

--
Best Regards
Andy Fan

v10-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-str.patch (80K) Download Attachment
v10-0005-Treat-the-input-as-sorted-if-the-group-by-clause.patch (18K) Download Attachment
v10-0003-Refactoring-existing-uniqueness-related-code-to-.patch (26K) Download Attachment
v10-0004-Remove-distinct-node-AggNode-if-the-input-is-uni.patch (35K) Download Attachment
v10-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (6K) Download Attachment
v10-0006-Join-removal-at-run-time-with-UniqueKey.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH] Keeps tracking the uniqueness with UniqueKey

Floris Van Nee

Hi Andy,

 

A small thing I found:

 

+static List *

+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,

+                                                                                                List *const_exprs,

+                                                                                                List *const_expr_opfamilies,

+                                                                                                Bitmapset *used_varattrs,

+                                                                                                bool *useful,

+                                                                                                bool *multi_nullvals)

+             indexpr_item = list_head(unique_index->indexprs);

+             for(c = 0; c < unique_index->ncolumns; c++)

+             {

 

I believe the for loop must be over unique_index->nkeycolumns, rather than columns. It shouldn’t include the extra non-key columns. This can currently lead to invalid memory accesses as well a few lines later when it does an array access of unique_index->opfamily[c] – this array only has nkeycolumns entries.

 

-Floris

 

 

From: Andy Fan <[hidden email]>
Sent: Sunday 19 July 2020 5:03 AM
To: Dmitry Dolgov <[hidden email]>
Cc: David Rowley <[hidden email]>; PostgreSQL Hackers <[hidden email]>; Tom Lane <[hidden email]>; Ashutosh Bapat <[hidden email]>; [hidden email]
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey [External]

 

Fixed a test case in v10. 

 

--

Best Regards

Andy Fan

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan
Hi Floris:

On Thu, Jul 23, 2020 at 3:22 AM Floris Van Nee <[hidden email]> wrote:

Hi Andy,

 

A small thing I found:

 

+static List *

+get_exprs_from_uniqueindex(IndexOptInfo *unique_index,

+                                                                                                List *const_exprs,

+                                                                                                List *const_expr_opfamilies,

+                                                                                                Bitmapset *used_varattrs,

+                                                                                                bool *useful,

+                                                                                                bool *multi_nullvals)

+             indexpr_item = list_head(unique_index->indexprs);

+             for(c = 0; c < unique_index->ncolumns; c++)

+             {

 

I believe the for loop must be over unique_index->nkeycolumns, rather than columns. It shouldn’t include the extra non-key columns. This can currently lead to invalid memory accesses as well a few lines later when it does an array access of unique_index->opfamily[c] – this array only has nkeycolumns entries.


You are correct, I would include this in the next version patch, Thank you
for this checking!

--
Andy Fan
Best Regards

 

 

From: Andy Fan <[hidden email]>
Sent: Sunday 19 July 2020 5:03 AM
To: Dmitry Dolgov <[hidden email]>
Cc: David Rowley <[hidden email]>; PostgreSQL Hackers <[hidden email]>; Tom Lane <[hidden email]>; Ashutosh Bapat <[hidden email]>; [hidden email]
Subject: Re: [PATCH] Keeps tracking the uniqueness with UniqueKey [External]

 

Fixed a test case in v10. 

 

--

Best Regards

Andy Fan



--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Michael Paquier-2
On Tue, Aug 04, 2020 at 06:59:50AM +0800, Andy Fan wrote:
> You are correct, I would include this in the next version patch, Thank you
> for this checking!

Regression tests are failing with this patch set applied.  The CF bot
says so, and I can reproduce that locally as well.  Could you look at
that please?  I have switched the patch to "waiting on author".
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Mon, Sep 7, 2020 at 3:22 PM Michael Paquier <[hidden email]> wrote:
On Tue, Aug 04, 2020 at 06:59:50AM +0800, Andy Fan wrote:
> You are correct, I would include this in the next version patch, Thank you
> for this checking!

Regression tests are failing with this patch set applied.  The CF bot
says so, and I can reproduce that locally as well.  Could you look at
that please?  I have switched the patch to "waiting on author".
--
Michael
 
Thank you Michael for checking it, I can reproduce the same locally after 
rebasing to the latest master. The attached v11 has fixed it and includes 
the fix Floris found.

The status of this patch is we are still in discussion about which data type should
UniqueKey->expr use.  Both David [1] and I [2] shared some thinking about
EquivalenceClasses, but neither of us have decided on it. So I still didn't change
anything about that  now.   I can change it once we have decided on it.

--
Best Regards
Andy Fan

v11-0002-Introduce-UniqueKey-attributes-on-RelOptInfo-str.patch (80K) Download Attachment
v11-0005-Treat-the-input-as-sorted-if-the-group-by-clause.patch (18K) Download Attachment
v11-0003-Refactoring-existing-uniqueness-related-code-to-.patch (26K) Download Attachment
v11-0001-Introduce-RelOptInfo-notnullattrs-attribute.patch (6K) Download Attachment
v11-0004-Remove-distinct-node-AggNode-if-the-input-is-uni.patch (35K) Download Attachment
v11-0006-Join-removal-at-run-time-with-UniqueKey.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
> On Wed, Sep 09, 2020 at 07:51:12AM +0800, Andy Fan wrote:
>
> Thank you Michael for checking it, I can reproduce the same locally after
> rebasing to the latest master. The attached v11 has fixed it and includes
> the fix Floris found.
>
> The status of this patch is we are still in discussion about which data
> type should
> UniqueKey->expr use.  Both David [1] and I [2] shared some thinking about
> EquivalenceClasses, but neither of us have decided on it. So I still didn't
> change
> anything about that  now.   I can change it once we have decided on it.
>
> [1]
> https://www.postgresql.org/message-id/CAApHDvoDMyw%3DhTuW-258yqNK4bhW6CpguJU_GZBh4x%2Brnoem3w%40mail.gmail.com
>
> [2]
> https://www.postgresql.org/message-id/CAKU4AWqy3Uv67%3DPR8RXG6LVoO-cMEwfW_LMwTxHdGrnu%2Bcf%2BdA%40mail.gmail.com

Hi,

In the Index Skip Scan thread Peter mentioned couple of issues that I
believe need to be addressed here. In fact one about valgrind errors was
already fixed as far as I see (nkeycolumns instead of ncolumns), another
one was:

/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:
In function ‘populate_baserel_uniquekeys’:
/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13:
warning: ‘expr’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  797 |   else if (!list_member(unique_index->rel->reltarget->exprs, expr))
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Other than that I wanted to ask what are the plans to proceed with this
patch? It's been a while since the question was raised in which format
to keep unique key expressions, and as far as I can see no detailed
suggestions or patch changes were proposed as a follow up. Obviously I
would love to see the first two preparation patches committed to avoid
dependencies between patches, and want to suggest an incremental
approach with simple format for start (what we have right now) with the
idea how to extend it in the future to cover more cases.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Wed, Oct 7, 2020 at 9:55 PM Dmitry Dolgov <[hidden email]> wrote:
> On Wed, Sep 09, 2020 at 07:51:12AM +0800, Andy Fan wrote:
>
> Thank you Michael for checking it, I can reproduce the same locally after
> rebasing to the latest master. The attached v11 has fixed it and includes
> the fix Floris found.
>
> The status of this patch is we are still in discussion about which data
> type should
> UniqueKey->expr use.  Both David [1] and I [2] shared some thinking about
> EquivalenceClasses, but neither of us have decided on it. So I still didn't
> change
> anything about that  now.   I can change it once we have decided on it.
>
> [1]
> https://www.postgresql.org/message-id/CAApHDvoDMyw%3DhTuW-258yqNK4bhW6CpguJU_GZBh4x%2Brnoem3w%40mail.gmail.com
>
> [2]
> https://www.postgresql.org/message-id/CAKU4AWqy3Uv67%3DPR8RXG6LVoO-cMEwfW_LMwTxHdGrnu%2Bcf%2BdA%40mail.gmail.com

Hi,

In the Index Skip Scan thread Peter mentioned couple of issues that I
believe need to be addressed here. In fact one about valgrind errors was
already fixed as far as I see (nkeycolumns instead of ncolumns), another
one was:

/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:
In function ‘populate_baserel_uniquekeys’:
/code/postgresql/patch/build/../source/src/backend/optimizer/path/uniquekeys.c:797:13:
warning: ‘expr’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  797 |   else if (!list_member(unique_index->rel->reltarget->exprs, expr))
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


I can fix this warning in the next version,  thanks for reporting it.  It can be
fixed like below or just adjust the if-elseif-else pattern.

--- a/src/backend/optimizer/path/uniquekeys.c
+++ b/src/backend/optimizer/path/uniquekeys.c
@@ -760,6 +760,7 @@ get_exprs_from_uniqueindex(IndexOptInfo *unique_index,
                {
                        /* Index on system column is not supported */
                        Assert(false);
+                       expr = NULL; /* make compiler happy */
                }

 
Other than that I wanted to ask what are the plans to proceed with this
patch? It's been a while since the question was raised in which format
to keep unique key expressions, and as far as I can see no detailed
suggestions or patch changes were proposed as a follow up. Obviously I
would love to see the first two preparation patches committed to avoid
dependencies between patches, and want to suggest an incremental
approach with simple format for start (what we have right now) with the
idea how to extend it in the future to cover more cases.

I think the hardest part of this series is commit 2,  it probably needs lots of
dedicated time to review which would be the hardest part for the reviewers.
I don't have a good suggestion, however. 

--
Best Regards
Andy Fan
Reply | Threaded
Open this post in threaded view
|

RE: [PATCH] Keeps tracking the uniqueness with UniqueKey

Hou, Zhijie
Hi

I have a look over this patch and find some typos in 0002.

1.Some typos about unique:
There are some spelling mistakes about "unique" in code comments and README.
Such as: "+However we define the UnqiueKey as below."

2.function name about initililze_uniquecontext_for_joinrel:
May be it should be initialize_ uniquecontext_for_joinrel.

3.some typos in comment:
+ * baserelation's basicrestrictinfo. so it must be in ON clauses.

I think it shoule be " basicrestrictinfo " => "baserestrictinfo".


Besides, I think list_copy can be used to simplify the following code.
(But It seems the type of expr is still in discussion, so this may has no impact )
+ List *exprs = NIL;
...
+ foreach(lc, unionrel->reltarget->exprs)
+ {
+ exprs = lappend(exprs, lfirst(lc));
+ }

Best regards,


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Dmitry Dolgov
In reply to this post by Andy Fan
> On Thu, Oct 08, 2020 at 09:34:51AM +0800, Andy Fan wrote:
>
> > Other than that I wanted to ask what are the plans to proceed with this
> > patch? It's been a while since the question was raised in which format
> > to keep unique key expressions, and as far as I can see no detailed
> > suggestions or patch changes were proposed as a follow up. Obviously I
> > would love to see the first two preparation patches committed to avoid
> > dependencies between patches, and want to suggest an incremental
> > approach with simple format for start (what we have right now) with the
> > idea how to extend it in the future to cover more cases.
> >
>
> I think the hardest part of this series is commit 2,  it probably needs
> lots of
> dedicated time to review which would be the hardest part for the reviewers.
> I don't have a good suggestion, however.

Sure, and I would review the patch as well. But as far as I understand
the main issue is "how to store uniquekey expressions", and as long as
it is not decided, no additional review will move the patch forward I
guess.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey

Andy Fan


On Thu, Oct 8, 2020 at 6:39 PM Dmitry Dolgov <[hidden email]> wrote:
> On Thu, Oct 08, 2020 at 09:34:51AM +0800, Andy Fan wrote:
>
> > Other than that I wanted to ask what are the plans to proceed with this
> > patch? It's been a while since the question was raised in which format
> > to keep unique key expressions, and as far as I can see no detailed
> > suggestions or patch changes were proposed as a follow up. Obviously I
> > would love to see the first two preparation patches committed to avoid
> > dependencies between patches, and want to suggest an incremental
> > approach with simple format for start (what we have right now) with the
> > idea how to extend it in the future to cover more cases.
> >
>
> I think the hardest part of this series is commit 2,  it probably needs
> lots of
> dedicated time to review which would be the hardest part for the reviewers.
> I don't have a good suggestion, however.

Sure, and I would review the patch as well.

Thank you very much!
 
But as far as I understand
the main issue is "how to store uniquekey expressions", and as long as
it is not decided, no additional review will move the patch forward I
guess.

I don't think so:)  The patch may have other issues as well.  For example, 
logic error or duplicated code or cases needing improvement and so on. 

--
Best Regards
Andy Fan
1234