UniqueKey on Partitioned table.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

UniqueKey on Partitioned table.

Andy Fan
Suppose we have partitioned table defined as below:

P(a, b, c, d)  partition by (a)
  p1  (a=1)
  p2  (a=2)
  p3  (a=3)

Since in PG, we can define different indexes among partitions, so each child may
have different UniqueKeys, and some of them might be invalidated in parent's
level. For example, 1). we only define unique index p1(c), so (c) would be an
uniquekey of p1 only.  so it is invalidate on appendrel level.  2). We define
unique index p_n(c) on each childrel,  so every childrel has UniqueKey
(c). However it is invalid on appendrel as well.  3). We define unique index
p_n(a, c), since a is the partition key, so (a, c) would be valid for both
child rel and parent rel.


In my v1 implementation[1] before, I maintained the child rel exactly the same as
non-partitioned table. But when calculating the UniqueKey for partitioned
table, I first introduce a global_unique_indexlist which handles the above 3
cases. The indexes for case 1 and case 2 will not be in global_unique_indexlist
but the index in case 3 will be even if they are only built in child level. After
we have build the global_unique_indexlist on appendrel, we will build the
UnqiueKey exactly same as non partitioned table. With this way,  I'm not happy
with the above method now is because 1). the global_unique_indexlist is build in
a hard way. 2). I have to totally ignored the UniqueKey on child level and
re-compute it on appendrel level. 3). The 3 cases should rarely happen in real
life, I guess.

When I re-implement the UniqueKey with EquivalenceClass, I re-think about how to
handle the above stuff. Now my preferred idea is just not handle it. When building the
uniquekey on parent rel, we just handle 2 cases. If the appendrel only have 1
child, we just copy (and modified if needed due to col-order-mismatch case) the
uniquekey.  2). Only handle the Unique index defined in top level, for this case
it would yield the below situation.

create unique index on p(a, b); --> (A, B) will be the UniqueKey of p.
create unique index on p_nn(a, b); --> (A, B) will not be the UniqueKey of p
even we create it on ALL the child rel. The result is not perfect but I think
it is practical.  Any suggestions?

The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
single relation part and may have bugs. I just attached it here for design
review only. and the not-null-attrs is just v1 which we can continue discussing on
the original thread[2].



--
Best Regards

v1-0001-Introduce-notnullattrs-field-in-RelOptInfo-to-ind.patch (6K) Download Attachment
v1-0002-UniqueKey-with-EquivalenceClass-for-single-rel-on.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

Dmitry Dolgov
> On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>
> The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
> single relation part and may have bugs. I just attached it here for design
> review only. and the not-null-attrs is just v1 which we can continue
> discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?


Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

Andy Fan


On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <[hidden email]> wrote:
> On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>
> The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
> single relation part and may have bugs. I just attached it here for design
> review only. and the not-null-attrs is just v1 which we can continue
> discussing on the original thread[2].

Thanks for the patch. After a short look through it I'm a bit confused
and wanted to clarify, now uniquekeys list could contain both Expr and
EquivalenceClass?

Yes,  That's because I don't want to create a new EquivalenceClass (which
would make the PlannerInfo->eq_classes longer) if we don't have
one , then I just used one Expr instead for this case.  However during the
test, I found some EquivalenceClass with only 1 EquivalenceMember
unexpectedly. 

--
Best Regards
Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

Ashutosh Bapat-2
On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <[hidden email]> wrote:

>
>
>
> On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <[hidden email]> wrote:
>>
>> > On Sat, Feb 20, 2021 at 10:25:59AM +0800, Andy Fan wrote:
>> >
>> > The attached is a UnqiueKey with EquivalenceClass patch, I just complete the
>> > single relation part and may have bugs. I just attached it here for design
>> > review only. and the not-null-attrs is just v1 which we can continue
>> > discussing on the original thread[2].
>>
>> Thanks for the patch. After a short look through it I'm a bit confused
>> and wanted to clarify, now uniquekeys list could contain both Expr and
>> EquivalenceClass?
>
>
> Yes,  That's because I don't want to create a new EquivalenceClass (which
> would make the PlannerInfo->eq_classes longer) if we don't have
> one , then I just used one Expr instead for this case.
> However during the
> test, I found some EquivalenceClass with only 1 EquivalenceMember
> unexpectedly.
>

Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

David Rowley
On Tue, 30 Mar 2021 at 02:27, Ashutosh Bapat
<[hidden email]> wrote:

>
> On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <[hidden email]> wrote:
> >
> > On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <[hidden email]> wrote:
> >> Thanks for the patch. After a short look through it I'm a bit confused
> >> and wanted to clarify, now uniquekeys list could contain both Expr and
> >> EquivalenceClass?
> >
> >
> > Yes,  That's because I don't want to create a new EquivalenceClass (which
> > would make the PlannerInfo->eq_classes longer) if we don't have
> > one , then I just used one Expr instead for this case.
> > However during the
> > test, I found some EquivalenceClass with only 1 EquivalenceMember
> > unexpectedly.
> >
>
> Pathkeys may induce single member ECs. Why UniqueKeys are an exception?

I doubt that it should be. get_eclass_for_sort_expr() makes
single-member ECs for sorts.  I imagine the UniqueKey stuff should
copy that... However, get_eclass_for_sort_expr() can often dominate
the planning effort in queries to partitioned tables with a large
number of partitions when the query has an ORDER BY. Perhaps Andy is
trying to sidestep that issue?

I mentioned a few things in [1] on what I think about this.

David

[1] https://www.postgresql.org/message-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@...


Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

Andy Fan


On Tue, Mar 30, 2021 at 4:16 AM David Rowley <[hidden email]> wrote:
On Tue, 30 Mar 2021 at 02:27, Ashutosh Bapat
<[hidden email]> wrote:
>
> On Sat, Mar 27, 2021 at 11:44 AM Andy Fan <[hidden email]> wrote:
> >
> > On Sat, Mar 27, 2021 at 3:07 AM Dmitry Dolgov <[hidden email]> wrote:
> >> Thanks for the patch. After a short look through it I'm a bit confused
> >> and wanted to clarify, now uniquekeys list could contain both Expr and
> >> EquivalenceClass?
> >
> >
> > Yes,  That's because I don't want to create a new EquivalenceClass (which
> > would make the PlannerInfo->eq_classes longer) if we don't have
> > one , then I just used one Expr instead for this case.
> > However during the
> > test, I found some EquivalenceClass with only 1 EquivalenceMember
> > unexpectedly.
> >
>
> Pathkeys may induce single member ECs. Why UniqueKeys are an exception?


When working with UniqueKey, I do want to make PlannerInfo.eq_classes short,
so I don't want to create a new EC for UniqueKey only.  After I realized we have
so single-member ECs, I doubt if the "Expr in UniqueKey" will be executed in real.
I still didn't get enough time to do more research about this. 

I doubt that it should be. get_eclass_for_sort_expr() makes
single-member ECs for sorts. 

Thanks for this hint.  I can check more cases like this. 
 
I imagine the UniqueKey stuff should
copy that... However, get_eclass_for_sort_expr() can often dominate
the planning effort in queries to partitioned tables with a large
number of partitions when the query has an ORDER BY. Perhaps Andy is
trying to sidestep that issue?

Yes.  a long PlannerInfo.eq_classes may make some finding slow, and in
my UniqueKey patch,  I am trying to not make it longer. 
 
I mentioned a few things in [1] on what I think about this.

David

[1] https://www.postgresql.org/message-id/CAApHDvoDMyw=hTuW-258yqNK4bhW6CpguJU_GZBh4x+rnoem3w@...

I appreciate all of the people who helped on this patch and others.  I would
like to share more of my planning.  As for the UniqueKey patch,  there are some
design decisions that need to be made.  In my mind, the order would be:  
a). How to present the notnullattrs probably in [1]  b).  How to present the element
in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
we just talked about.  c). How to maintain the UniqueKey Partitioned table in the
beginning of this thread.  As for a) & c).  I have my current proposal for discussion.
as for b) I think I need more thinking about this.  Based on the idea above, I am 
not willing to move too fast on the following issue unless the previous issue 
can be addressed.  Any feedback/suggestion about my current planning is welcome.


--
Best Regards
Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

Ashutosh Bapat-2
> b).  How to present the element
> in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
> we just talked about.
I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

Andy Fan


On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <[hidden email]> wrote:
> b).  How to present the element
> in UniqueKey.  Prue EquivalenceClasses or Mix of Expr and EquivalenceClass as
> we just talked about.
I think the reason we add ECs for sort expressions is to use
transitive relationship. The EC may start with a single member but
later in the planning that member might find partners which are all
equivalent. Result ordered by one is also ordered by the other. The
same logic applies to UniqueKey as well, isn't it. In a result if a
set of columns makes a row unique, the set of columns represented by
the other EC member should be unique. Though a key will start as a
singleton it might EC partners later and thus thus unique key will
transition to all the members. With that logic UniqueKey should use
just ECs instead of bare expressions.

TBH, I haven't thought about this too hard, but I think when we build the
UniqueKey, all the ECs have been built already.  So can you think out an
case we start with an EC with a single member at the beginning and
have more members later for UniqueKey cases? 

--
Best Regards
Reply | Threaded
Open this post in threaded view
|

Re: UniqueKey on Partitioned table.

David Rowley
On Tue, 6 Apr 2021 at 22:31, Andy Fan <[hidden email]> wrote:

> On Wed, Mar 31, 2021 at 9:12 PM Ashutosh Bapat <[hidden email]> wrote:
>> I think the reason we add ECs for sort expressions is to use
>> transitive relationship. The EC may start with a single member but
>> later in the planning that member might find partners which are all
>> equivalent. Result ordered by one is also ordered by the other. The
>> same logic applies to UniqueKey as well, isn't it. In a result if a
>> set of columns makes a row unique, the set of columns represented by
>> the other EC member should be unique. Though a key will start as a
>> singleton it might EC partners later and thus thus unique key will
>> transition to all the members. With that logic UniqueKey should use
>> just ECs instead of bare expressions.
>
>
> TBH, I haven't thought about this too hard, but I think when we build the
> UniqueKey, all the ECs have been built already.  So can you think out an
> case we start with an EC with a single member at the beginning and
> have more members later for UniqueKey cases?

I don't really know if it matters which order things happen. We still
end up with a single EC containing {a,b} whether we process ORDER BY b
or WHERE a=b first.

In any case, the reason we want PathKeys to be ECs rather than just
Exprs is to allow cases such as the following to use an index to
perform the sort.

# create table ab (a int, b int);
# create index on ab(a);
# set enable_seqscan=0;
# explain select * from ab where a=b order by b;
                             QUERY PLAN
---------------------------------------------------------------------
 Index Scan using ab_a_idx on ab  (cost=0.15..83.70 rows=11 width=8)
   Filter: (a = b)
(2 rows)

Of course, we couldn't use this index to provide pre-sorted results if
"where a=b" hadn't been there.

David