A problem about partitionwise join

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

A problem about partitionwise join

Richard Guo-2
Hi All,

To generate partitionwise join, we need to make sure there exists an
equi-join condition for each pair of partition keys, which is performed
by have_partkey_equi_join(). This makes sense and works well.

But if, let's say, one certain pair of partition keys (foo.k = bar.k)
has formed an equivalence class containing consts, no join clause would
be generated for it, since we have already generated 'foo.k = const' and
'bar.k = const' and pushed them into the proper restrictions earlier.

This will make partitionwise join fail to be planned if there are
multiple partition keys and the pushed-down restrictions 'xxx = const'
fail to prune away any partitions.

Consider the examples below:

create table p (k1 int, k2 int, val int) partition by range(k1,k2);
create table p_1 partition of p for values from (1,1) to (10,100);
create table p_2 partition of p for values from (10,100) to (20,200);


If we are joining on each pair of partition keys, we can generate
partitionwise join:

# explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2;
                              QUERY PLAN
----------------------------------------------------------------------
 Append
   ->  Hash Join
         Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2))
         ->  Seq Scan on p_1 foo
         ->  Hash
               ->  Seq Scan on p_1 bar
   ->  Hash Join
         Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2))
         ->  Seq Scan on p_2 foo_1
         ->  Hash
               ->  Seq Scan on p_2 bar_1
(11 rows)


But if we add another qual 'foo.k2 = const', we will be unable to
generate partitionwise join any more, because have_partkey_equi_join()
thinks not every partition key has an equi-join condition.

# explain (costs off)
select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16;
               QUERY PLAN
-----------------------------------------
 Hash Join
   Hash Cond: (foo.k1 = bar.k1)
   ->  Append
         ->  Seq Scan on p_1 foo
               Filter: (k2 = 16)
         ->  Seq Scan on p_2 foo_1
               Filter: (k2 = 16)
   ->  Hash
         ->  Append
               ->  Seq Scan on p_1 bar
                     Filter: (k2 = 16)
               ->  Seq Scan on p_2 bar_1
                     Filter: (k2 = 16)
(13 rows)


Is this a problem?

Thanks
Richard
Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Amit Langote
Hi Richard,

On Mon, Aug 26, 2019 at 6:33 PM Richard Guo <[hidden email]> wrote:

>
> Hi All,
>
> To generate partitionwise join, we need to make sure there exists an
> equi-join condition for each pair of partition keys, which is performed
> by have_partkey_equi_join(). This makes sense and works well.
>
> But if, let's say, one certain pair of partition keys (foo.k = bar.k)
> has formed an equivalence class containing consts, no join clause would
> be generated for it, since we have already generated 'foo.k = const' and
> 'bar.k = const' and pushed them into the proper restrictions earlier.
>
> This will make partitionwise join fail to be planned if there are
> multiple partition keys and the pushed-down restrictions 'xxx = const'
> fail to prune away any partitions.
>
> Consider the examples below:
>
> create table p (k1 int, k2 int, val int) partition by range(k1,k2);
> create table p_1 partition of p for values from (1,1) to (10,100);
> create table p_2 partition of p for values from (10,100) to (20,200);
>
> If we are joining on each pair of partition keys, we can generate
> partitionwise join:
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Append
>    ->  Hash Join
>          Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2))
>          ->  Seq Scan on p_1 foo
>          ->  Hash
>                ->  Seq Scan on p_1 bar
>    ->  Hash Join
>          Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2))
>          ->  Seq Scan on p_2 foo_1
>          ->  Hash
>                ->  Seq Scan on p_2 bar_1
> (11 rows)
>
> But if we add another qual 'foo.k2 = const', we will be unable to
> generate partitionwise join any more, because have_partkey_equi_join()
> thinks not every partition key has an equi-join condition.
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16;
>                QUERY PLAN
> -----------------------------------------
>  Hash Join
>    Hash Cond: (foo.k1 = bar.k1)
>    ->  Append
>          ->  Seq Scan on p_1 foo
>                Filter: (k2 = 16)
>          ->  Seq Scan on p_2 foo_1
>                Filter: (k2 = 16)
>    ->  Hash
>          ->  Append
>                ->  Seq Scan on p_1 bar
>                      Filter: (k2 = 16)
>                ->  Seq Scan on p_2 bar_1
>                      Filter: (k2 = 16)
> (13 rows)
>
> Is this a problem?

Perhaps.  Maybe it has to do with the way have_partkey_equi_join() has
been coded.  If it was coded such that it figured out on its own that
the equivalence (foo.k2, bar.k2, ...) does exist, then that would
allow partitionwise join to occur, which I think would be OK to do.
But maybe I'm missing something.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Richard Guo-2

On Tue, Aug 27, 2019 at 8:51 AM Amit Langote <[hidden email]> wrote:
Hi Richard,

On Mon, Aug 26, 2019 at 6:33 PM Richard Guo <[hidden email]> wrote:
>
> Hi All,
>
> To generate partitionwise join, we need to make sure there exists an
> equi-join condition for each pair of partition keys, which is performed
> by have_partkey_equi_join(). This makes sense and works well.
>
> But if, let's say, one certain pair of partition keys (foo.k = bar.k)
> has formed an equivalence class containing consts, no join clause would
> be generated for it, since we have already generated 'foo.k = const' and
> 'bar.k = const' and pushed them into the proper restrictions earlier.
>
> This will make partitionwise join fail to be planned if there are
> multiple partition keys and the pushed-down restrictions 'xxx = const'
> fail to prune away any partitions.
>
> Consider the examples below:
>
> create table p (k1 int, k2 int, val int) partition by range(k1,k2);
> create table p_1 partition of p for values from (1,1) to (10,100);
> create table p_2 partition of p for values from (10,100) to (20,200);
>
> If we are joining on each pair of partition keys, we can generate
> partitionwise join:
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Append
>    ->  Hash Join
>          Hash Cond: ((foo.k1 = bar.k1) AND (foo.k2 = bar.k2))
>          ->  Seq Scan on p_1 foo
>          ->  Hash
>                ->  Seq Scan on p_1 bar
>    ->  Hash Join
>          Hash Cond: ((foo_1.k1 = bar_1.k1) AND (foo_1.k2 = bar_1.k2))
>          ->  Seq Scan on p_2 foo_1
>          ->  Hash
>                ->  Seq Scan on p_2 bar_1
> (11 rows)
>
> But if we add another qual 'foo.k2 = const', we will be unable to
> generate partitionwise join any more, because have_partkey_equi_join()
> thinks not every partition key has an equi-join condition.
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k1 = bar.k1 and foo.k2 = bar.k2 and foo.k2 = 16;
>                QUERY PLAN
> -----------------------------------------
>  Hash Join
>    Hash Cond: (foo.k1 = bar.k1)
>    ->  Append
>          ->  Seq Scan on p_1 foo
>                Filter: (k2 = 16)
>          ->  Seq Scan on p_2 foo_1
>                Filter: (k2 = 16)
>    ->  Hash
>          ->  Append
>                ->  Seq Scan on p_1 bar
>                      Filter: (k2 = 16)
>                ->  Seq Scan on p_2 bar_1
>                      Filter: (k2 = 16)
> (13 rows)
>
> Is this a problem?

Perhaps.  Maybe it has to do with the way have_partkey_equi_join() has
been coded.  If it was coded such that it figured out on its own that
the equivalence (foo.k2, bar.k2, ...) does exist, then that would
allow partitionwise join to occur, which I think would be OK to do.
But maybe I'm missing something.


This should be caused by how we deduce join clauses from equivalence
classes. ECs containing consts will not be considered so we cannot
generate (foo.k2 = bar.k2) for the query above.

In addition, when generating join clauses from equivalence classes, we
only select the joinclause with the 'best score', or the first
joinclause with a score of 3. This may make us miss some joinclause on
partition keys.

Check the query below as a more illustrative example:

create table p (k int, val int) partition by range(k);
create table p_1 partition of p for values from (1) to (10);
create table p_2 partition of p for values from (10) to (100);


If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
partitionwise join:

# explain (costs off)
select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
               QUERY PLAN
-----------------------------------------
 Append
   ->  Hash Join
         Hash Cond: (foo.k = bar.k)
         ->  Seq Scan on p_1 foo
         ->  Hash
               ->  Seq Scan on p_1 bar
                     Filter: (k = val)
   ->  Hash Join
         Hash Cond: (foo_1.k = bar_1.k)
         ->  Seq Scan on p_2 foo_1
         ->  Hash
               ->  Seq Scan on p_2 bar_1
                     Filter: (k = val)
(13 rows)


But if we exchange the order of the two quals to 'foo.k = bar.val and
foo.k = bar.k', then partitionwise join cannot be generated any more,
because we only have joinclause 'foo.k = bar.val' as it first reached
score of 3. We have missed the joinclause on the partition key although
it does exist.

# explain (costs off)
select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;
               QUERY PLAN
-----------------------------------------
 Hash Join
   Hash Cond: (foo.k = bar.val)
   ->  Append
         ->  Seq Scan on p_1 foo
         ->  Seq Scan on p_2 foo_1
   ->  Hash
         ->  Append
               ->  Seq Scan on p_1 bar
                     Filter: (val = k)
               ->  Seq Scan on p_2 bar_1
                     Filter: (val = k)
(11 rows) 

Thanks
Richard
Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Etsuro Fujita-2
Hi,

On Tue, Aug 27, 2019 at 4:57 PM Richard Guo <[hidden email]> wrote:

> Check the query below as a more illustrative example:
>
> create table p (k int, val int) partition by range(k);
> create table p_1 partition of p for values from (1) to (10);
> create table p_2 partition of p for values from (10) to (100);
>
> If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
> partitionwise join:
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
>                QUERY PLAN
> -----------------------------------------
>  Append
>    ->  Hash Join
>          Hash Cond: (foo.k = bar.k)
>          ->  Seq Scan on p_1 foo
>          ->  Hash
>                ->  Seq Scan on p_1 bar
>                      Filter: (k = val)
>    ->  Hash Join
>          Hash Cond: (foo_1.k = bar_1.k)
>          ->  Seq Scan on p_2 foo_1
>          ->  Hash
>                ->  Seq Scan on p_2 bar_1
>                      Filter: (k = val)
> (13 rows)
>
> But if we exchange the order of the two quals to 'foo.k = bar.val and
> foo.k = bar.k', then partitionwise join cannot be generated any more,
> because we only have joinclause 'foo.k = bar.val' as it first reached
> score of 3. We have missed the joinclause on the partition key although
> it does exist.
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;
>                QUERY PLAN
> -----------------------------------------
>  Hash Join
>    Hash Cond: (foo.k = bar.val)
>    ->  Append
>          ->  Seq Scan on p_1 foo
>          ->  Seq Scan on p_2 foo_1
>    ->  Hash
>          ->  Append
>                ->  Seq Scan on p_1 bar
>                      Filter: (val = k)
>                ->  Seq Scan on p_2 bar_1
>                      Filter: (val = k)
> (11 rows)

I think it would be nice if we can address this issue.

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Richard Guo-2

On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita <[hidden email]> wrote:
Hi,

On Tue, Aug 27, 2019 at 4:57 PM Richard Guo <[hidden email]> wrote:
> Check the query below as a more illustrative example:
>
> create table p (k int, val int) partition by range(k);
> create table p_1 partition of p for values from (1) to (10);
> create table p_2 partition of p for values from (10) to (100);
>
> If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
> partitionwise join:
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
>                QUERY PLAN
> -----------------------------------------
>  Append
>    ->  Hash Join
>          Hash Cond: (foo.k = bar.k)
>          ->  Seq Scan on p_1 foo
>          ->  Hash
>                ->  Seq Scan on p_1 bar
>                      Filter: (k = val)
>    ->  Hash Join
>          Hash Cond: (foo_1.k = bar_1.k)
>          ->  Seq Scan on p_2 foo_1
>          ->  Hash
>                ->  Seq Scan on p_2 bar_1
>                      Filter: (k = val)
> (13 rows)
>
> But if we exchange the order of the two quals to 'foo.k = bar.val and
> foo.k = bar.k', then partitionwise join cannot be generated any more,
> because we only have joinclause 'foo.k = bar.val' as it first reached
> score of 3. We have missed the joinclause on the partition key although
> it does exist.
>
> # explain (costs off)
> select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;
>                QUERY PLAN
> -----------------------------------------
>  Hash Join
>    Hash Cond: (foo.k = bar.val)
>    ->  Append
>          ->  Seq Scan on p_1 foo
>          ->  Seq Scan on p_2 foo_1
>    ->  Hash
>          ->  Append
>                ->  Seq Scan on p_1 bar
>                      Filter: (val = k)
>                ->  Seq Scan on p_2 bar_1
>                      Filter: (val = k)
> (11 rows)

I think it would be nice if we can address this issue.

Thank you.

Attached is a patch as an attempt to address this issue. The idea is
quite straightforward. When building partition info for joinrel, we
generate any possible EC-derived joinclauses of form 'outer_em =
inner_em', which will be used together with the original restrictlist to
check if there exists an equi-join condition for each pair of partition
keys.

Any comments are welcome!

Thanks
Richard 

v1-0001-Fix-up-partitionwise-join.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Etsuro Fujita-2
On Thu, Aug 29, 2019 at 6:45 PM Richard Guo <[hidden email]> wrote:

> On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita <[hidden email]> wrote:
>> On Tue, Aug 27, 2019 at 4:57 PM Richard Guo <[hidden email]> wrote:
>> > Check the query below as a more illustrative example:
>> >
>> > create table p (k int, val int) partition by range(k);
>> > create table p_1 partition of p for values from (1) to (10);
>> > create table p_2 partition of p for values from (10) to (100);
>> >
>> > If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
>> > partitionwise join:
>> >
>> > # explain (costs off)
>> > select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
>> >                QUERY PLAN
>> > -----------------------------------------
>> >  Append
>> >    ->  Hash Join
>> >          Hash Cond: (foo.k = bar.k)
>> >          ->  Seq Scan on p_1 foo
>> >          ->  Hash
>> >                ->  Seq Scan on p_1 bar
>> >                      Filter: (k = val)
>> >    ->  Hash Join
>> >          Hash Cond: (foo_1.k = bar_1.k)
>> >          ->  Seq Scan on p_2 foo_1
>> >          ->  Hash
>> >                ->  Seq Scan on p_2 bar_1
>> >                      Filter: (k = val)
>> > (13 rows)
>> >
>> > But if we exchange the order of the two quals to 'foo.k = bar.val and
>> > foo.k = bar.k', then partitionwise join cannot be generated any more,
>> > because we only have joinclause 'foo.k = bar.val' as it first reached
>> > score of 3. We have missed the joinclause on the partition key although
>> > it does exist.
>> >
>> > # explain (costs off)
>> > select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;
>> >                QUERY PLAN
>> > -----------------------------------------
>> >  Hash Join
>> >    Hash Cond: (foo.k = bar.val)
>> >    ->  Append
>> >          ->  Seq Scan on p_1 foo
>> >          ->  Seq Scan on p_2 foo_1
>> >    ->  Hash
>> >          ->  Append
>> >                ->  Seq Scan on p_1 bar
>> >                      Filter: (val = k)
>> >                ->  Seq Scan on p_2 bar_1
>> >                      Filter: (val = k)
>> > (11 rows)
>>
>> I think it would be nice if we can address this issue.

> Attached is a patch as an attempt to address this issue. The idea is
> quite straightforward. When building partition info for joinrel, we
> generate any possible EC-derived joinclauses of form 'outer_em =
> inner_em', which will be used together with the original restrictlist to
> check if there exists an equi-join condition for each pair of partition
> keys.

Thank you for the patch!  Will review.  Could you add the patch to the
upcoming CF so that it doesn’t get lost?

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Richard Guo-2

On Fri, Aug 30, 2019 at 2:08 AM Etsuro Fujita <[hidden email]> wrote:
On Thu, Aug 29, 2019 at 6:45 PM Richard Guo <[hidden email]> wrote:
> On Wed, Aug 28, 2019 at 6:49 PM Etsuro Fujita <[hidden email]> wrote:
>> On Tue, Aug 27, 2019 at 4:57 PM Richard Guo <[hidden email]> wrote:
>> > Check the query below as a more illustrative example:
>> >
>> > create table p (k int, val int) partition by range(k);
>> > create table p_1 partition of p for values from (1) to (10);
>> > create table p_2 partition of p for values from (10) to (100);
>> >
>> > If we use quals 'foo.k = bar.k and foo.k = bar.val', we can generate
>> > partitionwise join:
>> >
>> > # explain (costs off)
>> > select * from p as foo join p as bar on foo.k = bar.k and foo.k = bar.val;
>> >                QUERY PLAN
>> > -----------------------------------------
>> >  Append
>> >    ->  Hash Join
>> >          Hash Cond: (foo.k = bar.k)
>> >          ->  Seq Scan on p_1 foo
>> >          ->  Hash
>> >                ->  Seq Scan on p_1 bar
>> >                      Filter: (k = val)
>> >    ->  Hash Join
>> >          Hash Cond: (foo_1.k = bar_1.k)
>> >          ->  Seq Scan on p_2 foo_1
>> >          ->  Hash
>> >                ->  Seq Scan on p_2 bar_1
>> >                      Filter: (k = val)
>> > (13 rows)
>> >
>> > But if we exchange the order of the two quals to 'foo.k = bar.val and
>> > foo.k = bar.k', then partitionwise join cannot be generated any more,
>> > because we only have joinclause 'foo.k = bar.val' as it first reached
>> > score of 3. We have missed the joinclause on the partition key although
>> > it does exist.
>> >
>> > # explain (costs off)
>> > select * from p as foo join p as bar on foo.k = bar.val and foo.k = bar.k;
>> >                QUERY PLAN
>> > -----------------------------------------
>> >  Hash Join
>> >    Hash Cond: (foo.k = bar.val)
>> >    ->  Append
>> >          ->  Seq Scan on p_1 foo
>> >          ->  Seq Scan on p_2 foo_1
>> >    ->  Hash
>> >          ->  Append
>> >                ->  Seq Scan on p_1 bar
>> >                      Filter: (val = k)
>> >                ->  Seq Scan on p_2 bar_1
>> >                      Filter: (val = k)
>> > (11 rows)
>>
>> I think it would be nice if we can address this issue.

> Attached is a patch as an attempt to address this issue. The idea is
> quite straightforward. When building partition info for joinrel, we
> generate any possible EC-derived joinclauses of form 'outer_em =
> inner_em', which will be used together with the original restrictlist to
> check if there exists an equi-join condition for each pair of partition
> keys.

Thank you for the patch!  Will review.  Could you add the patch to the
upcoming CF so that it doesn’t get lost?


Thanks
Richard 
Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Etsuro Fujita-2
On Fri, Aug 30, 2019 at 12:15 PM Richard Guo <[hidden email]> wrote:
> On Fri, Aug 30, 2019 at 2:08 AM Etsuro Fujita <[hidden email]> wrote:
>> On Thu, Aug 29, 2019 at 6:45 PM Richard Guo <[hidden email]> wrote:
>> > Attached is a patch as an attempt to address this issue. The idea is
>> > quite straightforward. When building partition info for joinrel, we
>> > generate any possible EC-derived joinclauses of form 'outer_em =
>> > inner_em', which will be used together with the original restrictlist to
>> > check if there exists an equi-join condition for each pair of partition
>> > keys.

>> Could you add the patch to the
>> upcoming CF so that it doesn’t get lost?

> Added this patch: https://commitfest.postgresql.org/24/2266/

Thanks!

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Álvaro Herrera
In reply to this post by Richard Guo-2
So in this patch, the input restrictlist is modified to include the
clauses generated by generate_join_implied_equalities_for_all.  That
doesn't seem okay -- doesn't it affect downstream usage of the
restrictlist in the caller of set_joinrel_size_estimates?

I wonder if it's possible to do this by using the ECs directly in
have_partkey_equi_join instead of using them to create fake join
clauses.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: A problem about partitionwise join

Richard Guo-2
Hi Alvaro,

Thank you for reviewing this patch.

On Wed, Sep 11, 2019 at 4:48 AM Alvaro Herrera from 2ndQuadrant <[hidden email]> wrote:
So in this patch, the input restrictlist is modified to include the
clauses generated by generate_join_implied_equalities_for_all.  That
doesn't seem okay -- doesn't it affect downstream usage of the
restrictlist in the caller of set_joinrel_size_estimates?

Actually the joinclauses generated by
generate_join_implied_equalities_for_all only affects the restrictlist
used in have_partkey_equi_join to check equi-join conditions for
partition keys.  The input restrictlist would not be altered.
 

I wonder if it's possible to do this by using the ECs directly in
have_partkey_equi_join instead of using them to create fake join
clauses.

Hmm.. I thought about this option and at last figured that what we need
to do in have_partkey_equi_join with the ECs is actually the same as in
generate_join_implied_equalities_for_all. Maybe I didn't think it
correctly.

Thanks
Richard