Strange runtime partition pruning behaviour with 11.4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Strange runtime partition pruning behaviour with 11.4

Thomas Kellerer
I stumbled across this question on SO: https://stackoverflow.com/questions/56517852

Disregarding the part about Postgres 9.3, the example for Postgres 11 looks a bit confusing.

There is a script to setup test data in that question:

==== start of script ====

    create table foo (
        foo_id integer not null,
        foo_name varchar(10),
        constraint foo_pkey primary key (foo_id)
    );      

    insert into foo
      (foo_id, foo_name)
    values
      (1, 'eeny'),
      (2, 'meeny'),
      (3, 'miny'),
      (4, 'moe'),
      (5, 'tiger'),
      (6, 'toe');

    create table foo_bar_baz (
        foo_id integer not null,
        bar_id integer not null,
        baz    integer not null,
        constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz),
        constraint foo_bar_baz_fkey1 foreign key (foo_id)
            references foo (foo_id)
    ) partition by range (foo_id)
    ;

    create table if not exists foo_bar_baz_0 partition of foo_bar_baz for values from (0) to (1);
    create table if not exists foo_bar_baz_1 partition of foo_bar_baz for values from (1) to (2);
    create table if not exists foo_bar_baz_2 partition of foo_bar_baz for values from (2) to (3);
    create table if not exists foo_bar_baz_3 partition of foo_bar_baz for values from (3) to (4);
    create table if not exists foo_bar_baz_4 partition of foo_bar_baz for values from (4) to (5);
    create table if not exists foo_bar_baz_5 partition of foo_bar_baz for values from (5) to (6);

    with foos_and_bars as (
        select ((random() * 4) + 1)::int as foo_id, bar_id::int
        from generate_series(0, 1499) as t(bar_id)
    ), bazzes as (
        select baz::int
        from generate_series(1, 1500) as t(baz)
    )
    insert into foo_bar_baz (foo_id, bar_id, baz)
    select foo_id, bar_id, baz
    from bazzes as bz
      join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0;

==== end of script ====

I see the some strange behaviour similar to to what is reported in the comments to that question:

When I run the test query immediately after populating the tables with the sample data:

    explain analyze
    select count(*)
    from foo_bar_baz as fbb
      join foo on fbb.foo_id = foo.foo_id
    where foo.foo_name = 'eeny'

I do see an "Index Only Scan .... (never executed)" in the plan for the irrelevant partitions:

  https://explain.depesz.com/s/AqlE

However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres chooses to do a "Parallel Seq Scan" for each partition:

  https://explain.depesz.com/s/WwxE

Why does updating the statistics mess up (runtime) partition pruning?


I played around with random_page_cost and that didn't change anything.
I tried to create extended statistics on "foo(id, name)" so that the planner would no, that there is only one name per id. No change.

I saw the above behaviour when running this on Windows 10 (my Laptop) or CentOS 7 (a test environment on a VM)

On the CentOS server default_statistics_target is set to 100, on my laptop it is set to 1000

In both cases the Postgres version was 11.4

Any ideas?

Thomas


Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Michaeldba@sqlexec.com
I too am a bit perplexed by why runtime partition pruning does not seem
to work with this example.  Anybody got any ideas of this?

Regards,
Michael Vitale

Thomas Kellerer wrote on 8/2/2019 9:58 AM:

> I stumbled across this question on SO: https://stackoverflow.com/questions/56517852
>
> Disregarding the part about Postgres 9.3, the example for Postgres 11 looks a bit confusing.
>
> There is a script to setup test data in that question:
>
> ==== start of script ====
>
>      create table foo (
>          foo_id integer not null,
>          foo_name varchar(10),
>          constraint foo_pkey primary key (foo_id)
>      );
>
>      insert into foo
>        (foo_id, foo_name)
>      values
>        (1, 'eeny'),
>        (2, 'meeny'),
>        (3, 'miny'),
>        (4, 'moe'),
>        (5, 'tiger'),
>        (6, 'toe');
>
>      create table foo_bar_baz (
>          foo_id integer not null,
>          bar_id integer not null,
>          baz    integer not null,
>          constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz),
>          constraint foo_bar_baz_fkey1 foreign key (foo_id)
>              references foo (foo_id)
>      ) partition by range (foo_id)
>      ;
>
>      create table if not exists foo_bar_baz_0 partition of foo_bar_baz for values from (0) to (1);
>      create table if not exists foo_bar_baz_1 partition of foo_bar_baz for values from (1) to (2);
>      create table if not exists foo_bar_baz_2 partition of foo_bar_baz for values from (2) to (3);
>      create table if not exists foo_bar_baz_3 partition of foo_bar_baz for values from (3) to (4);
>      create table if not exists foo_bar_baz_4 partition of foo_bar_baz for values from (4) to (5);
>      create table if not exists foo_bar_baz_5 partition of foo_bar_baz for values from (5) to (6);
>
>      with foos_and_bars as (
>          select ((random() * 4) + 1)::int as foo_id, bar_id::int
>          from generate_series(0, 1499) as t(bar_id)
>      ), bazzes as (
>          select baz::int
>          from generate_series(1, 1500) as t(baz)
>      )
>      insert into foo_bar_baz (foo_id, bar_id, baz)
>      select foo_id, bar_id, baz
>      from bazzes as bz
>        join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0;
>
> ==== end of script ====
>
> I see the some strange behaviour similar to to what is reported in the comments to that question:
>
> When I run the test query immediately after populating the tables with the sample data:
>
>      explain analyze
>      select count(*)
>      from foo_bar_baz as fbb
>        join foo on fbb.foo_id = foo.foo_id
>      where foo.foo_name = 'eeny'
>
> I do see an "Index Only Scan .... (never executed)" in the plan for the irrelevant partitions:
>
>    https://explain.depesz.com/s/AqlE
>
> However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres chooses to do a "Parallel Seq Scan" for each partition:
>
>    https://explain.depesz.com/s/WwxE
>
> Why does updating the statistics mess up (runtime) partition pruning?
>
>
> I played around with random_page_cost and that didn't change anything.
> I tried to create extended statistics on "foo(id, name)" so that the planner would no, that there is only one name per id. No change.
>
> I saw the above behaviour when running this on Windows 10 (my Laptop) or CentOS 7 (a test environment on a VM)
>
> On the CentOS server default_statistics_target is set to 100, on my laptop it is set to 1000
>
> In both cases the Postgres version was 11.4
>
> Any ideas?
>
> Thomas
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Andreas Kretschmer-3
Hi,


Am 03.08.19 um 15:16 schrieb MichaelDBA:
> I too am a bit perplexed by why runtime partition pruning does not
> seem to work with this example.  Anybody got any ideas of this?


please don't top-posting.

it's posible to rewrite the query to:


test=# explain analyse select count(*) from foo_bar_baz as fbb where
foo_id = (select foo_id from foo where foo_name = 'eeny');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
  Finalize Aggregate  (cost=15880.63..15880.64 rows=1 width=8) (actual
time=48.447..48.448 rows=1 loops=1)
    InitPlan 1 (returns $0)
      ->  Seq Scan on foo  (cost=0.00..24.50 rows=6 width=4) (actual
time=0.243..0.246 rows=1 loops=1)
            Filter: ((foo_name)::text = 'eeny'::text)
            Rows Removed by Filter: 5
    ->  Gather  (cost=15855.92..15856.13 rows=2 width=8) (actual
time=48.376..51.468 rows=3 loops=1)
          Workers Planned: 2
          Params Evaluated: $0
          Workers Launched: 2
          ->  Partial Aggregate  (cost=14855.92..14855.93 rows=1
width=8) (actual time=42.600..42.600 rows=1 loops=3)
                ->  Parallel Append  (cost=0.00..13883.01 rows=389162
width=0) (actual time=0.139..34.914 rows=83500 loops=3)
                      ->  Parallel Bitmap Heap Scan on foo_bar_baz_0
fbb  (cost=4.23..14.73 rows=6 width=0) (never executed)
                            Recheck Cond: (foo_id = $0)
                            ->  Bitmap Index Scan on foo_bar_baz_0_pkey 
(cost=0.00..4.23 rows=10 width=0) (never executed)
                                  Index Cond: (foo_id = $0)
                      ->  Parallel Seq Scan on foo_bar_baz_2 fbb_2 
(cost=0.00..3865.72 rows=178218 width=0) (never executed)
                            Filter: (foo_id = $0)
                      ->  Parallel Seq Scan on foo_bar_baz_1 fbb_1 
(cost=0.00..3195.62 rows=147250 width=0) (actual time=0.129..24.735
rows=83500 loops=3)
                            Filter: (foo_id = $0)
                      ->  Parallel Seq Scan on foo_bar_baz_3 fbb_3 
(cost=0.00..2334.49 rows=107559 width=0) (never executed)
                            Filter: (foo_id = $0)
                      ->  Parallel Seq Scan on foo_bar_baz_4 fbb_4 
(cost=0.00..1860.95 rows=85756 width=0) (never executed)
                            Filter: (foo_id = $0)
                      ->  Parallel Seq Scan on foo_bar_baz_5 fbb_5 
(cost=0.00..665.69 rows=30615 width=0) (never executed)
                            Filter: (foo_id = $0)
  Planning Time: 12.648 ms
  Execution Time: 52.621 ms
(27 rows)

test=*#


I know, that's not a solution, but a workaround. :-(

(pg 12beta2 and also with PostgreSQL 11.4 (2ndQPG 11.4r1.6.7))



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Thomas Kellerer
> it's posible to rewrite the query to:
>
>
> test=# explain analyse select count(*) from foo_bar_baz as fbb where foo_id = (select foo_id from foo where foo_name = 'eeny');
>
> I know, that's not a solution, but a workaround. :-(

Yes, I discovered that as well.

But I'm more confused (or concerned) by the fact that the (original) query works correctly *without* statistics.

Thomas





Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Andreas Kretschmer-3


Am 03.08.19 um 16:06 schrieb Thomas Kellerer:

>> it's posible to rewrite the query to:
>>
>>
>> test=# explain analyse select count(*) from foo_bar_baz as fbb where
>> foo_id = (select foo_id from foo where foo_name = 'eeny');
>>
>> I know, that's not a solution, but a workaround. :-(
>
> Yes, I discovered that as well.
>
> But I'm more confused (or concerned) by the fact that the (original)
> query works correctly *without* statistics.
>
> Thomas
>
>

can't reproduce that :-(  (PG 11.4 Community)

(all in a file and executed the explain immediately)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Tom Lane-2
Andreas Kretschmer <[hidden email]> writes:
> Am 03.08.19 um 16:06 schrieb Thomas Kellerer:
>> But I'm more confused (or concerned) by the fact that the (original)
>> query works correctly *without* statistics.

> can't reproduce that :-(  (PG 11.4 Community)

Yeah, I get the same plan with or without ANALYZE, too.  In this example,
having the ANALYZE stats barely moves the rowcount estimates for
foo_bar_baz at all, so it's not surprising that the plan doesn't change.
(I do wonder how Thomas got a different outcome...)

Given the shape of the preferred plan:

 Finalize Aggregate  (cost=15779.59..15779.60 rows=1 width=8) (actual time=160.329..160.330 rows=1 loops=1)
   ->  Gather  (cost=15779.38..15779.59 rows=2 width=8) (actual time=160.011..161.712 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=14779.38..14779.39 rows=1 width=8) (actual time=154.675..154.675 rows=1 loops=3)
               ->  Hash Join  (cost=1.09..14612.90 rows=66590 width=0) (actual time=86.814..144.793 rows=100500 loops=3)
                     Hash Cond: (fbb_1.foo_id = foo.foo_id)
                     ->  Parallel Append  (cost=0.00..12822.21 rows=399537 width=4) (actual time=0.019..95.644 rows=318950 loops=3)
                           ->  Parallel Seq Scan on foo_bar_baz_1 fbb_1  (cost=0.00..3403.53 rows=177353 width=4) (actual time=0.012..18.881 rows=100500 loops=3)
                           ->  Parallel Seq Scan on foo_bar_baz_2 fbb_2  (cost=0.00..3115.53 rows=162353 width=4) (actual time=0.018..51.716 rows=276000 loops=1)
                           ->  Parallel Seq Scan on foo_bar_baz_3 fbb_3  (cost=0.00..2031.82 rows=105882 width=4) (actual time=0.011..16.854 rows=90000 loops=2)
                           ->  Parallel Seq Scan on foo_bar_baz_4 fbb_4  (cost=0.00..1584.00 rows=82500 width=4) (actual time=0.011..26.950 rows=140250 loops=1)
                           ->  Parallel Seq Scan on foo_bar_baz_5 fbb_5  (cost=0.00..667.65 rows=34765 width=4) (actual time=0.014..11.896 rows=59100 loops=1)
                           ->  Parallel Seq Scan on foo_bar_baz_0 fbb  (cost=0.00..22.00 rows=1200 width=4) (actual time=0.001..0.001 rows=0 loops=1)
                     ->  Hash  (cost=1.07..1.07 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=3)
                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
                           ->  Seq Scan on foo  (cost=0.00..1.07 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=3)
                                 Filter: ((foo_name)::text = 'eeny'::text)
                                 Rows Removed by Filter: 5

it's obvious that no pruning can happen, run-time or otherwise,
because the partitioned table is being scanned on the outside
of the join --- so the target value of foo_id isn't available.

We can force the planner to its second best choice with
set enable_hashjoin to 0;

and then we get

 Aggregate  (cost=31954.09..31954.10 rows=1 width=8) (actual time=420.158..420.158 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..31554.55 rows=159815 width=0) (actual time=0.058..389.974 rows=301500 loops=1)
         Join Filter: (fbb.foo_id = foo.foo_id)
         Rows Removed by Join Filter: 655350
         ->  Seq Scan on foo  (cost=0.00..1.07 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1)
               Filter: ((foo_name)::text = 'eeny'::text)
               Rows Removed by Filter: 5
         ->  Append  (cost=0.00..19567.35 rows=958890 width=4) (actual time=0.026..280.510 rows=956850 loops=1)
               ->  Seq Scan on foo_bar_baz_0 fbb  (cost=0.00..30.40 rows=2040 width=4) (actual time=0.003..0.003 rows=0 loops=1)
               ->  Seq Scan on foo_bar_baz_1 fbb_1  (cost=0.00..4645.00 rows=301500 width=4) (actual time=0.022..57.836 rows=301500 loops=1)
               ->  Seq Scan on foo_bar_baz_2 fbb_2  (cost=0.00..4252.00 rows=276000 width=4) (actual time=0.019..51.834 rows=276000 loops=1)
               ->  Seq Scan on foo_bar_baz_3 fbb_3  (cost=0.00..2773.00 rows=180000 width=4) (actual time=0.016..31.951 rows=180000 loops=1)
               ->  Seq Scan on foo_bar_baz_4 fbb_4  (cost=0.00..2161.50 rows=140250 width=4) (actual time=0.015..24.392 rows=140250 loops=1)
               ->  Seq Scan on foo_bar_baz_5 fbb_5  (cost=0.00..911.00 rows=59100 width=4) (actual time=0.012..10.252 rows=59100 loops=1)

This is a good deal slower, and the planner correctly estimates that it's
a good deal slower, so that's why it didn't get picked.

But ... why didn't any run-time pruning happen?  Because the shape of the
plan is still wrong: the join condition is being applied at the nestloop
node.  If we'd pushed down the foo_id condition to the foo_bar_baz scan
then there'd be hope of pruning.

I think the reason that that isn't happening is that the planner has
not been taught that run-time pruning is a thing, so it's not giving
any cost preference to doing things in a way that would enable that.
It's not entirely clear what the cost estimate adjustments should be,
but obviously somebody had better work on that.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Michaeldba@sqlexec.com
In reply to this post by Thomas Kellerer
I too got the same plan (non runtime partition pruning plan) with or
without the statistics.  So it looks like the workaround until this is
fixed is to re-arrange the query to do a subselect to force the runtime
partition pruning as Andreas suggested, which I tested and indeed does
work for me too!

Regards,
Michael Vitale

Thomas Kellerer wrote on 8/3/2019 10:06 AM:

>> it's posible to rewrite the query to:
>>
>>
>> test=# explain analyse select count(*) from foo_bar_baz as fbb where
>> foo_id = (select foo_id from foo where foo_name = 'eeny');
>>
>> I know, that's not a solution, but a workaround. :-(
>
> Yes, I discovered that as well.
>
> But I'm more confused (or concerned) by the fact that the (original)
> query works correctly *without* statistics.
>
> Thomas
>
>
>
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Thomas Kellerer
In reply to this post by Tom Lane-2
Tom Lane schrieb am 03.08.2019 um 18:05:
> Yeah, I get the same plan with or without ANALYZE, too.  In this example,
> having the ANALYZE stats barely moves the rowcount estimates for
> foo_bar_baz at all, so it's not surprising that the plan doesn't change.
> (I do wonder how Thomas got a different outcome...)

I don't know why either ;)

I am using a JDBC based SQL tool to run that - I don't know if that matters.

I just tried this script with Postgres 12 beta2 and there I do not get
the initial plan with "never executed" (so the same behaviour as everybody
else seems to have).

If the reason why my initial plan is different than the "analyzed" plan
lies in the configuration, I am happy to share my postgresql.conf if
that is of any interest.

Thomas




Reply | Threaded
Open this post in threaded view
|

Re: Strange runtime partition pruning behaviour with 11.4

Sverre Boschman
In reply to this post by Thomas Kellerer
On Tue, Aug 13, 2019 at 8:46 AM Thomas Kellerer <[hidden email]> wrote:
I stumbled across this question on SO: https://stackoverflow.com/questions/56517852

Disregarding the part about Postgres 9.3, the example for Postgres 11 looks a bit confusing.

There is a script to setup test data in that question:

==== start of script ====

    create table foo (
        foo_id integer not null,
        foo_name varchar(10),
        constraint foo_pkey primary key (foo_id)
    );     

    insert into foo
      (foo_id, foo_name)
    values
      (1, 'eeny'),
      (2, 'meeny'),
      (3, 'miny'),
      (4, 'moe'),
      (5, 'tiger'),
      (6, 'toe');

    create table foo_bar_baz (
        foo_id integer not null,
        bar_id integer not null,
        baz    integer not null,
        constraint foo_bar_baz_pkey primary key (foo_id, bar_id, baz),
        constraint foo_bar_baz_fkey1 foreign key (foo_id)
            references foo (foo_id)
    ) partition by range (foo_id)
    ;

    create table if not exists foo_bar_baz_0 partition of foo_bar_baz for values from (0) to (1);
    create table if not exists foo_bar_baz_1 partition of foo_bar_baz for values from (1) to (2);
    create table if not exists foo_bar_baz_2 partition of foo_bar_baz for values from (2) to (3);
    create table if not exists foo_bar_baz_3 partition of foo_bar_baz for values from (3) to (4);
    create table if not exists foo_bar_baz_4 partition of foo_bar_baz for values from (4) to (5);
    create table if not exists foo_bar_baz_5 partition of foo_bar_baz for values from (5) to (6);

    with foos_and_bars as (
        select ((random() * 4) + 1)::int as foo_id, bar_id::int
        from generate_series(0, 1499) as t(bar_id)
    ), bazzes as (
        select baz::int
        from generate_series(1, 1500) as t(baz)
    )
    insert into foo_bar_baz (foo_id, bar_id, baz)
    select foo_id, bar_id, baz
    from bazzes as bz
      join foos_and_bars as fab on mod(bz.baz, fab.foo_id) = 0;

==== end of script ====

I see the some strange behaviour similar to to what is reported in the comments to that question:

When I run the test query immediately after populating the tables with the sample data:

    explain analyze
    select count(*)
    from foo_bar_baz as fbb
      join foo on fbb.foo_id = foo.foo_id
    where foo.foo_name = 'eeny'

I do see an "Index Only Scan .... (never executed)" in the plan for the irrelevant partitions:

  https://explain.depesz.com/s/AqlE

However once I run "analyze foo_bar_baz" (or "vacuum analyze"), Postgres chooses to do a "Parallel Seq Scan" for each partition:

  https://explain.depesz.com/s/WwxE

Why does updating the statistics mess up (runtime) partition pruning?


I played around with random_page_cost and that didn't change anything.
I tried to create extended statistics on "foo(id, name)" so that the planner would no, that there is only one name per id. No change.

I saw the above behaviour when running this on Windows 10 (my Laptop) or CentOS 7 (a test environment on a VM)

On the CentOS server default_statistics_target is set to 100, on my laptop it is set to 1000

In both cases the Postgres version was 11.4

Any ideas?

Thomas


Ran into the same behaviour of the planner. The amount of rows in the partitions influence the statistics being generated and the statistics in turn influence the plan chosen.

I managed to force the "correct" plan by manually setting the n_distinct statistics for the partitioned table.
E.g.: alter table foo_bar_baz alter column foo_id set ( n_distinct=-1, n_distinct_inherited=-1);

With a certain number of rows in the partitions the analyser sets the n_distinct value for the partitioned table to the number of unique partition keys and the n_distinct value
for the individual partitions to number of unique partition keys in that partition. Unfortunately this causes the planner to pick a plan that doesn't allow for execution pruning,
resulting in very slow execution times.

Regards,
Sverre