Planner performance in partitions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Planner performance in partitions

Piotr Włodarczyk
Hello,

We have partitioned tables in two levels. Both stages are partitioned in
ranges method. We see that planner and executor time was 10 time slower when
we asked main table rather than partitioned. My question is did planner and
executor are working optimal? I have doubts about it. Let's consider that
situation, because I think that in some ways (I think in more cases) planner
shoudl be more optimal. When we have table partitioned by key which is in
"where" clause we have guarantee that all rows we can find just in ONE
partition: contained in range or in default if exists. Planner show that
query should be executed just in one partition, but it takes a lot of time.
So my direct question is if planner is stopping searching for another
partition when he found one correct? Are partition ranges stored sorted and
searching method is optimal and is stopped after first (and only) hit?

I've noticed that increasing numbers of partition proportionally increase
planner time. Additionally having index on column that you are searching for
is adding extra time (one index add +/- 100% time for table). It's
understandable but optimizing planner and executor by ideas I wrote on first
paragraph automatically decrease time for searching indexes.

Reproduction:
1. ADD MAIN TABLE

-- Table: public.book

--DROP TABLE public.book;

CREATE TABLE public.book
(
    id bigserial,
    id_owner bigint NOT NULL,
    added date NOT NULL
) PARTITION BY RANGE (id_owner)
WITH (
    OIDS = FALSE
)
TABLESPACE pg_default;

ALTER TABLE public.book
    OWNER to postgres;

2. ADD PARTITIONS (run first "a" variant, then drop table book, reconstruct
and run "variant"):
    a. 1200 partitions:
https://gist.github.com/piotrwlodarczyk/4faa05729d1bdd3b5f5738a2a3faabc0 
    b. 6000 partitions:
https://gist.github.com/piotrwlodarczyk/2747e0984f521768f5d36ab2b382ea36 
   
3. ANALYZE ON MAIN TABLE:
    EXPLAIN ANALYZE SELECT * FROM public.book WHERE id_owner = 4;
    a. My result for 1200 partitions:
https://gist.github.com/piotrwlodarczyk/500f20a0b6e2cac6d36ab88d4fea2c00 
    b. My result for 6000 partitions:
https://gist.github.com/piotrwlodarczyk/277687b21201340377116a18a3dd8be8

4. ANALYZE ON PARTITIONED TABLE (only on first level):
  EXPLAIN ANALYZE SELECT * FROM public.book WHERE id_owner = 4;
  a. My result for 1200:
https://gist.github.com/piotrwlodarczyk/4285907c68b34b486cbf39eb8ae5cf92
  b. My result for 6000:
https://gist.github.com/piotrwlodarczyk/c157cc9321b6e1a1d0f900310f14f1cc

4. CONCLUSIONS
    Planner time for select on public.book (main table) 1200 was 469.416 ms,
for 6000 was 2530.179 ms. It looks like time is linear to partition count.
That makes me sure that all partitions are checked instead of searching for
first that equals. Intentionally I've searching id_owner = 4 to make sure
that in both cases first partition should by marked as correct and planer
time should be constant. What is intereting too that real execution time was
+/- equal in both cases. Is executor working better than planner?
  When we're asking on first level partition directly - time for planner
1200 is 58.736 ms, for 6000: 60.555 ms. We can say it's equal. Why? Because
planner don't have to search for another matching partitions because first
found can match. It's guaranteed by rule that say ranges in partitions
cannot override. Execution time in this case is 50 times faster!




smime.p7s (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Planner performance in partitions

Michael Lewis
"It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher as more partitions are added." (emphasis added)
Reply | Threaded
Open this post in threaded view
|

Re: Planner performance in partitions

Michaeldba@sqlexec.com
Queries against tables with a lot of partitions (> 1000) start to incur an increasing planning time duration even with the current version, V11.  V12 purportedly has fixed this problem, allowing thousands of partitioned tables without a heavy planning cost.  Can't seem to find the threads on this topic, but there are out there.  I personally noted a gigantic increase in planning time once I got past 1500 partitioned tables in V11.

On another note, hopefully they have fixed runtime partition pruning in V12 since V11 introduced it but some query plans don't use it, so you have to reconstruct some queries to sub queries to make it work correctly.

Regards,
Michael Vitale


Michael Lewis wrote on 8/12/2019 3:05 PM:
"It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher as more partitions are added." (emphasis added)

Reply | Threaded
Open this post in threaded view
|

ODP: Planner performance in partitions

piotrwlodarczyk89

@Michael Lewis: I know documentation. I'm just considerations about possible performance tricks in current production version. I've tested this on V12 on another computer and I can say that I'm impressed. I've checked on 1200 partitions and times are:

 

PostgreSQL11.5:

  • select on main partition (public.book): planner: 60ms, execution: 5ms
  • select on partitioned table (public.book_1-1000): planner: 2.7 ms, execution: 2,4 ms

PostgreSQL 12B3:

  • select on main partition (public.book): planner: 2,5ms , execution: 1,2ms
  • select on partitioned table (public.book_1-1000): planner: 2.5 ms, execution: 1,2 ms

 

So looking at above results we have two options:

  • Wait for 12.0 stable version 😉
  • Wait for patches to 11 – PostgreSQL Team: can You do this? 😊

 

Pozdrawiam,

Piotr Włodarczyk

 

Od: [hidden email]
Wysłano: poniedziałek, 12 sierpnia 2019 21:25
Do: [hidden email]
DW: [hidden email]; [hidden email]
Temat: Re: Planner performance in partitions

 

Queries against tables with a lot of partitions (> 1000) start to incur an increasing planning time duration even with the current version, V11.  V12 purportedly has fixed this problem, allowing thousands of partitioned tables without a heavy planning cost.  Can't seem to find the threads on this topic, but there are out there.  I personally noted a gigantic increase in planning time once I got past 1500 partitioned tables in V11.

On another note, hopefully they have fixed runtime partition pruning in V12 since V11 introduced it but some query plans don't use it, so you have to reconstruct some queries to sub queries to make it work correctly.

Regards,
Michael Vitale


Michael Lewis wrote on 8/12/2019 3:05 PM:

"It is also important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few hundred partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher as more partitions are added." (emphasis added)

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Planner performance in partitions

Michael Lewis
Thanks for clarifying your position and sharing the results you have seen. That is impressive indeed.

It seems likely that waiting for v12 is needed since feature are not back patched. Perhaps one of the contributors will confirm, but that is my expectation.
Reply | Threaded
Open this post in threaded view
|

Re: Planner performance in partitions

David Rowley-3
In reply to this post by piotrwlodarczyk89
On Tue, 13 Aug 2019 at 08:03, Piotr Włodarczyk
<[hidden email]> wrote:

> PostgreSQL11.5:
>
> select on main partition (public.book): planner: 60ms, execution: 5ms
> select on partitioned table (public.book_1-1000): planner: 2.7 ms, execution: 2,4 ms
>
> PostgreSQL 12B3:
>
> select on main partition (public.book): planner: 2,5ms , execution: 1,2ms
> select on partitioned table (public.book_1-1000): planner: 2.5 ms, execution: 1,2 ms
>
> So looking at above results we have two options:
>
> Wait for 12.0 stable version
> Wait for patches to 11 – PostgreSQL Team: can You do this?

You'll need to either reduce the number of partitions down to
something realistic or wait for 12.0.

The work done to speed up the planner with partitioned tables for v12
won't be going into v11.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Planner performance in partitions

Michael Lewis
Was there a reason to exceed 100-500 partitions in real life that pushed you to do this test? Is there some issue you see when using 100 partitions that is solved or reduced in severity by increasing to 1200 or 6000 partitions?
Reply | Threaded
Open this post in threaded view
|

ODP: Planner performance in partitions

piotrwlodarczyk89

As you wrote we have about 400/500 partitions in real life. So time problem is much smaller, but still it is and in one place of aur application we have decided to help DB and we're indicating in query exact partition we need. What pushed me to do this test? Just curiosity I think. After I saw in pg_locks that all partitions which was selected in uncommitted transaction have ACCESS SHARED i've started thinking about efficiency. And that way here we are. Why we need some hundred partitions? It’s because our main table (public.book) have hundreds of millions records. It’s not maintainable. VACUUM never ends, space on device is huge and we cannot take database down for longer that 2-3 hours, what is too short to maintain them manually. So we've partitioned them on two levels. First on id_owner (which is in every query) and the second level based on date. It’ll help as detach partitions with old data we no longer need.

 

Pozdrawiam,

Piotr Włodarczyk

 

Od: [hidden email]
Wysłano: wtorek, 13 sierpnia 2019 00:37
Do: [hidden email]
DW: [hidden email]; [hidden email]; [hidden email]; [hidden email]
Temat: Re: Planner performance in partitions

 

Was there a reason to exceed 100-500 partitions in real life that pushed you to do this test? Is there some issue you see when using 100 partitions that is solved or reduced in severity by increasing to 1200 or 6000 partitions?