Very long query planning times for database with lots of partitions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Very long query planning times for database with lots of partitions

Mickael van der Beek
Hey everyone,

I have a PostgreSQL 10 database that contains two tables which both have two levels of partitioning (by list and using a single value). Meaning that a partitioned table gets repartitioned again.

The data in my use case is stored on 5K to 10K partitioned tables (children and grand-children of the two tables mentioned above) depending on usage levels.

Three indexes are set on the grand-child partition. The partitioning columns are not covered by them.
(I don't believe that it is needed to index partition columns no?)

With this setup, I experience queries that have very slow planning times but fast execution times.
Even for simple queries where only a couple partitions are searched on and the partition values are hard-coded.

Researching the issue, I thought that the linear search in use by PostgreSQL 10 to find the partition table metadata was the cause.


So I decided to try ou PostgreSQL 11 which included the two aforementioned fixes:

I ran an `ANALYZE` before doing my tests so I believe that the statistics are calculated and fresh.

Now I know that PostgreSQL doesn't like having lots of partitions but I still would like to understand why the query planner is so slow in PostgreSQL 10 and PostgreSQL 11.
(I was also wondering what "a lot" of partitions is in PostgreSQL? When I look at use cases of extensions like TimescaleDB, I would expect that 5K to 10K partitions wouldn't be a whole lot.)
 
An example of a simple query that I run on both PostgreSQL version would be:

EXPLAIN ANALYZE
SELECT
    table_a.a,
    table_b.a
FROM
    (
        SELECT
            a,
            b
        FROM
            table_a
        WHERE
            partition_level_1_column = 'foo'
            AND
            partition_level_2_column = 'bar'
    )
        AS table_a
INNER JOIN
    (
        SELECT
            a,
            b
        FROM
            table_b
        WHERE
            partition_level_1_column = 'baz'
            AND
            partition_level_2_column = 'bat'
    )
        AS table_b
        ON table_b.b = table_a.b
LIMIT
    10;

Running this query on my database with 5K partitions (split roughly 2/3rds of the partitions for table_b and 1/3rd of the partitions for table_a) will return:

- Planning Time: 7155.647 ms
- Execution Time: 2.827 ms

Thank you in advance for your help!

Mickael
Reply | Threaded
Open this post in threaded view
|

Re: Very long query planning times for database with lots of partitions

Justin Pryzby
On Tue, Jan 22, 2019 at 02:44:29PM +0100, Mickael van der Beek wrote:

> Hey everyone,
>
> I have a PostgreSQL 10 database that contains two tables which both have
> two levels of partitioning (by list and using a single value). Meaning that
> a partitioned table gets repartitioned again.
>
> The data in my use case is stored on 5K to 10K partitioned tables (children
> and grand-children of the two tables mentioned above) depending on usage
> levels.
>
> Three indexes are set on the grand-child partition. The partitioning
> columns are not covered by them.
> (I don't believe that it is needed to index partition columns no?)
>
> With this setup, I experience queries that have very slow planning times
> but fast execution times.
> Even for simple queries where only a couple partitions are searched on and
> the partition values are hard-coded.
>
> Researching the issue, I thought that the linear search in use by
> PostgreSQL 10 to find the partition table metadata was the cause.
>
> cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/
>
> So I decided to try ou PostgreSQL 11 which included the two aforementioned
> fixes:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81

Those reduce the CPU time needed, but that's not the most significant issue.

For postgres up through 11, including for relkind=p, planning requires 1)
stat()ing every 1GB file for every partition, even those partitions which are
eventually excluded by constraints or partition bounds ; AND, 2) open()ing
every index on every partition, even if it's excluded later.

Postgres 12 is expected to resolve this and allow "many" (perhaps 10k) of
partitions: https://commitfest.postgresql.org/20/1778/

I think postgres through 11 would consider 1000 partitions to be "too many".

You *might* be able to mitigate the high cost of stat()ing tables by ensuring
that the table metadata stays in OS cache, by running something like:
 find /var/lib/pgsql /tablespace -ls

You *might* be able to mitigate the high cost of open()ing the indices by
keeping their first page in cache (preferably postgres buffer cache)..either by
running a cronjob to run explain, or perhaps something like pg_prewarm on the
indices.  (I did something like this for our largest customers to improve
performance as a stopgap).

Justin

Reply | Threaded
Open this post in threaded view
|

RE: Very long query planning times for database with lots of partitions

Steven Winfield
In reply to this post by Mickael van der Beek

Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?

If so, does the EXPLAIN output mention all of your parent partitions, or are some being successfully pruned?

Planning times can be sped up significantly if the planner can exclude parent partitions, without ever having to examine the constraints of the child (and grandchild) partitions. If this is not the case, take another look at your query and try to figure out why the planner might believe a parent partition cannot be outright disregarded from the query – does the query contain a filter on the parent partitions’ partition key, for example?

 

I believe Timescaledb has its own query planner optimisations for discarding partitions early at planning time.

 

Good luck,

Steve.





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
Reply | Threaded
Open this post in threaded view
|

Re: Very long query planning times for database with lots of partitions

Mickael van der Beek
Thank both of you for your quick answers,

@Justin Based on your answer it would seem to confirm that partitioning or at least partitioning this much is not the correct direction to take.
The reason I originally wanted to use partitioning was that I'm storing a multi-tenant graph and that as the data grew, so did the indexes and once they were larger than the available RAM, query performance went down the drain.
The two levels of partitioning let me create one level for the tenant-level partitioning and one level for the business logic where I could further partition the tables into the different types of nodes and edges I was storing.
(The table_a and table_b in my example query. There is also a table_c which connect table_a and table_b but I wanted to keep it simple.)
Another reason was that we do regular, automated cleanups of the data and dropping all the data (hundreds of thousands of rows) for a tenant is very fast with DROP TABLE of a partition and rather slow with a regular DELETE query (even if indexed).
With the redesign of the database schema (that included the partitioning changes), I also dramatically reduced the amounts and size of data per row on the nodes and edges by storing the large and numerous metadata fields on separate tables that are not part of the graph traversal process.
Based on the usage number I see, I would expect around 12K tenants in the medium future which means that even partitioning per tenant on those two tables would lead to 24K partitions which is way above your approximate limit of 1K partitions.
Queries are always limited to one tenant's data which was one of the motivations behind partitioning in the first place.
Not sure what you would advise in this case for a multi-tenant graph?

@Steven, yes, constaint_exclusion is set to the default value of 'partition'.
The EXPLAIN ANALYZE output also successfully prunes the partitions correctly.
So the query plan looks sounds and the query execution confirms this.
But reaching that point is really what the issue is for me.

 

On Tue, Jan 22, 2019 at 3:07 PM Steven Winfield <[hidden email]> wrote:

Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?

If so, does the EXPLAIN output mention all of your parent partitions, or are some being successfully pruned?

Planning times can be sped up significantly if the planner can exclude parent partitions, without ever having to examine the constraints of the child (and grandchild) partitions. If this is not the case, take another look at your query and try to figure out why the planner might believe a parent partition cannot be outright disregarded from the query – does the query contain a filter on the parent partitions’ partition key, for example?

 

I believe Timescaledb has its own query planner optimisations for discarding partitions early at planning time.

 

Good luck,

Steve.





This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.

GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice