Postgres 11.0 Partitioned Table Query Performance

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgres 11.0 Partitioned Table Query Performance

Paul Schaap
Hi,

I have an issue, and a partial workaround, with a query outlined below. What I am hoping to get to is a Parallel Index Only Scan on my partition indexes as theoretically that should be the fastest, but can only get either a Parallel Seq Scan on each partition which is very slow, or a non parallel Index Only Scan which is faster.

If I express my query this way:

EXPLAIN SELECT trl.*, tr.trans_id
FROM transactions_raw_load trl
  LEFT OUTER JOIN transactions_raw tr ON tr.trans_id = trl.trans_id;

Note there is an index on tr.trans_id, and no indexes on transactions_raw_load.

I get the following poor performing query plan (I got bored and gave up after an hour):

Gather  (cost=4621066.16..5902873.25 rows=5966773 width=101)
  Workers Planned: 2
  ->  Parallel Hash Left Join  (cost=4620066.16..5305195.95 rows=2486155 width=101)
        Hash Cond: (trl.trans_id = tr_33.trans_id)
        ->  Parallel Seq Scan on transactions_raw_load trl  (cost=0.00..113010.55 rows=2486155 width=93)
        ->  Parallel Hash  (cost=3493307.51..3493307.51 rows=68678612 width=8)
              ->  Parallel Append  (cost=0.00..3493307.51 rows=68678612 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201810 tr_33  (cost=0.00..153214.79 rows=2491579 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201801 tr_24  (cost=0.00..109644.39 rows=2417739 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201809 tr_32  (cost=0.00..108290.88 rows=2384788 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201803 tr_26  (cost=0.00..107932.82 rows=2371382 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201808 tr_31  (cost=0.00..106060.08 rows=2326508 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201711 tr_22  (cost=0.00..104153.00 rows=2296600 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201807 tr_30  (cost=0.00..104067.08 rows=2277508 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201712 tr_23  (cost=0.00..103835.12 rows=2292412 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201806 tr_29  (cost=0.00..101345.35 rows=2219635 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201804 tr_27  (cost=0.00..100914.89 rows=2211989 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201805 tr_28  (cost=0.00..100822.62 rows=2208362 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201710 tr_21  (cost=0.00..98046.89 rows=2165089 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201802 tr_25  (cost=0.00..94385.00 rows=2082000 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201709 tr_20  (cost=0.00..93871.78 rows=2073678 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201708 tr_19  (cost=0.00..90885.16 rows=2009316 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201703 tr_14  (cost=0.00..90467.97 rows=2008797 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201605 tr_4  (cost=0.00..90442.77 rows=2015178 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201603 tr_2  (cost=0.00..88116.27 rows=1960828 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201701 tr_12  (cost=0.00..87520.99 rows=1943799 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201604 tr_3  (cost=0.00..85274.36 rows=1897736 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201606 tr_5  (cost=0.00..84541.97 rows=1883897 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201612 tr_11  (cost=0.00..83267.58 rows=1853858 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201704 tr_15  (cost=0.00..83224.67 rows=1843667 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201601 tr  (cost=0.00..83147.67 rows=1850067 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201707 tr_18  (cost=0.00..82300.82 rows=1817682 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201602 tr_1  (cost=0.00..81858.09 rows=1821909 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201702 tr_13  (cost=0.00..80157.20 rows=1780320 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201705 tr_16  (cost=0.00..80028.58 rows=1770758 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201706 tr_17  (cost=0.00..79477.53 rows=1756953 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201611 tr_10  (cost=0.00..76224.72 rows=1700772 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201610 tr_9  (cost=0.00..73696.76 rows=1644276 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201608 tr_7  (cost=0.00..73237.05 rows=1633905 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201609 tr_8  (cost=0.00..71263.61 rows=1590161 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201607 tr_6  (cost=0.00..70902.52 rows=1580452 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201811 tr_34  (cost=0.00..27282.28 rows=494928 width=8)
                    ->  Parallel Seq Scan on transactions_raw_201812 tr_35  (cost=0.00..11.18 rows=118 width=8)

It surprises me that this doesn't Parallel Index Only Scan when I deliberately only used tr.trans_id. so it wouldn't need to go to the data page.

The partial workaround I have found is to express my query this way:

EXPLAIN SELECT *, (
    SELECT tre.trans_id
    FROM transactions_raw tre
    WHERE tre.trans_id = trl.trans_id
) transactions_raw_exists
FROM transactions_raw_load trl;

Which results in the query plan below, and much better performance (it takes about 30 mins) :

Seq Scan on transactions_raw_load trl  (cost=0.00..1814494316.70 rows=5966773 width=101)
  SubPlan 1
    ->  Append  (cost=0.43..304.07 rows=36 width=8)
          ->  Index Only Scan using transactions_raw_201601_trans_id_idx on transactions_raw_201601 tre  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201602_trans_id_idx on transactions_raw_201602 tre_1  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201603_trans_id_idx on transactions_raw_201603 tre_2  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201604_trans_id_idx on transactions_raw_201604 tre_3  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201605_trans_id_idx on transactions_raw_201605 tre_4  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201606_trans_id_idx on transactions_raw_201606 tre_5  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201607_trans_id_idx on transactions_raw_201607 tre_6  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201608_trans_id_idx on transactions_raw_201608 tre_7  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201609_trans_id_idx on transactions_raw_201609 tre_8  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201610_trans_id_idx on transactions_raw_201610 tre_9  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201611_trans_id_idx on transactions_raw_201611 tre_10  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201612_trans_id_idx on transactions_raw_201612 tre_11  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201701_trans_id_idx on transactions_raw_201701 tre_12  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201702_trans_id_idx on transactions_raw_201702 tre_13  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201703_trans_id_idx on transactions_raw_201703 tre_14  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201704_trans_id_idx on transactions_raw_201704 tre_15  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201705_trans_id_idx on transactions_raw_201705 tre_16  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201706_trans_id_idx on transactions_raw_201706 tre_17  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201707_trans_id_idx on transactions_raw_201707 tre_18  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201708_trans_id_idx on transactions_raw_201708 tre_19  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201709_trans_id_idx on transactions_raw_201709 tre_20  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201710_trans_id_idx on transactions_raw_201710 tre_21  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201711_trans_id_idx on transactions_raw_201711 tre_22  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201712_trans_id_idx on transactions_raw_201712 tre_23  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201801_trans_id_idx on transactions_raw_201801 tre_24  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201802_trans_id_idx on transactions_raw_201802 tre_25  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201803_trans_id_idx on transactions_raw_201803 tre_26  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201804_trans_id_idx on transactions_raw_201804 tre_27  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201805_trans_id_idx on transactions_raw_201805 tre_28  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201806_trans_id_idx on transactions_raw_201806 tre_29  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201807_trans_id_idx on transactions_raw_201807 tre_30  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201808_trans_id_idx on transactions_raw_201808 tre_31  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201809_trans_id_idx on transactions_raw_201809 tre_32  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201810_trans_id_idx on transactions_raw_201810 tre_33  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201811_trans_id_idx on transactions_raw_201811 tre_34  (cost=0.43..8.45 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)
          ->  Index Only Scan using transactions_raw_201812_trans_id_idx on transactions_raw_201812 tre_35  (cost=0.14..8.16 rows=1 width=8)
                Index Cond: (trans_id = trl.trans_id)

But obviously the nature of this query means it cannot be parallel.

Platform tested on is OSX 10.14.1 using Postgres.app v2.2 PostgreSQL 11.0 from https://postgresapp.com/

Any help/pointers would be appreciated.

CHEERS
Paul


PRIVACY & CONFIDENTIALITY NOTICE

The information contained in this email is intended for the named recipients  only.  It may contain privileged and confidential information and if you are  not the intended recipient, you should not copy it or disclose its contents  to any other person, or take any action in reliance on it.  If you have  received this email in error, please notify us immediately by return email.

Reply | Threaded
Open this post in threaded view
|

Re: Postgres 11.0 Partitioned Table Query Performance

David Rowley-3
On 8 November 2018 at 15:28, Paul Schaap <[hidden email]> wrote:

> I have an issue, and a partial workaround, with a query outlined below. What
> I am hoping to get to is a Parallel Index Only Scan on my partition indexes
> as theoretically that should be the fastest, but can only get either a
> Parallel Seq Scan on each partition which is very slow, or a non parallel
> Index Only Scan which is faster.
>
> If I express my query this way:
>
> EXPLAIN SELECT trl.*, tr.trans_id
> FROM transactions_raw_load trl
>   LEFT OUTER JOIN transactions_raw tr ON tr.trans_id = trl.trans_id;
>
> Note there is an index on tr.trans_id, and no indexes on
> transactions_raw_load.
>
> I get the following poor performing query plan (I got bored and gave up
> after an hour):

Parallel nodes cannot be on the inside of a nested loop join, and
you've mentioned that the other table has no indexes so I guess you
didn't mean on the outside.

You may find that a serial nested loop plan with a parameterised inner
index only scan to be faster than the hash join. If you're finding
that subquery scan is better, then you may want to consider dropping
random_page_cost a bit or increasing effective_cache_size.  This will
lower the estimated cost of random IO for indexes scans.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services