Performance regression related to parallel execution

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Performance regression related to parallel execution

Jinho Jung-2

Hello,

We noticed that the following SQL queries are running 3 times slower on the latest version of PostgreSQL. Here’s the time taken to execute them on older (v9.5.16) and newer versions (v11.2) of PostgreSQL (in milliseconds):

+-----------------------+--------+---------+---------+-----------+
|                       | scale1 | scale10 | scale50 | scale 300 |
+-----------------------+--------+---------+---------+-----------+
| Query 1 (v9.5.16)     |     88 |     937 |    4721 |   27241   |
| Query 1 (v11.2)       |    288 |    2822 |   13838 |   85081   |
+-----------------------+--------+---------+---------+-----------+
| Query 2 (v9.5.16)     |     39 |       X |       X |      X    |
| Query 2 (v11.2)       |     80 |       X |       X |      X    |
+-----------------------+--------+---------+---------+-----------+

For each regression, we share:
1) the associated query,
2) the commit that activated it,
3) our high-level analysis, and
4) query execution plans in old and new versions of PostgreSQL.

All these regressions are observed on the latest version. (v11.2 and v9.5.16)

We found several other regression related to this commit: 
77cd477 (Enable parallel query by default.)

* You can download the queries at:

* You can reproduce the result by using the same setup that we described
before (As Andrew mentioned before, we increased default work_mem to 128MB):


###### Query 1:

select
  ref_0.ol_delivery_d as c1
from
  public.order_line as ref_0
where EXISTS (
  select
    ref_1.i_im_id as c0
  from
    public.item as ref_1
  where ref_0.ol_d_id <=  ref_1.i_im_id
)

- Commit: 77cd477 (Enable parallel query by default.)

- Our analysis: We believe that this regression is due to parallel queries being enabled by default. Surprisingly, we found that even on a larger TPC-C database (scale factor of 50, roughly 4GB), parallel scan is still slower than the non-parallel one in the old version, when the query is not returning any tuples.

- Query Execution Plans

[OLD version]
Nested Loop Semi Join  (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)
  Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
  ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)
      ->  Seq Scan on item ref_1  (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms
Execution time: 27241.239 ms

[NEW version]
Gather  (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)
      Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
      ->  Parallel Seq Scan on order_line ref_0  (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)
      ->  Seq Scan on item ref_1  (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms
Execution Time: 85081.158 ms



##### QUERY 2

select
  ref_0.c_id as c0
from
  public.customer as ref_0
where EXISTS (
  select
      ref_0.c_city as c0
  from
    public.order_line as ref_1
    left join public.new_order as ref_2
    on (ref_1.ol_supply_w_id = ref_2.no_w_id)
  where (ref_1.ol_delivery_d > ref_0.c_since)
)

- Our analysis : There seems to a problem with parallel execution logic. It is unclear why  only one worker is launched, since the default behavior of PostgreSQL is to launch at least two workers. We found that this query does not run to completion within a day when we increase the size of the DB. (e.g., from scale factor 1 to scale factor 10/50/300).

- Commit: 16be2fd (Make dsa_allocate interface more like MemoryContextAlloc)

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=224.43..910152608046.08 rows=10000 width=4) (actual time=3.681..37.842 rows=30000 loops=1)
  Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)
  ->  Seq Scan on customer ref_0  (cost=0.00..2569.00 rows=30000 width=12) (actual time=0.020..22.582 rows=30000 loops=1)
  ->  Materialize  (cost=224.43..48521498.97 rows=2406886812 width=8) (actual time=0.000..0.000 rows=1 loops=30000)
        ->  Hash Left Join  (cost=224.43..27085162.91 rows=2406886812 width=8) (actual time=3.652..3.652 rows=1 loops=1)
              Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)
              ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=12) (actual time=0.016..0.016 rows=1 loops=1)
              ->  Hash  (cost=124.19..124.19 rows=8019 width=4) (actual time=3.569..3.569 rows=8019 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 346kB
                    ->  Seq Scan on new_order ref_2  (cost=0.00..124.19 rows=8019 width=4) (actual time=0.020..1.770 rows=8019 loops=1)
Planning time: 0.927 ms
Execution time: 39.166 ms

[New version]
Gather  (cost=1224.43..672617098015.10 rows=10000 width=4) (actual time=3.792..78.702 rows=30000 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Nested Loop Semi Join  (cost=224.43..672617096015.10 rows=5882 width=4) (actual time=4.383..65.544 rows=15000 loops=2)
        Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)
        ->  Parallel Seq Scan on customer ref_0  (cost=0.00..2445.47 rows=17647 width=12) (actual time=0.029..13.054 rows=15000 loops=2)
        ->  Hash Left Join  (cost=224.43..27085162.91 rows=2406886812 width=8) (actual time=0.002..0.002 rows=1 loops=30000)
              Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)
              ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=12) (actual time=0.002..0.002 rows=1 loops=30000)
              ->  Hash  (cost=124.19..124.19 rows=8019 width=4) (actual time=4.094..4.094 rows=8019 loops=2)
                    Buckets: 8192  Batches: 1  Memory Usage: 346kB
                    ->  Seq Scan on new_order ref_2  (cost=0.00..124.19 rows=8019 width=4) (actual time=0.025..1.995 rows=8019 loops=2)
Planning Time: 1.154 ms
Execution Time: 80.217 ms