Performance regressions found using sqlfuzz

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Performance regressions found using sqlfuzz

Jinho Jung-2

Hello, 

Thanks for the feedback so far! Continue with the previous report, we sharing another four interesting cases that SQLFuzz discovered. 

Here’s the time taken to execute four SQL queries on old (v9.5.16) and newer version (v11.2) of PostgreSQL (in milliseconds):

+----------------------+--------+---------+---------+---------+
|                      | scale1 | scale10 | scale50 | scale300|
+----------------------+--------+---------+---------+---------+
| Case-5 (v9.5.16)     |     39 |     183 |    1459 |   11125 |
| Case-5 (v11.2)       |     73 |     620 |    4818 |   16956 |
+----------------------+--------+---------+---------+---------+
| Case-6 (v9.5.16)     |     46 |     329 |   15096 |   10721 |
| Case-6 (v11.2)       |     81 |     671 |   64808 |   26919 |
+----------------------+--------+---------+---------+---------+
| Case-7 (v9.5.16)     |     19 |       X |       X |       X |
| Case-7 (v11.2)       |     46 |       X |       X |       X |
+----------------------+--------+---------+---------+---------+
| Case-8 (v9.5.16)     |    215 |    2108 |   10460 |   64959 |
| Case-8 (v11.2)       |    449 |    3997 |   20246 |  130595 |
+----------------------+--------+---------+---------+---------+

For each regression, we share:
1) the associated query,
2) the commit that activated it,
3) our high-level analysis, and
4) query execution plans in old and new versions of PostgreSQL.

All these regressions are observed on the latest version. (v11.2 and v9.5.16)

* You can download the queries at:

* You can reproduce the result by using the same setup that we described before:

 - As Andrew mentioned before, we increased default work_mem to 128MB

Best regards,
Jinho Jung

#### QUERY 5

EXPLAIN ANALYZE
select
  ref_0.c_zip as c0
from
  public.customer as ref_0
where EXISTS (
  select
    ref_1.ol_d_id as c10
  from 
    public.order_line as ref_1
  where (ref_1.ol_o_id <> ref_0.c_d_id)
)

- Commit : 7ca25b7

- Our analysis: We believe newer version is slow when the number of rows in the filter is small.

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=0.00..15317063263550.52 rows=1 width=10) (actual time=0.019..10888.266 rows=9000000 loops=1)
  Join Filter: (ref_1.ol_o_id <> ref_0.c_d_id)
  Rows Removed by Join Filter: 11700000
  ->  Seq Scan on customer ref_0  (cost=0.00..770327.00 rows=9000000 width=14) (actual time=0.008..7541.944 rows=9000000 loops=1)
  ->  Materialize  (cost=0.00..2813223.52 rows=90017568 width=4) (actual time=0.000..0.000 rows=2 loops=9000000)
        ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.68 rows=90017568 width=4) (actual time=0.005..0.007 rows=14 loops=1)
Planning time: 0.401 ms
Execution time: 11125.538 ms


[New version]
Nested Loop Semi Join  (cost=0.00..3409260.89 rows=9000000 width=10) (actual time=0.033..16732.988 rows=9000000 loops=1)
  Join Filter: (ref_1.ol_o_id <> ref_0.c_d_id)
  Rows Removed by Join Filter: 11700000
  ->  Seq Scan on customer ref_0  (cost=0.00..770327.00 rows=9000000 width=14) (actual time=0.017..2113.336 rows=9000000 loops=1)
  ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.68 rows=90017568 width=4) (actual time=0.001..0.001 rows=2 loops=9000000)
Planning Time: 0.615 ms
Execution Time: 16956.115 ms


##### QUERY 6

select distinct 
  ref_0.h_data as c0, 
  ref_0.h_c_id as c1  
from 
  public.history as ref_0
    left join public.item as ref_1
    on (ref_1.i_im_id < -1)
where ref_1.i_price is NULL

- Our analysis: We think that the 'merge sort' makes slow execution. We are wondering why newer version applies external merge sort in this case. 

- Commit: 3fc6e2d (big patch)

- Query execution plans:

[Old version]
HashAggregate  (cost=1312274.26..1312274.27 rows=1 width=21) (actual time=7288.727..10443.586 rows=9000000 loops=1)
  Group Key: ref_0.h_data, ref_0.h_c_id
  ->  Nested Loop Left Join  (cost=0.00..1312274.26 rows=1 width=21) (actual time=26.965..2463.231 rows=9000000 loops=1)
        Filter: (ref_1.i_price IS NULL)
        ->  Seq Scan on history ref_0  (cost=0.00..184795.61 rows=8999661 width=21) (actual time=0.347..795.936 rows=9000000 loops=1)
        ->  Materialize  (cost=0.00..2521.05 rows=10 width=6) (actual time=0.000..0.000 rows=0 loops=9000000)
              ->  Seq Scan on item ref_1  (cost=0.00..2521.00 rows=10 width=6) (actual time=26.610..26.610 rows=0 loops=1)
                    Filter: (i_im_id < '-1'::integer)
                    Rows Removed by Filter: 100000
Planning time: 1.538 ms
Execution time: 10721.259 ms


[New version]
Unique  (cost=1312334.34..1312334.35 rows=1 width=21) (actual time=21444.459..26651.868 rows=9000000 loops=1)
  ->  Sort  (cost=1312334.34..1312334.35 rows=1 width=21) (actual time=21444.457..25629.389 rows=9000000 loops=1)
        Sort Key: ref_0.h_data, ref_0.h_c_id
        Sort Method: external merge  Disk: 285384kB
        ->  Nested Loop Left Join  (cost=0.00..1312334.33 rows=1 width=21) (actual time=21.328..2409.302 rows=9000000 loops=1)
              Filter: (ref_1.i_price IS NULL)
              ->  Seq Scan on history ref_0  (cost=0.00..184800.06 rows=9000106 width=21) (actual time=0.320..734.376 rows=9000000 loops=1)
              ->  Materialize  (cost=0.00..2521.05 rows=10 width=6) (actual time=0.000..0.000 rows=0 loops=9000000)
                    ->  Seq Scan on item ref_1  (cost=0.00..2521.00 rows=10 width=6) (actual time=21.000..21.001 rows=0 loops=1)
                          Filter: (i_im_id < '-1'::integer)
                          Rows Removed by Filter: 100000
Planning Time: 1.426 ms
Execution Time: 26919.635 ms


##### QUERY 7 

select  
  ref_0.c_id as c0  
from 
  public.customer as ref_0
where EXISTS (
  select  
      ref_0.c_city as c0
  from 
    public.order_line as ref_1
    left join public.new_order as ref_2
    on (ref_1.ol_supply_w_id = ref_2.no_w_id)
  where (ref_1.ol_delivery_d > ref_0.c_since)
)

- Our analysis : Parallel execution seems a problem. We also want to ask whether only one worker is intended behavior of Postgres because we think parallel execution with less than two workers is not parallel.

- Another interesting finding: this query cannot be finished within one day if we incrase the size of DB (e.g., from scale factor 1 to scale factor 10/50/300).

- Commit: 16be2fd

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=224.43..910152608046.08 rows=10000 width=4) (actual time=2.619..18.042 rows=30000 loops=1)
  Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)
  ->  Seq Scan on customer ref_0  (cost=0.00..2569.00 rows=30000 width=12) (actual time=0.003..4.530 rows=30000 loops=1)
  ->  Materialize  (cost=224.43..48521498.97 rows=2406886812 width=8) (actual time=0.000..0.000 rows=1 loops=30000)
        ->  Hash Left Join  (cost=224.43..27085162.91 rows=2406886812 width=8) (actual time=2.612..2.612 rows=1 loops=1)
              Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)
              ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=12) (actual time=0.002..0.002 rows=1 loops=1)
              ->  Hash  (cost=124.19..124.19 rows=8019 width=4) (actual time=2.571..2.571 rows=8019 loops=1)
                    Buckets: 8192  Batches: 1  Memory Usage: 346kB
                    ->  Seq Scan on new_order ref_2  (cost=0.00..124.19 rows=8019 width=4) (actual time=0.174..1.539 rows=8019 loops=1)
Planning time: 0.605 ms
Execution time: 19.045 ms

[New version]
Gather  (cost=1224.43..672617098015.10 rows=10000 width=4) (actual time=3.099..45.077 rows=30000 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Nested Loop Semi Join  (cost=224.43..672617096015.10 rows=5882 width=4) (actual time=2.307..38.258 rows=15000 loops=2)
        Join Filter: (ref_1.ol_delivery_d > ref_0.c_since)
        ->  Parallel Seq Scan on customer ref_0  (cost=0.00..2445.47 rows=17647 width=12) (actual time=0.003..2.668 rows=15000 loops=2)
        ->  Hash Left Join  (cost=224.43..27085162.91 rows=2406886812 width=8) (actual time=0.002..0.002 rows=1 loops=30000)
              Hash Cond: (ref_1.ol_supply_w_id = ref_2.no_w_id)
              ->  Seq Scan on order_line ref_1  (cost=0.00..6711.48 rows=300148 width=12) (actual time=0.001..0.001 rows=1 loops=30000)
              ->  Hash  (cost=124.19..124.19 rows=8019 width=4) (actual time=2.211..2.211 rows=8019 loops=2)
                    Buckets: 8192  Batches: 1  Memory Usage: 346kB
                    ->  Seq Scan on new_order ref_2  (cost=0.00..124.19 rows=8019 width=4) (actual time=0.086..1.179 rows=8019 loops=2)
Planning Time: 0.611 ms
Execution Time: 46.195 ms


##### QUERY 8

select  
  ref_0.ol_d_id as c0  
from 
  public.order_line as ref_0
  left join (
    select                
      ref_1.ol_supply_w_id as c0,      
      ref_1.ol_d_id as c2      
    from 
      public.order_line as ref_1 
    where ref_1.ol_o_id < 1
  ) as subq_0  
  on (subq_0.c2 = ref_0.ol_o_id)
where EXISTS (
  select  
    ref_2.o_ol_cnt as c0      
  from 
    public.oorder as ref_2
  where 
    nullif(ref_2.o_d_id, subq_0.c0) is not NULL
);

- Commit: 0c2070c

- Our analysis : We are not sure about the root cause of this regression. This might have to do with parallel execution.

- Query execution plans:

[Old version]
Nested Loop Semi Join  (cost=3488355.85..210034755085.67 rows=245615461 width=4) (actual time=32830.372..62745.966 rows=90017507 loops=1)
  Join Filter: (NULLIF(ref_2.o_d_id, ref_1.ol_supply_w_id) IS NOT NULL)
  ->  Hash Right Join  (cost=3488355.85..9115461.16 rows=246849710 width=8) (actual time=32829.980..48232.002 rows=90017507 loops=1)
        Hash Cond: (ref_1.ol_d_id = ref_0.ol_o_id)
        ->  Index Scan using order_line_pkey on order_line ref_1  (cost=0.57..2076988.89 rows=8461 width=8) (actual time=5346.547..5346.547 rows=0 loops=1)
              Index Cond: (ol_o_id < 1)
        ->  Hash  (cost=2011503.68..2011503.68 rows=90017568 width=8) (actual time=27466.438..27466.438 rows=90017507 loops=1)
              Buckets: 4194304  Batches: 64  Memory Usage: 80798kB
              ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.68 rows=90017568 width=8) (actual time=0.006..16821.719 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..245157.00 rows=9000000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)
        ->  Seq Scan on oorder ref_2  (cost=0.00..165000.00 rows=9000000 width=4) (actual time=0.377..0.377 rows=1 loops=1)
Planning time: 3.933 ms
Execution time: 64959.231 ms

[New version]
Gather  (cost=8797140.37..153823591859.11 rows=264667229 width=4) (actual time=27917.270..128050.059 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=8796140.37..153797124136.21 rows=110278012 width=4) (actual time=27911.016..109588.362 rows=30005836 loops=3)
        Join Filter: (NULLIF(ref_2.o_d_id, ref_1.ol_supply_w_id) IS NOT NULL)
        ->  Merge Left Join  (cost=8796140.37..10646159.56 rows=110832173 width=8) (actual time=27910.620..33777.444 rows=30005836 loops=3)
              Merge Cond: (ref_0.ol_o_id = ref_1.ol_d_id)
              ->  Sort  (cost=6717742.99..6811511.29 rows=37507320 width=8) (actual time=22427.383..25862.373 rows=30005836 loops=3)
                    Sort Key: ref_0.ol_o_id
                    Sort Method: external merge  Disk: 534352kB
                    Worker 0:  Sort Method: external merge  Disk: 509888kB
                    Worker 1:  Sort Method: external merge  Disk: 541512kB
                    ->  Parallel Seq Scan on order_line ref_0  (cost=0.00..1486401.20 rows=37507320 width=8) (actual time=0.025..14349.178 rows=30005836 loops=3)
              ->  Sort  (cost=2078397.38..2078419.66 rows=8912 width=8) (actual time=5483.221..5483.221 rows=0 loops=3)
                    Sort Key: ref_1.ol_d_id
                    Sort Method: quicksort  Memory: 25kB
                    Worker 0:  Sort Method: quicksort  Memory: 25kB
                    Worker 1:  Sort Method: quicksort  Memory: 25kB
                    ->  Index Scan using order_line_pkey on order_line ref_1  (cost=0.57..2077812.68 rows=8912 width=8) (actual time=5483.203..5483.203 rows=0 loops=3)
                          Index Cond: (ol_o_id < 1)
        ->  Seq Scan on oorder ref_2  (cost=0.00..165000.00 rows=9000000 width=4) (actual time=0.002..0.002 rows=1 loops=90017507)
Planning Time: 3.952 ms
Execution Time: 130595.467 ms


===================================
Following up with previous question
===================================

on-going discussion:


Hello Andres,

Could you please share your thoughts on QUERY 3?

The performance impact of this regression increases *linearly* on larger databases. We concur with Andrew in that this is related to the lack of a Materialize node and mis-costing of the Nested Loop Anti-Join. 

We found more than 20 regressions related to this commit. We have shared two illustrative examples (QUERIES 3A and 3B) below.

- Commit: 77cd477 (Enable parallel query by default.)

- Summary: Execution Time (milliseconds)

When we increased the scale-factor of TPC-C to 300 (~30 GB), this query ran three times slower on v11 (24 seconds) in comparison to v9.5 (7 seconds). We also found more than 15 regressions related to the same commit and share a couple of them below.

+-----------------------+--------+---------+---------+-----------+
|                       | scale1 | scale10 | scale50 | scale 300 |
+-----------------------+--------+---------+---------+-----------+
| Query 3 (v9.5)        |     28 |     248 |    1231 |      7265 |
| Query 3 (v11)         |     74 |     677 |    3345 |     24581 |
+-----------------------+--------+---------+---------+-----------+
| Query 3A (v9.5)       |     88 |     937 |    4721 |     27241 |
| Query 3A (v11)        |    288 |    2822 |   13838 |     85081 |
+-----------------------+--------+---------+---------+-----------+
| Query 3B (v9.5)       |    101 |     934 |    4824 |     29363 |
| Query 3B (v11)        |    200 |    2331 |   12327 |     74110 |
+-----------------------+--------+---------+---------+-----------+


###### QUERY 3:

select
     cast(ref_1.ol_i_id as int4) as c0
from
     public.stock as ref_0
       left join public.order_line as ref_1
       on (ref_1.ol_number is not null)
where ref_1.ol_number is null


###### QUERY 3A:

select
  ref_0.ol_delivery_d as c1
from
  public.order_line as ref_0
where EXISTS (
  select
    ref_1.i_im_id as c0
    from
    public.item as ref_1
    where ref_0.ol_d_id <=  ref_1.i_im_id
  )

 Execution plan:

[OLD version]
Nested Loop Semi Join  (cost=0.00..90020417940.08 rows=30005835 width=8) (actual time=0.034..24981.895 rows=90017507 loops=1)
  Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
  ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.04 rows=90017504 width=12) (actual time=0.022..7145.811 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..2771.00 rows=100000 width=4) (actual time=0.000..0.000 rows=1 loops=90017507)
        ->  Seq Scan on item ref_1  (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.006..0.006 rows=1 loops=1)

Planning time: 0.290 ms
Execution time: 27241.239 ms

[NEW version]
Gather  (cost=1000.00..88047487498.82 rows=30005835 width=8) (actual time=0.265..82355.289 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=0.00..88044485915.32 rows=12502431 width=8) (actual time=0.033..68529.259 rows=30005836 loops=3)
        Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id)
        ->  Parallel Seq Scan on order_line ref_0  (cost=0.00..1486400.93 rows=37507293 width=12) (actual time=0.023..2789.901 rows=30005836 loops=3)
        ->  Seq Scan on item ref_1  (cost=0.00..2271.00 rows=100000 width=4) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.319 ms
Execution Time: 85081.158 ms


###### QUERY 3B:


select
  ref_0.ol_i_id as c0
from
  public.order_line as ref_0
where EXISTS (
  select
    ref_0.ol_delivery_d as c0
    from
    public.order_line as ref_1
    where ref_1.ol_d_id <= cast(nullif(ref_1.ol_o_id, ref_0.ol_i_id) as int4))

Execution plan:

[OLD version]
Nested Loop Semi Join  (cost=0.00..115638730740936.53 rows=30005835 width=4) (actual time=0.017..27009.302 rows=90017507 loops=1)
  Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))
  Rows Removed by Join Filter: 11557
  ->  Seq Scan on order_line ref_0  (cost=0.00..2011503.04 rows=90017504 width=4) (actual time=0.009..7199.540 rows=90017507 loops=1)
  ->  Materialize  (cost=0.00..2813221.56 rows=90017504 width=8) (actual time=0.000..0.000 rows=1 loops=90017507)
        ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.002 rows=14 loops=1)

Planning time: 0.252 ms
Execution time: 29363.737 ms

[NEW version]
Gather  (cost=1000.00..84060490326155.39 rows=30005835 width=4) (actual time=0.272..71712.491 rows=90017507 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop Semi Join  (cost=0.00..84060487324571.89 rows=12502431 width=4) (actual time=0.046..60153.472 rows=30005836 loops=3)
        Join Filter: (ref_1.ol_d_id <= NULLIF(ref_1.ol_o_id, ref_0.ol_i_id))
        Rows Removed by Join Filter: 1717
        ->  Parallel Seq Scan on order_line ref_0  (cost=0.00..1486400.93 rows=37507293 width=4) (actual time=0.023..2819.361 rows=30005836 loops=3)
        ->  Seq Scan on order_line ref_1  (cost=0.00..2011503.04 rows=90017504 width=8) (actual time=0.001..0.001 rows=1 loops=90017507)

Planning Time: 0.334 ms
Execution Time: 74110.942 ms