BUG #16888: Reoccurring group by clauses makes system slower

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16888: Reoccurring group by clauses makes system slower

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16888
Logged by:          XINYU LIU
Email address:      [hidden email]
PostgreSQL version: 12.3
Operating system:   Ubuntu 20.04
Description:        

Hello,

We have 2 TPC-H queries which fetch the same tuples but have significant
query execution time differences (3.8 times).

We are sharing a pair of TPC-H queries that exhibit this performance
difference:

First query:
SELECT   "t13"."c_acctbal"
FROM     (
                    SELECT     "t11"."o_comment",
                               "t12"."c_nationkey",
                               "t12"."c_acctbal"
                    FROM       (
                                      SELECT *
                                      FROM   "orders"
                                      WHERE  "o_orderstatus" = 'F') AS "t11"

                    INNER JOIN
                               (
                                      SELECT *
                                      FROM   "customer"
                                      WHERE  "c_nationkey" <= 4
                                      AND    "c_name" =
'Customer#000000005') AS "t12"
                    ON         "t11"."o_custkey" > "t12"."c_custkey"
                    GROUP BY   "t11"."o_comment",
                               "t12"."c_nationkey",
                               "t12"."c_acctbal") AS "t13"
GROUP BY "t13"."c_acctbal" offset 8 rowsFETCH next 18 rows only

Second query:
SELECT     "t23"."c_acctbal"
FROM       (
                  SELECT *
                  FROM   "orders"
                  WHERE  "o_orderstatus" = 'F') AS "t22"
INNER JOIN
           (
                  SELECT *
                  FROM   "customer"
                  WHERE  "c_nationkey" <= 4
                  AND    "c_name" = 'Customer#000000005') AS "t23"
ON         "t22"."o_custkey" > "t23"."c_custkey"
GROUP BY   "t23"."c_acctbal" offset 8 rowsFETCH next 18 rows only

Actual Behavior
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes over 17 seconds, while the second query only takes 4.5 seconds.
We think the time difference results from different plans selected.
Specifically, in the first (slow) query, the DBMS performs both the inner
GROUP BY operation and the outer GROUP BY operation.

Query Execution Plan
First query:
                                                       QUERY PLAN          
                                                               
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=505911.54..505911.72 rows=18 width=6) (actual
time=17102.666..17102.666 rows=0 loops=1)
   ->  HashAggregate  (cost=505911.46..505913.46 rows=200 width=6) (actual
time=17102.661..17102.664 rows=1 loops=1)
         Group Key: customer.c_acctbal
         ->  Group  (cost=360749.03..490676.64 rows=1218786 width=59)
(actual time=9164.577..15427.569 rows=3568877 loops=1)
               Group Key: orders.o_comment, customer.c_nationkey,
customer.c_acctbal
               ->  Gather Merge  (cost=360749.03..483059.22 rows=1015656
width=59) (actual time=9164.573..13459.724 rows=3568877 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Group  (cost=359749.01..364827.29 rows=507828
width=59) (actual time=3119.972..4402.277 rows=1189626 loops=3)
                           Group Key: orders.o_comment,
customer.c_nationkey, customer.c_acctbal
                           ->  Sort  (cost=359749.01..361018.58 rows=507828
width=59) (actual time=3119.968..3676.916 rows=1218159 loops=3)
                                 Sort Key: orders.o_comment,
customer.c_nationkey, customer.c_acctbal
                                 Sort Method: external merge  Disk:
257528kB
                                 Worker 0:  Sort Method: quicksort  Memory:
25kB
                                 Worker 1:  Sort Method: quicksort  Memory:
25kB
                                 ->  Nested Loop  (cost=0.00..292526.20
rows=507828 width=59) (actual time=65.037..1141.994 rows=1218159 loops=3)
                                       Join Filter: (orders.o_custkey >
customer.c_custkey)
                                       Rows Removed by Join Filter: 8
                                       ->  Parallel Seq Scan on customer
(cost=0.00..22602.50 rows=1 width=14) (actual time=65.029..65.037 rows=0
loops=3)
                                             Filter: ((c_nationkey <= 4) AND
((c_name)::text = 'Customer#000000005'::text))
                                             Rows Removed by Filter:
250000
                                       ->  Seq Scan on orders
(cost=0.00..224219.21 rows=3656359 width=53) (actual time=0.018..2108.302
rows=3654501 loops=1)
                                             Filter: (o_orderstatus =
'F'::bpchar)
                                             Rows Removed by Filter:
3845499
 Planning Time: 0.750 ms
 Execution Time: 17146.464 ms
(26 rows)






Second query:
                                                                      QUERY
PLAN                                                                      

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=294796.05..294796.30 rows=1 width=6) (actual
time=4486.879..4486.879 rows=0 loops=1)
   ->  Group  (cost=294795.81..294796.05 rows=1 width=6) (actual
time=4486.876..4486.878 rows=1 loops=1)
         Group Key: customer.c_acctbal
         ->  Gather Merge  (cost=294795.81..294796.05 rows=2 width=6)
(actual time=4486.873..4487.007 rows=1 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Sort  (cost=293795.79..293795.80 rows=1 width=6) (actual
time=1559.177..1559.177 rows=0 loops=3)
                     Sort Key: customer.c_acctbal
                     Sort Method: quicksort  Memory: 25kB
                     Worker 0:  Sort Method: quicksort  Memory: 25kB
                     Worker 1:  Sort Method: quicksort  Memory: 25kB
                     ->  Partial HashAggregate  (cost=293795.77..293795.78
rows=1 width=6) (actual time=1559.140..1559.140 rows=0 loops=3)
                           Group Key: customer.c_acctbal
                           ->  Nested Loop  (cost=0.00..292526.20
rows=507828 width=6) (actual time=63.752..1014.437 rows=1218159 loops=3)
                                 Join Filter: (orders.o_custkey >
customer.c_custkey)
                                 Rows Removed by Join Filter: 8
                                 ->  Parallel Seq Scan on customer
(cost=0.00..22602.50 rows=1 width=10) (actual time=63.734..63.738 rows=0
loops=3)
                                       Filter: ((c_nationkey <= 4) AND
((c_name)::text = 'Customer#000000005'::text))
                                       Rows Removed by Filter: 250000
                                 ->  Seq Scan on orders
(cost=0.00..224219.21 rows=3656359 width=4) (actual time=0.045..2108.431
rows=3654501 loops=1)
                                       Filter: (o_orderstatus =
'F'::bpchar)
                                       Rows Removed by Filter: 3845499
 Planning Time: 0.418 ms
 Execution Time: 4487.097 ms
(24 rows)




Expected Behavior
Since these two queries are semantically equivalent, we were hoping that
PostgreSQL will evaluate them in roughly the same amount of time. It looks
to me that the query optimizer does not realize the inner GROUP BY clause is
unnecessary given the outer GROUP BY clause.

Test Environment
Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23
00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
PostgreSQL v12.3
Database: TPC-H benchmark (with scale factor 5)



Here are the steps for reproducing our observations:
       
1. Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
2. Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
    echo $i
    name=`echo $i|cut -d'.' -f1`
    psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING
'LATIN1';"
done
psql -d $db < dss_postgres.ri
3. Execute the queries

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16888: Reoccurring group by clauses makes system slower

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> We have 2 TPC-H queries which fetch the same tuples but have significant
> query execution time differences (3.8 times).
> ...
> Since these two queries are semantically equivalent, we were hoping that
> PostgreSQL will evaluate them in roughly the same amount of time. It looks
> to me that the query optimizer does not realize the inner GROUP BY clause is
> unnecessary given the outer GROUP BY clause.

TBH, you are wasting your time reporting these sorts of things as bugs.
They are not bugs, unless equivalent queries give different answers.
At most, they represent optimizations nobody has bothered to put in.

In cases where the problem is really a visibly-badly-written query
(e.g., redundant GROUP BY clauses, as here), it's unclear that we'd
accept such an optimization if it were proposed, because adding checks
for such cases would necessarily penalize well-written queries to benefit
badly-written ones.  The overhead would have to be extremely small, or
else the change would need to have some other benefit that accrues to
well-written queries as well as badly-written ones.

                        regards, tom lane