Postgresql Sort cost Poor performance?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgresql Sort cost Poor performance?

tank.zhang
1、postgresql version

qis3_dp2=> select * from version();
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

qis3_dp2=>

2、postgresql work_mem


qis3_dp2=> SHOW work_mem;
 work_mem
----------
 2GB
(1 row)

qis3_dp2=> SHOW  shared_buffers;
 shared_buffers
----------------
 4028MB
(1 row)

qis3_dp2=>

3、Table count

qis3_dp2=> select count(*) from  QIS_CARPASSEDSTATION;
  count  
----------
 11453079
(1 row)

qis3_dp2=>

4、table desc

qis3_dp2=> \dS QIS_CARPASSEDSTATION;
                   Table "qis_schema.qis_carpassedstation"
    Column    |            Type             | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+---------
 iid          | integer                     |           | not null |
 scartypecd   | character varying(50)       |           |          |
 svin         | character varying(20)       |           |          |
 sstationcd   | character varying(50)       |           |          |
 dpassedtime  | timestamp(6) with time zone |           |          |
 dworkdate    | date                        |           |          |
 iworkyear    | integer                     |           |          |
 iworkmonth   | integer                     |           |          |
 iweek        | integer                     |           |          |
 sinputteamcd | character varying(20)       |           |          |
 sinputdutycd | character varying(20)       |           |          |
 smtoc        | character varying(50)       |           |          |
 slineno      | character varying(18)       |           |          |
Indexes:
    "qis_carpassedstation_pkey" PRIMARY KEY, btree (iid)
    "q_carp_dworkdate" btree (dworkdate)
    "q_carp_smtoc" btree (smtoc)

qis3_dp2=>

5、Execute SQL:
qis3_dp2=> EXPLAIN (analyze true,buffers true)   SELECT COUNT(DISTINCT SVIN)
AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD
= 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by  SMTOC
;
                                                                                                                         
QUERY PLAN                                                                              
                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 GroupAggregate  (cost=697738.61..714224.02 rows=372 width=30) (actual
time=5908.786..32420.412 rows=410 loops=1)
   Group Key: smtoc
   Buffers: shared hit=401 read=184983
   I/O Timings: read=1377.762
   ->  Sort  (cost=697738.61..703232.51 rows=2197559 width=40) (actual
time=5907.791..6139.351 rows=2142215 loops=1)
         Sort Key: smtoc
         Sort Method: quicksort  Memory: 265665kB
         Buffers: shared hit=401 read=184983
         I/O Timings: read=1377.762
         ->  Gather  (cost=1000.00..466253.56 rows=2197559 width=40) (actual
time=0.641..1934.614 rows=2142215 loops=1)
               Workers Planned: 5
               Workers Launched: 5
               Buffers: shared hit=401 read=184983
               I/O Timings: read=1377.762
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..245497.66 rows=439512 width=40) (actual time=0.245..1940.527
rows=357036 loops=6)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1551811
                     Buffers: shared hit=401 read=184983
                     I/O Timings: read=1377.762
 Planning Time: 0.393 ms
 Execution Time: 32439.704 ms
(21 rows)

qis3_dp2=>


6、Why does sort take a long time to execute and how can you optimize it?
Thanks!!!






--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

Andrew Gierth
>>>>> "tank" == tank zhang <[hidden email]> writes:

 tank>  smtoc        | character varying(50)       |           |          |
 tank>          Sort Key: smtoc

What is the output of  SHOW lc_collate;

One of the most common reasons for slow sorting is that you're sorting a
text/varchar field in a locale other than C. The slowdown for using
other locales varies according to the data, the locale, and the
operating system, but 8-20x slowdowns are very common, 50-100x slowdowns
are not unusual, and there have been reports of even worse cases with
unusual script combinations.

--
Andrew (irc:RhodiumToad)


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

tank.zhang
Thank you for your reply.

qis3_dp2=>  SHOW lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

Time: 0.311 ms
qis3_dp2=>

qis3_dp2=> SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC FROM
QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND
A.SLINYYY-MM-DD') AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')
group by  SMTOC
;
 checkcarnum |         smtoc        
-------------+-----------------------
          90 | HT6LHD700 NH731P    A
         690 | HT6LHD700 NH788P    A
          90 | HT6LHD700 R550P     A
          30 | HT6LHD700 YR615M    A
        1141 | HT6MHB700 NH731P    A



Is there any possibility of optimization?





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

Gavin Flower-2
In reply to this post by Andrew Gierth
On 01/04/2019 23:20, Andrew Gierth wrote:

>>>>>> "tank" == tank zhang <[hidden email]> writes:
>   tank>  smtoc        | character varying(50)       |           |          |
>   tank>          Sort Key: smtoc
>
> What is the output of  SHOW lc_collate;
>
> One of the most common reasons for slow sorting is that you're sorting a
> text/varchar field in a locale other than C. The slowdown for using
> other locales varies according to the data, the locale, and the
> operating system, but 8-20x slowdowns are very common, 50-100x slowdowns
> are not unusual, and there have been reports of even worse cases with
> unusual script combinations.
>
Just wondering...

Would it be possible to optionally enable the system to create a hidden
system column for the text field to be sorted, the new column would be
the original column preprocessed to sort correctly & efficiently.  This
would seem to lead to a massive improvement in performance.

Depending relative tradeoffs disk storage vs processing:
(A) create hidden system column for each sort invocation
(B) at table creation
(C) other possibilities

(A) could be done automatically, and possibly controlled via a GUC parameter
(B) might require a change to the CREATE TABLE syntax

Anyhow, just some thoughts...


Cheers,
Gavin



Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

Christoph Berg-2
Re: Gavin Flower 2019-04-01 <[hidden email]>
> Would it be possible to optionally enable the system to create a hidden
> system column for the text field to be sorted, the new column would be the
> original column preprocessed to sort correctly & efficiently.

That's the idea behind the strxfrm(3) optimization that ultimately got
disabled again for non-C locales because glibc fails to implement it
correctly.

Christoph


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

Ramón Bastidas
In reply to this post by tank.zhang
Hi, 

If your problem is the sort, try creating an index on the Field that you consider thst could be needed (you can  star with smtoc that is the one you are grouping and sorting) 

Another thing that i noticed is your work_mem, I thing is too high for a global config (if you think 2gb can hel for this operation you can set it before execute the query but only for that session), but generally this value most be smaller depending on the commons query every sub query uses that amount of mem (i.e if you have a query that have 3 subqueries and each one with a sort operation and a grouping operation, you can be using 12 gb of mem in that only big query, and it doesn't mean it will be faster).. try to monitor the uses of ram by pgsql maybe you can be suffering paging problems because os the size of you work_mem and that make the dbms slow too



On Mon, Apr 1, 2019, 6:45 AM tank.zhang <[hidden email]> wrote:
1、postgresql version

qis3_dp2=> select * from version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
(1 row)

qis3_dp2=>

2、postgresql work_mem


qis3_dp2=> SHOW work_mem;
 work_mem
----------
 2GB
(1 row)

qis3_dp2=> SHOW  shared_buffers;
 shared_buffers
----------------
 4028MB
(1 row)

qis3_dp2=>

3、Table count

qis3_dp2=> select count(*) from  QIS_CARPASSEDSTATION;
  count   
----------
 11453079
(1 row)

qis3_dp2=>

4、table desc

qis3_dp2=> \dS QIS_CARPASSEDSTATION;
                   Table "qis_schema.qis_carpassedstation"
    Column    |            Type             | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+---------
 iid          | integer                     |           | not null |
 scartypecd   | character varying(50)       |           |          |
 svin         | character varying(20)       |           |          |
 sstationcd   | character varying(50)       |           |          |
 dpassedtime  | timestamp(6) with time zone |           |          |
 dworkdate    | date                        |           |          |
 iworkyear    | integer                     |           |          |
 iworkmonth   | integer                     |           |          |
 iweek        | integer                     |           |          |
 sinputteamcd | character varying(20)       |           |          |
 sinputdutycd | character varying(20)       |           |          |
 smtoc        | character varying(50)       |           |          |
 slineno      | character varying(18)       |           |          |
Indexes:
    "qis_carpassedstation_pkey" PRIMARY KEY, btree (iid)
    "q_carp_dworkdate" btree (dworkdate)
    "q_carp_smtoc" btree (smtoc)

qis3_dp2=>

5、Execute SQL:
qis3_dp2=> EXPLAIN (analyze true,buffers true)   SELECT COUNT(DISTINCT SVIN)
AS CHECKCARNUM ,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD
= 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by  SMTOC
;

QUERY PLAN                                                                             

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 GroupAggregate  (cost=697738.61..714224.02 rows=372 width=30) (actual
time=5908.786..32420.412 rows=410 loops=1)
   Group Key: smtoc
   Buffers: shared hit=401 read=184983
   I/O Timings: read=1377.762
   ->  Sort  (cost=697738.61..703232.51 rows=2197559 width=40) (actual
time=5907.791..6139.351 rows=2142215 loops=1)
         Sort Key: smtoc
         Sort Method: quicksort  Memory: 265665kB
         Buffers: shared hit=401 read=184983
         I/O Timings: read=1377.762
         ->  Gather  (cost=1000.00..466253.56 rows=2197559 width=40) (actual
time=0.641..1934.614 rows=2142215 loops=1)
               Workers Planned: 5
               Workers Launched: 5
               Buffers: shared hit=401 read=184983
               I/O Timings: read=1377.762
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..245497.66 rows=439512 width=40) (actual time=0.245..1940.527
rows=357036 loops=6)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1551811
                     Buffers: shared hit=401 read=184983
                     I/O Timings: read=1377.762
 Planning Time: 0.393 ms
 Execution Time: 32439.704 ms
(21 rows)

qis3_dp2=>


6、Why does sort take a long time to execute and how can you optimize it?
Thanks!!!






--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

tank.zhang
1、DISTINCT response time is fast without being added
 
qis3_dp2=# SELECT COUNT(*) AS CHECKCARNUM FROM QIS_CARPASSEDSTATION A WHERE
1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD');
 checkcarnum
-------------
     2142215
(1 row)

*Time: 2237.970 ms (00:02.238)*
qis3_dp2=#

2、 Adding a DISTINCT response time was very slow

qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD');
 checkcarnum
-------------
     1071367
(1 row)

*Time: 38979.246 ms (00:38.979)*





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

David Rowley-3
On Tue, 2 Apr 2019 at 20:00, tank.zhang <[hidden email]> wrote:

> 2、 Adding a DISTINCT response time was very slow
>
> qis3_dp2=# SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM FROM
> QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO
> IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
> TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
> TO_DATE('2019-03-11','YYYY-MM-DD');
>  checkcarnum
> -------------
>      1071367
> (1 row)

That's because of how DISTINCT is implemented within an aggregate
function in PostgreSQL. Internally within the aggregate code in the
executor, a sort is performed on the entire input to the aggregate
node.  The planner is currently unable to make use of any indexes that
provide pre-sorted input.

One way to work around this would be to perform the DISTINCT and
COUNT(*) in separate stages using a subquery.

From your original query, something like:

SELECT COUNT(SVIN) AS CHECKCARNUM,SMTOC
FROM (
SELECT SMTOC,SVIN
FROM QIS_CARPASSEDSTATION A
WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG'
AND A.SLINENO IN ( '1F' , '2F' , '3F' )
AND A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD')
AND A.DWORKDATE <= TO_DATE('2019-03-11','YYYY-MM-DD')
GROUP BY  SMTOC,SVIN
) A GROUP BY SMTOC;

An index something like:
CREATE INDEX ON QIS_CARPASSEDSTATION (SMTOC, SVIN, SSTATIONCD, DWORKDATE);

Should help speed up the subquery and provide pre-sorted input to the
outer aggregate.  If you like, you could add SLINENO to the end of the
index to allow an index-only scan which may result in further
performance improvements.

Without the index, you're forced to sort, but at least it's just one
sort instead of two.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Postgresql Sort cost Poor performance?

tank.zhang
Thank you  replay!

I tried to use the TMP table is very fast .    thank you


qis3_dp2=# explain analyze SELECT COUNT(*),SMTOC FROM ( SELECT
DISTINCT(SVIN) AS CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1
AND A.SSTATIONCD = 'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND
A.DWORKDATE >= TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD')) AS TEMP  group by SMTOC;
                                                                                                                         
QUERY PLAN                                                                              
                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 HashAggregate  (cost=691386.41..691388.41 rows=200 width=30) (actual
time=4090.951..4091.027 rows=410 loops=1)
   Group Key: a.smtoc
   ->  HashAggregate  (cost=666561.44..676491.43 rows=992999 width=40)
(actual time=3481.712..3794.213 rows=1071367 loops=1)
         Group Key: a.svin, a.smtoc
         ->  Gather  (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.657..1722.814 rows=2142215 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..445848.83 rows=523125 width=40) (actual time=65.187..2287.739
rows=428443 loops=5)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1862173
 Planning Time: 0.513 ms
 Execution Time: 4147.542 ms
(12 rows)

Time: 4148.852 ms (00:04.149)
qis3_dp2=#


qis3_dp2=# SELECT COUNT(*),SMTOC FROM ( SELECT DISTINCT(SVIN) AS
CHECKCARNUM,SMTOC FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD =
'VQ3_LYG' AND A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD')) AS TEMP  group by SMTOC;

**Time: 3223.935 ms (00:03.224)**


2、 Before

qis3_dp2=# explain analyze SELECT COUNT(DISTINCT SVIN) AS CHECKCARNUM ,SMTOC
FROM QIS_CARPASSEDSTATION A WHERE 1=1 AND A.SSTATIONCD = 'VQ3_LYG' AND
A.SLINENO IN ( '1F' , '2F' , '3F' ) AND A.DWORKDATE >=
TO_DATE('2017-02-11','YYYY-MM-DD') AND A.DWORKDATE <=
TO_DATE('2019-03-11','YYYY-MM-DD') group by  SMTOC;
                                                                                                                         
QUERY PLAN                                                                              
                                             
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
 GroupAggregate  (cost=875778.02..891475.55 rows=377 width=30) (actual
time=6400.991..33314.132 rows=410 loops=1)
   Group Key: smtoc
   ->  Sort  (cost=875778.02..881009.28 rows=2092501 width=40) (actual
time=6399.993..6626.151 rows=2142215 loops=1)
         Sort Key: smtoc
         Sort Method: quicksort  Memory: 265665kB
         ->  Gather  (cost=1000.00..656098.93 rows=2092501 width=40) (actual
time=0.557..2467.778 rows=2142215 loops=1)
               Workers Planned: 4
               Workers Launched: 4
               ->  Parallel Seq Scan on qis_carpassedstation a
(cost=0.00..445848.83 rows=523125 width=40) (actual time=66.908..2428.397
rows=428443 loops=5)
                     Filter: (((sstationcd)::text = 'VQ3_LYG'::text) AND
((slineno)::text = ANY ('{1F,2F,3F}'::text[])) AND (dworkdate >=
to_date('2017-02-11'::text, 'YYYY-MM-DD'::text)) AND (dworkdate <= to_da
te('2019-03-11'::text, 'YYYY-MM-DD'::text)))
                     Rows Removed by Filter: 1862173
 Planning Time: 0.457 ms
 Execution Time: 33335.429 ms
(13 rows)
*
Time: 33336.720 ms (00:33.337)*
qis3_dp2=#











--
Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html