Aggregate and many LEFT JOIN

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Aggregate and many LEFT JOIN

kimaidou
Hi all,

I need to optimize the following query
I use it to create  a materialized view, but I think there is room for optimization.
I tried to
SET join_collapse_limit TO 15;
with to real difference.

Explain shows that the GROUP AGGREGATE and needed sort kill the performance.
Do you have any hint how to optimize this ?

Regards
Michaël
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Justin Pryzby
On Fri, Feb 22, 2019 at 04:14:05PM +0100, kimaidou wrote:
> Explain shows that the GROUP AGGREGATE and needed sort kill the performance.
> Do you have any hint how to optimize this ?
> https://explain.depesz.com/s/6nf

This is writing 2GB tempfile, perhaps the query would benefit from larger
work_mem:
|Sort (cost=3,014,498.66..3,016,923.15 rows=969,796 width=1,818) (actual time=21,745.193..22,446.561 rows=1,212,419 loops=1)
|   Sort Method: external sort Disk: 1782200kB
|   Buffers: shared hit=5882951, temp read=230958 written=230958

This is apparently joining without indices:
|Nested Loop Left Join (cost=1.76..360,977.37 rows=321,583 width=1,404) (actual time=0.080..1,953.007 rows=321,849 loops=1)
|   Join Filter: (tgc1.groupe_nom = t.group1_inpn)
|   Rows Removed by Join Filter: 965547
|   Buffers: shared hit=1486327

This perhaps should have an index on tgc2.groupe_type ?
|Index Scan using t_group_categorie_pkey on taxon.t_group_categorie tgc2 (cost=0.14..0.42 rows=1 width=28) (actual time=0.002..0.002 rows=1 loops=321,849)
|   Index Cond: (tgc2.groupe_nom = t.group2_inpn)
|   Filter: (tgc2.groupe_type = 'group2_inpn'::text)
|   Buffers: shared hit=643687

This would perhaps benefit from an index on tv.cd_ref ?
|Index Scan using taxref_consolide_non_filtre_cd_nom_idx on taxon.taxref_consolide_non_filtre tv (cost=0.42..0.63 rows=1 width=94) (actual time=0.002..0.002 rows=1 loops=690,785)
|   Index Cond: (tv.cd_nom = t.cd_ref)
|   Filter: (tv.cd_nom = tv.cd_ref)
|   Buffers: shared hit=2764875

I don't think it's causing a significant fraction of the issue, but for some
reason this is overestimating rowcount by 2000.  Do you need to VACUUM ANALYZE
the table ?
|Seq Scan on occtax.personne p_1 (cost=0.00..78.04 ROWS=2,204 width=56) (actual time=0.011..0.011 ROWS=1 loops=1)

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Michael Lewis
Curious- Is geqo_threshold still set to 12? Is increasing join_collapse_limit to be higher than geqo_threshold going to have a noticeable impact?

The disk sorts are the killer as Justin says. I wonder how it performs with that increased significantly. Is the storage SSD or traditional hard disks?

Michael Lewis

On Fri, Feb 22, 2019 at 8:54 AM Justin Pryzby <[hidden email]> wrote:
On Fri, Feb 22, 2019 at 04:14:05PM +0100, kimaidou wrote:
> Explain shows that the GROUP AGGREGATE and needed sort kill the performance.
> Do you have any hint how to optimize this ?
> https://explain.depesz.com/s/6nf

This is writing 2GB tempfile, perhaps the query would benefit from larger
work_mem:
|Sort (cost=3,014,498.66..3,016,923.15 rows=969,796 width=1,818) (actual time=21,745.193..22,446.561 rows=1,212,419 loops=1)
|   Sort Method: external sort Disk: 1782200kB
|   Buffers: shared hit=5882951, temp read=230958 written=230958

This is apparently joining without indices:
|Nested Loop Left Join (cost=1.76..360,977.37 rows=321,583 width=1,404) (actual time=0.080..1,953.007 rows=321,849 loops=1)
|   Join Filter: (tgc1.groupe_nom = t.group1_inpn)
|   Rows Removed by Join Filter: 965547
|   Buffers: shared hit=1486327

This perhaps should have an index on tgc2.groupe_type ?
|Index Scan using t_group_categorie_pkey on taxon.t_group_categorie tgc2 (cost=0.14..0.42 rows=1 width=28) (actual time=0.002..0.002 rows=1 loops=321,849)
|   Index Cond: (tgc2.groupe_nom = t.group2_inpn)
|   Filter: (tgc2.groupe_type = 'group2_inpn'::text)
|   Buffers: shared hit=643687

This would perhaps benefit from an index on tv.cd_ref ?
|Index Scan using taxref_consolide_non_filtre_cd_nom_idx on taxon.taxref_consolide_non_filtre tv (cost=0.42..0.63 rows=1 width=94) (actual time=0.002..0.002 rows=1 loops=690,785)
|   Index Cond: (tv.cd_nom = t.cd_ref)
|   Filter: (tv.cd_nom = tv.cd_ref)
|   Buffers: shared hit=2764875

I don't think it's causing a significant fraction of the issue, but for some
reason this is overestimating rowcount by 2000.  Do you need to VACUUM ANALYZE
the table ?
|Seq Scan on occtax.personne p_1 (cost=0.00..78.04 ROWS=2,204 width=56) (actual time=0.011..0.011 ROWS=1 loops=1)

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

kimaidou
Thanks for your answers.

I have tried via

--show work_mem; "10485kB"  -> initial work_mem for my first post
-- set session work_mem='100000kB';
-- set session geqo_threshold = 12;
-- set session join_collapse_limit = 15;

I have a small machine, with SSD disk and 8GB RAM. I cannot really increase work_mem up to 2GB (or more). There are only 300 000 data in occtax.observation, which will increase (and possibly go up to 3 millions...)
I am running PostgreSQL 9.6. I should probably test it against PostgreSQL 11 as many improvements has been made.

I even tried to remove all non aggregated columns and keep only o.cle_obs (the primary key) to have a
GROUP BY o.cle_obs
AND the query plan does not show a HASH AGGREGATE, but only a GROUP AGGREGATE.

Obviously I have already tried to VACUUM ANALYSE

My current PostgreSQL settings
max_connections = 100
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 10485kB
maintenance_work_mem = 512MB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100

Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Michael Lewis
Does the plan change significantly with this-

set session work_mem='250MB';
set session geqo_threshold = 20;
set session join_collapse_limit = 20;

With that expensive sort spilling to disk and then aggregating after that, it would seem like the work_mem being significantly increased is going to make the critical difference. Unless it could fetch the data sorted via an index, but that doesn't seem likely.

I would suggest increase default_statistics_target, but you have good estimates already for the most part. Hopefully someone else will chime in with more.

Michael Lewis
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Tom Lane-2
Michael Lewis <[hidden email]> writes:
> Does the plan change significantly with this-
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;

Yeah ... by my count there are 16 tables in this query, so raising
join_collapse_limit to 15 is not enough to ensure that the planner
considers all join orders.  Whether use of GEQO is a big problem
is harder to say, but it might be.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

kimaidou
Thanks for your answers. I tried with
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;

It seems to have no real impact :

Indeed an index cannot really be used for sorting here, based on the complexity of the returned fields.
Wich strikes me is that if I try to simplify it a lot, removing all data but the main table (occtax.observation) primary key cd_nom and aggregate, the query plan should be able tu use the cd_nom index for sorting and provide better query plan (hash aggregate), but it does not seems so :

* EXPLAIN : https://explain.depesz.com/s/FR3h -> a group aggregate is used, which : GroupAggregate     1     10,639.313 ms     72.6 %

It is better, but I think 10s for such a query seems bad perf for me.

Regards
Michaël

Le ven. 22 févr. 2019 à 19:06, Tom Lane <[hidden email]> a écrit :
Michael Lewis <[hidden email]> writes:
> Does the plan change significantly with this-
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;

Yeah ... by my count there are 16 tables in this query, so raising
join_collapse_limit to 15 is not enough to ensure that the planner
considers all join orders.  Whether use of GEQO is a big problem
is harder to say, but it might be.

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

kimaidou
I have better results with this version. Basically, I run a first query only made for aggregation, and then do a JOIN to get other needed data.


Not really "fast", but I gained 30%

Le lun. 25 févr. 2019 à 09:54, kimaidou <[hidden email]> a écrit :
Thanks for your answers. I tried with
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;

It seems to have no real impact :

Indeed an index cannot really be used for sorting here, based on the complexity of the returned fields.
Wich strikes me is that if I try to simplify it a lot, removing all data but the main table (occtax.observation) primary key cd_nom and aggregate, the query plan should be able tu use the cd_nom index for sorting and provide better query plan (hash aggregate), but it does not seems so :

* EXPLAIN : https://explain.depesz.com/s/FR3h -> a group aggregate is used, which : GroupAggregate     1     10,639.313 ms     72.6 %

It is better, but I think 10s for such a query seems bad perf for me.

Regards
Michaël

Le ven. 22 févr. 2019 à 19:06, Tom Lane <[hidden email]> a écrit :
Michael Lewis <[hidden email]> writes:
> Does the plan change significantly with this-
> set session work_mem='250MB';
> set session geqo_threshold = 20;
> set session join_collapse_limit = 20;

Yeah ... by my count there are 16 tables in this query, so raising
join_collapse_limit to 15 is not enough to ensure that the planner
considers all join orders.  Whether use of GEQO is a big problem
is harder to say, but it might be.

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Michael Lewis


On Mon, Feb 25, 2019 at 2:44 AM kimaidou <[hidden email]> wrote:
I have better results with this version. Basically, I run a first query only made for aggregation, and then do a JOIN to get other needed data.


Not really "fast", but I gained 30%
 

It still seems that disk sort and everything after that is where the query plan dies. It seems odd that it went to disk if work_mem was already 250MB. Can you allocate more as a test? As an alternative, if this is a frequently needed data, can you aggregate this data and keep a summarized copy updated periodically?
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

kimaidou
I manage to avoid the disk sort after performing a VACUUM ANALYSE;
And with a session work_mem = '250MB'


It stills spent 16s
It seems this kind of query will need better hardware to scale...

Thanks for your help


Le lun. 25 févr. 2019 à 19:30, Michael Lewis <[hidden email]> a écrit :


On Mon, Feb 25, 2019 at 2:44 AM kimaidou <[hidden email]> wrote:
I have better results with this version. Basically, I run a first query only made for aggregation, and then do a JOIN to get other needed data.


Not really "fast", but I gained 30%
 

It still seems that disk sort and everything after that is where the query plan dies. It seems odd that it went to disk if work_mem was already 250MB. Can you allocate more as a test? As an alternative, if this is a frequently needed data, can you aggregate this data and keep a summarized copy updated periodically?
Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Justin Pryzby
On Tue, Feb 26, 2019 at 01:54:00PM +0100, kimaidou wrote:
> I manage to avoid the disk sort after performing a VACUUM ANALYSE;
> And with a session work_mem = '250MB'
>
> * SQL http://paste.debian.net/1070207/
> * EXPLAIN https://explain.depesz.com/s/nJ2y
>
> It stills spent 16s
> It seems this kind of query will need better hardware to scale...

Once you've exhausted other ideas, you could consider making that a TEMPORARY
TABLE, and creating an index on it (and analyzing it) and then aggregating.
It'd be several separate queries.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Aggregate and many LEFT JOIN

Jeff Janes
In reply to this post by kimaidou
On Mon, Feb 25, 2019 at 3:54 AM kimaidou <[hidden email]> wrote:
 
Wich strikes me is that if I try to simplify it a lot, removing all data but the main table (occtax.observation) primary key cd_nom and aggregate, the query plan should be able tu use the cd_nom index for sorting and provide better query plan (hash aggregate), but it does not seems so :

HashAggregate doesn't support aggregates with DISTINCT.  I don't think there is any reason it can't, it is just that no one has gotten around to it.

Aggregates with DISTINCT also kill your ability to get parallel queries.

Cheers,

Jeff