Slow statement using parallelism after 9.6>11 upgrade

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
Hi list

We have upgraded our database from 9.6 to 11 (and updated PostGIS from
2.3 to 2.5 as well).
We are using it among other things to store an OpenStreetMap database,
populated by the osm2pgsql program and updated on a daily basis.

The process used to take ~1h30 minutes before the upgrade, but after the
upgrade it looks like it will never run to completion in a 24h time frame.

The problem is apparently that the planner choses to run some statements
in parallel, but these take a lot longer than when run without it.
Here are the to explain analyze of the same statement, with parallelism
on and off :

Statement :
PREPARE mark_ways_by_node(int8) AS select id from planet_osm_ways WHERE
nodes && ARRAY[$1];
EXPLAIN EXECUTE mark_ways_by_node(1);


max_parallel_workers_per_gather = 2 :

Gather  (cost=12545.61..2357352.25 rows=1420982 width=8) (actual
time=2.577..64.028 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on planet_osm_ways
(cost=11545.61..2214254.05 rows=592076 width=8) (actual
time=0.213..0.213 rows=0 loops=3)
         Recheck Cond: (nodes && '{1}'::bigint[])
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.173..0.174
rows=1 loops=1)
               Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 8.596 ms
Execution Time: 64.135 ms


max_parallel_workers_per_gather = 0 :

Bitmap Heap Scan on planet_osm_ways  (cost=11545.61..3462154.46
rows=1420982 width=8) (actual time=0.677..0.679 rows=1 loops=1)
   Recheck Cond: (nodes && '{1}'::bigint[])
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
rows=1 loops=1)
         Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 0.193 ms
Execution Time: 0.773 ms


So this Parallel Bitmap Heap Scan seems to be quite problematic here.
What runtime setting should I change to fix this, without losing the
benefit of parallelism ?

Thanks a lot !

Cheers
--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Tom Lane-2
"Arnaud L." <[hidden email]> writes:
> We have upgraded our database from 9.6 to 11 (and updated PostGIS from
> 2.3 to 2.5 as well).
> ...

Have you re-ANALYZEd the database?  The problem with this query
seems to be the spectacularly awful rowcount estimate here:

>    ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
> rows=1 loops=1)
>          Index Cond: (nodes && '{1}'::bigint[])

The planner should be able to do better than that, given up-to-date
statistics on the "nodes" column.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
Le 03/09/2019 à 15:43, Tom Lane a écrit :
> "Arnaud L." <[hidden email]> writes:
>> We have upgraded our database from 9.6 to 11 (and updated PostGIS from
>> 2.3 to 2.5 as well).
>> ...
>
> Have you re-ANALYZEd the database?  The problem with this query
> seems to be the spectacularly awful rowcount estimate here:

You mean after the upgrade process ? Yes I have.
I've juste re-run "ANALYZE table" to rule this out, estimates are the same.

Maybe some statistic target problem ? Estimated number of rows is
284.196.352
Also, this is a GIN index on a bigint[] column.

I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for
the time being which solves this specific problem. These value don't
look very sensible though, they are very high compared to the default ones.

Cheers
--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Paul Ramsey


> On Sep 3, 2019, at 7:06 AM, Arnaud L. <[hidden email]> wrote:
>
> Le 03/09/2019 à 15:43, Tom Lane a écrit :
>> "Arnaud L." <[hidden email]> writes:
>>> We have upgraded our database from 9.6 to 11 (and updated PostGIS from 2.3 to 2.5 as well).
>>> ...
>> Have you re-ANALYZEd the database?  The problem with this query
>> seems to be the spectacularly awful rowcount estimate here:
>
> You mean after the upgrade process ? Yes I have.
> I've juste re-run "ANALYZE table" to rule this out, estimates are the same.
>
> Maybe some statistic target problem ? Estimated number of rows is 284.196.352
> Also, this is a GIN index on a bigint[] column.
>
> I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the time being which solves this specific problem. These value don't look very sensible though, they are very high compared to the default ones.

You can also leave that setting unchanged and just change the behaviour on your one table:

ALTER TABLE nodes SET ( parallel_workers = 0);

P.

>
> Cheers
> --
> Arnaud
>
>



Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Imre Samu
In reply to this post by Arnaud L.
> We have upgraded our database from 9.6 to 11 

This is now the latest PG ?   PG11.5?
( for example - in PG11.5 fixed:  "Fix failure to resolve deadlocks involving multiple parallel worker processes"
> populated by the osm2pgsql program and updated on a daily basis.
> What runtime setting should I change to fix this, without losing the  benefit of parallelism ?

- the osm2pgsql has an own parallelizations ...  ( osm2pgsql  --number-processes .. ) 
   so be careful to add more parallelisation to the PG side with the same osm2pgsql parameters !   ( check the memory usages! )
- check the benchmarks and the tunnings:  https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
- you can ask help on the  : "osm dev mail list" (="dev OpenStreetMap developer discusssion" https://lists.openstreetmap.org/listinfo/dev )

Imre




Arnaud L. <[hidden email]> ezt írta (időpont: 2019. szept. 3., K, 14:11):
Hi list

We have upgraded our database from 9.6 to 11 (and updated PostGIS from
2.3 to 2.5 as well).
We are using it among other things to store an OpenStreetMap database,
populated by the osm2pgsql program and updated on a daily basis.

The process used to take ~1h30 minutes before the upgrade, but after the
upgrade it looks like it will never run to completion in a 24h time frame.

The problem is apparently that the planner choses to run some statements
in parallel, but these take a lot longer than when run without it.
Here are the to explain analyze of the same statement, with parallelism
on and off :

Statement :
PREPARE mark_ways_by_node(int8) AS select id from planet_osm_ways WHERE
nodes && ARRAY[$1];
EXPLAIN EXECUTE mark_ways_by_node(1);


max_parallel_workers_per_gather = 2 :

Gather  (cost=12545.61..2357352.25 rows=1420982 width=8) (actual
time=2.577..64.028 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Bitmap Heap Scan on planet_osm_ways
(cost=11545.61..2214254.05 rows=592076 width=8) (actual
time=0.213..0.213 rows=0 loops=3)
         Recheck Cond: (nodes && '{1}'::bigint[])
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.173..0.174
rows=1 loops=1)
               Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 8.596 ms
Execution Time: 64.135 ms


max_parallel_workers_per_gather = 0 :

Bitmap Heap Scan on planet_osm_ways  (cost=11545.61..3462154.46
rows=1420982 width=8) (actual time=0.677..0.679 rows=1 loops=1)
   Recheck Cond: (nodes && '{1}'::bigint[])
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
rows=1 loops=1)
         Index Cond: (nodes && '{1}'::bigint[])
Planning Time: 0.193 ms
Execution Time: 0.773 ms


So this Parallel Bitmap Heap Scan seems to be quite problematic here.
What runtime setting should I change to fix this, without losing the
benefit of parallelism ?

Thanks a lot !

Cheers
--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
In reply to this post by Paul Ramsey
Le 03/09/2019 à 16:39, Paul Ramsey a écrit :
>> On Sep 3, 2019, at 7:06 AM, Arnaud L. <[hidden email]> wrote:
>> I've setup parallel_tuple_cost to 1.0 parallel_setup_cost to 5000.0 for the time being which solves this specific problem. These value don't look very sensible though, they are very high compared to the default ones.
>
> You can also leave that setting unchanged and just change the behaviour on your one table:
>
> ALTER TABLE nodes SET ( parallel_workers = 0);

Nice, this will fix this precise query and leave parallelism in place
for the other ones. Thanks !

--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
In reply to this post by Imre Samu
Le 03/09/2019 à 17:02, Imre Samu a écrit :
>  > We have upgraded our database from 9.6 to 11
>
> This is now the latest PG ?   PG11.5?

Yes, PostgreSQL 11.5, compiled by Visual C++ build 1914, 64-bit.


> - the osm2pgsql has an own parallelizations ...  ( osm2pgsql  
> --number-processes .. )
>     so be careful to add more parallelisation to the PG side with the
> same osm2pgsql parameters !   ( check the memory usages! )
> - check the benchmarks and the tunnings:
> https://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks
> - you can ask help on the  : "osm dev mail list" (="dev OpenStreetMap
> developer discusssion" https://lists.openstreetmap.org/listinfo/dev )


Yes, I know. It as been fine tuned over some time now to fit our
server's capacity.
Disabling parallelisme has allowed the update to run in ~5000s, which is
consistent with what we could witness before the upgrade.

I had started writing an issue on the osm2pgsql github before realizing
it was probably a PG-only issue.


Cheers
--
Arnaud



Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
In reply to this post by Tom Lane-2
Le 03/09/2019 à 15:43, Tom Lane a écrit :

> "Arnaud L." <[hidden email]> writes:
>> We have upgraded our database from 9.6 to 11 (and updated PostGIS from
>> 2.3 to 2.5 as well).
>> ...
>
> Have you re-ANALYZEd the database?  The problem with this query
> seems to be the spectacularly awful rowcount estimate here:
>
>>    ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
>> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
>> rows=1 loops=1)
>>          Index Cond: (nodes && '{1}'::bigint[])
>
> The planner should be able to do better than that, given up-to-date
> statistics on the "nodes" column.


Tom, I can confirm that with up to date statistics the planner is still
lost.
I did a REINDEX to rule out a broken index and the estimate is still in
the 100k+ range.


Regards
--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
Le 04/09/2019 à 09:04, Arnaud L. a écrit :
> Tom, I can confirm that with up to date statistics the planner is still
> lost.
> I did a REINDEX to rule out a broken index and the estimate is still in
> the 100k+ range.

Sorry, I meant 1M+ range.

EXPLAIN ANALYZE select id from planet_osm_ways WHERE nodes &&
ARRAY[123456789::bigint];

(parallel_workers = 0 on the table as per Paul's recommandation) :

Bitmap Heap Scan on planet_osm_ways  (cost=11582.45..3535447.30
rows=1419000 width=8) (actual time=0.198..0.199 rows=1 loops=1)
   Recheck Cond: (nodes && '{123456789}'::bigint[])
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
(cost=0.00..11227.70 rows=1419000 width=0) (actual time=0.151..0.151
rows=1 loops=1)
         Index Cond: (nodes && '{123456789}'::bigint[])
Planning Time: 0.260 ms
Execution Time: 0.249 ms


Regards
--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Re: Slow statement using parallelism after 9.6>11 upgrade

Arnaud L.
In reply to this post by Arnaud L.
Le 04/09/2019 à 09:04, Arnaud L. a écrit :
> Le 03/09/2019 à 15:43, Tom Lane a écrit :
>> The planner should be able to do better than that, given up-to-date
>> statistics on the "nodes" column.
>
> Tom, I can confirm that with up to date statistics the planner is still
> lost.
> I did a REINDEX to rule out a broken index and the estimate is still in
> the 100k+ range.


Setting STATISTICS target to 10.000 on the nodes column and re-analyzing
did not help either. Estimates are still ~1.4M rows when result is 1 row.

Regards
--
Arnaud


Reply | Threaded
Open this post in threaded view
|

Bad estimates on GIN bigint[] index

Arnaud L.
In reply to this post by Tom Lane-2
Le 03/09/2019 à 15:43, Tom Lane a écrit :
> "Arnaud L." <[hidden email]> writes:
>>    ->  Bitmap Index Scan on planet_osm_ways_nodes_idx
>> (cost=0.00..11190.36 rows=1420982 width=0) (actual time=0.268..0.268
>> rows=1 loops=1)
>>          Index Cond: (nodes && '{1}'::bigint[])
>
> The planner should be able to do better than that, given up-to-date
> statistics on the "nodes" column.


Sorry to up this thread, but is there anything I can do to help the
planner in this particular case ?
REINDEXing did not help, nor did ANALYZEing with different STATISTICS
target for this specific column.


Regards
--
Arnaud