Planning performance problem (67626.278ms)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Planning performance problem (67626.278ms)

Krzysztof Plocharz
Hi

We have some very strange query planning problem. Long story short it
takes 67626.278ms just to plan. Query execution takes 12ms.

Query has 7 joins and 2 subselects.
It looks like the issue is not deterministic, sometimes is takes few ms
to plan the query.

One of the tables has 550,485,942 live tuples and 743,504,012 dead
tuples. Running ANALYZE on that tables solves the problem only temporarily.

Question is how can we debug what is going on?

Best Regards,
Krzysztof Płocharz


Reply | Threaded
Open this post in threaded view
|

RE: Planning performance problem (67626.278ms)

Igor Neyman

-----Original Message-----
From: Krzysztof Plocharz [mailto:[hidden email]]
Sent: Monday, April 08, 2019 10:11 AM
To: [hidden email]
Subject: Planning performance problem (67626.278ms)

Hi

We have some very strange query planning problem. Long story short it takes 67626.278ms just to plan. Query execution takes 12ms.

Query has 7 joins and 2 subselects.
It looks like the issue is not deterministic, sometimes is takes few ms to plan the query.

One of the tables has 550,485,942 live tuples and 743,504,012 dead tuples. Running ANALYZE on that tables solves the problem only temporarily.

Question is how can we debug what is going on?

Best Regards,
Krzysztof Płocharz

_______________________________________________________________________________________________

Why do you have to run Analyze? Did you turn off Autovacuum?

Regards,
Igor Neyman
Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Krzysztof Plocharz
No, Autovacuum is running.

On 2019/04/08 16:18, Igor Neyman wrote:

>
> -----Original Message-----
> From: Krzysztof Plocharz [mailto:[hidden email]]
> Sent: Monday, April 08, 2019 10:11 AM
> To: [hidden email]
> Subject: Planning performance problem (67626.278ms)
>
> Hi
>
> We have some very strange query planning problem. Long story short it takes 67626.278ms just to plan. Query execution takes 12ms.
>
> Query has 7 joins and 2 subselects.
> It looks like the issue is not deterministic, sometimes is takes few ms to plan the query.
>
> One of the tables has 550,485,942 live tuples and 743,504,012 dead tuples. Running ANALYZE on that tables solves the problem only temporarily.
>
> Question is how can we debug what is going on?
>
> Best Regards,
> Krzysztof Płocharz
>
> _______________________________________________________________________________________________
>
> Why do you have to run Analyze? Did you turn off Autovacuum?
>
> Regards,
> Igor Neyman
>


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Pavel Stehule
In reply to this post by Krzysztof Plocharz


po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <[hidden email]> napsal:
Hi

We have some very strange query planning problem. Long story short it
takes 67626.278ms just to plan. Query execution takes 12ms.

Query has 7 joins and 2 subselects.
It looks like the issue is not deterministic, sometimes is takes few ms
to plan the query.

One of the tables has 550,485,942 live tuples and 743,504,012 dead
tuples. Running ANALYZE on that tables solves the problem only temporarily.

Question is how can we debug what is going on?

please check your indexes against bloating. Planner get min and max from indexes and this operation is slow on bloat indexes.

but 67 sec is really slow - it can be some other other problem - it is real computer or virtual?

 

Best Regards,
Krzysztof Płocharz


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Justin Pryzby
On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:

> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <[hidden email]> napsal:
>
> > We have some very strange query planning problem. Long story short it
> > takes 67626.278ms just to plan. Query execution takes 12ms.
> >
> > Query has 7 joins and 2 subselects.
> > It looks like the issue is not deterministic, sometimes is takes few ms
> > to plan the query.
> >
> > One of the tables has 550,485,942 live tuples and 743,504,012 dead
> > tuples. Running ANALYZE on that tables solves the problem only temporarily.
> >
> > Question is how can we debug what is going on?
>
> please check your indexes against bloating. Planner get min and max from
> indexes and this operation is slow on bloat indexes.

I think that's from get_actual_variable_range(), right ?

If it's due to bloating, I think the first step would be to 1) vacuum right
now; and, 2) set more aggressive auto-vacuum, like ALTER TABLE t SET
(AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005).

What version postgres server ?

Justin


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Krzysztof Plocharz


On 2019/04/08 16:42, Justin Pryzby wrote:

> On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
>> po 8. 4. 2019 v 16:11 odes�latel Krzysztof Plocharz <[hidden email]> napsal:
>>
>>> We have some very strange query planning problem. Long story short it
>>> takes 67626.278ms just to plan. Query execution takes 12ms.
>>>
>>> Query has 7 joins and 2 subselects.
>>> It looks like the issue is not deterministic, sometimes is takes few ms
>>> to plan the query.
>>>
>>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
>>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
>>>
>>> Question is how can we debug what is going on?
>>
>> please check your indexes against bloating. Planner get min and max from
>> indexes and this operation is slow on bloat indexes.

Yes, we thought about this, there are over 700,000,000 dead tuples. But
as you said, it should not result in 67 second planning...

>
> I think that's from get_actual_variable_range(), right ?
>
> If it's due to bloating, I think the first step would be to 1) vacuum right
> now; and, 2) set more aggressive auto-vacuum, like ALTER TABLE t SET
> (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005).
>

We did pgrepack and it did help, but is it possible for
get_actual_variable_range to take over 60 seconds?
Is there any other workaround for this except for pgrepack/vacuum?

Anyway to actually debug this?

> What version postgres server ?
>
> Justin
>
>




On 2019/04/08 16:33, Pavel Stehule wrote:>
 >
 > po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz
 > <[hidden email] <mailto:[hidden email]>> napsal:
 >
 >     Hi
 >
 >     We have some very strange query planning problem. Long story short it
 >     takes 67626.278ms just to plan. Query execution takes 12ms.
 >
 >     Query has 7 joins and 2 subselects.
 >     It looks like the issue is not deterministic, sometimes is takes
few ms
 >     to plan the query.
 >
 >     One of the tables has 550,485,942 live tuples and 743,504,012 dead
 >     tuples. Running ANALYZE on that tables solves the problem only
 >     temporarily.
 >
 >     Question is how can we debug what is going on?
 >
 >
 > please check your indexes against bloating. Planner get min and max from
 > indexes and this operation is slow on bloat indexes.
 >
Yes, we thought about this, there are over 700,000,000 dead tuples. But
as you said, it should not result in 67 second planning...

 > but 67 sec is really slow - it can be some other other problem - it is
 > real computer or virtual?
 >
real, with pretty good specs: NVME drives, Six-Core AMD Opteron, 64GB of
ram. During testing system was mostly idle.


 >
 >     Best Regards,
 >     Krzysztof Płocharz
 >
 >


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Krzysztof Plocharz
In reply to this post by Pavel Stehule


On 2019/04/08 16:33, Pavel Stehule wrote:

>
>
> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz
> <[hidden email] <mailto:[hidden email]>> napsal:
>
>     Hi
>
>     We have some very strange query planning problem. Long story short it
>     takes 67626.278ms just to plan. Query execution takes 12ms.
>
>     Query has 7 joins and 2 subselects.
>     It looks like the issue is not deterministic, sometimes is takes few ms
>     to plan the query.
>
>     One of the tables has 550,485,942 live tuples and 743,504,012 dead
>     tuples. Running ANALYZE on that tables solves the problem only
>     temporarily.
>
>     Question is how can we debug what is going on?
>
>
> please check your indexes against bloating. Planner get min and max from
> indexes and this operation is slow on bloat indexes.
>

Yes, we thought about this, there are over 700,000,000 dead tuples. But
as you said, it should not result in 67 second planning...

> but 67 sec is really slow - it can be some other other problem - it is
> real computer or virtual?
>

real, with pretty good specs: NVME drives, Six-Core AMD Opteron, 64GB of
ram. During testing system was mostly idle.

>
>     Best Regards,
>     Krzysztof Płocharz
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Pavel Stehule
In reply to this post by Krzysztof Plocharz


po 8. 4. 2019 v 16:55 odesílatel Krzysztof Plocharz <[hidden email]> napsal:


On 2019/04/08 16:42, Justin Pryzby wrote:
> On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
>> po 8. 4. 2019 v 16:11 odes�latel Krzysztof Plocharz <[hidden email]> napsal:
>>
>>> We have some very strange query planning problem. Long story short it
>>> takes 67626.278ms just to plan. Query execution takes 12ms.
>>>
>>> Query has 7 joins and 2 subselects.
>>> It looks like the issue is not deterministic, sometimes is takes few ms
>>> to plan the query.
>>>
>>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
>>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
>>>
>>> Question is how can we debug what is going on?
>>
>> please check your indexes against bloating. Planner get min and max from
>> indexes and this operation is slow on bloat indexes.

Yes, we thought about this, there are over 700,000,000 dead tuples. But
as you said, it should not result in 67 second planning...

>
> I think that's from get_actual_variable_range(), right ?
>
> If it's due to bloating, I think the first step would be to 1) vacuum right
> now; and, 2) set more aggressive auto-vacuum, like ALTER TABLE t SET
> (AUTOVACUUM_VACUUM_SCALE_FACTOR=0.005).
>

We did pgrepack and it did help, but is it possible for
get_actual_variable_range to take over 60 seconds?
Is there any other workaround for this except for pgrepack/vacuum?

Anyway to actually debug this?

you can use perf and get a profile.




> What version postgres server ?
>
> Justin
>
>




On 2019/04/08 16:33, Pavel Stehule wrote:>
 >
 > po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz
 > <[hidden email] <mailto:[hidden email]>> napsal:
 >
 >     Hi
 >
 >     We have some very strange query planning problem. Long story short it
 >     takes 67626.278ms just to plan. Query execution takes 12ms.
 >
 >     Query has 7 joins and 2 subselects.
 >     It looks like the issue is not deterministic, sometimes is takes
few ms
 >     to plan the query.
 >
 >     One of the tables has 550,485,942 live tuples and 743,504,012 dead
 >     tuples. Running ANALYZE on that tables solves the problem only
 >     temporarily.
 >
 >     Question is how can we debug what is going on?
 >
 >
 > please check your indexes against bloating. Planner get min and max from
 > indexes and this operation is slow on bloat indexes.
 >
Yes, we thought about this, there are over 700,000,000 dead tuples. But
as you said, it should not result in 67 second planning...

 > but 67 sec is really slow - it can be some other other problem - it is
 > real computer or virtual?
 >
real, with pretty good specs: NVME drives, Six-Core AMD Opteron, 64GB of
ram. During testing system was mostly idle.


 >
 >     Best Regards,
 >     Krzysztof Płocharz
 >
 >


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Justin Pryzby
In reply to this post by Krzysztof Plocharz
On Mon, Apr 08, 2019 at 04:55:36PM +0200, Krzysztof Plocharz wrote:
> We did pgrepack and it did help, but is it possible for
> get_actual_variable_range to take over 60 seconds?

You have many tables being joined, perhaps in exhaustive search, so maybe
that's being called many times.

What version postgres server ?

Justin


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Jeremy Schneider-2
In reply to this post by Justin Pryzby
On 4/8/19 07:42, Justin Pryzby wrote:

> On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
>> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <[hidden email]> napsal:
>>
>>> We have some very strange query planning problem. Long story short it
>>> takes 67626.278ms just to plan. Query execution takes 12ms.
>>>
>>> Query has 7 joins and 2 subselects.
>>> It looks like the issue is not deterministic, sometimes is takes few ms
>>> to plan the query.
>>>
>>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
>>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
>>>
>>> Question is how can we debug what is going on?
>>
>> please check your indexes against bloating. Planner get min and max from
>> indexes and this operation is slow on bloat indexes.
>
> I think that's from get_actual_variable_range(), right ?

For what it's worth, I have seen a similar issue on Aurora PG 9.6 where
query planning took a very long time (multiple minutes). In this
particular case, there wasn't anything Aurora-specific about the call to
get_actual_variable_range. We weren't able to distinctly identify the
root cause or build a reproducible test case -- but we suspect that an
inefficiency might exist in community PostgreSQL code.

For debugging, a few ideas:

1) capture a stack with pstack or perf record --call-graph

2) capture the execution plan of the SQL w slow planning

3) capture detailed stats for all relations and objects involved

4) capture the usual info for bug reporting (preface section in docs)

A reproducible test case is the gold standard; I'm keeping my eyes open
for another case too.

For the slow planning case that I saw, the slow process was almost
entirely in this call stack (captured with perf record --call-graph):
...
index_fetch_heap
index_getnext
get_actual_variable_range
ineq_histogram_selectivity
scalarineqsel
mergejoinscansel
initial_cost_mergejoin
try_mergejoin_path
add_paths_to_joinrel
make_join_rel
join_search_one_level
standard_join_search
make_one_rel
query_planner
...

-Jeremy


--
Jeremy Schneider
Database Engineer
Amazon Web Services


Reply | Threaded
Open this post in threaded view
|

Re: Planning performance problem (67626.278ms)

Andres Freund
Hi,

On 2019-04-08 16:10:17 -0700, Jeremy Schneider wrote:

> On 4/8/19 07:42, Justin Pryzby wrote:
> > On Mon, Apr 08, 2019 at 04:33:34PM +0200, Pavel Stehule wrote:
> >> po 8. 4. 2019 v 16:11 odesílatel Krzysztof Plocharz <[hidden email]> napsal:
> >>
> >>> We have some very strange query planning problem. Long story short it
> >>> takes 67626.278ms just to plan. Query execution takes 12ms.
> >>>
> >>> Query has 7 joins and 2 subselects.
> >>> It looks like the issue is not deterministic, sometimes is takes few ms
> >>> to plan the query.
> >>>
> >>> One of the tables has 550,485,942 live tuples and 743,504,012 dead
> >>> tuples. Running ANALYZE on that tables solves the problem only temporarily.
> >>>
> >>> Question is how can we debug what is going on?
> >>
> >> please check your indexes against bloating. Planner get min and max from
> >> indexes and this operation is slow on bloat indexes.
> >
> > I think that's from get_actual_variable_range(), right ?
>
> For what it's worth, I have seen a similar issue on Aurora PG 9.6 where
> query planning took a very long time (multiple minutes). In this
> particular case, there wasn't anything Aurora-specific about the call to
> get_actual_variable_range. We weren't able to distinctly identify the
> root cause or build a reproducible test case -- but we suspect that an
> inefficiency might exist in community PostgreSQL code.
>
> For debugging, a few ideas:
>
> 1) capture a stack with pstack or perf record --call-graph
>
> 2) capture the execution plan of the SQL w slow planning
>
> 3) capture detailed stats for all relations and objects involved
>
> 4) capture the usual info for bug reporting (preface section in docs)
>
> A reproducible test case is the gold standard; I'm keeping my eyes open
> for another case too.
>
> For the slow planning case that I saw, the slow process was almost
> entirely in this call stack (captured with perf record --call-graph):
> ...
> index_fetch_heap
> index_getnext
> get_actual_variable_range
> ineq_histogram_selectivity
> scalarineqsel
> mergejoinscansel
> initial_cost_mergejoin
> try_mergejoin_path
> add_paths_to_joinrel
> make_join_rel
> join_search_one_level
> standard_join_search
> make_one_rel
> query_planner
> ...

I suspect some of this might be related to < 11 not having the following
commit:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd

Greetings,

Andres Freund