[PATCH] Add extra statistics to explain for Nested Loop

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Add extra statistics to explain for Nested Loop

e.sokolova
Hi, hackers.
For some distributions of data in tables, different loops in nested loop
joins can take different time and process different amounts of entries.
It makes average statistics returned by explain analyze not very useful
for DBA.
To fix it, here is the patch that add printing of min and max statistics
for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
Please don't hesitate to share any thoughts on this topic!
--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

extra_statistics_v0.patch (44K) Download Attachment
example_v0.sql (551K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Pavel Stehule


pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
Hi, hackers.
For some distributions of data in tables, different loops in nested loop
joins can take different time and process different amounts of entries.
It makes average statistics returned by explain analyze not very useful
for DBA.
To fix it, here is the patch that add printing of min and max statistics
for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
Please don't hesitate to share any thoughts on this topic!

+1

This is great feature - sometimes it can be pretty messy current limited format

Pavel

--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Julien Rouhaud
Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :


pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
Hi, hackers.
For some distributions of data in tables, different loops in nested loop
joins can take different time and process different amounts of entries.
It makes average statistics returned by explain analyze not very useful
for DBA.
To fix it, here is the patch that add printing of min and max statistics
for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
Please don't hesitate to share any thoughts on this topic!

+1

This is great feature - sometimes it can be pretty messy current limited format

+1, this can be very handy! 
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Anastasia Lubennikova
On 16.10.2020 12:07, Julien Rouhaud wrote:
Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :


pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
Hi, hackers.
For some distributions of data in tables, different loops in nested loop
joins can take different time and process different amounts of entries.
It makes average statistics returned by explain analyze not very useful
for DBA.
To fix it, here is the patch that add printing of min and max statistics
for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
Please don't hesitate to share any thoughts on this topic!

+1

This is great feature - sometimes it can be pretty messy current limited format

+1, this can be very handy! 

Cool.
I have added your patch to the commitfest, so it won't get lost.
https://commitfest.postgresql.org/30/2765/

I will review the code next week.  Unfortunately, I cannot give any feedback about usability of this feature.

User visible change is:

-               ->  Nested Loop (actual rows=N loops=N)
+              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)

Pavel, Julien, could you please say if it looks good?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Pavel Stehule


so 17. 10. 2020 v 0:11 odesílatel Anastasia Lubennikova <[hidden email]> napsal:
On 16.10.2020 12:07, Julien Rouhaud wrote:
Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :


pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
Hi, hackers.
For some distributions of data in tables, different loops in nested loop
joins can take different time and process different amounts of entries.
It makes average statistics returned by explain analyze not very useful
for DBA.
To fix it, here is the patch that add printing of min and max statistics
for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
Please don't hesitate to share any thoughts on this topic!

+1

This is great feature - sometimes it can be pretty messy current limited format

+1, this can be very handy! 

Cool.
I have added your patch to the commitfest, so it won't get lost.
https://commitfest.postgresql.org/30/2765/

I will review the code next week.  Unfortunately, I cannot give any feedback about usability of this feature.

User visible change is:

-               ->  Nested Loop (actual rows=N loops=N)
+              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)


This interface is ok - there is not too much space for creativity. I can imagine displaying variance or average - but I am afraid about very bad performance impacts.

Regards

Pavel


Pavel, Julien, could you please say if it looks good?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Julien Rouhaud
On Sat, Oct 17, 2020 at 12:15 PM Pavel Stehule <[hidden email]> wrote:

>
> so 17. 10. 2020 v 0:11 odesílatel Anastasia Lubennikova <[hidden email]> napsal:
>>
>> On 16.10.2020 12:07, Julien Rouhaud wrote:
>>
>> Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :
>>>
>>>
>>>
>>> pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
>>>>
>>>> Hi, hackers.
>>>> For some distributions of data in tables, different loops in nested loop
>>>> joins can take different time and process different amounts of entries.
>>>> It makes average statistics returned by explain analyze not very useful
>>>> for DBA.
>>>> To fix it, here is the patch that add printing of min and max statistics
>>>> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
>>>> Please don't hesitate to share any thoughts on this topic!
>>>
>>>
>>> +1
>>>
>>> This is great feature - sometimes it can be pretty messy current limited format
>>
>>
>> +1, this can be very handy!
>>
>> Cool.
>> I have added your patch to the commitfest, so it won't get lost.

Thanks!  I'll also try to review it next week.

>> https://commitfest.postgresql.org/30/2765/
>>
>> I will review the code next week.  Unfortunately, I cannot give any feedback about usability of this feature.
>>
>> User visible change is:
>>
>> -               ->  Nested Loop (actual rows=N loops=N)
>> +              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
>
>
> This interface is ok - there is not too much space for creativity.

Yes I also think it's ok. We should also consider usability for tools
like explain.depesz.com, I don't know if the current output is best.
I'm adding Depesz and Pierre which are both working on this kind of
tool for additional input.

> I can imagine displaying variance or average - but I am afraid about very bad performance impacts.

The original counter (rows here) is already an average right?
Variance could be nice too.  Instrumentation will already spam
gettimeofday() calls for nested loops, I don't think that computing
variance would add that much overhead?


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Pavel Stehule


so 17. 10. 2020 v 6:26 odesílatel Julien Rouhaud <[hidden email]> napsal:
On Sat, Oct 17, 2020 at 12:15 PM Pavel Stehule <[hidden email]> wrote:
>
> so 17. 10. 2020 v 0:11 odesílatel Anastasia Lubennikova <[hidden email]> napsal:
>>
>> On 16.10.2020 12:07, Julien Rouhaud wrote:
>>
>> Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :
>>>
>>>
>>>
>>> pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
>>>>
>>>> Hi, hackers.
>>>> For some distributions of data in tables, different loops in nested loop
>>>> joins can take different time and process different amounts of entries.
>>>> It makes average statistics returned by explain analyze not very useful
>>>> for DBA.
>>>> To fix it, here is the patch that add printing of min and max statistics
>>>> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
>>>> Please don't hesitate to share any thoughts on this topic!
>>>
>>>
>>> +1
>>>
>>> This is great feature - sometimes it can be pretty messy current limited format
>>
>>
>> +1, this can be very handy!
>>
>> Cool.
>> I have added your patch to the commitfest, so it won't get lost.

Thanks!  I'll also try to review it next week.

>> https://commitfest.postgresql.org/30/2765/
>>
>> I will review the code next week.  Unfortunately, I cannot give any feedback about usability of this feature.
>>
>> User visible change is:
>>
>> -               ->  Nested Loop (actual rows=N loops=N)
>> +              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
>
>
> This interface is ok - there is not too much space for creativity.

Yes I also think it's ok. We should also consider usability for tools
like explain.depesz.com, I don't know if the current output is best.
I'm adding Depesz and Pierre which are both working on this kind of
tool for additional input.

> I can imagine displaying variance or average - but I am afraid about very bad performance impacts.

The original counter (rows here) is already an average right?
Variance could be nice too.  Instrumentation will already spam
gettimeofday() calls for nested loops, I don't think that computing
variance would add that much overhead?

There is not any problem to write benchmark for worst case and test it


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

hubert depesz lubaczewski-2
In reply to this post by Julien Rouhaud
On Sat, Oct 17, 2020 at 12:26:08PM +0800, Julien Rouhaud wrote:
> >> -               ->  Nested Loop (actual rows=N loops=N)
> >> +              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
> > This interface is ok - there is not too much space for creativity.
> Yes I also think it's ok. We should also consider usability for tools
> like explain.depesz.com, I don't know if the current output is best.
> I'm adding Depesz and Pierre which are both working on this kind of
> tool for additional input.

Thanks for heads up. This definitely will need some changes on my side,
but should be easy to handle.

Best regards,

depesz



Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

David G Johnston
In reply to this post by Anastasia Lubennikova
On Fri, Oct 16, 2020 at 3:11 PM Anastasia Lubennikova <[hidden email]> wrote:
User visible change is:


-               ->  Nested Loop (actual rows=N loops=N)
+              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)

I'd be inclined to append both new rows to the end.

(actual rows=N loops=N min_rows=N max_rows=N)

rows * loops is still an important calculation.

Why not just add total_rows while we are at it - last in the listing?

(actual rows=N loops=N min_rows=N max_rows=N total_rows=N) 
 
David J.

Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Julien Rouhaud
In reply to this post by Anastasia Lubennikova
On Sat, Oct 17, 2020 at 6:11 AM Anastasia Lubennikova
<[hidden email]> wrote:

>
> On 16.10.2020 12:07, Julien Rouhaud wrote:
>
> Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :
>>
>>
>>
>> pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
>>>
>>> Hi, hackers.
>>> For some distributions of data in tables, different loops in nested loop
>>> joins can take different time and process different amounts of entries.
>>> It makes average statistics returned by explain analyze not very useful
>>> for DBA.
>>> To fix it, here is the patch that add printing of min and max statistics
>>> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
>>> Please don't hesitate to share any thoughts on this topic!
>>
>>
>> +1
>>
>> This is great feature - sometimes it can be pretty messy current limited format
>
>
> +1, this can be very handy!
>
> Cool.
> I have added your patch to the commitfest, so it won't get lost.
> https://commitfest.postgresql.org/30/2765/
>
> I will review the code next week.  Unfortunately, I cannot give any feedback about usability of this feature.
>
> User visible change is:
>
> -               ->  Nested Loop (actual rows=N loops=N)
> +              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)

Thanks for working on this feature!  Here are some comments on the patch.

First, cosmetic issues.  There are a lot of whitespace issues, the new
code is space indented while it should be tab indented.  Also there
are 3 patches included with some fixups, could you instead push a
single patch?

It also misses some modification in the regression tests.  For instance:

diff --git a/src/test/regress/expected/partition_prune.out
b/src/test/regress/expected/partition_prune.out
index 50d2a7e4b9..db0b167ef4 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -2065,7 +2065,7 @@ select explain_parallel_append('select avg(ab.a)
from ab inner join lprt_a a on
          Workers Planned: 1
          Workers Launched: N
          ->  Partial Aggregate (actual rows=N loops=N)
-               ->  Nested Loop (actual rows=N loops=N)
+               ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
                      ->  Parallel Seq Scan on lprt_a a (actual rows=N loops=N)

You should update the explain_parallel_append() plpgsql function
created in that test file to make sure that both "rows" and the two
new counters are changed to "N".  There might be other similar changes
needed.


Now, for the changes themselves.  For the min/max time, you're
aggregating "totaltime - instr->firsttuple".  Why removing the startup
time from the loop execution time?  I think this should be kept.
Also, in explain.c you display the counters in the "Nested loop" node,
but this should be done in the inner plan node instead, as this is the
one being looped on.  So the condition should probably be "nloops > 1"
rather than testing if it's a NestLoop.

I'm switching the patch to WoA.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Pierre Giraud
In reply to this post by Julien Rouhaud


Le 17/10/2020 à 06:26, Julien Rouhaud a écrit :

> On Sat, Oct 17, 2020 at 12:15 PM Pavel Stehule <[hidden email]> wrote:
>>
>> so 17. 10. 2020 v 0:11 odesílatel Anastasia Lubennikova <[hidden email]> napsal:
>>>
>>> On 16.10.2020 12:07, Julien Rouhaud wrote:
>>>
>>> Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <[hidden email]> a écrit :
>>>>
>>>>
>>>>
>>>> pá 16. 10. 2020 v 9:43 odesílatel <[hidden email]> napsal:
>>>>>
>>>>> Hi, hackers.
>>>>> For some distributions of data in tables, different loops in nested loop
>>>>> joins can take different time and process different amounts of entries.
>>>>> It makes average statistics returned by explain analyze not very useful
>>>>> for DBA.
>>>>> To fix it, here is the patch that add printing of min and max statistics
>>>>> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
>>>>> Please don't hesitate to share any thoughts on this topic!
>>>>
>>>>
>>>> +1
>>>>
>>>> This is great feature - sometimes it can be pretty messy current limited format
>>>
>>>
>>> +1, this can be very handy!
>>>
>>> Cool.
>>> I have added your patch to the commitfest, so it won't get lost.
>
> Thanks!  I'll also try to review it next week.
>
>>> https://commitfest.postgresql.org/30/2765/
>>>
>>> I will review the code next week.  Unfortunately, I cannot give any feedback about usability of this feature.
>>>
>>> User visible change is:
>>>
>>> -               ->  Nested Loop (actual rows=N loops=N)
>>> +              ->  Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2)
>>
>>
>> This interface is ok - there is not too much space for creativity.
>
> Yes I also think it's ok. We should also consider usability for tools
> like explain.depesz.com, I don't know if the current output is best.
> I'm adding Depesz and Pierre which are both working on this kind of
> tool for additional input.

Same for me and PEV2. It should be fairly easy to parse this new format.

>
>> I can imagine displaying variance or average - but I am afraid about very bad performance impacts.
>
> The original counter (rows here) is already an average right?
> Variance could be nice too.  Instrumentation will already spam
> gettimeofday() calls for nested loops, I don't think that computing
> variance would add that much overhead?

Thus, it's an average value. And to be mentioned: a rounded one! Which I
found a bit tricky to figure out.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

Andres Freund
In reply to this post by e.sokolova
Hi,

On 2020-10-16 10:42:43 +0300, [hidden email] wrote:
> For some distributions of data in tables, different loops in nested loop
> joins can take different time and process different amounts of entries. It
> makes average statistics returned by explain analyze not very useful for
> DBA.
> To fix it, here is the patch that add printing of min and max statistics for
> time and rows across all loops in Nested Loop to EXPLAIN ANALYSE.
> Please don't hesitate to share any thoughts on this topic!

Interesting idea!

I'm a bit worried that further increasing the size of struct
Instrumentation will increase the overhead of EXPLAIN ANALYZE further -
in some workloads we spend a fair bit of time in code handling that. It
would be good to try to find a few bad cases, and see what the overhead is.

Unfortunately your patch is pretty hard to look at - you seem to have
included your incremental hacking efforts?

> From 7871ac1afe7837a6dc0676a6c9819cc68a5c0f07 Mon Sep 17 00:00:00 2001
> From: "e.sokolova" <[hidden email]>
> Date: Fri, 4 Sep 2020 18:00:47 +0300
> Subject: Add min and max statistics without case of
>  parallel workers. Tags: commitfest_hotfix.

> From ebdfe117e4074d268e3e7c480b98d375d1d6f62b Mon Sep 17 00:00:00 2001
> From: "e.sokolova" <[hidden email]>
> Date: Fri, 11 Sep 2020 23:04:34 +0300
> Subject: Add case of parallel workers. Tags:
>  commitfest_hotfix.

> From ecbf04d519e17b8968103364e89169ab965b41d7 Mon Sep 17 00:00:00 2001
> From: "e.sokolova" <[hidden email]>
> Date: Fri, 18 Sep 2020 13:35:19 +0300
> Subject: Fix bugs. Tags: commitfest_hotfix.

> From 7566a98bbc33a24052e1334b0afe2cf341c0818f Mon Sep 17 00:00:00 2001
> From: "e.sokolova" <[hidden email]>
> Date: Fri, 25 Sep 2020 20:09:22 +0300
> Subject: Fix tests. Tags: commitfest_hotfix.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Add extra statistics to explain for Nested Loop

e.sokolova
In reply to this post by Julien Rouhaud
<[hidden email]> wrote:
> You should update the explain_parallel_append() plpgsql function
> created in that test file to make sure that both "rows" and the two
> new counters are changed to "N".  There might be other similar changes
> needed.

Thank you for watching this issue. I made the necessary changes in tests
following your advice.

> Now, for the changes themselves.  For the min/max time, you're
> aggregating "totaltime - instr->firsttuple".  Why removing the startup
> time from the loop execution time?  I think this should be kept.

I think it's right remark. I fixed it.

> Also, in explain.c you display the counters in the "Nested loop" node,
> but this should be done in the inner plan node instead, as this is the
> one being looped on.  So the condition should probably be "nloops > 1"
> rather than testing if it's a NestLoop.

Condition "nloops > 1" is not the same as checking if it's NestLoop.
This condition will lead to printing extra statistics for nodes with
different types of join, not only Nested Loop Join. If this statistic is
useful for other plan nodes, it makes sense to change the condition.

<[hidden email]> wrote:
> I'm a bit worried that further increasing the size of struct
> Instrumentation will increase the overhead of EXPLAIN ANALYZE further -
> in some workloads we spend a fair bit of time in code handling that. It
> would be good to try to find a few bad cases, and see what the overhead
> is.

Thank you for this comment, I will try to figure it out. Do you have
some examples of large overhead dependent on this struct? I think I need
some sample to know which way to think.

Thank you all for the feedback. I hope the new version of my patch will
be more correct and useful.
Please don't hesitate to share any thoughts on this topic!
--
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

extra_statistics_v1.patch (28K) Download Attachment