Open Item: Should non-text EXPLAIN always show properties?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Open Item: Should non-text EXPLAIN always show properties?

David Rowley
Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should
always show the "Disk Usage" and "HashAgg Batches" properties.  I
agree with this. show_wal_usage() is a good example of how we normally
do things.  We try to keep the text format as humanly readable as
possible but don't really expect humans to be commonly reading the
other supported formats, so we care less about including additional
details there.

There's also an open item regarding this for Incremental Sort, so I've
CC'd James and Tomas here. This seems like a good place to discuss
both.

I've attached a small patch that changes the Hash Aggregate behaviour
to always show these properties for non-text formats.

Does anyone object to this?

David

[1] https://www.postgresql.org/message-id/20200619040624.GA17995%40telsasoft.com

more_hash_agg_explain_fixes.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

James Coleman
On Thu, Jun 25, 2020 at 5:15 AM David Rowley <[hidden email]> wrote:

>
> Over on [1] Justin mentions that the non-text EXPLAIN ANALYZE should
> always show the "Disk Usage" and "HashAgg Batches" properties.  I
> agree with this. show_wal_usage() is a good example of how we normally
> do things.  We try to keep the text format as humanly readable as
> possible but don't really expect humans to be commonly reading the
> other supported formats, so we care less about including additional
> details there.
>
> There's also an open item regarding this for Incremental Sort, so I've
> CC'd James and Tomas here. This seems like a good place to discuss
> both.

Yesterday I'd replied [1] to Justin's proposal for this WRT
incremental sort and expressed my opinion that including both
unnecessarily (i.e., including disk when an in-memory sort was used)
is undesirable and confusing and leads to shortcuts I believe to be
bad habits when using the data programmatically.

On a somewhat related note, memory can be 0 but that doesn't mean no
memory was used: it's a result of how tuplesort.c doesn't properly
track memory usage when it switches to disk sort. The same isn't true
in reverse (we don't have 0 disk when disk was used), but I do think
it does show the idea that showing "empty" data isn't an inherent
good.

If there's a clear established pattern and/or most others seem to
prefer Justin's proposed approach, then I'm not going to fight it
hard. I just don't think it's the best approach.

James

[1] https://www.postgresql.org/message-id/CAAaqYe-LswZFUL4k5Dr6%3DEN6MJG1HurggcH4QzUs6UFqBbnQzQ%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

Robert Haas
On Thu, Jun 25, 2020 at 8:42 AM James Coleman <[hidden email]> wrote:
> Yesterday I'd replied [1] to Justin's proposal for this WRT
> incremental sort and expressed my opinion that including both
> unnecessarily (i.e., including disk when an in-memory sort was used)
> is undesirable and confusing and leads to shortcuts I believe to be
> bad habits when using the data programmatically.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Thu, Jun 25, 2020 at 8:42 AM James Coleman <[hidden email]> wrote:
>> Yesterday I'd replied [1] to Justin's proposal for this WRT
>> incremental sort and expressed my opinion that including both
>> unnecessarily (i.e., including disk when an in-memory sort was used)
>> is undesirable and confusing and leads to shortcuts I believe to be
>> bad habits when using the data programmatically.

> +1.

I think the policy about non-text output formats is "all applicable
fields should be included automatically".  But the key word there is
"applicable".  Are disk-sort numbers applicable when no disk sort
happened?

I think the right way to think about this is that we are building
an output data structure according to a schema that should be fixed
for any particular plan shape.  If event X happened zero times in
a given execution, but it could have happened in a different execution
of the same plan, then we should print X with a zero count.  If X
could not happen period in this plan, we should omit X's entry.

So the real question here is whether the disk vs memory decision is
plan time vs run time.  AFAIK it's run time, which leads me to think
we ought to print the zeroes.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

James Coleman
On Thu, Jun 25, 2020 at 12:33 PM Tom Lane <[hidden email]> wrote:

>
> Robert Haas <[hidden email]> writes:
> > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <[hidden email]> wrote:
> >> Yesterday I'd replied [1] to Justin's proposal for this WRT
> >> incremental sort and expressed my opinion that including both
> >> unnecessarily (i.e., including disk when an in-memory sort was used)
> >> is undesirable and confusing and leads to shortcuts I believe to be
> >> bad habits when using the data programmatically.
>
> > +1.
>
> I think the policy about non-text output formats is "all applicable
> fields should be included automatically".  But the key word there is
> "applicable".  Are disk-sort numbers applicable when no disk sort
> happened?
>
> I think the right way to think about this is that we are building
> an output data structure according to a schema that should be fixed
> for any particular plan shape.  If event X happened zero times in
> a given execution, but it could have happened in a different execution
> of the same plan, then we should print X with a zero count.  If X
> could not happen period in this plan, we should omit X's entry.
>
> So the real question here is whether the disk vs memory decision is
> plan time vs run time.  AFAIK it's run time, which leads me to think
> we ought to print the zeroes.

Do we print zeroes for memory usage when all sorts ended up spilling
to disk then? That might be the current behavior; I'd have to check.
Because that's a lie, but we don't have any better information
currently (which is unfortunate, but hardly in scope for fixing here.)

James


Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

Tom Lane-2
James Coleman <[hidden email]> writes:
> On Thu, Jun 25, 2020 at 12:33 PM Tom Lane <[hidden email]> wrote:
>> I think the right way to think about this is that we are building
>> an output data structure according to a schema that should be fixed
>> for any particular plan shape.  If event X happened zero times in
>> a given execution, but it could have happened in a different execution
>> of the same plan, then we should print X with a zero count.  If X
>> could not happen period in this plan, we should omit X's entry.

> Do we print zeroes for memory usage when all sorts ended up spilling
> to disk then?

I did not claim that the pre-existing code adheres to this model
completely faithfully ;-).  But we ought to have a clear mental
picture of what it is we're trying to achieve.  If you don't like
the above design, propose a different one.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

David Rowley
In reply to this post by Tom Lane-2
On Fri, 26 Jun 2020 at 04:33, Tom Lane <[hidden email]> wrote:

>
> Robert Haas <[hidden email]> writes:
> > On Thu, Jun 25, 2020 at 8:42 AM James Coleman <[hidden email]> wrote:
> >> Yesterday I'd replied [1] to Justin's proposal for this WRT
> >> incremental sort and expressed my opinion that including both
> >> unnecessarily (i.e., including disk when an in-memory sort was used)
> >> is undesirable and confusing and leads to shortcuts I believe to be
> >> bad habits when using the data programmatically.
>
> > +1.
>
> I think the policy about non-text output formats is "all applicable
> fields should be included automatically".  But the key word there is
> "applicable".  Are disk-sort numbers applicable when no disk sort
> happened?
>
> I think the right way to think about this is that we are building
> an output data structure according to a schema that should be fixed
> for any particular plan shape.  If event X happened zero times in
> a given execution, but it could have happened in a different execution
> of the same plan, then we should print X with a zero count.  If X
> could not happen period in this plan, we should omit X's entry.
>
> So the real question here is whether the disk vs memory decision is
> plan time vs run time.  AFAIK it's run time, which leads me to think
> we ought to print the zeroes.

I think that's a pretty good way of thinking about it.

For the HashAgg case, the plan could end up spilling, so based on what
you've said, we should be printing those zeros as some other execution
of the same plan could spill.

If nobody objects to that very soon, then I'll go ahead and push the
changes for HashAgg's non-text EXPLAIN ANALYZE

David


Reply | Threaded
Open this post in threaded view
|

Re: Open Item: Should non-text EXPLAIN always show properties?

David Rowley
In reply to this post by David Rowley
On Thu, 25 Jun 2020 at 21:15, David Rowley <[hidden email]> wrote:
> I've attached a small patch that changes the Hash Aggregate behaviour
> to always show these properties for non-text formats.

I've pushed this change for HashAgg only and marked the open item as
completed for hash agg.  I'll leave it up to Justin, Tomas and James
to decide what to do with the incremental sort EXPLAIN open item.

David