Duplicate Workers entries in some EXPLAIN plans

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda-2
Hello,

I originally reported this in pgsql-bugs [0], but there wasn't much feedback there, so moving the discussion here. When using JSON, YAML, or XML-format EXPLAIN on a plan that uses a parallelized sort, the Sort nodes list two different entries for "Workers", one for the sort-related info, and one for general worker info. This is what this looks like in JSON (some details elided):

{
  "Node Type": "Sort",
  ...
  "Workers": [
    {
      "Worker Number": 0,
      "Sort Method": "external merge",
      "Sort Space Used": 20128,
      "Sort Space Type": "Disk"
    },
    {
      "Worker Number": 1,
      "Sort Method": "external merge",
      "Sort Space Used": 20128,
      "Sort Space Type": "Disk"
    }
  ],
  ...
  "Workers": [
    {
      "Worker Number": 0,
      "Actual Startup Time": 309.726,
      "Actual Total Time": 310.179,
      "Actual Rows": 4128,
      "Actual Loops": 1,
      "Shared Hit Blocks": 2872,
      "Shared Read Blocks": 7584,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 490,
      "Temp Written Blocks": 2529
    },
    {
      "Worker Number": 1,
      "Actual Startup Time": 306.523,
      "Actual Total Time": 307.001,
      "Actual Rows": 4128,
      "Actual Loops": 1,
      "Shared Hit Blocks": 3356,
      "Shared Read Blocks": 7100,
      "Shared Dirtied Blocks": 0,
      "Shared Written Blocks": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Temp Read Blocks": 490,
      "Temp Written Blocks": 2529
    }
  ],
  "Plans:" ...
}

This is technically valid JSON, but it's extremely difficult to work with, since default JSON parsing in Ruby, node, Python, Go, and even Postgres' own jsonb only keep the latest key--the sort information is discarded (other languages probably don't fare much better; this is what I had on hand). As Tom Lane pointed out in my pgsql-bugs thread, this has been reported before [1] and in that earlier thread, Andrew Dunstan suggested that perhaps the simplest solution is to just rename the sort-related Workers node. Tom expressed some concerns about a breaking change here, though I think the current behavior means vanishingly few users are parsing this data correctly. Thoughts?

Thanks,
Maciek

[0]: https://www.postgresql.org/message-id/CADXhmgSr807j2Pc9aUjW2JOzOBe3FeYnQBe_f9U%2B-Mm4b1HRUw%40mail.gmail.com
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Andres Freund
Hi,

On 2019-10-22 11:58:35 -0700, Maciek Sakrejda wrote:

> I originally reported this in pgsql-bugs [0], but there wasn't much
> feedback there, so moving the discussion here. When using JSON, YAML, or
> XML-format EXPLAIN on a plan that uses a parallelized sort, the Sort nodes
> list two different entries for "Workers", one for the sort-related info,
> and one for general worker info. This is what this looks like in JSON (some
> details elided):
>
> {
>   "Node Type": "Sort",
>   ...
>   "Workers": [
>     {
>       "Worker Number": 0,
>       "Sort Method": "external merge",
>       "Sort Space Used": 20128,
>       "Sort Space Type": "Disk"
>     },
>     {
>       "Worker Number": 1,
>       "Sort Method": "external merge",
>       "Sort Space Used": 20128,
>       "Sort Space Type": "Disk"
>     }
>   ],
>   ...
>   "Workers": [
>     {

> This is technically valid JSON, but it's extremely difficult to work with,
> since default JSON parsing in Ruby, node, Python, Go, and even Postgres'
> own jsonb only keep the latest key

It's also quite confusing.


> As Tom Lane pointed out in my pgsql-bugs thread, this has been
> reported before [1] and in that earlier thread, Andrew Dunstan suggested
> that perhaps the simplest solution is to just rename the sort-related
> Workers node.  Thoughts?

Yea, I think we should fix this. The current output basically makes no
sense.


> Tom expressed some concerns about a breaking change here,
> though I think the current behavior means vanishingly few users are parsing
> this data correctly.

Well, in a lot of the cases there's no parallel output for the sort, and
in other cases BUFFERS is not specified. In either case the 'duplicate
key' problem won't exist then.


While Tom said:

On 2019-10-16 09:16:56 +0200, Tom Lane wrote:
> I think the text-mode output is intentional, but the other formats
> need more work.

  Sort Method: external merge  Disk: 4920kB
  Worker 0:  Sort Method: external merge  Disk: 5880kB
  Worker 1:  Sort Method: external merge  Disk: 5920kB
  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
  Worker 0: actual time=130.058..130.324 rows=1324 loops=1
    Buffers: shared hit=337 read=3489, temp read=505 written=739
  Worker 1: actual time=130.273..130.512 rows=1297 loops=1
    Buffers: shared hit=345 read=3507, temp read=505 written=744

I don't think this is close to being good enough to be worth
preserving. I think it's worth avoiding unnecessary breakage of explain
output, but we also shouldn't endlessly carry forward confusing output,
just because of that.

It clearly seems like it'd be better if this instead were

  Sort Method: external merge  Disk: 4920kB
  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
  Worker 0: actual time=130.058..130.324 rows=1324 loops=1
    Sort Method: external merge  Disk: 5880kB
    Buffers: shared hit=337 read=3489, temp read=505 written=739
  Worker 1: actual time=130.273..130.512 rows=1297 loops=1
    Buffers: shared hit=345 read=3507, temp read=505 written=744
    Sort Method: external merge  Disk: 5920kB

I think the way this information was added in bf11e7ee2e36 and
33001fd7a707, contrasting to the output added in b287df70e408, is just
not right. If we add similar instrumentation reporting to more nodes,
we'll end up with duplicated information all over.  Additionally the
per-worker part of show_sort_info() basically just duplicated the rest
of the function.  I then also did something similar (although luckily
with a different key...), with the ExplainPrintJIT() call for Gather
nodes.

Unfortunately I think the fix isn't all that trivial, due to the way we
output the per-worker information at the end of ExplainNode(), by just
dumping things into a string.  It seems to me that a step in the right
direction would be for ExplainNode() to create
planstate->worker_instrument StringInfos, which can be handed to
routines like show_sort_info(), which would print the per-node
information into that, rather than directly dumping into
es->output. Most of the current "Show worker detail" would be done
earlier in ExplainNode(), at the place where we current display the
"actual rows" bit.

ISTM that should include removing the duplication fo the the contents of
show_sort_info(), and probably also for the Gather, GatherMerge blocks
(I've apparently skipped adding the JIT information to the latter, not
sure if we ought to fix that in the stable branches).

Any chance you want to take a stab at that?

I don't think we'll fix it soon, but damn, all this string appending
around just isn't a great way to reliably build nested data formats.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda-2
On Thu, Oct 24, 2019 at 6:48 PM Andres Freund <[hidden email]> wrote:

> Unfortunately I think the fix isn't all that trivial, due to the way we
> output the per-worker information at the end of ExplainNode(), by just
> dumping things into a string.  It seems to me that a step in the right
> direction would be for ExplainNode() to create
> planstate->worker_instrument StringInfos, which can be handed to
> routines like show_sort_info(), which would print the per-node
> information into that, rather than directly dumping into
> es->output. Most of the current "Show worker detail" would be done
> earlier in ExplainNode(), at the place where we current display the
> "actual rows" bit.
>
> ISTM that should include removing the duplication fo the the contents of
> show_sort_info(), and probably also for the Gather, GatherMerge blocks
> (I've apparently skipped adding the JIT information to the latter, not
> sure if we ought to fix that in the stable branches).
>
> Any chance you want to take a stab at that?
It took me a while, but I did take a stab at it (thanks for your
off-list help). Attached is my patch that changes the structured
formats to merge sort worker output in with costs/timing/buffers
worker output. I have not touched any other worker output yet, since
it's not under a Workers group as far as I can tell (so it does not
exhibit the problem I originally reported). I can try to make further
changes here if the approach is deemed sound. I also have not touched
text output; above you had proposed

>  Sort Method: external merge  Disk: 4920kB
>  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
>  Worker 0: actual time=130.058..130.324 rows=1324 loops=1
>    Sort Method: external merge  Disk: 5880kB
>    Buffers: shared hit=337 read=3489, temp read=505 written=739
>  Worker 1: actual time=130.273..130.512 rows=1297 loops=1
>    Buffers: shared hit=345 read=3507, temp read=505 written=744
>    Sort Method: external merge  Disk: 5920kB

which makes sense to me, but I'd like to confirm this is the approach
we want before I add it to the patch.

This is my first C in close to a decade (and I was never much of a C
programmer to begin with), so please be gentle.

As Andres suggested off-list, I also changed the worker output to
order fields that also occur in the parent node in the same way as the
parent node.

I've also added a test for the patch, and because this is really an
EXPLAIN issue rather than a query feature issue, I added a
src/test/regress/sql/explain.sql for the test. I added a couple of
utility functions for munging json-formatted EXPLAIN plans into
something we can repeatably verify in regression tests (the functions
use json rather than jsonb to preserve field order). I have not added
this for YAML or XML (even though they should behave the same way),
since I'm not familiar with the the functions to manipulate those data
types in a similar way (if they exist). My hunch is due to the
similarity of structured formats, just testing JSON is enough, but I
can expand/adjust tests as necessary.

merge-explain-worker-output.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda-2
I wanted to follow up on this patch since I received no feedback. What should my next steps be (besides rebasing, though I want to confirm there's interest before I do that)?
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Julien Rouhaud
On Fri, Dec 27, 2019 at 12:31 AM Maciek Sakrejda <[hidden email]> wrote:
>
> I wanted to follow up on this patch since I received no feedback. What should my next steps be (besides rebasing, though I want to confirm there's interest before I do that)?

Given Andres' answer I'd say that there's interest in this patch.  You
should register this patch in the next commitfest
(https://commitfest.postgresql.org/26/) to make sure that it's not
forgotten, which unfortunately is probably what happened here .


Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda-2
Done! Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

This is a high level review only. However, seeing that there is a conflict and does not merge cleanly after commit <b925a00f4ef>, I return to Author.

To be fair the resolution seems quite straight forward and I took the liberty of applying the necessary changes to include the new element of ExplainState introduced in the above commit, namely hide_workers. However since the author might have a different idea on how to incorporate this change I leave it up to him.

Another very high level comment is the introduction of a new test file, namely explain. Seeing `explain.sql` in the tests suits, personally and very opinion based, I would have expected the whole spectrum of the explain properties to be tested. However only a slight fraction of the functionality is tested. Since this is a bit more of a personal opinion, I don't expect any changes unless the author happens to agree.

Other than these minor nitpick, the code seems clear, concise and does what it says on the box. It follows the comments in the discussion thread(s) and solves a real issue.

Please have a look on how commit <b925a00f4ef> affects this patch and rebase.

The new status of this patch is: Waiting on Author
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda-2
Thanks for the review! I looked at <b925a00f4ef> and rebased the patch
on current master, ac5bdf6.

I introduced a new test file because this bug is specifically about
EXPLAIN output (as opposed to query execution or planning
functionality), and it didn't seem like a test would fit in any of the
other files. I focused on testing just the behavior around this
specific bug (and fix). I think eventually we should probably test
other more fundamental EXPLAIN features (and I'm happy to contribute
to that) in that file, but that seems outside of the scope of this
patch.

Any thoughts on what we should do with text mode output (which is
untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.

merge-explain-worker-output-v2.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

The current version of the patch (v2) applies cleanly and does what it says on the box.

> Any thoughts on what we should do with text mode output (which is
untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.

Nothing to add beyond what is stated in the thread.

>  Sort Method: external merge  Disk: 4920kB
>  Buffers: shared hit=682 read=10188, temp read=1415 written=2101
>  Worker 0: actual time=130.058..130.324 rows=1324 loops=1
>    Sort Method: external merge  Disk: 5880kB
>    Buffers: shared hit=337 read=3489, temp read=505 written=739
>  Worker 1: actual time=130.273..130.512 rows=1297 loops=1
>    Buffers: shared hit=345 read=3507, temp read=505 written=744
>    Sort Method: external merge  Disk: 5920kB

This proposal seems like a fitting approach. Awaiting for v3 which
will include the text version. May I suggest a format yaml test case?
Just to make certain that no regressions occur in the future.

Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Maciek Sakrejda-2
Sounds good, I'll try that format. Any idea how to test YAML? With the
JSON format, I was able to rely on Postgres' own JSON-manipulating
functions to strip or canonicalize fields that can vary across
executions--I can't really do that with YAML. Or should I run EXPLAIN
with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple
queries the BUFFERS output (and other fields I can't turn off like
Sort Space Used) *is* going to be stable?


Reply | Threaded
Open this post in threaded view
|

Re: Duplicate Workers entries in some EXPLAIN plans

Georgios Kokolatos
> Sounds good, I'll try that format. Any idea how to test YAML? With the
> JSON format, I was able to rely on Postgres' own JSON-manipulating
> functions to strip or canonicalize fields that can vary across
> executions--I can't really do that with YAML.

Yes, this approach was clear in the patch and works great with Json. Also
you are correct, this can not be done with YAML. I spend a bit of time to
look around and I could not find any tests really on yaml format.

> Or should I run EXPLAIN
> with COSTS OFF, TIMING OFF, SUMMARY OFF and assume that for simple
> queries the BUFFERS output (and other fields I can't turn off like
> Sort Space Used) *is* going to be stable?

I have to admit with the current diff tool used in pg_regress, this is not possible.
I am pretty certain that it *is not* going to be stable. Not for long anyways.
I withdraw my suggestion for YAML and currently awaiting for TEXT format only.