A reloption for partitioned tables - parallel_workers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

A reloption for partitioned tables - parallel_workers

seamusabshere
hi,

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).

Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..f1ade035ac 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
  * If the user has set the parallel_workers reloption, use that; otherwise
  * select a default number of workers.
  */
+ // I want to affect this
  if (rel->rel_parallel_workers != -1)
  parallel_workers = rel->rel_parallel_workers;
  else

so I do this

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..597b209bfb 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
 bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
- /*
- * There are no options for partitioned tables yet, but this is able to do
- * some validation.
- */
+ static const relopt_parse_elt tab[] = {
+ {"parallel_workers", RELOPT_TYPE_INT,
+ offsetof(StdRdOptions, parallel_workers)},
+ };
+
  return (bytea *) build_reloptions(reloptions, validate,
   RELOPT_KIND_PARTITIONED,
-  0, NULL, 0);
+  sizeof(StdRdOptions),
+  tab, lengthof(tab));
 }

That "works":

postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
ALTER TABLE
postgres=# select relname, relkind, reloptions from pg_class where relname = 'test_3pd_cstore_partitioned';
           relname           | relkind |      reloptions      
-----------------------------+---------+-----------------------
 test_3pd_cstore_partitioned | p       | {parallel_workers=33}
(1 row)

But it seems to be ignored:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..c68835ce38 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
  * If the user has set the parallel_workers reloption, use that; otherwise
  * select a default number of workers.
  */
+ // I want to affect this, but this assertion always passes
+ Assert(rel->rel_parallel_workers == -1)
  if (rel->rel_parallel_workers != -1)
  parallel_workers = rel->rel_parallel_workers;
  else

Thanks and please forgive my code pasting etiquette as this is my first post to pgsql-hackers and I'm not quite sure what the right format is.

Thank you,
Seamus


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
Hi Seamus,

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <[hidden email]> wrote:

> It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
>
> Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:
>
> diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> index cd3fdd259c..f1ade035ac 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
>          * If the user has set the parallel_workers reloption, use that; otherwise
>          * select a default number of workers.
>          */
> +       // I want to affect this
>         if (rel->rel_parallel_workers != -1)
>                 parallel_workers = rel->rel_parallel_workers;
>         else
>
> so I do this
>
> diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
> index c687d3ee9e..597b209bfb 100644
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
>  bytea *
>  partitioned_table_reloptions(Datum reloptions, bool validate)
>  {
> -       /*
> -        * There are no options for partitioned tables yet, but this is able to do
> -        * some validation.
> -        */
> +       static const relopt_parse_elt tab[] = {
> +               {"parallel_workers", RELOPT_TYPE_INT,
> +               offsetof(StdRdOptions, parallel_workers)},
> +       };
> +
>         return (bytea *) build_reloptions(reloptions, validate,
>                                                                           RELOPT_KIND_PARTITIONED,
> -                                                                         0, NULL, 0);
> +                                                                         sizeof(StdRdOptions),
> +                                                                         tab, lengthof(tab));
>  }
>
> That "works":
>
> postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
> ALTER TABLE
> postgres=# select relname, relkind, reloptions from pg_class where relname = 'test_3pd_cstore_partitioned';
>            relname           | relkind |      reloptions
> -----------------------------+---------+-----------------------
>  test_3pd_cstore_partitioned | p       | {parallel_workers=33}
> (1 row)
>
> But it seems to be ignored:
>
> diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> index cd3fdd259c..c68835ce38 100644
> --- a/src/backend/optimizer/path/allpaths.c
> +++ b/src/backend/optimizer/path/allpaths.c
> @@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
>          * If the user has set the parallel_workers reloption, use that; otherwise
>          * select a default number of workers.
>          */
> +       // I want to affect this, but this assertion always passes
> +       Assert(rel->rel_parallel_workers == -1)
>         if (rel->rel_parallel_workers != -1)
>                 parallel_workers = rel->rel_parallel_workers;
>         else

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions.  Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

        /* Find the highest number of workers requested for any subpath. */
        foreach(lc, partial_subpaths)
        {
            Path       *path = lfirst(lc);

            parallel_workers = Max(parallel_workers, path->parallel_workers);
        }
        Assert(parallel_workers > 0);

        /*
         * If the use of parallel append is permitted, always request at least
         * log2(# of children) workers.  We assume it can be useful to have
         * extra workers in this case because they will be spread out across
         * the children.  The precise formula is just a guess, but we don't
         * want to end up with a radically different answer for a table with N
         * partitions vs. an unpartitioned table with the same data, so the
         * use of some kind of log-scaling here seems to make some sense.
         */
        if (enable_parallel_append)
        {
            parallel_workers = Max(parallel_workers,
                                   fls(list_length(live_childrels)));
            parallel_workers = Min(parallel_workers,
                                   max_parallel_workers_per_gather);
        }
        Assert(parallel_workers > 0);

        /* Generate a partial append path. */
        appendpath = create_append_path(root, rel, NIL, partial_subpaths,
                                        NIL, NULL, parallel_workers,
                                        enable_parallel_append,
                                        -1);

Note that the 'rel' in this code refers to the partitioned table for
which an Append path is being considered, so compute_parallel_worker()
using that 'rel' would use the partitioned table's
rel_parallel_workers as you are trying to do.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

seamusabshere
hi,

Here we go, my first patch... solves https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@...

Best,
Seamus

On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:

> Hi Seamus,
>
> On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <[hidden email]> wrote:
> > It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> >
> > Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:
> >
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index cd3fdd259c..f1ade035ac 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
> >          * If the user has set the parallel_workers reloption, use that; otherwise
> >          * select a default number of workers.
> >          */
> > +       // I want to affect this
> >         if (rel->rel_parallel_workers != -1)
> >                 parallel_workers = rel->rel_parallel_workers;
> >         else
> >
> > so I do this
> >
> > diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
> > index c687d3ee9e..597b209bfb 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
> >  bytea *
> >  partitioned_table_reloptions(Datum reloptions, bool validate)
> >  {
> > -       /*
> > -        * There are no options for partitioned tables yet, but this is able to do
> > -        * some validation.
> > -        */
> > +       static const relopt_parse_elt tab[] = {
> > +               {"parallel_workers", RELOPT_TYPE_INT,
> > +               offsetof(StdRdOptions, parallel_workers)},
> > +       };
> > +
> >         return (bytea *) build_reloptions(reloptions, validate,
> >                                                                           RELOPT_KIND_PARTITIONED,
> > -                                                                         0, NULL, 0);
> > +                                                                         sizeof(StdRdOptions),
> > +                                                                         tab, lengthof(tab));
> >  }
> >
> > That "works":
> >
> > postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
> > ALTER TABLE
> > postgres=# select relname, relkind, reloptions from pg_class where relname = 'test_3pd_cstore_partitioned';
> >            relname           | relkind |      reloptions
> > -----------------------------+---------+-----------------------
> >  test_3pd_cstore_partitioned | p       | {parallel_workers=33}
> > (1 row)
> >
> > But it seems to be ignored:
> >
> > diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
> > index cd3fdd259c..c68835ce38 100644
> > --- a/src/backend/optimizer/path/allpaths.c
> > +++ b/src/backend/optimizer/path/allpaths.c
> > @@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
> >          * If the user has set the parallel_workers reloption, use that; otherwise
> >          * select a default number of workers.
> >          */
> > +       // I want to affect this, but this assertion always passes
> > +       Assert(rel->rel_parallel_workers == -1)
> >         if (rel->rel_parallel_workers != -1)
> >                 parallel_workers = rel->rel_parallel_workers;
> >         else
>
> You may see by inspecting the callers of compute_parallel_worker()
> that it never gets called on a partitioned table, only its leaf
> partitions.  Maybe you could try calling compute_parallel_worker()
> somewhere in add_paths_to_append_rel(), which has this code to figure
> out parallel_workers to use for a parallel Append path for a given
> partitioned table:
>
>         /* Find the highest number of workers requested for any subpath. */
>         foreach(lc, partial_subpaths)
>         {
>             Path       *path = lfirst(lc);
>
>             parallel_workers = Max(parallel_workers, path->parallel_workers);
>         }
>         Assert(parallel_workers > 0);
>
>         /*
>          * If the use of parallel append is permitted, always request at least
>          * log2(# of children) workers.  We assume it can be useful to have
>          * extra workers in this case because they will be spread out across
>          * the children.  The precise formula is just a guess, but we don't
>          * want to end up with a radically different answer for a table with N
>          * partitions vs. an unpartitioned table with the same data, so the
>          * use of some kind of log-scaling here seems to make some sense.
>          */
>         if (enable_parallel_append)
>         {
>             parallel_workers = Max(parallel_workers,
>                                    fls(list_length(live_childrels)));
>             parallel_workers = Min(parallel_workers,
>                                    max_parallel_workers_per_gather);
>         }
>         Assert(parallel_workers > 0);
>
>         /* Generate a partial append path. */
>         appendpath = create_append_path(root, rel, NIL, partial_subpaths,
>                                         NIL, NULL, parallel_workers,
>                                         enable_parallel_append,
>                                         -1);
>
> Note that the 'rel' in this code refers to the partitioned table for
> which an Append path is being considered, so compute_parallel_worker()
> using that 'rel' would use the partitioned table's
> rel_parallel_workers as you are trying to do.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>

0001-Allow-setting-parallel_workers-on-partitioned-tables.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Laurenz Albe
In reply to this post by Amit Langote
On Mon, 2021-02-15 at 17:53 +0900, Amit Langote wrote:

> On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <[hidden email]> wrote:
> > It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables,
> >  at least if they're made up of fancy column store partitions (see
> >  https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:
>
> You may see by inspecting the callers of compute_parallel_worker()
> that it never gets called on a partitioned table, only its leaf
> partitions.  Maybe you could try calling compute_parallel_worker()
> somewhere in add_paths_to_append_rel(), which has this code to figure
> out parallel_workers to use for a parallel Append path for a given
> partitioned table:
>
>         /* Find the highest number of workers requested for any subpath. */
>         foreach(lc, partial_subpaths)
>         {
>             Path       *path = lfirst(lc);
>
>             parallel_workers = Max(parallel_workers, path->parallel_workers);
>         }
>         Assert(parallel_workers > 0);
>
>         /*
>          * If the use of parallel append is permitted, always request at least
>          * log2(# of children) workers.  We assume it can be useful to have
>          * extra workers in this case because they will be spread out across
>          * the children.  The precise formula is just a guess, but we don't
>          * want to end up with a radically different answer for a table with N
>          * partitions vs. an unpartitioned table with the same data, so the
>          * use of some kind of log-scaling here seems to make some sense.
>          */
>         if (enable_parallel_append)
>         {
>             parallel_workers = Max(parallel_workers,
>                                    fls(list_length(live_childrels)));
>             parallel_workers = Min(parallel_workers,
>                                    max_parallel_workers_per_gather);
>         }
>         Assert(parallel_workers > 0);
>
> Note that the 'rel' in this code refers to the partitioned table for
> which an Append path is being considered, so compute_parallel_worker()
> using that 'rel' would use the partitioned table's
> rel_parallel_workers as you are trying to do.

Note that there is a second chunk of code quite like that one a few
lines down from there that would also have to be modified.

I am +1 on allowing to override the degree of parallelism on a parallel
append.  If "parallel_workers" on the partitioned table is an option for
that, it might be a simple solution.  On the other hand, perhaps it would
be less confusing to have a different storage parameter name rather than
having "parallel_workers" do double duty.

Also, since there is a design rule that storage parameters can only be used
on partitions, we would have to change that - is that a problem for anybody?


There is another related consideration that doesn't need to be addressed
by this patch, but that is somewhat related: if the executor prunes some
partitions, the degree of parallelism is unaffected, right?
So if the planner decides to use 24 workers for 25 partitions, and the
executor discards all but one of these partition scans, we would end up
with 24 workers scanning a single partition.

I am not sure how that could be improved.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
In reply to this post by seamusabshere
Hi Seamus,

Please see my reply below.

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <[hidden email]> wrote:

> On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:
> > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <[hidden email]> wrote:
> > > It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > >
> > > Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:
>
> > You may see by inspecting the callers of compute_parallel_worker()
> > that it never gets called on a partitioned table, only its leaf
> > partitions.  Maybe you could try calling compute_parallel_worker()
> > somewhere in add_paths_to_append_rel(), which has this code to figure
> > out parallel_workers to use for a parallel Append path for a given
> > partitioned table:
> >
> >         /* Find the highest number of workers requested for any subpath. */
> >         foreach(lc, partial_subpaths)
> >         {
> >             Path       *path = lfirst(lc);
> >
> >             parallel_workers = Max(parallel_workers, path->parallel_workers);
> >         }
> >         Assert(parallel_workers > 0);
> >
> >         /*
> >          * If the use of parallel append is permitted, always request at least
> >          * log2(# of children) workers.  We assume it can be useful to have
> >          * extra workers in this case because they will be spread out across
> >          * the children.  The precise formula is just a guess, but we don't
> >          * want to end up with a radically different answer for a table with N
> >          * partitions vs. an unpartitioned table with the same data, so the
> >          * use of some kind of log-scaling here seems to make some sense.
> >          */
> >         if (enable_parallel_append)
> >         {
> >             parallel_workers = Max(parallel_workers,
> >                                    fls(list_length(live_childrels)));
> >             parallel_workers = Min(parallel_workers,
> >                                    max_parallel_workers_per_gather);
> >         }
> >         Assert(parallel_workers > 0);
> >
> >         /* Generate a partial append path. */
> >         appendpath = create_append_path(root, rel, NIL, partial_subpaths,
> >                                         NIL, NULL, parallel_workers,
> >                                         enable_parallel_append,
> >                                         -1);
> >
> > Note that the 'rel' in this code refers to the partitioned table for
> > which an Append path is being considered, so compute_parallel_worker()
> > using that 'rel' would use the partitioned table's
> > rel_parallel_workers as you are trying to do.
>
> Here we go, my first patch... solves https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@...

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR:  unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
        {
            "parallel_workers",
            "Number of parallel processes that can be used per
executor node for this relation.",
-           RELOPT_KIND_HEAP,
+           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
            ShareUpdateExclusiveLock
        },
        -1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+       /* First see if there is a root-level setting for parallel_workers */
+       parallel_workers = compute_parallel_worker(
+           rel,
+           -1,
+           -1,
+           max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase.  Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says.  Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should consider the Append parent relation's
parallel_workers ONLY if it is a partitioned relation, because it
doesn't make a lot of sense for other types of parent relations.  So
the new code should look like this:

if (IS_PARTITIONED_REL(rel))
    parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off).  That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on).  So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
    parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

BTW, please consider bottom-posting like I've done in this reply,
because that makes it easier to follow discussions involving patch
reviews that can potentially take many emails to reach conclusions.





--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
In reply to this post by Laurenz Albe
Hi,

On Tue, Feb 16, 2021 at 1:06 AM Laurenz Albe <[hidden email]> wrote:

> On Mon, 2021-02-15 at 17:53 +0900, Amit Langote wrote:
> > On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <[hidden email]> wrote:
> > > It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables,
> > >  at least if they're made up of fancy column store partitions (see
> > >  https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55%40www.fastmail.com).
> > > Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:
> >
> > You may see by inspecting the callers of compute_parallel_worker()
> > that it never gets called on a partitioned table, only its leaf
> > partitions.  Maybe you could try calling compute_parallel_worker()
> > somewhere in add_paths_to_append_rel(), which has this code to figure
> > out parallel_workers to use for a parallel Append path for a given
> > partitioned table:
> >
> >         /* Find the highest number of workers requested for any subpath. */
> >         foreach(lc, partial_subpaths)
> >         {
> >             Path       *path = lfirst(lc);
> >
> >             parallel_workers = Max(parallel_workers, path->parallel_workers);
> >         }
> >         Assert(parallel_workers > 0);
> >
> >         /*
> >          * If the use of parallel append is permitted, always request at least
> >          * log2(# of children) workers.  We assume it can be useful to have
> >          * extra workers in this case because they will be spread out across
> >          * the children.  The precise formula is just a guess, but we don't
> >          * want to end up with a radically different answer for a table with N
> >          * partitions vs. an unpartitioned table with the same data, so the
> >          * use of some kind of log-scaling here seems to make some sense.
> >          */
> >         if (enable_parallel_append)
> >         {
> >             parallel_workers = Max(parallel_workers,
> >                                    fls(list_length(live_childrels)));
> >             parallel_workers = Min(parallel_workers,
> >                                    max_parallel_workers_per_gather);
> >         }
> >         Assert(parallel_workers > 0);
> >
> > Note that the 'rel' in this code refers to the partitioned table for
> > which an Append path is being considered, so compute_parallel_worker()
> > using that 'rel' would use the partitioned table's
> > rel_parallel_workers as you are trying to do.
>
> Note that there is a second chunk of code quite like that one a few
> lines down from there that would also have to be modified.
>
> I am +1 on allowing to override the degree of parallelism on a parallel
> append.  If "parallel_workers" on the partitioned table is an option for
> that, it might be a simple solution.  On the other hand, perhaps it would
> be less confusing to have a different storage parameter name rather than
> having "parallel_workers" do double duty.
>
> Also, since there is a design rule that storage parameters can only be used
> on partitions, we would have to change that - is that a problem for anybody?

I am not aware of a rule that suggests that parallel_workers is always
interpreted using storage-level considerations.  If that is indeed a
popular interpretation at this point, then yes, we should be open to
considering a new name for the parameter that this patch wants to add.

Maybe parallel_append_workers?  Perhaps not a bad idea in this patch's
case, but considering that we may want to expand the support of
cross-partition parallelism to operations other than querying, maybe
something else?

This reminds me of something I forgot to mention in my review of the
patch -- it should also update the documentation of parallel_workers
on the CREATE TABLE page to mention that it will be interpreted a bit
differently for partitioned tables than for regular storage-bearing
relations.  Workers specified for partitioned tables would be
distributed by the executor over its partitions, unlike with
storage-bearing relations, where the distribution of specified workers
is controlled by the AM using storage-level considerations.

> There is another related consideration that doesn't need to be addressed
> by this patch, but that is somewhat related: if the executor prunes some
> partitions, the degree of parallelism is unaffected, right?

That's correct.  Launched workers could be less than planned, but that
would not have anything to do with executor pruning.

> So if the planner decides to use 24 workers for 25 partitions, and the
> executor discards all but one of these partition scans, we would end up
> with 24 workers scanning a single partition.

I do remember pondering this when testing my patches to improve the
performance of executing a generic plan to scan a partitioned table
where runtime pruning is possible.  Here is an example:

create table hp (a int) partition by hash (a);
select 'create table hp' || i || ' partition of hp for values with
(modulus 100, remainder ' || i || ');' from generate_series(0, 99) i;
\gexec
insert into hp select generate_series(0, 1000000);
alter table hp set (parallel_workers = 16);
set plan_cache_mode to force_generic_plan ;
set max_parallel_workers_per_gather to 16;
prepare q as select * from hp where a = $1;

explain (analyze, verbose) execute q (1);
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..14426.50 rows=5675 width=4) (actual
time=2.370..25.002 rows=1 loops=1)
   Output: hp.a
   Workers Planned: 16
   Workers Launched: 7
   ->  Parallel Append  (cost=0.00..12859.00 rows=400 width=4) (actual
time=0.006..0.384 rows=0 loops=8)
         Worker 0:  actual time=0.001..0.001 rows=0 loops=1
         Worker 1:  actual time=0.001..0.001 rows=0 loops=1
         Worker 2:  actual time=0.001..0.001 rows=0 loops=1
         Worker 3:  actual time=0.001..0.001 rows=0 loops=1
         Worker 4:  actual time=0.001..0.001 rows=0 loops=1
         Worker 5:  actual time=0.001..0.001 rows=0 loops=1
         Worker 6:  actual time=0.001..0.001 rows=0 loops=1
         Subplans Removed: 99
         ->  Parallel Seq Scan on public.hp40 hp_1  (cost=0.00..126.50
rows=33 width=4) (actual time=0.041..3.060 rows=1 loops=1)
               Output: hp_1.a
               Filter: (hp_1.a = $1)
               Rows Removed by Filter: 9813
 Planning Time: 5.543 ms
 Execution Time: 25.139 ms
(19 rows)

deallocate q;
set max_parallel_workers_per_gather to 0;
prepare q as select * from hp where a = $1;
explain (analyze, verbose) execute q (1);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..18754.88 rows=5675 width=4) (actual
time=0.029..2.474 rows=1 loops=1)
   Subplans Removed: 99
   ->  Seq Scan on public.hp40 hp_1  (cost=0.00..184.25 rows=56
width=4) (actual time=0.028..2.471 rows=1 loops=1)
         Output: hp_1.a
         Filter: (hp_1.a = $1)
         Rows Removed by Filter: 9813
 Planning Time: 2.231 ms
 Execution Time: 2.535 ms
(8 rows)

Comparing the Execution Times above, it's clear that Gather and
workers are pure overhead in this case.

Although in cases where one expects runtime pruning to be useful, the
plan itself is very unlikely to be parallelized. For example, when the
individual partition scans are Index Scans.

deallocate q;
create index on hp (a);
alter table hp set (parallel_workers = 16);
analyze;
set max_parallel_workers_per_gather to 16;
prepare q as select * from hp where a = $1;
explain (analyze, verbose) execute q (1);
                                                               QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------
-
 Append  (cost=0.29..430.75 rows=100 width=4) (actual
time=0.043..0.046 rows=1 loops=1)
   Subplans Removed: 99
   ->  Index Only Scan using hp40_a_idx on public.hp40 hp_1
(cost=0.29..4.30 rows=1 width=4) (actual time=0.042..0.044 rows=1
loops=1)
         Output: hp_1.a
         Index Cond: (hp_1.a = $1)
         Heap Fetches: 0
 Planning Time: 13.769 ms
 Execution Time: 0.115 ms
(8 rows)

> I am not sure how that could be improved.

The planner currently ignores runtime pruning optimization when
assigning costs to the Append path, so fixing that would be a good
start.  There are efforts underway for that, such as [1].

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://commitfest.postgresql.org/32/2829/


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Laurenz Albe
On Tue, 2021-02-16 at 16:29 +0900, Amit Langote wrote:

> > I am +1 on allowing to override the degree of parallelism on a parallel
> > append.  If "parallel_workers" on the partitioned table is an option for
> > that, it might be a simple solution.  On the other hand, perhaps it would
> > be less confusing to have a different storage parameter name rather than
> > having "parallel_workers" do double duty.
> > Also, since there is a design rule that storage parameters can only be used
> > on partitions, we would have to change that - is that a problem for anybody?
>
> I am not aware of a rule that suggests that parallel_workers is always
> interpreted using storage-level considerations.  If that is indeed a
> popular interpretation at this point, then yes, we should be open to
> considering a new name for the parameter that this patch wants to add.

Well, https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
says:

 "Specifying these parameters for partitioned tables is not supported,
  but you may specify them for individual leaf partitions."

If we re-purpose "parallel_workers" like this, we'd have to change this.

Then for a normal table, "parallel_workers" would mean how many workers
work on a parallel table scan.  For a partitioned table, it determines
how many workers work on a parallel append.

Perhaps that is similar enough that it is not confusing.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
On Tue, Feb 16, 2021 at 11:02 PM Laurenz Albe <[hidden email]> wrote:

> On Tue, 2021-02-16 at 16:29 +0900, Amit Langote wrote:
> > > I am +1 on allowing to override the degree of parallelism on a parallel
> > > append.  If "parallel_workers" on the partitioned table is an option for
> > > that, it might be a simple solution.  On the other hand, perhaps it would
> > > be less confusing to have a different storage parameter name rather than
> > > having "parallel_workers" do double duty.
> > > Also, since there is a design rule that storage parameters can only be used
> > > on partitions, we would have to change that - is that a problem for anybody?
> >
> > I am not aware of a rule that suggests that parallel_workers is always
> > interpreted using storage-level considerations.  If that is indeed a
> > popular interpretation at this point, then yes, we should be open to
> > considering a new name for the parameter that this patch wants to add.
>
> Well, https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
> says:
>
>  "Specifying these parameters for partitioned tables is not supported,
>   but you may specify them for individual leaf partitions."
>
> If we re-purpose "parallel_workers" like this, we'd have to change this.

Right, as I mentioned in my reply, the patch will need to update the
documentation.

> Then for a normal table, "parallel_workers" would mean how many workers
> work on a parallel table scan.  For a partitioned table, it determines
> how many workers work on a parallel append.
>
> Perhaps that is similar enough that it is not confusing.

I tend to agree with that.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

RE: A reloption for partitioned tables - parallel_workers

Hou, Zhijie
In reply to this post by seamusabshere
> hi,
>
> Here we go, my first patch... solves
> https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520
> [hidden email]
>

Hi,

partitioned_table_reloptions(Datum reloptions, bool validate)
 {
+ static const relopt_parse_elt tab[] = {
+ {"parallel_workers", RELOPT_TYPE_INT,
+ offsetof(StdRdOptions, parallel_workers)},
+ };
+
  return (bytea *) build_reloptions(reloptions, validate,
   RELOPT_KIND_PARTITIONED,
-  0, NULL, 0);
+  sizeof(StdRdOptions),
+  tab, lengthof(tab));
 }

I noticed that you plan to store the parallel_workers in the same struct(StdRdOptions) as normal heap relation.
It seems better to store it in a separate struct.

And as commit 1bbd608 said:
----
> Splitting things has the advantage to
> make the information stored in rd_options include only the necessary
> information, reducing the amount of memory used for a relcache entry
> with partitioned tables if new reloptions are introduced at this level.
----

What do you think?

Best regards,
Houzj








Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
On Thu, Feb 18, 2021 at 6:06 PM Hou, Zhijie <[hidden email]> wrote:

> > Here we go, my first patch... solves
> > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520
> > [hidden email]
> >
>
> Hi,
>
> partitioned_table_reloptions(Datum reloptions, bool validate)
>  {
> +       static const relopt_parse_elt tab[] = {
> +               {"parallel_workers", RELOPT_TYPE_INT,
> +               offsetof(StdRdOptions, parallel_workers)},
> +       };
> +
>         return (bytea *) build_reloptions(reloptions, validate,
>                                                                           RELOPT_KIND_PARTITIONED,
> -                                                                         0, NULL, 0);
> +                                                                         sizeof(StdRdOptions),
> +                                                                         tab, lengthof(tab));
>  }
>
> I noticed that you plan to store the parallel_workers in the same struct(StdRdOptions) as normal heap relation.
> It seems better to store it in a separate struct.
>
> And as commit 1bbd608 said:
> ----
> > Splitting things has the advantage to
> > make the information stored in rd_options include only the necessary
> > information, reducing the amount of memory used for a relcache entry
> > with partitioned tables if new reloptions are introduced at this level.
> ----
>
> What do you think?

That may be a good idea.  So instead of referring to the
parallel_workers in StdRdOptions, define a new one, say,
PartitionedTableRdOptions as follows and refer to it in
partitioned_table_reloptions():

typedef struct PartitionedTableRdOptions
{
    int32       vl_len_;        /* varlena header (do not touch directly!) */
    int         parallel_workers;   /* max number of parallel workers */
} PartitionedTableRdOptions;

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
In reply to this post by Amit Langote
On Tue, Feb 16, 2021 at 3:05 PM Amit Langote <[hidden email]> wrote:

> On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <[hidden email]> wrote:
> > Here we go, my first patch... solves https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@...
>
> Thanks for sending the patch here.
>
> It seems you haven't made enough changes for reloptions code to
> recognize parallel_workers as valid for partitioned tables, because
> even with the patch applied, I get this:
>
> create table rp (a int) partition by range (a);
> create table rp1 partition of rp for values from (minvalue) to (0);
> create table rp2 partition of rp for values from (0) to (maxvalue);
> alter table rp set (parallel_workers = 1);
> ERROR:  unrecognized parameter "parallel_workers"
>
> You need this:
>
> diff --git a/src/backend/access/common/reloptions.c
> b/src/backend/access/common/reloptions.c
> index 029a73325e..9eb8a0c10d 100644
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
>         {
>             "parallel_workers",
>             "Number of parallel processes that can be used per
> executor node for this relation.",
> -           RELOPT_KIND_HEAP,
> +           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
>             ShareUpdateExclusiveLock
>         },
>         -1, 0, 1024
>
> which tells reloptions parsing code that parallel_workers is
> acceptable for both heap and partitioned relations.
>
> Other comments on the patch:
>
> * This function calling style, where the first argument is not placed
> on the same line as the function itself, is not very common in
> Postgres:
>
> +       /* First see if there is a root-level setting for parallel_workers */
> +       parallel_workers = compute_parallel_worker(
> +           rel,
> +           -1,
> +           -1,
> +           max_parallel_workers_per_gather
> +
>
> This makes the new code look very different from the rest of the
> codebase.  Better to stick to existing styles.
>
> 2. It might be a good idea to use this opportunity to add a function,
> say compute_append_parallel_workers(), for the code that does what the
> function name says.  Then the patch will simply add the new
> compute_parallel_worker() call at the top of that function.
>
> 3. I think we should consider the Append parent relation's
> parallel_workers ONLY if it is a partitioned relation, because it
> doesn't make a lot of sense for other types of parent relations.  So
> the new code should look like this:
>
> if (IS_PARTITIONED_REL(rel))
>     parallel_workers = compute_parallel_worker(rel, -1, -1,
> max_parallel_workers_per_gather);
>
> 4. Maybe it also doesn't make sense to consider the parent relation's
> parallel_workers if Parallel Append is disabled
> (enable_parallel_append = off).  That's because with a simple
> (non-parallel) Append running under Gather, all launched parallel
> workers process the same partition before moving to the next one.
> OTOH, one's intention of setting parallel_workers on the parent
> partitioned table would most likely be to distribute workers across
> partitions, which is only possible with parallel Append
> (enable_parallel_append = on).  So, the new code should look like
> this:
>
> if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
>     parallel_workers = compute_parallel_worker(rel, -1, -1,
> max_parallel_workers_per_gather);
Here is an updated version of the Seamus' patch that takes into
account these and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to
use that site.

--
Amit Langote
EDB: http://www.enterprisedb.com

v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

seamusabshere
hi Amit,

Thanks so much for doing this. I had created

https://commitfest.postgresql.org/32/2987/

and it looks like it now shows your patch as the one to use. Let me know if I should do anything else.

Best,
Seamus

On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:

> On Tue, Feb 16, 2021 at 3:05 PM Amit Langote <[hidden email]> wrote:
> > On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <[hidden email]> wrote:
> > > Here we go, my first patch... solves https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@...
> >
> > Thanks for sending the patch here.
> >
> > It seems you haven't made enough changes for reloptions code to
> > recognize parallel_workers as valid for partitioned tables, because
> > even with the patch applied, I get this:
> >
> > create table rp (a int) partition by range (a);
> > create table rp1 partition of rp for values from (minvalue) to (0);
> > create table rp2 partition of rp for values from (0) to (maxvalue);
> > alter table rp set (parallel_workers = 1);
> > ERROR:  unrecognized parameter "parallel_workers"
> >
> > You need this:
> >
> > diff --git a/src/backend/access/common/reloptions.c
> > b/src/backend/access/common/reloptions.c
> > index 029a73325e..9eb8a0c10d 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
> >         {
> >             "parallel_workers",
> >             "Number of parallel processes that can be used per
> > executor node for this relation.",
> > -           RELOPT_KIND_HEAP,
> > +           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> >             ShareUpdateExclusiveLock
> >         },
> >         -1, 0, 1024
> >
> > which tells reloptions parsing code that parallel_workers is
> > acceptable for both heap and partitioned relations.
> >
> > Other comments on the patch:
> >
> > * This function calling style, where the first argument is not placed
> > on the same line as the function itself, is not very common in
> > Postgres:
> >
> > +       /* First see if there is a root-level setting for parallel_workers */
> > +       parallel_workers = compute_parallel_worker(
> > +           rel,
> > +           -1,
> > +           -1,
> > +           max_parallel_workers_per_gather
> > +
> >
> > This makes the new code look very different from the rest of the
> > codebase.  Better to stick to existing styles.
> >
> > 2. It might be a good idea to use this opportunity to add a function,
> > say compute_append_parallel_workers(), for the code that does what the
> > function name says.  Then the patch will simply add the new
> > compute_parallel_worker() call at the top of that function.
> >
> > 3. I think we should consider the Append parent relation's
> > parallel_workers ONLY if it is a partitioned relation, because it
> > doesn't make a lot of sense for other types of parent relations.  So
> > the new code should look like this:
> >
> > if (IS_PARTITIONED_REL(rel))
> >     parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
> >
> > 4. Maybe it also doesn't make sense to consider the parent relation's
> > parallel_workers if Parallel Append is disabled
> > (enable_parallel_append = off).  That's because with a simple
> > (non-parallel) Append running under Gather, all launched parallel
> > workers process the same partition before moving to the next one.
> > OTOH, one's intention of setting parallel_workers on the parent
> > partitioned table would most likely be to distribute workers across
> > partitions, which is only possible with parallel Append
> > (enable_parallel_append = on).  So, the new code should look like
> > this:
> >
> > if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
> >     parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
>
> Here is an updated version of the Seamus' patch that takes into
> account these and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.
>
> Seamus, please register this patch in the next commit-fest:
> https://commitfest.postgresql.org/32/
>
> If you haven't already, you will need to create a community account to
> use that site.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>
> Attachments:
> * v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
On Fri, Feb 19, 2021 at 11:54 PM Seamus Abshere <[hidden email]> wrote:

> On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:
> > Here is an updated version of the Seamus' patch that takes into
> > account these and other comments received on this thread so far.
> > Maybe warrants adding some tests too but I haven't.
> >
> > Seamus, please register this patch in the next commit-fest:
> > https://commitfest.postgresql.org/32/
> >
> > If you haven't already, you will need to create a community account to
> > use that site.
>
> hi Amit,
>
> Thanks so much for doing this. I had created
>
> https://commitfest.postgresql.org/32/2987/

Ah, sorry, I had not checked.  I updated the entry to add you as the author.

> and it looks like it now shows your patch as the one to use. Let me know if I should do anything else.

You could take a look at the latest patch and see if you find any
problems with my or others' suggestions that I implemented in the v2
patch.  Also, please add regression tests for the new feature in
src/test/regress/sql/select_parallel.sql.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Laurenz Albe
In reply to this post by Amit Langote
On Fri, 2021-02-19 at 16:30 +0900, Amit Langote wrote:
> On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <[hidden email]> wrote:
> > > Here we go, my first patch... solves https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@...
>
> Here is an updated version of the Seamus' patch that takes into
> account these and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.

Yes, there should be regression tests.

I gave the patch a spin, and it allows to raise the number of workers for
a parallel append as advertised.

--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     If a table parameter value is set and the
     equivalent <literal>toast.</literal> parameter is not, the TOAST table
     will use the table's parameter value.
-    Specifying these parameters for partitioned tables is not supported,
-    but you may specify them for individual leaf partitions.
+    Specifying most of these parameters for partitioned tables is not
+    supported, but you may specify them for individual leaf partitions;
+    refer to the description of individual parameters for more details.
    </para>

This doesn't make me happy.  Since the options themselves do not say if they
are supported on partitioned tables or not, the reader is left in the dark.

Perhaps:

  These options, with the exception of <literal>parallel_workers</literal>,
  are not supported on partitioned tables, but you may specify them for individual
  leaf partitions.

@@ -1401,9 +1402,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      <para>
       This sets the number of workers that should be used to assist a parallel
       scan of this table.  If not set, the system will determine a value based
-      on the relation size.  The actual number of workers chosen by the planner
-      or by utility statements that use parallel scans may be less, for example
-      due to the setting of <xref linkend="guc-max-worker-processes"/>.
+      on the relation size.  When set on a partitioned table, the specified
+      number of workers will work on distinct partitions, so the number of
+      partitions affected by the parallel operation should be taken into
+      account.  The actual number of workers chosen by the planner or by
+      utility statements that use parallel scans may be less, for example due
+      to the setting of <xref linkend="guc-max-worker-processes"/>.
      </para>
     </listitem>
    </varlistentry>

The reader is left to believe that the default number of workers depends on the
size of the partitioned table, which is not entirely true.

Perhaps:

  If not set, the system will determine a value based on the relation size and
  the number of scanned partitions.

--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1268,6 +1268,59 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
    add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * compute_append_parallel_workers
+ *         Computes the number of workers to assign to scan the subpaths appended
+ *         by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+                               int num_live_children,
+                               bool parallel_append)

The new function should have a prototype.

+{
+   ListCell   *lc;
+   int         parallel_workers = 0;
+
+   /*
+    * For partitioned rels, first see if there is a root-level setting for
+    * parallel_workers.  But only consider if a Parallel Append plan is
+    * to be considered.
+    */
+   if (IS_PARTITIONED_REL(rel) && parallel_append)
+       parallel_workers =
+           compute_parallel_worker(rel, -1, -1,
+                                   max_parallel_workers_per_gather);
+
+   /* Find the highest number of workers requested for any subpath. */
+   foreach(lc, subpaths)
+   foreach(lc, subpaths)
+   {
+       Path       *path = lfirst(lc);
+
+       parallel_workers = Max(parallel_workers, path->parallel_workers);
+   }
+   Assert(parallel_workers > 0 || subpaths == NIL);
+
+   /*
+    * If the use of parallel append is permitted, always request at least
+    * log2(# of children) workers.  We assume it can be useful to have
+    * extra workers in this case because they will be spread out across
+    * the children.  The precise formula is just a guess, but we don't
+    * want to end up with a radically different answer for a table with N
+    * partitions vs. an unpartitioned table with the same data, so the
+    * use of some kind of log-scaling here seems to make some sense.
+    */
+   if (parallel_append)
+   {
+       parallel_workers = Max(parallel_workers,
+                              fls(num_live_children));
+       parallel_workers = Min(parallel_workers,
+                              max_parallel_workers_per_gather);
+   }
+   Assert(parallel_workers > 0);
+
+   return parallel_workers;
+}

That means that it is not possible to *lower* the number of parallel workers
with this reloption, which seems to me a valid use case.

I think that if the option is set, it should override the number of workers
inherited from the partitions, and it should override the log2 default.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

RE: A reloption for partitioned tables - parallel_workers

houzj.fnst@fujitsu.com
In reply to this post by Amit Langote
> > 4. Maybe it also doesn't make sense to consider the parent relation's
> > parallel_workers if Parallel Append is disabled
> > (enable_parallel_append = off).  That's because with a simple
> > (non-parallel) Append running under Gather, all launched parallel
> > workers process the same partition before moving to the next one.
> > OTOH, one's intention of setting parallel_workers on the parent
> > partitioned table would most likely be to distribute workers across
> > partitions, which is only possible with parallel Append
> > (enable_parallel_append = on).  So, the new code should look like
> > this:
> >
> > if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
> >     parallel_workers = compute_parallel_worker(rel, -1, -1,
> > max_parallel_workers_per_gather);
>
> Here is an updated version of the Seamus' patch that takes into account these
> and other comments received on this thread so far.
> Maybe warrants adding some tests too but I haven't.
>
> Seamus, please register this patch in the next commit-fest:
> https://commitfest.postgresql.org/32/
>
> If you haven't already, you will need to create a community account to use that
> site.

It seems the patch does not include the code that get the parallel_workers from new struct " PartitionedTableRdOptions ",
Did I miss something ?

Best regards,
houzj

Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
Hi,

On Tue, Feb 23, 2021 at 3:12 PM [hidden email]
<[hidden email]> wrote:

> > Here is an updated version of the Seamus' patch that takes into account these
> > and other comments received on this thread so far.
> > Maybe warrants adding some tests too but I haven't.
> >
> > Seamus, please register this patch in the next commit-fest:
> > https://commitfest.postgresql.org/32/
> >
> > If you haven't already, you will need to create a community account to use that
> > site.
>
> It seems the patch does not include the code that get the parallel_workers from new struct " PartitionedTableRdOptions ",
> Did I miss something ?

Aren't the following hunks in the v2 patch what you meant?

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
  {
  "parallel_workers",
  "Number of parallel processes that can be used per executor node for
this relation.",
- RELOPT_KIND_HEAP,
+ RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
  ShareUpdateExclusiveLock
  },
  -1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
  /*
- * There are no options for partitioned tables yet, but this is able to do
- * some validation.
+ * Currently the only setting known to be useful for partitioned tables
+ * is parallel_workers.
  */
+ static const relopt_parse_elt tab[] = {
+ {"parallel_workers", RELOPT_TYPE_INT,
+ offsetof(PartitionedTableRdOptions, parallel_workers)},
+ };
+
  return (bytea *) build_reloptions(reloptions, validate,
    RELOPT_KIND_PARTITIONED,
-   0, NULL, 0);
+   sizeof(PartitionedTableRdOptions),
+   tab, lengthof(tab));
 }

 /*

diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..fe114e0856 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
  bool vacuum_truncate; /* enables vacuum to truncate a relation */
 } StdRdOptions;

+/*
+ * PartitionedTableRdOptions
+ * Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedTableRdOptions
+{
+ int32 vl_len_; /* varlena header (do not touch directly!) */
+ int parallel_workers; /* max number of parallel workers */
+} PartitionedTableRdOptions;
+
 #define HEAP_MIN_FILLFACTOR 10
 #define HEAP_DEFAULT_FILLFACTOR 100

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

RE: A reloption for partitioned tables - parallel_workers

houzj.fnst@fujitsu.com
> > It seems the patch does not include the code that get the
> > parallel_workers from new struct " PartitionedTableRdOptions ", Did I miss
> something ?
>
> Aren't the following hunks in the v2 patch what you meant?
>
> diff --git a/src/backend/access/common/reloptions.c
> b/src/backend/access/common/reloptions.c
> index c687d3ee9e..f8443d2361 100644
> --- a/src/backend/access/common/reloptions.c
> +++ b/src/backend/access/common/reloptions.c
> @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
>   {
>   "parallel_workers",
>   "Number of parallel processes that can be used per executor node for this
> relation.",
> - RELOPT_KIND_HEAP,
> + RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
>   ShareUpdateExclusiveLock
>   },
>   -1, 0, 1024
> @@ -1962,12 +1962,18 @@ bytea *
>  partitioned_table_reloptions(Datum reloptions, bool validate)  {
>   /*
> - * There are no options for partitioned tables yet, but this is able to do
> - * some validation.
> + * Currently the only setting known to be useful for partitioned tables
> + * is parallel_workers.
>   */
> + static const relopt_parse_elt tab[] = { {"parallel_workers",
> + RELOPT_TYPE_INT, offsetof(PartitionedTableRdOptions,
> + parallel_workers)}, };
> +
>   return (bytea *) build_reloptions(reloptions, validate,
>     RELOPT_KIND_PARTITIONED,
> -   0, NULL, 0);
> +   sizeof(PartitionedTableRdOptions),
> +   tab, lengthof(tab));
>  }
>
>  /*
>
> diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index
> 10b63982c0..fe114e0856 100644
> --- a/src/include/utils/rel.h
> +++ b/src/include/utils/rel.h
> @@ -308,6 +308,16 @@ typedef struct StdRdOptions
>   bool vacuum_truncate; /* enables vacuum to truncate a relation */  }
> StdRdOptions;
>
> +/*
> + * PartitionedTableRdOptions
> + * Contents of rd_options for partitioned tables  */ typedef struct
> +PartitionedTableRdOptions {
> + int32 vl_len_; /* varlena header (do not touch directly!) */  int
> +parallel_workers; /* max number of parallel workers */ }
> +PartitionedTableRdOptions;
> +
>  #define HEAP_MIN_FILLFACTOR 10
>  #define HEAP_DEFAULT_FILLFACTOR 100
Hi,

I am not sure.
IMO, for normal table, we use the following macro to get the parallel_workers:
----------------------
/*
 * RelationGetParallelWorkers
 * Returns the relation's parallel_workers reloption setting.
 * Note multiple eval of argument!
 */
#define RelationGetParallelWorkers(relation, defaultpw) \
        ((relation)->rd_options ? \
         ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
----------------------

Since we add new struct " PartitionedTableRdOptions ", It seems we need to get parallel_workers in different way.
Do we need similar macro to get partitioned table's parallel_workers ?

Like:
#define PartitionedTableGetParallelWorkers(relation, defaultpw) \
xxx
(PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))

Best regards,
houzj

Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
On Tue, Feb 23, 2021 at 7:24 PM [hidden email]
<[hidden email]> wrote:

> > > It seems the patch does not include the code that get the
> > > parallel_workers from new struct " PartitionedTableRdOptions ", Did I miss
> > something ?
> >
> > Aren't the following hunks in the v2 patch what you meant?
> >
> > diff --git a/src/backend/access/common/reloptions.c
> > b/src/backend/access/common/reloptions.c
> > index c687d3ee9e..f8443d2361 100644
> > --- a/src/backend/access/common/reloptions.c
> > +++ b/src/backend/access/common/reloptions.c
> > @@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
> >   {
> >   "parallel_workers",
> >   "Number of parallel processes that can be used per executor node for this
> > relation.",
> > - RELOPT_KIND_HEAP,
> > + RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> >   ShareUpdateExclusiveLock
> >   },
> >   -1, 0, 1024
> > @@ -1962,12 +1962,18 @@ bytea *
> >  partitioned_table_reloptions(Datum reloptions, bool validate)  {
> >   /*
> > - * There are no options for partitioned tables yet, but this is able to do
> > - * some validation.
> > + * Currently the only setting known to be useful for partitioned tables
> > + * is parallel_workers.
> >   */
> > + static const relopt_parse_elt tab[] = { {"parallel_workers",
> > + RELOPT_TYPE_INT, offsetof(PartitionedTableRdOptions,
> > + parallel_workers)}, };
> > +
> >   return (bytea *) build_reloptions(reloptions, validate,
> >     RELOPT_KIND_PARTITIONED,
> > -   0, NULL, 0);
> > +   sizeof(PartitionedTableRdOptions),
> > +   tab, lengthof(tab));
> >  }
> >
> >  /*
> >
> > diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index
> > 10b63982c0..fe114e0856 100644
> > --- a/src/include/utils/rel.h
> > +++ b/src/include/utils/rel.h
> > @@ -308,6 +308,16 @@ typedef struct StdRdOptions
> >   bool vacuum_truncate; /* enables vacuum to truncate a relation */  }
> > StdRdOptions;
> >
> > +/*
> > + * PartitionedTableRdOptions
> > + * Contents of rd_options for partitioned tables  */ typedef struct
> > +PartitionedTableRdOptions {
> > + int32 vl_len_; /* varlena header (do not touch directly!) */  int
> > +parallel_workers; /* max number of parallel workers */ }
> > +PartitionedTableRdOptions;
> > +
> >  #define HEAP_MIN_FILLFACTOR 10
> >  #define HEAP_DEFAULT_FILLFACTOR 100
> Hi,
>
> I am not sure.
> IMO, for normal table, we use the following macro to get the parallel_workers:
> ----------------------
> /*
>  * RelationGetParallelWorkers
>  *              Returns the relation's parallel_workers reloption setting.
>  *              Note multiple eval of argument!
>  */
> #define RelationGetParallelWorkers(relation, defaultpw) \
>         ((relation)->rd_options ? \
>          ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
> ----------------------
>
> Since we add new struct " PartitionedTableRdOptions ", It seems we need to get parallel_workers in different way.
> Do we need similar macro to get partitioned table's parallel_workers ?

Oh, you're right.  The parallel_workers setting of a relation is only
accessed through this macro, even for partitioned tables, and I can
see that it is actually wrong to access a partitioned table's
parallel_workers through this macro as-is.   Although I hadn't tested
that, so thanks for pointing that out.

> Like:
> #define PartitionedTableGetParallelWorkers(relation, defaultpw) \
> xxx
> (PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))

I'm thinking it would be better to just modify the existing macro to
check relkind to decide which struct pointer type to cast the value in
rd_options to.

I will post an updated patch later.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Laurenz Albe
On Mon, 2021-03-01 at 17:39 +0900, Amit Langote wrote:

> > I am not sure.
> > IMO, for normal table, we use the following macro to get the parallel_workers:
> > ----------------------
> > /*
> >   * RelationGetParallelWorkers
> >   *              Returns the relation's parallel_workers reloption setting.
> >   *              Note multiple eval of argument!
> >   */
> > #define RelationGetParallelWorkers(relation, defaultpw) \
> >          ((relation)->rd_options ? \
> >           ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
> > ----------------------
> > Since we add new struct " PartitionedTableRdOptions ", It seems we need to get parallel_workers in different way.
> > Do we need similar macro to get partitioned table's parallel_workers ?
>
> Oh, you're right.  The parallel_workers setting of a relation is only
> accessed through this macro, even for partitioned tables, and I can
> see that it is actually wrong to access a partitioned table's
> parallel_workers through this macro as-is.   Although I hadn't tested
> that, so thanks for pointing that out.
>
> > Like:
> > #define PartitionedTableGetParallelWorkers(relation, defaultpw) \
> > xxx
> > (PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
>
> I'm thinking it would be better to just modify the existing macro to
> check relkind to decide which struct pointer type to cast the value in
> rd_options to.
Here is an updated patch with this fix.

I added regression tests and adapted the documentation a bit.

I also added support for lowering the number of parallel workers.

Yours,
Laurenz Albe

v3-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch (21K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: A reloption for partitioned tables - parallel_workers

Amit Langote
On Tue, Mar 2, 2021 at 12:10 AM Laurenz Albe <[hidden email]> wrote:
> Here is an updated patch with this fix.

Thanks for updating the patch.  I was about to post an updated version
myself but you beat me to it.

> I added regression tests and adapted the documentation a bit.
>
> I also added support for lowering the number of parallel workers.

+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)

I got the same result with my implementation, but I am wondering if
setting parallel_workers=0 on the parent table shouldn't really
disable a regular (non-parallel-aware) Append running under Gather
even if it does Parallel Append (parallel-aware)?  So in this test
case, there should have been a Gather atop Append, with individual
partitions scanned using Parallel Seq Scan where applicable.


--
Amit Langote
EDB: http://www.enterprisedb.com


12