Parallel INSERT (INTO ... SELECT ...)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
363 messages Options
123456 ... 19
Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Bharath Rupireddy
On Thu, Oct 8, 2020 at 1:42 PM Greg Nancarrow <[hidden email]> wrote:

>
> On Wed, Oct 7, 2020 at 7:25 PM Greg Nancarrow <[hidden email]> wrote:
> >
> > On Wed, Oct 7, 2020 at 12:40 AM Bharath Rupireddy
> > <[hidden email]> wrote:
> > > In parallel, we are not doing anything(due to the same reason
> > > explained in above comment) to find whether there is a foreign
> > > partition or not while deciding to go with parallel/non-parallel copy,
> > > we are just throwing an error during the first tuple insertion into
> > > the partition.
> > >
> > > errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD
> > > OF triggers, or if the partition is foreign partition"),
> > >                             errhint("Try COPY without PARALLEL option")));
> > >
> >
> > I'm wondering whether code similar to the following can safely be used
> > to detect a foreign partition:
> >
> >     if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> >     {
> >         int i;
> >         PartitionDesc pd = RelationGetPartitionDesc(rel);
> >         for (i = 0; i < pd->nparts; i++)
> >         {
> >             if (get_rel_relkind(pd->oids[i]) == RELKIND_FOREIGN_TABLE)
> >             {
> >                 table_close(rel, NoLock);
> >                 return false;
> >             }
> >         }
> >     }
> >
>
> Actually, the addition of this kind of check is still not good enough.
> Partitions can have their own constraints, triggers, column default
> expressions etc. and a partition itself can be partitioned.
> I've written code to recursively walk the partitions and do all the
> various checks for parallel-insert-safety as before, but it's doing a
> fair bit of work.
> Any other idea of dealing with this? Seems it can't be avoided if you
> want to support partitioned tables and partitions.
>

IMHO, it's good not to do all of this recursive checking right now,
which may complicate the code or may restrict the performance gain.
Having said that, in future we may have to do something about it.

Others may have better opinions on this point.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Thomas Munro-5
In reply to this post by Greg Nancarrow
On Tue, Oct 6, 2020 at 10:38 PM Greg Nancarrow <[hidden email]> wrote:
+            if (estate->es_plannedstmt->commandType == CMD_INSERT)
...
+    if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
...
+        isParallelInsertLeader = nodeModifyTableState->operation == CMD_INSERT;
...

One thing I noticed is that you have logic, variable names and
assertions all over the tree that assume that we can only do parallel
*inserts*.  I agree 100% with your plan to make Parallel Insert work
first, it is an excellent goal and if we get it in it'll be a headline
feature of PG14 (along with COPY etc).  That said, I wonder if it
would make sense to use more general naming (isParallelModifyLeader?),
be more liberal where you really mean "is it DML", and find a way to
centralise the logic about which DML commands types are currently
allowed (ie insert only for now) for assertions and error checks etc,
so that in future we don't have to go around and change all these
places and rename things again and again.

While contemplating that, I couldn't resist taking a swing at the main
(?) show stopper for Parallel Update and Parallel Delete, judging by
various clues left in code comments by Robert: combo command IDs
created by other processes.  Here's a rapid prototype to make that
work (though perhaps not as efficiently as we'd want, not sure).  With
that in place, I wonder what else we'd need to extend your patch to
cover all three operations... it can't be much!  Of course I don't
want to derail your work on Parallel Insert, I'm just providing some
motivation for my comments on the (IMHO) shortsightedness of some of
the coding.

PS Why not use git format-patch to create patches?

0001-Coordinate-combo-command-IDs-with-parallel-workers.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Greg Nancarrow
On Fri, Oct 9, 2020 at 8:41 AM Thomas Munro <[hidden email]> wrote:

> One thing I noticed is that you have logic, variable names and
> assertions all over the tree that assume that we can only do parallel
> *inserts*.  I agree 100% with your plan to make Parallel Insert work
> first, it is an excellent goal and if we get it in it'll be a headline
> feature of PG14 (along with COPY etc).  That said, I wonder if it
> would make sense to use more general naming (isParallelModifyLeader?),
> be more liberal where you really mean "is it DML", and find a way to
> centralise the logic about which DML commands types are currently
> allowed (ie insert only for now) for assertions and error checks etc,
> so that in future we don't have to go around and change all these
> places and rename things again and again.
>

Fair points.
I agree, it would make more sense to generalise the naming and
centralise the DML-command-type checks, rather than everything being
insert-specific.
It was getting a bit ugly. I'll work on that.

> While contemplating that, I couldn't resist taking a swing at the main
> (?) show stopper for Parallel Update and Parallel Delete, judging by
> various clues left in code comments by Robert: combo command IDs
> created by other processes.  Here's a rapid prototype to make that
> work (though perhaps not as efficiently as we'd want, not sure).  With
> that in place, I wonder what else we'd need to extend your patch to
> cover all three operations... it can't be much!  Of course I don't
> want to derail your work on Parallel Insert, I'm just providing some
> motivation for my comments on the (IMHO) shortsightedness of some of
> the coding.
>

Thanks for your prototype code for coordination of combo command IDs
with the workers.
It does give me the incentive to look beyond that issue and see
whether parallel Update and parallel Delete are indeed possible. I'll
be sure to give it a go!

> PS Why not use git format-patch to create patches?

Guess I was being a bit lazy - will use git format-patch in future.


Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Thomas Munro-5
On Fri, Oct 9, 2020 at 3:48 PM Greg Nancarrow <[hidden email]> wrote:
> It does give me the incentive to look beyond that issue and see
> whether parallel Update and parallel Delete are indeed possible. I'll
> be sure to give it a go!

Cool!

A couple more observations:

+       pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;

Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
If you change it to assign false always, it works just the same,
except EXPLAIN says:

 Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
   Workers Planned: 2
   ->  Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
         ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)

... instead of:

 Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
   Workers Planned: 2
   ->  Parallel Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
         ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)

AFAICS it's not parallel-aware, it just happens to be running in
parallel with a partial input and partial output (and in this case,
effect in terms of writes).  Parallel-aware is our term for nodes that
actually know they are running in parallel and do some special
coordination with their twins in other processes.

The estimated row count also looks wrong; at a guess, the parallel
divisor is applied twice.  Let me try that with
parallel_leader_particiation=off (which disables some funky maths in
the row estimation and makes it straight division by number of
processes):

 Gather  (cost=17629.00..18645.50 rows=1000000 width=4)
   Workers Planned: 2
   ->  Insert on s  (cost=17629.00..18645.50 rows=250000 width=4)
         ->  Parallel Hash Join  (cost=17629.00..37291.00 rows=500000 width=4)
[more nodes omitted]

Yeah, that was a join that spat out a million rows, and we correctly
estimated 500k per process, and then Insert (still with my hack to
turn off the bogus "Parallel" display in this case, but it doesn't
affect the estimation) estimated 250k per process, which is wrong.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Greg Nancarrow
On Fri, Oct 9, 2020 at 6:31 PM Thomas Munro <[hidden email]> wrote:

>
> A couple more observations:
>
> +       pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;
>
> Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
> If you change it to assign false always, it works just the same,
> except EXPLAIN says:
>
>  Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
>          ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)
>
> ... instead of:
>
>  Gather  (cost=15428.00..16101.14 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Parallel Insert on s  (cost=15428.00..16101.14 rows=208334 width=4)
>          ->  Parallel Hash Join  (cost=15428.00..32202.28 rows=416667 width=4)
>
> AFAICS it's not parallel-aware, it just happens to be running in
> parallel with a partial input and partial output (and in this case,
> effect in terms of writes).  Parallel-aware is our term for nodes that
> actually know they are running in parallel and do some special
> coordination with their twins in other processes.
>

Ah, thanks, I see the distinction now. I'll fix that, to restore
parallel_aware=false for the ModifyTable node.

> The estimated row count also looks wrong; at a guess, the parallel
> divisor is applied twice.  Let me try that with
> parallel_leader_particiation=off (which disables some funky maths in
> the row estimation and makes it straight division by number of
> processes):
>
>  Gather  (cost=17629.00..18645.50 rows=1000000 width=4)
>    Workers Planned: 2
>    ->  Insert on s  (cost=17629.00..18645.50 rows=250000 width=4)
>          ->  Parallel Hash Join  (cost=17629.00..37291.00 rows=500000 width=4)
> [more nodes omitted]
>
> Yeah, that was a join that spat out a million rows, and we correctly
> estimated 500k per process, and then Insert (still with my hack to
> turn off the bogus "Parallel" display in this case, but it doesn't
> affect the estimation) estimated 250k per process, which is wrong.

Thanks, I did suspect the current costing was wrong for ModifyTable
(workers>0 case), as I'd thrown it in (moving current costing code
into costsize.c) without a lot of checking or great thought, and was
on my TODO list of things to check. At least I created a placeholder
for it. Looks like I've applied a parallel-divisor again (not allowing
for that of the underlying query), as you said.
Speaking of costing, I'm not sure I really agree with the current
costing of a Gather node. Just considering a simple Parallel SeqScan
case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
Gather cost always completely drowns out any other path costs when a
large number of rows are involved (at least with default
parallel-related GUC values), such that Parallel SeqScan would never
be the cheapest path. This linear relationship in the costing based on
the rows and a parallel_tuple_cost doesn't make sense to me. Surely
after a certain amount of rows, the overhead of launching workers will
be out-weighed by the benefit of their parallel work, such that the
more rows, the more likely a Parallel SeqScan will benefit. That seems
to suggest something like a logarithmic formula (or similar) would
better match reality than what we have now. Am I wrong on this? Every
time I use default GUC values, the planner doesn't want to generate a
parallel plan. Lowering parallel-related GUCs like parallel_tuple_cost
(which I normally do for testing) influences it of course, but the
linear relationship still seems wrong.

Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
In reply to this post by Greg Nancarrow
On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <[hidden email]> wrote:
>
> On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <[hidden email]> wrote:
>
> Also, I have attached a separate patch (requested by Andres Freund)
> that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> ..." to be parallel.
>

It might be a good idea to first just get this patch committed, if
possible. So, I have reviewed the latest version of this patch:

0001-InsertParallelSelect
1.
ParallelContext *pcxt;

+ /*
+ * We need to avoid an attempt on INSERT to assign a
+ * FullTransactionId whilst in parallel mode (which is in
+ * effect due to the underlying parallel query) - so the
+ * FullTransactionId is assigned here. Parallel mode must
+ * be temporarily escaped in order for this to be possible.
+ * The FullTransactionId will be included in the transaction
+ * state that is serialized in the parallel DSM.
+ */
+ if (estate->es_plannedstmt->commandType == CMD_INSERT)
+ {
+ Assert(IsInParallelMode());
+ ExitParallelMode();
+ GetCurrentFullTransactionId();
+ EnterParallelMode();
+ }
+

This looks like a hack to me. I think you are doing this to avoid the
parallel mode checks in GetNewTransactionId(), right? If so, I have
already mentioned above [1] that we can change it so that we disallow
assigning xids for parallel workers only. The same is true for the
check in ExecGatherMerge. Do you see any problem with that suggestion?

2.
@@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  */
  if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
  IsUnderPostmaster &&
- parse->commandType == CMD_SELECT &&
+ (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
  !parse->hasModifyingCTE &&
  max_parallel_workers_per_gather > 0 &&
  !IsParallelWorker())

I think the comments above this need to be updated especially the part
where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
the leader backend writes into a completely new table.". Don't we need
to include Insert also?

3.
@@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  * parallel-unsafe, or else the query planner itself has a bug.
  */
  glob->parallelModeNeeded = glob->parallelModeOK &&
+ (parse->commandType == CMD_SELECT) &&
  (force_parallel_mode != FORCE_PARALLEL_OFF);

Why do you need this change? The comments above this code should be
updated to reflect this change. I think for the same reason the below
code seems to be modified but I don't understand the reason for the
below change as well, also it is better to update the comments for
this as well.

@@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
  * Optionally add a Gather node for testing purposes, provided this is
  * actually a safe thing to do.
  */
- if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
+ if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
== CMD_SELECT && top_plan->parallel_safe)
  {
  Gather    *gather = makeNode(Gather);

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com


--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
In reply to this post by Greg Nancarrow
On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <[hidden email]> wrote:

>
> Speaking of costing, I'm not sure I really agree with the current
> costing of a Gather node. Just considering a simple Parallel SeqScan
> case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> Gather cost always completely drowns out any other path costs when a
> large number of rows are involved (at least with default
> parallel-related GUC values), such that Parallel SeqScan would never
> be the cheapest path. This linear relationship in the costing based on
> the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> after a certain amount of rows, the overhead of launching workers will
> be out-weighed by the benefit of their parallel work, such that the
> more rows, the more likely a Parallel SeqScan will benefit.
>

That will be true for the number of rows/pages we need to scan not for
the number of tuples we need to return as a result. The formula here
considers the number of rows the parallel scan will return and the
more the number of rows each parallel node needs to pass via shared
memory to gather node the more costly it will be.

We do consider the total pages we need to scan in
compute_parallel_worker() where we use a logarithmic formula to
determine the number of workers.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Greg Nancarrow
In reply to this post by akapila
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <[hidden email]> wrote:

>
> 0001-InsertParallelSelect
> 1.
> ParallelContext *pcxt;
>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right?

Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
that ExecutePlan() sets parallel-mode for the entire plan execution).
Also, did not expect that to necessarily remain in a final patch.

>If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

No, should be OK I guess, but will update and test to be sure.

> 2.
> @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   */
>   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
>   IsUnderPostmaster &&
> - parse->commandType == CMD_SELECT &&
> + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
>   !parse->hasModifyingCTE &&
>   max_parallel_workers_per_gather > 0 &&
>   !IsParallelWorker())
>
> I think the comments above this need to be updated especially the part
> where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> the leader backend writes into a completely new table.". Don't we need
> to include Insert also?

Yes, Insert needs to be mentioned somewhere there.

>
> 3.
> @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   * parallel-unsafe, or else the query planner itself has a bug.
>   */
>   glob->parallelModeNeeded = glob->parallelModeOK &&
> + (parse->commandType == CMD_SELECT) &&
>   (force_parallel_mode != FORCE_PARALLEL_OFF);
>
> Why do you need this change? The comments above this code should be
> updated to reflect this change. I think for the same reason the below
> code seems to be modified but I don't understand the reason for the
> below change as well, also it is better to update the comments for
> this as well.
>

OK, I will update the comments for this.
Basically, up to now, the "force_parallel_mode" has only ever operated
on a SELECT.
But since we are now allowing CMD_INSERT to be assessed for parallel
mode too, we need to prevent the force_parallel_mode logic from
sticking a Gather node over the top of arbitrary INSERTs and causing
them to be run in parallel. Not all INSERTs are suitable for parallel
operation, and also there are further considerations for
parallel-safety for INSERTs compared to SELECT. INSERTs can also
trigger UPDATEs.
If we need to support force_parallel_mode for INSERT, more work will
need to be done.

> @@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
> *query_string, int cursorOptions,
>   * Optionally add a Gather node for testing purposes, provided this is
>   * actually a safe thing to do.
>   */
> - if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
> + if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
> == CMD_SELECT && top_plan->parallel_safe)
>   {
>   Gather    *gather = makeNode(Gather);
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com
>

Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Greg Nancarrow
In reply to this post by akapila
On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <[hidden email]> wrote:

>
> On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <[hidden email]> wrote:
> >
> > Speaking of costing, I'm not sure I really agree with the current
> > costing of a Gather node. Just considering a simple Parallel SeqScan
> > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> > Gather cost always completely drowns out any other path costs when a
> > large number of rows are involved (at least with default
> > parallel-related GUC values), such that Parallel SeqScan would never
> > be the cheapest path. This linear relationship in the costing based on
> > the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> > after a certain amount of rows, the overhead of launching workers will
> > be out-weighed by the benefit of their parallel work, such that the
> > more rows, the more likely a Parallel SeqScan will benefit.
> >
>
> That will be true for the number of rows/pages we need to scan not for
> the number of tuples we need to return as a result. The formula here
> considers the number of rows the parallel scan will return and the
> more the number of rows each parallel node needs to pass via shared
> memory to gather node the more costly it will be.
>
> We do consider the total pages we need to scan in
> compute_parallel_worker() where we use a logarithmic formula to
> determine the number of workers.
>

Despite all the best intentions, the current costings seem to be
geared towards selection of a non-parallel plan over a parallel plan,
the more rows there are in the table. Yet the performance of a
parallel plan appears to be better than non-parallel-plan the more
rows there are in the table.
This doesn't seem right to me. Is there a rationale behind this costing model?
I have pointed out the part of the parallel_tuple_cost calculation
that seems to drown out all other costs (causing the cost value to be
huge), the more rows there are in the table.

Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
On Fri, Oct 9, 2020 at 4:28 PM Greg Nancarrow <[hidden email]> wrote:

>
> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <[hidden email]> wrote:
> >
> > On Fri, Oct 9, 2020 at 2:37 PM Greg Nancarrow <[hidden email]> wrote:
> > >
> > > Speaking of costing, I'm not sure I really agree with the current
> > > costing of a Gather node. Just considering a simple Parallel SeqScan
> > > case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
> > > Gather cost always completely drowns out any other path costs when a
> > > large number of rows are involved (at least with default
> > > parallel-related GUC values), such that Parallel SeqScan would never
> > > be the cheapest path. This linear relationship in the costing based on
> > > the rows and a parallel_tuple_cost doesn't make sense to me. Surely
> > > after a certain amount of rows, the overhead of launching workers will
> > > be out-weighed by the benefit of their parallel work, such that the
> > > more rows, the more likely a Parallel SeqScan will benefit.
> > >
> >
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
> >
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.
> This doesn't seem right to me. Is there a rationale behind this costing model?
>

Yes, AFAIK, there is no proof that we can get any (much) gain by
dividing the I/O among workers. It is primarily the CPU effort which
gives the benefit. So, the parallel plans show greater benefit when we
have to scan a large table and then project much lesser rows.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Greg Nancarrow
In reply to this post by akapila
On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <[hidden email]> wrote:

>
> + /*
> + * We need to avoid an attempt on INSERT to assign a
> + * FullTransactionId whilst in parallel mode (which is in
> + * effect due to the underlying parallel query) - so the
> + * FullTransactionId is assigned here. Parallel mode must
> + * be temporarily escaped in order for this to be possible.
> + * The FullTransactionId will be included in the transaction
> + * state that is serialized in the parallel DSM.
> + */
> + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> + {
> + Assert(IsInParallelMode());
> + ExitParallelMode();
> + GetCurrentFullTransactionId();
> + EnterParallelMode();
> + }
> +
>
> This looks like a hack to me. I think you are doing this to avoid the
> parallel mode checks in GetNewTransactionId(), right? If so, I have
> already mentioned above [1] that we can change it so that we disallow
> assigning xids for parallel workers only. The same is true for the
> check in ExecGatherMerge. Do you see any problem with that suggestion?
>

Actually, there is a problem.
If I remove that "hack", and change the code in GetNewTransactionId()
to disallow xid assignment for parallel workers only, then there is
also similar code in AssignTransactionId() which gets called. If I
change that code too, in the same way, then on a parallel INSERT, that
code gets called by a parallel worker (from GetCurrentTransactionId())
and the ERROR "cannot assign XIDs in a parallel worker" results.
GetCurrentFullTransactionId() must be called in the leader, somewhere
(and will be included in the transaction state that is serialized in
the parallel DSM).
If not done here, then where?

Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
On Fri, Oct 9, 2020 at 5:54 PM Greg Nancarrow <[hidden email]> wrote:

>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <[hidden email]> wrote:
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right? If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> Actually, there is a problem.
> If I remove that "hack", and change the code in GetNewTransactionId()
> to disallow xid assignment for parallel workers only, then there is
> also similar code in AssignTransactionId() which gets called.
>

I don't think workers need to call AssignTransactionId(), before that
the transactionid passed from leader should be set in
CurrentTransactionState. Why
GetCurrentTransactionId()/GetCurrentFullTransactionId(void) needs to
call AssignTransactionId() when called from worker?

> GetCurrentFullTransactionId() must be called in the leader, somewhere
> (and will be included in the transaction state that is serialized in
> the parallel DSM).
>

Yes, it should have done in the leader and then it should have been
set in the workers via StartParallelWorkerTransaction before we do any
actual operation. If that happens then GetCurrentTransactionId() won't
need to call AssignTransactionId().

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
In reply to this post by Greg Nancarrow
On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <[hidden email]> wrote:

>
> On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <[hidden email]> wrote:
> >
> > 0001-InsertParallelSelect
> > 1.
> > ParallelContext *pcxt;
> >
> > + /*
> > + * We need to avoid an attempt on INSERT to assign a
> > + * FullTransactionId whilst in parallel mode (which is in
> > + * effect due to the underlying parallel query) - so the
> > + * FullTransactionId is assigned here. Parallel mode must
> > + * be temporarily escaped in order for this to be possible.
> > + * The FullTransactionId will be included in the transaction
> > + * state that is serialized in the parallel DSM.
> > + */
> > + if (estate->es_plannedstmt->commandType == CMD_INSERT)
> > + {
> > + Assert(IsInParallelMode());
> > + ExitParallelMode();
> > + GetCurrentFullTransactionId();
> > + EnterParallelMode();
> > + }
> > +
> >
> > This looks like a hack to me. I think you are doing this to avoid the
> > parallel mode checks in GetNewTransactionId(), right?
>
> Yes, agreed, is a hack to avoid that (mind you, it's not exactly great
> that ExecutePlan() sets parallel-mode for the entire plan execution).
> Also, did not expect that to necessarily remain in a final patch.
>
> >If so, I have
> > already mentioned above [1] that we can change it so that we disallow
> > assigning xids for parallel workers only. The same is true for the
> > check in ExecGatherMerge. Do you see any problem with that suggestion?
> >
>
> No, should be OK I guess, but will update and test to be sure.
>
> > 2.
> > @@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   */
> >   if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
> >   IsUnderPostmaster &&
> > - parse->commandType == CMD_SELECT &&
> > + (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
> >   !parse->hasModifyingCTE &&
> >   max_parallel_workers_per_gather > 0 &&
> >   !IsParallelWorker())
> >
> > I think the comments above this need to be updated especially the part
> > where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
> > CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
> > the leader backend writes into a completely new table.". Don't we need
> > to include Insert also?
>
> Yes, Insert needs to be mentioned somewhere there.
>
> >
> > 3.
> > @@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
> > *query_string, int cursorOptions,
> >   * parallel-unsafe, or else the query planner itself has a bug.
> >   */
> >   glob->parallelModeNeeded = glob->parallelModeOK &&
> > + (parse->commandType == CMD_SELECT) &&
> >   (force_parallel_mode != FORCE_PARALLEL_OFF);
> >
> > Why do you need this change? The comments above this code should be
> > updated to reflect this change. I think for the same reason the below
> > code seems to be modified but I don't understand the reason for the
> > below change as well, also it is better to update the comments for
> > this as well.
> >
>
> OK, I will update the comments for this.
> Basically, up to now, the "force_parallel_mode" has only ever operated
> on a SELECT.
> But since we are now allowing CMD_INSERT to be assessed for parallel
> mode too, we need to prevent the force_parallel_mode logic from
> sticking a Gather node over the top of arbitrary INSERTs and causing
> them to be run in parallel. Not all INSERTs are suitable for parallel
> operation, and also there are further considerations for
> parallel-safety for INSERTs compared to SELECT. INSERTs can also
> trigger UPDATEs.
>

Sure but in that case 'top_plan->parallel_safe' should be false and it
should stick Gather node atop Insert node. For the purpose of this
patch, the scan beneath Insert should be considered as parallel_safe.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
On Fri, Oct 9, 2020 at 6:26 PM Amit Kapila <[hidden email]> wrote:

>
> On Fri, Oct 9, 2020 at 3:51 PM Greg Nancarrow <[hidden email]> wrote:
> >
> > On Fri, Oct 9, 2020 at 8:09 PM Amit Kapila <[hidden email]> wrote:
> >
> > OK, I will update the comments for this.
> > Basically, up to now, the "force_parallel_mode" has only ever operated
> > on a SELECT.
> > But since we are now allowing CMD_INSERT to be assessed for parallel
> > mode too, we need to prevent the force_parallel_mode logic from
> > sticking a Gather node over the top of arbitrary INSERTs and causing
> > them to be run in parallel. Not all INSERTs are suitable for parallel
> > operation, and also there are further considerations for
> > parallel-safety for INSERTs compared to SELECT. INSERTs can also
> > trigger UPDATEs.
> >
>
> Sure but in that case 'top_plan->parallel_safe' should be false and it
> should stick Gather node atop Insert node.
>

/should/should not.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Greg Nancarrow
On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <[hidden email]> wrote:
>
> >
> > Sure but in that case 'top_plan->parallel_safe' should be false and it
> > should stick Gather node atop Insert node.
> >
>
> /should/should not.
>

OK, for the minimal patch, just allowing INSERT with parallel SELECT,
you're right, neither of those additional "commandType == CMD_SELECT"
checks are needed, so I'll remove them. (In the main patch, I think
the first check can be removed, once the XID handling is fixed; the
second check is definitely needed though).

Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Thomas Munro-5
In reply to this post by Greg Nancarrow
On Fri, Oct 9, 2020 at 11:58 PM Greg Nancarrow <[hidden email]> wrote:

> On Fri, Oct 9, 2020 at 8:41 PM Amit Kapila <[hidden email]> wrote:
> > That will be true for the number of rows/pages we need to scan not for
> > the number of tuples we need to return as a result. The formula here
> > considers the number of rows the parallel scan will return and the
> > more the number of rows each parallel node needs to pass via shared
> > memory to gather node the more costly it will be.
> >
> > We do consider the total pages we need to scan in
> > compute_parallel_worker() where we use a logarithmic formula to
> > determine the number of workers.
>
> Despite all the best intentions, the current costings seem to be
> geared towards selection of a non-parallel plan over a parallel plan,
> the more rows there are in the table. Yet the performance of a
> parallel plan appears to be better than non-parallel-plan the more
> rows there are in the table.

Right, but as Amit said, we still have to account for the cost of
schlepping tuples between processes.  Hmm... could the problem be that
we're incorrectly estimating that Insert (without RETURNING) will send
a bazillion tuples, even though that isn't true?  I didn't look at the
code but that's what the plan seems to imply when it says stuff like
"Gather  (cost=15428.00..16101.14 rows=1000000 width=4)".  I suppose
the row estimates for ModifyTable paths are based on what they write,
not what they emit, and in the past that distinction didn't matter
much because it wasn't something that was used for comparing
alternative plans.  Now it is.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
In reply to this post by Greg Nancarrow
On Fri, Oct 9, 2020 at 7:32 PM Greg Nancarrow <[hidden email]> wrote:

>
> On Fri, Oct 9, 2020 at 11:57 PM Amit Kapila <[hidden email]> wrote:
> >
> > >
> > > Sure but in that case 'top_plan->parallel_safe' should be false and it
> > > should stick Gather node atop Insert node.
> > >
> >
> > /should/should not.
> >
>
> OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> you're right, neither of those additional "commandType == CMD_SELECT"
> checks are needed, so I'll remove them.
>

Okay, that makes sense.

> (In the main patch, I think
> the first check can be removed, once the XID handling is fixed; the
> second check is definitely needed though).
>

Okay, then move that check but please do add some comments to state the reason.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
In reply to this post by akapila
On Fri, Oct 9, 2020 at 2:39 PM Amit Kapila <[hidden email]> wrote:

>
> On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <[hidden email]> wrote:
> >
> > On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <[hidden email]> wrote:
> >
> > Also, I have attached a separate patch (requested by Andres Freund)
> > that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> > ..." to be parallel.
> >
>
> It might be a good idea to first just get this patch committed, if
> possible. So, I have reviewed the latest version of this patch:
>

Few initial comments on 0004-ParallelInsertSelect:

1.
@@ -2049,11 +2049,6 @@ heap_prepare_insert(Relation relation,
HeapTuple tup, TransactionId xid,
  * inserts in general except for the cases where inserts generate a new
  * CommandId (eg. inserts into a table having a foreign key column).
  */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-

I have speculated above [1] to see if we can change this Assert
condition instead of just removing it? Have you considered that
suggestion?

2.
@@ -764,12 +778,13 @@ GetCurrentCommandId(bool used)
  if (used)
  {
  /*
- * Forbid setting currentCommandIdUsed in a parallel worker, because
- * we have no provision for communicating this back to the leader.  We
- * could relax this restriction when currentCommandIdUsed was already
- * true at the start of the parallel operation.
+ * If in a parallel worker, only allow setting currentCommandIdUsed
+ * if currentCommandIdUsed was already true at the start of the
+ * parallel operation (by way of SetCurrentCommandIdUsed()), otherwise
+ * forbid setting currentCommandIdUsed because we have no provision
+ * for communicating this back to the leader.
  */
- Assert(!IsParallelWorker());
+ Assert(!(IsParallelWorker() && !currentCommandIdUsed));
  currentCommandIdUsed = true;
  }

Once we allowed this, won't the next CommandCounterIncrement() in the
worker will increment the commandId which will lead to using different
commandIds in worker and leader? Is that prevented in some way, if so,
how? Can we document the same?

3.
@@ -173,7 +173,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * against performing unsafe operations in parallel mode, but this gives a
  * more user-friendly error message.
  */
- if ((XactReadOnly || IsInParallelMode()) &&
+ if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
  !(eflags & EXEC_FLAG_EXPLAIN_ONLY))
  ExecCheckXactReadOnly(queryDesc->plannedstmt);

I don't think above change is correct. We need to extend the below
check in ExecCheckXactReadOnly() because otherwise, it can allow
Insert operations even when XactReadOnly is set which we don't want.

ExecCheckXactReadOnly()
{
..
if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
..
}

4.
@@ -173,18 +175,20 @@ ExecSerializePlan(Plan *plan, EState *estate)
  * PlannedStmt to start the executor.
- pstmt->hasReturning = false;
- pstmt->hasModifyingCTE = false;
+ pstmt->hasReturning = estate->es_plannedstmt->hasReturning;
+ pstmt->hasModifyingCTE = estate->es_plannedstmt->hasModifyingCTE;

Why change hasModifyingCTE?

5.
+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */

The multi-line comment should start from the second line. I see a
similar problem at other places in the patch as well.

6.
@@ -178,6 +214,25 @@ ExecGather(PlanState *pstate)
  node->pei,
  gather->initParam);

+ if (isParallelInsertLeader)
+ {
+ /* For Parallel INSERT, if there are BEFORE STATEMENT triggers,
+ * these must be fired by the leader, not the parallel workers.
+ */
+ if (nodeModifyTableState->fireBSTriggers)
+ {
+ fireBSTriggers(nodeModifyTableState);
+ nodeModifyTableState->fireBSTriggers = false;
+
+ /*
+ * Disable firing of AFTER STATEMENT triggers by local
+ * plan execution (ModifyTable processing). These will be
+ * fired at end of Gather processing.
+ */
+ nodeModifyTableState->fireASTriggers = false;
+ }
+ }

Can we encapsulate this in a separate function? It seems a bit odd to
directly do this ExecGather.

7.
@@ -418,14 +476,25 @@ ExecShutdownGatherWorkers(GatherState *node)
 void
 ExecShutdownGather(GatherState *node)
 {
- ExecShutdownGatherWorkers(node);
+ if (node->pei == NULL)
+ return;

- /* Now destroy the parallel context. */
- if (node->pei != NULL)

So after this patch if "node->pei == NULL" then we won't shutdown
workers here? Why so?

8. You have made changes related to trigger execution for Gather node,
don't we need similar changes for GatherMerge node?

9.
@@ -383,7 +444,21 @@ cost_gather(GatherPath *path, PlannerInfo *root,

  /* Parallel setup and communication cost. */
  startup_cost += parallel_setup_cost;
- run_cost += parallel_tuple_cost * path->path.rows;
+
+ /*
+ * For Parallel INSERT, provided no tuples are returned from workers
+ * to gather/leader node, don't add a cost-per-row, as each worker
+ * parallelly inserts the tuples that result from its chunk of plan
+ * execution. This change may make the parallel plan cheap among all
+ * other plans, and influence the planner to consider this parallel
+ * plan.
+ */
+ if (!(IsA(path->subpath, ModifyTablePath) &&
+ castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
+ castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
+ {
+ run_cost += parallel_tuple_cost * path->path.rows;
+ }

Isn't the last condition in above check "castNode(ModifyTablePath,
path->subpath)->returningLists != NULL" should be
"castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
instead? Because otherwise when there is returning list it won't count
the cost for passing tuples via Gather node. This might be reason of
what Thomas has seen in his recent email [2].

10. Don't we need a change similar to cost_gather in
cost_gather_merge? It seems you have made only partial changes for
GatherMerge node.


[1] - https://www.postgresql.org/message-id/CAA4eK1KyftVDgovvRQmdV1b%3DnN0R-KqdWZqiu7jZ1GYQ7SO9OA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CA%2BhUKGLZB%3D1Q%2BAQQEEmffr3bUMAh%2BJD%2BJ%2B7axv%2BK10Kea0U9TQ%40mail.gmail.com

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

akapila
On Sat, Oct 10, 2020 at 5:25 PM Amit Kapila <[hidden email]> wrote:
>
> 8. You have made changes related to trigger execution for Gather node,
> don't we need similar changes for GatherMerge node?
>
..
>
> 10. Don't we need a change similar to cost_gather in
> cost_gather_merge? It seems you have made only partial changes for
> GatherMerge node.
>

Please ignore these two comments as we can't push Insert to workers
when GatherMerge is involved as a leader backend does the final phase
(merge the results by workers). So, we can only push the Select part
of the statement.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Parallel INSERT (INTO ... SELECT ...)

Thomas Munro-5
In reply to this post by akapila
On Sun, Oct 11, 2020 at 12:55 AM Amit Kapila <[hidden email]> wrote:

> + /*
> + * For Parallel INSERT, provided no tuples are returned from workers
> + * to gather/leader node, don't add a cost-per-row, as each worker
> + * parallelly inserts the tuples that result from its chunk of plan
> + * execution. This change may make the parallel plan cheap among all
> + * other plans, and influence the planner to consider this parallel
> + * plan.
> + */
> + if (!(IsA(path->subpath, ModifyTablePath) &&
> + castNode(ModifyTablePath, path->subpath)->operation == CMD_INSERT &&
> + castNode(ModifyTablePath, path->subpath)->returningLists != NULL))
> + {
> + run_cost += parallel_tuple_cost * path->path.rows;
> + }
>
> Isn't the last condition in above check "castNode(ModifyTablePath,
> path->subpath)->returningLists != NULL" should be
> "castNode(ModifyTablePath, path->subpath)->returningLists == NULL"
> instead? Because otherwise when there is returning list it won't count
> the cost for passing tuples via Gather node. This might be reason of
> what Thomas has seen in his recent email [2].

Yeah, I think this is trying to fix the problem too late.  Instead, we
should fix the incorrect row estimates so we don't have to fudge it
later like that.  For example, this should be estimating rows=0:

postgres=# explain analyze insert into s select * from t t1 join t t2 using (i);
...
 Insert on s  (cost=30839.08..70744.45 rows=1000226 width=4) (actual
time=2940.560..2940.562 rows=0 loops=1)

I think that should be done with something like this:

--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -3583,16 +3583,11 @@ create_modifytable_path(PlannerInfo *root,
RelOptInfo *rel,
                if (lc == list_head(subpaths))  /* first node? */
                        pathnode->path.startup_cost = subpath->startup_cost;
                pathnode->path.total_cost += subpath->total_cost;
-               pathnode->path.rows += subpath->rows;
+               if (returningLists != NIL)
+                       pathnode->path.rows += subpath->rows;
                total_size += subpath->pathtarget->width * subpath->rows;
        }

-       /*
-        * Set width to the average width of the subpath outputs.  XXX this is
-        * totally wrong: we should report zero if no RETURNING, else an average
-        * of the RETURNING tlist widths.  But it's what happened historically,
-        * and improving it is a task for another day.
-        */
        if (pathnode->path.rows > 0)
                total_size /= pathnode->path.rows;
        pathnode->path.pathtarget->width = rint(total_size);


123456 ... 19