pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
37 messages Options
12
Reply | Threaded
Open this post in threaded view
|

pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Amit Kapila-3
Add a new GUC and a reloption to enable inserts in parallel-mode.

Commit 05c8482f7f added the implementation of parallel SELECT for
"INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
the additional parallel-safety checks that it performs, even when, in the
end, those checks determine that parallelism can't be used. This is
normally only ever a problem in the case of when the target table has a
large number of partitions.

A new GUC option "enable_parallel_insert" is added, to allow insert in
parallel-mode. The default is on.

In addition to the GUC option, the user may want a mechanism to allow
inserts in parallel-mode with finer granularity at table level. The new
table option "parallel_insert_enabled" allows this. The default is true.

Author: "Hou, Zhijie"
Reviewed-by: Greg Nancarrow, Amit Langote, Takayuki Tsunakawa, Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAJcOf-cXnB5cnMKqWEp2E2z7Mvcd04iLVmV=qpFJrR3AcrTS3g@...

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/c8f78b616167bf8e24bc5dc69112c37755ed3058

Modified Files
--------------
doc/src/sgml/config.sgml                      | 23 +++++++++++
doc/src/sgml/ref/alter_table.sgml             |  3 +-
doc/src/sgml/ref/create_table.sgml            | 26 +++++++++++++
src/backend/access/common/reloptions.c        | 25 +++++++++---
src/backend/optimizer/path/costsize.c         |  2 +
src/backend/optimizer/util/clauses.c          | 34 ++++++++++++++--
src/backend/utils/misc/guc.c                  | 10 +++++
src/backend/utils/misc/postgresql.conf.sample |  1 +
src/bin/psql/tab-complete.c                   |  1 +
src/include/optimizer/cost.h                  |  1 +
src/include/utils/rel.h                       | 25 ++++++++++++
src/test/regress/expected/insert_parallel.out | 56 ++++++++++++++++++++++++++-
src/test/regress/expected/sysviews.out        |  3 +-
src/test/regress/sql/insert_parallel.sql      | 44 ++++++++++++++++++++-
src/tools/pgindent/typedefs.list              |  1 +
15 files changed, 240 insertions(+), 15 deletions(-)

Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Robert Haas
On Wed, Mar 17, 2021 at 10:14 PM Amit Kapila <[hidden email]> wrote:

> Add a new GUC and a reloption to enable inserts in parallel-mode.
>
> Commit 05c8482f7f added the implementation of parallel SELECT for
> "INSERT INTO ... SELECT ..." which may incur non-negligible overhead in
> the additional parallel-safety checks that it performs, even when, in the
> end, those checks determine that parallelism can't be used. This is
> normally only ever a problem in the case of when the target table has a
> large number of partitions.
>
> A new GUC option "enable_parallel_insert" is added, to allow insert in
> parallel-mode. The default is on.
>
> In addition to the GUC option, the user may want a mechanism to allow
> inserts in parallel-mode with finer granularity at table level. The new
> table option "parallel_insert_enabled" allows this. The default is true.

I find this fairly ugly. If you can't make the cost of checking
whether parallelism is safe low enough that you don't need a setting
for this, then I think perhaps you shouldn't have the feature at all.
In other words, I propose that you revert both this and 05c8482f7f and
come back when you have a better design that doesn't introduce so much
overhead.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Tom Lane-2
Robert Haas <[hidden email]> writes:
> I find this fairly ugly. If you can't make the cost of checking
> whether parallelism is safe low enough that you don't need a setting
> for this, then I think perhaps you shouldn't have the feature at all.
> In other words, I propose that you revert both this and 05c8482f7f and
> come back when you have a better design that doesn't introduce so much
> overhead.

I'm +1 on that idea for a completely independent reason: 05c8482f7f
is currently the easy winner for "scariest patch of the v14 cycle".
I don't have any faith in it, and so I'm very concerned that it went
in so late in the dev cycle.  It'd be better to push some successor
design early in the v15 cycle, when we'll have more time to catch
problems.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

akapila
On Mon, Mar 22, 2021 at 8:03 PM Tom Lane <[hidden email]> wrote:

>
> Robert Haas <[hidden email]> writes:
> > I find this fairly ugly. If you can't make the cost of checking
> > whether parallelism is safe low enough that you don't need a setting
> > for this, then I think perhaps you shouldn't have the feature at all.
> > In other words, I propose that you revert both this and 05c8482f7f and
> > come back when you have a better design that doesn't introduce so much
> > overhead.
>
> I'm +1 on that idea for a completely independent reason: 05c8482f7f
> is currently the easy winner for "scariest patch of the v14 cycle".
> I don't have any faith in it, and so I'm very concerned that it went
> in so late in the dev cycle.  It'd be better to push some successor
> design early in the v15 cycle, when we'll have more time to catch
> problems.
>

Okay, I can revert this work to avoid that risk but it would be great
if you can share your thoughts on what alternative design do you have
in mind, and or how it should be better implemented? Regarding
performance overhead, it is for partitioned tables with a large number
of partitions, and that too when the data to insert is not that much
or there is parallel-unsafe clause on one of the partitions. Now, how
else can we determine the parallel-safety without checking each of the
partitions? We do have other partition-related gucs
(enable_partition*) to avoid the partitions-related overhead so why is
it so bad to have guc here (maybe the naming of guc/reloption is not
good)?

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

tsunakawa.takay@fujitsu.com
From: Amit Kapila <[hidden email]>

> Okay, I can revert this work to avoid that risk but it would be great
> if you can share your thoughts on what alternative design do you have
> in mind, and or how it should be better implemented? Regarding
> performance overhead, it is for partitioned tables with a large number
> of partitions, and that too when the data to insert is not that much
> or there is parallel-unsafe clause on one of the partitions. Now, how
> else can we determine the parallel-safety without checking each of the
> partitions? We do have other partition-related gucs
> (enable_partition*) to avoid the partitions-related overhead so why is
> it so bad to have guc here (maybe the naming of guc/reloption is not
> good)?
>

I'd love to hear even rough ideas from Robert-san.  I guess something like...:

* Basically, we can assume or hope that all partitions have the same parallel safety as the root partitioned table.  That is, it'd be very rare that child partitions have different indexes, constraints, and triggers.  So, we can just check the root table during planning to decide if we want to run parallel processing.

* When the executor opens a target child partition, it checks its parallel safety only once for it.  If any target partition turns out to be parallel unsafe, have the parallel leader do the insertion shomehow.

TBH, I don't think the parameter is so ugly.  At least, it's not worse than Oracle or SQL Server.





        Regards
Takayuki Tsunakawa
                                               

Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2021-03-22 08:47:47 -0400, Robert Haas wrote:
> I find this fairly ugly. If you can't make the cost of checking
> whether parallelism is safe low enough that you don't need a setting
> for this, then I think perhaps you shouldn't have the feature at all.
> In other words, I propose that you revert both this and 05c8482f7f and
> come back when you have a better design that doesn't introduce so much
> overhead.

I started out wondering whether some of the caching David Rowley has been
working on to reduce the overhead of the result cache planning shows a
path for how to make parts of this cheaper.
https://postgr.es/m/CAApHDvpw5qG4vTb%2BhwhmAJRid6QF%2BR9vvtOhX2HN2Yy9%2Bw20sw%40mail.gmail.com

But looking more, several of the checks just seem wrong to me.

target_rel_index_max_parallel_hazard() deparses index expressions from
scratch? With code like

+        index_rel = index_open(index_oid, lockmode);
...
+    index_oid_list = RelationGetIndexList(rel);
+    foreach(lc, index_oid_list)
...
+        ii_Expressions = RelationGetIndexExpressions(index_rel);
...

+                    Assert(index_expr_item != NULL);
+                    if (index_expr_item == NULL)    /* shouldn't happen */
+                    {
+                        elog(WARNING, "too few entries in indexprs list");
+                        context->max_hazard = PROPARALLEL_UNSAFE;
+                        found_max_hazard = true;
+                        break;
+                    }

Brrr.

Shouldn't we have this in IndexOptInfo already? But also, why on earth
is that WARNING branch a good idea?


+static bool
+target_rel_domain_max_parallel_hazard(Oid typid, max_parallel_hazard_context *context)
...
+    scan = systable_beginscan(con_rel, ConstraintTypidIndexId, true,
+                              NULL, 1, key);
+
+    while (HeapTupleIsValid((tup = systable_getnext(scan))))

There's plenty other code in the planner that needs to know about
domains. This stuff is precisely why the typecache exists.


The pattern to rebuild information that we already have cached elsewhere
seems to repeat all over this patch.


This seems not even close to committable.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Andres Freund
In reply to this post by akapila
Hi,

On 2021-03-23 09:01:13 +0530, Amit Kapila wrote:
> Okay, I can revert this work to avoid that risk but it would be great
> if you can share your thoughts on what alternative design do you have
> in mind, and or how it should be better implemented? Regarding
> performance overhead, it is for partitioned tables with a large number
> of partitions, and that too when the data to insert is not that much
> or there is parallel-unsafe clause on one of the partitions. Now, how
> else can we determine the parallel-safety without checking each of the
> partitions?

You cache it. And actually use information that is already
cached. There's a huge difference between doing expensive stuff once
every now and then, and doing it over and over and over. That's why we
have relcache, syscache, typecache etc.


> We do have other partition-related gucs (enable_partition*) to avoid
> the partitions-related overhead so why is it so bad to have guc here
> (maybe the naming of guc/reloption is not good)?

Most of those seem more about risk-reduction. And they don't have
reloptions - which seems to be the only way this feature can
realistically be used.  The one halfway comparable GUC is
enable_partitionwise_join - while it'd be great to not need it (or at
least not default to off), it implies some legitimately computationally
hard work that can't easily be cached.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Robert Haas
On Tue, Mar 23, 2021 at 3:13 PM Andres Freund <[hidden email]> wrote:
> You cache it.

Yeah, exactly. I don't think it's super-easy to understand exactly how
to make that work well for something like this. It would be easy
enough to set a flag in the relcache whose value is computed the first
time we need it and is then consulted every time after that, and you
just invalidate it based on sinval messages. But, if you go with that
design, you've got a big problem: now an insert has to lock all the
tables in the partitioning hierarchy to decide whether it can run in
parallel or not, and we do not want that. We want to be able to just
lock the partitions we need, so really, we want to be able to test for
parallel-safety without requiring a relation lock, or only requiring
it on the partitioned table itself and not all the partitions.
However, that introduces a race condition, because if you ever check
the value of the flag without a lock on the relation, then you can't
rely on sinval to blow away the cached state. I don't have a good
solution to that problem in mind right now, because I haven't really
devoted much time to thinking about it, but I think it might be
possible to solve it with more thought.

But if I had thought about it and had not come up with anything better
than what you committed here, I would have committed nothing, and I
think that's what you should have done, too. This patch is full of
grotty hacks. Just to take one example, consider the code that forces
a transaction ID assignment prior to the operation. You *could* have
solved this problem by introducing new machinery to make it safe to
assign an XID in parallel mode. Then, we'd have a fundamental new
capability that we currently lack. Instead, you just force-assigned an
XID before entering parallel mode. That's not building any new
capability; that's just hacking around the lack of a capability to
make something work, while ignoring the disadvantages of doing it that
way, namely that sometimes an XID will be assigned for no purpose.

Likewise, the XXX comment you added to max_parallel_hazard_walker
claims that some of the code introduced there is to compensate for an
unspecified bug in the rewriter. I'm a bit skeptical that the comment
is correct, and there's no way to find out because the comment doesn't
say what the bug supposedly is, but let's just say for the sake of
argument that it's true. Well, you *could* have fixed the bug, but
instead you hacked around it, and in a relatively expensive way that
affects every query with a CTE in it whether it can benefit from this
patch or not. That's not a responsible way of maintaining the core
PostgreSQL code.

I also agree with Andres's criticism of the code in
target_rel_index_max_parallel_hazard entirely. It's completely
unacceptable to be doing index_open() here. If you don't understand
the design of the planner well enough to know why that's not OK, then
you shouldn't be committing patches like this.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Álvaro Herrera
On 2021-Mar-23, Robert Haas wrote:

> Likewise, the XXX comment you added to max_parallel_hazard_walker
> claims that some of the code introduced there is to compensate for an
> unspecified bug in the rewriter. I'm a bit skeptical that the comment
> is correct, and there's no way to find out because the comment doesn't
> say what the bug supposedly is, but let's just say for the sake of
> argument that it's true. Well, you *could* have fixed the bug, but
> instead you hacked around it, and in a relatively expensive way that
> affects every query with a CTE in it whether it can benefit from this
> patch or not. That's not a responsible way of maintaining the core
> PostgreSQL code.

I think the CTE bug is this one:

https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@...

while I can't disagree with the overall conclusion that it seems safer
to revert parallel INSERT/SELECT given the number of alleged problems,
it is true that this bug exists, and has gone unfixed.

--
Álvaro Herrera       Valdivia, Chile


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On 2021-Mar-23, Robert Haas wrote:
>> Likewise, the XXX comment you added to max_parallel_hazard_walker
>> claims that some of the code introduced there is to compensate for an
>> unspecified bug in the rewriter.

> I think the CTE bug is this one:
> https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@...
> while I can't disagree with the overall conclusion that it seems safer
> to revert parallel INSERT/SELECT given the number of alleged problems,
> it is true that this bug exists, and has gone unfixed.

Yeah, because it's not totally clear whether we want to fix it by
disallowing the case, or by significantly changing the way the
rewriter works.  It'd be good if some of the folks on this thread
weighed in on that choice.

(Having said that, another complaint about this particular comment
is that it doesn't mention how the planner should be changed once
the rewriter is fixed.  Is it sufficient to delete the stanza below
the comment?  Just looking at it, I wonder whether
max_parallel_hazard_walker is now being invoked on portions of the
Query tree that we'd not have to traverse at all given a rewriter fix.)

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

akapila
In reply to this post by Álvaro Herrera
On Wed, Mar 24, 2021 at 2:00 AM Alvaro Herrera <[hidden email]> wrote:

>
> On 2021-Mar-23, Robert Haas wrote:
>
> > Likewise, the XXX comment you added to max_parallel_hazard_walker
> > claims that some of the code introduced there is to compensate for an
> > unspecified bug in the rewriter. I'm a bit skeptical that the comment
> > is correct, and there's no way to find out because the comment doesn't
> > say what the bug supposedly is, but let's just say for the sake of
> > argument that it's true. Well, you *could* have fixed the bug, but
> > instead you hacked around it, and in a relatively expensive way that
> > affects every query with a CTE in it whether it can benefit from this
> > patch or not. That's not a responsible way of maintaining the core
> > PostgreSQL code.
>
> I think the CTE bug is this one:
>
> https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@...
>
> while I can't disagree with the overall conclusion that it seems safer
> to revert parallel INSERT/SELECT given the number of alleged problems,
>

Agreed. I'll revert the patch and respond to some of the points here
with my thoughts.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

akapila
In reply to this post by Tom Lane-2
On Wed, Mar 24, 2021 at 2:42 AM Tom Lane <[hidden email]> wrote:

>
> Alvaro Herrera <[hidden email]> writes:
> > On 2021-Mar-23, Robert Haas wrote:
> >> Likewise, the XXX comment you added to max_parallel_hazard_walker
> >> claims that some of the code introduced there is to compensate for an
> >> unspecified bug in the rewriter.
>
> > I think the CTE bug is this one:
> > https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@...
> > while I can't disagree with the overall conclusion that it seems safer
> > to revert parallel INSERT/SELECT given the number of alleged problems,
> > it is true that this bug exists, and has gone unfixed.
>
> Yeah, because it's not totally clear whether we want to fix it by
> disallowing the case, or by significantly changing the way the
> rewriter works.  It'd be good if some of the folks on this thread
> weighed in on that choice.
>
> (Having said that, another complaint about this particular comment
> is that it doesn't mention how the planner should be changed once
> the rewriter is fixed.  Is it sufficient to delete the stanza below
> the comment?
>

Yes.

> Just looking at it, I wonder whether
> max_parallel_hazard_walker is now being invoked on portions of the
> Query tree that we'd not have to traverse at all given a rewriter fix.)
>

I remember that I have debugged that part and information was
available, so decided to add a check based on that. I just went with
your analysis in the CTE bug discussion that it is not worth and OTOH
here that information was available, so I thought that it might be a
good compromise but still, I or someone else could have weighed in on
that choice.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

tsunakawa.takay@fujitsu.com
In reply to this post by Robert Haas
From: Robert Haas <[hidden email]>

> Yeah, exactly. I don't think it's super-easy to understand exactly how
> to make that work well for something like this. It would be easy
> enough to set a flag in the relcache whose value is computed the first
> time we need it and is then consulted every time after that, and you
> just invalidate it based on sinval messages. But, if you go with that
> design, you've got a big problem: now an insert has to lock all the
> tables in the partitioning hierarchy to decide whether it can run in
> parallel or not, and we do not want that. We want to be able to just
> lock the partitions we need, so really, we want to be able to test for
> parallel-safety without requiring a relation lock, or only requiring
> it on the partitioned table itself and not all the partitions.
> However, that introduces a race condition, because if you ever check
> the value of the flag without a lock on the relation, then you can't
> rely on sinval to blow away the cached state. I don't have a good
> solution to that problem in mind right now, because I haven't really
> devoted much time to thinking about it, but I think it might be
> possible to solve it with more thought.

One problem with caching the result is that the first access in each session has to experience the slow processing.  Some severe customers of our proprietary database, which is not based on Postgres, have requested to eliminate even the overhead associated with the first access, and we have provided features for them.  As for the data file, users can use pg_prewam.  But what can we recommend users to do in this case?  Maybe the logon trigger feature, which is ready for committer in PG 14, can be used to allow users to execute possible queries at session start (or establishing a connection pool), but I feel it's inconvenient.


> But if I had thought about it and had not come up with anything better
> than what you committed here, I would have committed nothing, and I
> think that's what you should have done, too. This patch is full of
> grotty hacks. Just to take one example, consider the code that forces
> a transaction ID assignment prior to the operation. You *could* have
> solved this problem by introducing new machinery to make it safe to
> assign an XID in parallel mode. Then, we'd have a fundamental new
> capability that we currently lack. Instead, you just force-assigned an
> XID before entering parallel mode. That's not building any new
> capability; that's just hacking around the lack of a capability to
> make something work, while ignoring the disadvantages of doing it that
> way, namely that sometimes an XID will be assigned for no purpose.

Regarding the picked xid assignment, I didn't think it's so grotty.  Yes, in fact, I felt it's a bit unclean.  But it's only a single line of code.  With a single line of code, we can provide great value to users.  Why don't we go for it?  As discussed in the thread, the xid is wasted only when the source data is empty, which is impractical provided that the user wants to load much data probably for ETL.

(I'm afraid "grotty" may be too strong a word considering the CoC statement "We encourage thoughtful, constructive discussion of the software and this community, their current state, and possible directions for development. The focus of our discussions should be the code and related technology, community projects, and infrastructure.")


> Likewise, the XXX comment you added to max_parallel_hazard_walker
> claims that some of the code introduced there is to compensate for an
> unspecified bug in the rewriter. I'm a bit skeptical that the comment
> is correct, and there's no way to find out because the comment doesn't
> say what the bug supposedly is, but let's just say for the sake of
> argument that it's true. Well, you *could* have fixed the bug, but
> instead you hacked around it, and in a relatively expensive way that
> affects every query with a CTE in it whether it can benefit from this
> patch or not. That's not a responsible way of maintaining the core
> PostgreSQL code.

It'd be too sad if we have to be bothered by an existing bug and give up an attractive feature.  Adding more explanation in the comment is OK?  Anyway, I think we can separate this issue.


> I also agree with Andres's criticism of the code in
> target_rel_index_max_parallel_hazard entirely. It's completely
> unacceptable to be doing index_open() here. If you don't understand
> the design of the planner well enough to know why that's not OK, then
> you shouldn't be committing patches like this.

This sounds like something to address.  I have to learn...


        Regards
Takayuki Tsunakawa
                                               

Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Greg Nancarrow
In reply to this post by Andres Freund
On Wed, Mar 24, 2021 at 5:44 AM Andres Freund <[hidden email]> wrote:

>
>
> But looking more, several of the checks just seem wrong to me.
>
> target_rel_index_max_parallel_hazard() deparses index expressions from
> scratch? With code like
>
> +        index_rel = index_open(index_oid, lockmode);
> ...
> +    index_oid_list = RelationGetIndexList(rel);
> +    foreach(lc, index_oid_list)
> ...
> +        ii_Expressions = RelationGetIndexExpressions(index_rel);
> ...
>
> +                    Assert(index_expr_item != NULL);
> +                    if (index_expr_item == NULL)    /* shouldn't happen */
> +                    {
> +                        elog(WARNING, "too few entries in indexprs list");
> +                        context->max_hazard = PROPARALLEL_UNSAFE;
> +                        found_max_hazard = true;
> +                        break;
> +                    }
>
> Brrr.
>
> Shouldn't we have this in IndexOptInfo already?

The additional parallel-safety checks are (at least currently) invoked
as part of max_parallel_hazard(), which is called early on in the
planner, so I don't believe that IndexOptInfo/RelOptInfo has been
built at this point.

> But also, why on earth
> is that WARNING branch a good idea?
>

I think there are about half a dozen other places in the Postgres code
where the same check is done, in which case it calls elog(ERROR,...).
Is it a better strategy to immediately exit the backend with an error
in this case, like the other cases do?
My take on it was that if this "should never happen" condition is ever
encountered, let it back out of the parallel-safety checks and
error-out in the place it normally (currently) would.

>
> +static bool
> +target_rel_domain_max_parallel_hazard(Oid typid, max_parallel_hazard_context *context)
> ...
> +    scan = systable_beginscan(con_rel, ConstraintTypidIndexId, true,
> +                              NULL, 1, key);
> +
> +    while (HeapTupleIsValid((tup = systable_getnext(scan))))
>
> There's plenty other code in the planner that needs to know about
> domains. This stuff is precisely why the typecache exists.
>
>

OK, fair comment.


Regards,
Greg Nancarrow
Fujitsu Australia


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Andres Freund
Hi,

On 2021-03-24 14:42:44 +1100, Greg Nancarrow wrote:
> On Wed, Mar 24, 2021 at 5:44 AM Andres Freund <[hidden email]> wrote:
> > Shouldn't we have this in IndexOptInfo already?
>
> The additional parallel-safety checks are (at least currently) invoked
> as part of max_parallel_hazard(), which is called early on in the
> planner, so I don't believe that IndexOptInfo/RelOptInfo has been
> built at this point.

Then that's something you need to redesign, not duplicate the effort.


> > But also, why on earth
> > is that WARNING branch a good idea?

> I think there are about half a dozen other places in the Postgres code
> where the same check is done, in which case it calls elog(ERROR,...).
> Is it a better strategy to immediately exit the backend with an error
> in this case, like the other cases do?

Yes.


> My take on it was that if this "should never happen" condition is ever
> encountered, let it back out of the parallel-safety checks and
> error-out in the place it normally (currently) would.

What advantage does that have? You'll get a bunch of WARNINGs before the
ERROR later in optimize, differences between assert-non/assert builds,
more complicated code flow, longer untested code.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Tom Lane-2
In reply to this post by Robert Haas
Robert Haas <[hidden email]> writes:
> On Tue, Mar 23, 2021 at 3:13 PM Andres Freund <[hidden email]> wrote:
>> You cache it.

> Yeah, exactly. I don't think it's super-easy to understand exactly how
> to make that work well for something like this. It would be easy
> enough to set a flag in the relcache whose value is computed the first
> time we need it and is then consulted every time after that, and you
> just invalidate it based on sinval messages. But, if you go with that
> design, you've got a big problem: now an insert has to lock all the
> tables in the partitioning hierarchy to decide whether it can run in
> parallel or not, and we do not want that.

Possibly-crazy late-night idea ahead:

IIUC, we need to know a global property of a partitioning hierarchy:
is every trigger, CHECK constraint, etc that might be run by an INSERT
parallel-safe?  What we see here is that reverse-engineering that
property every time we need to know it is just too expensive, even
with use of our available caching methods.

How about a declarative approach instead?  That is, if a user would
like parallelized inserts into a partitioned table, she must declare
the table parallel-safe with some suitable annotation.  Then, checking
the property during DML is next door to free, and instead we have to think
about whether and how to enforce that the marking is valid during DDL.

I don't honestly see a real cheap way to enforce such a property.
For instance, if someone does ALTER FUNCTION to remove a function's
parallel-safe marking, we can't really run around and verify that the
function is not used in any CHECK constraint.  (Aside from the cost,
there would be race conditions.)

But maybe we don't have to enforce it exactly.  It could be on the
user's head that the marking is accurate.  We could prevent any
really bad misbehavior by having parallel workers error out if they
see they've been asked to execute a non-parallel-safe function.

Or there are probably other ways to slice it up.  But I think some
outside-the-box thinking might be helpful here.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

akapila
In reply to this post by akapila
On Wed, Mar 24, 2021 at 8:28 AM Amit Kapila <[hidden email]> wrote:

>
> On Wed, Mar 24, 2021 at 2:00 AM Alvaro Herrera <[hidden email]> wrote:
> >
> > On 2021-Mar-23, Robert Haas wrote:
> >
> > > Likewise, the XXX comment you added to max_parallel_hazard_walker
> > > claims that some of the code introduced there is to compensate for an
> > > unspecified bug in the rewriter. I'm a bit skeptical that the comment
> > > is correct, and there's no way to find out because the comment doesn't
> > > say what the bug supposedly is, but let's just say for the sake of
> > > argument that it's true. Well, you *could* have fixed the bug, but
> > > instead you hacked around it, and in a relatively expensive way that
> > > affects every query with a CTE in it whether it can benefit from this
> > > patch or not. That's not a responsible way of maintaining the core
> > > PostgreSQL code.
> >
> > I think the CTE bug is this one:
> >
> > https://www.postgresql.org/message-id/flat/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@...
> >
> > while I can't disagree with the overall conclusion that it seems safer
> > to revert parallel INSERT/SELECT given the number of alleged problems,
> >
>
> Agreed. I'll revert the patch and respond to some of the points here
> with my thoughts.
>

BTW, I think the revert needs catversion bump as commit
c5be48f092016b1caf597b2e21d588b56c88a23e has changed catalog. I hope
that is fine but if not do let me know?

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Andres Freund
In reply to this post by Robert Haas
Hi,

On 2021-03-23 16:04:59 -0400, Robert Haas wrote:

> It would be easy enough to set a flag in the relcache whose value is
> computed the first time we need it and is then consulted every time
> after that, and you just invalidate it based on sinval messages. But,
> if you go with that design, you've got a big problem: now an insert
> has to lock all the tables in the partitioning hierarchy to decide
> whether it can run in parallel or not, and we do not want that. We
> want to be able to just lock the partitions we need, so really, we
> want to be able to test for parallel-safety without requiring a
> relation lock, or only requiring it on the partitioned table itself
> and not all the partitions.

That seems like an argument for a pg_class attribute about parallel
safety of the current relation *and* its children. It'd definitely mean
recursing higher in the partition tree during DDL if the action on a
child partition causes the safety to flip.


> But if I had thought about it and had not come up with anything better
> than what you committed here, I would have committed nothing, and I
> think that's what you should have done, too.

I agree with that.


> Just to take one example, consider the code that forces a transaction
> ID assignment prior to the operation. You *could* have solved this
> problem by introducing new machinery to make it safe to assign an XID
> in parallel mode. Then, we'd have a fundamental new capability that we
> currently lack. Instead, you just force-assigned an XID before
> entering parallel mode. That's not building any new capability; that's
> just hacking around the lack of a capability to make something work,
> while ignoring the disadvantages of doing it that way, namely that
> sometimes an XID will be assigned for no purpose.

Although this specific hack doesn't seem too terrible to me. If you
execute a parallel insert the likelihood to end up not needing an xid is
pretty low. Implementing it concurrently does seem like it'd end up
needing another lwlock nested around xid assignment, or some more
complicated scheme with already holding XidGenLock or retries. But maybe
I'm missing an easy solution here.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

Tom Lane-2
In reply to this post by akapila
Amit Kapila <[hidden email]> writes:
> BTW, I think the revert needs catversion bump as commit
> c5be48f092016b1caf597b2e21d588b56c88a23e has changed catalog. I hope
> that is fine but if not do let me know?

Yeah, just advance catversion another time.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

RE: pgsql: Add a new GUC and a reloption to enable inserts in parallel-mode

tsunakawa.takay@fujitsu.com
In reply to this post by Tom Lane-2
From: Tom Lane <[hidden email]>

> Possibly-crazy late-night idea ahead:
>
> IIUC, we need to know a global property of a partitioning hierarchy:
> is every trigger, CHECK constraint, etc that might be run by an INSERT
> parallel-safe?  What we see here is that reverse-engineering that
> property every time we need to know it is just too expensive, even
> with use of our available caching methods.
>
> How about a declarative approach instead?  That is, if a user would
> like parallelized inserts into a partitioned table, she must declare
> the table parallel-safe with some suitable annotation.  Then, checking
> the property during DML is next door to free, and instead we have to think
> about whether and how to enforce that the marking is valid during DDL.
>
> I don't honestly see a real cheap way to enforce such a property.
> For instance, if someone does ALTER FUNCTION to remove a function's
> parallel-safe marking, we can't really run around and verify that the
> function is not used in any CHECK constraint.  (Aside from the cost,
> there would be race conditions.)

I thought of a similar idea as below, which I was most reluctant to adopt.


https://www.postgresql.org/message-id/TYAPR01MB29907AE025B60A1C2CA5F08DFEA70%40TYAPR01MB2990.jpnprd01.prod.outlook.com
--------------------------------------------------
(3) Record the parallel safety in system catalog
Add a column like relparallel in pg_class that indicates the parallel safety of the relation.  planner just checks the value instead of doing heavy work for every SQL statement.  That column's value is modified whenever a relation alteration is made that affects the parallel safety, such as adding a domain column and CHECK constraint.  In case of a partitioned relation, the parallel safety attributes of all its descendant relations are merged.  For example, if a partition becomes parallel-unsafe, the ascendant partitioned tables also become parallel-unsafe.



But... developing such code would be burdonsome and bug-prone?
--------------------------------------------------


> But maybe we don't have to enforce it exactly.  It could be on the
> user's head that the marking is accurate.  We could prevent any
> really bad misbehavior by having parallel workers error out if they
> see they've been asked to execute a non-parallel-safe function.

I'm wondering if we can do so as I mentioned yesterday; the parallel worker delegates the work to the parallel leader when the target relation or related functions is not parallel-safe.


        Regards
Takayuki Tsunakawa
                                               



12