ModifyTable overheads in generic plans

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

ModifyTable overheads in generic plans

Amit Langote
Hi,

I would like to discuss a refactoring patch that builds on top of the
patches at [1] to address $subject.  To get an idea for what
eliminating these overheads looks like, take a look at the following
benchmarking results.

Note 1: I've forced the use of generic plan by setting plan_cache_mode
to 'force_generic_plan'

Note 2: The individual TPS figures as measured are quite noisy, though
I just want to show the rough trend with increasing number of
partitions.

pgbench -i -s 10 --partitions={0, 10, 100, 1000}
pgbench -T120 -f test.sql -M prepared

test.sql:
\set aid random(1, 1000000)
update pgbench_accounts set abalance = abalance + 1 where aid = :aid;

Without any of the patches:

0       tps = 13045.485121 (excluding connections establishing)
10      tps = 9358.157433 (excluding connections establishing)
100     tps = 1878.274500 (excluding connections establishing)
1000    tps = 84.684695 (excluding connections establishing)

The slowdown as the partition count increases can be explained by the
fact that UPDATE and DELETE can't currently use runtime partition
pruning.  So, even if any given transaction is only updating a single
tuple in a single partition, the plans for *all* partitions are being
initialized and also the ResultRelInfos.  That is, a lot of useless
work being done in InitPlan() and ExecInitModifyTable().

With the patches at [1] (latest 0001+0002 posted there), whereby the
generic plan for UPDATE can now perform runtime pruning, numbers can
be seen to improve, slightly:

0       tps = 12743.487196 (excluding connections establishing)
10      tps = 12644.240748 (excluding connections establishing)
100     tps = 4158.123345 (excluding connections establishing)
1000    tps = 391.248067 (excluding connections establishing)

So even though runtime pruning enabled by those patches ensures that
the useless plans are left untouched by the executor, the
ResultRelInfos are still being made assuming *all* result relations
will be processed.  With the attached patches (0001+0002+0003) that I
want to discuss here in this thread, numbers are further improved:

0       tps = 13419.283168 (excluding connections establishing)
10      tps = 12588.016095 (excluding connections establishing)
100     tps = 8560.824225 (excluding connections establishing)
1000    tps = 1926.553901 (excluding connections establishing)

0001 and 0002 are preparatory patches.  0003 teaches nodeModifyTable.c
to make the ResultRelInfo for a given result relation lazily, that is,
when the plan producing tuples to be updated/deleted actually produces
one that belongs to that relation.  So, if a transaction only updates
one tuple, then only one ResultRelInfo would be made.  For larger
partition counts, that saves significant amount of work.

However, there's one new loop in ExecInitModifyTable() added by the
patches at [1] that loops over all partitions, which I haven't been
able to eliminate so far and I'm seeing it cause significant
bottleneck at higher partition counts.  The loop is meant to create a
hash table that maps result relation OIDs to their offsets in the
PlannedStmt.resultRelations list.  We need this mapping, because the
ResultRelInfos are accessed from the query-global array using that
offset.  One approach that was mentioned by David Rowley at [1] to not
have do this mapping is to make the result relation's scan node's
targetlist emit the relation's RT index or ordinal position to begin
with, instead of the table OID, but I haven't figured out a way to do
that.

Having taken care of the ModifyTable overheads (except the one
mentioned in the last paragraph), a few more bottlenecks are seen to
pop up at higher partition counts.  Basically, they result from doing
some pre-execution actions on relations contained in the plan by
traversing the flat range table in whole.

1. AcquireExecutorLocks(): locks *all* partitions before executing the
plan tree but runtime pruning allows to skip scanning all but one

2. ExecCheckRTPerms(): checks permissions of *all* partitions before
executing the plan tree, but maybe it's okay to check only the ones
that will be accessed

Problem 1 has been discussed before and David Rowley even developed a
patch that was discussed at [2].  The approach taken in the patch was
to delay locking of the partitions contained in a generic plan that
are potentially runtime pruneable, although as also described in the
linked thread, that approach has a race condition whereby a concurrent
session may invalidate the generic plan by altering a partition in the
window between when AcquireExecutorLocks() runs on the plan and the
plan is executed.

Another solution suggested to me by Robert Haas in an off-list
discussion is to teach AcquireExecutorLocks() or the nearby code to
perform EXTERN parameter based pruning before passing the plan tree to
the executor and lock partitions that survive that pruning.  It's
perhaps doable if we refactor the ExecFindInitialMatchingSubPlans() to
not require a full-blown execution context.  Or maybe we could do
something more invasive by rewriting AcquireExecutorLocks() to walk
the plan tree instead of the flat range table, looking for scan nodes
and nodes that support runtime pruning to lock the appropriate
relations.

Regarding problem 2, I wonder if we shouldn't simply move the
permission check to ExecGetRangeTableRelation(), which will be
performed the first time a given relation is accessed during
execution.

Anyway, applying David's aforementioned patch gives the following numbers:

0       tps = 12325.890487 (excluding connections establishing)
10      tps = 12146.420443 (excluding connections establishing)
100     tps = 12807.850709 (excluding connections establishing)
1000    tps = 7578.652893 (excluding connections establishing)

which suggests that it might be worthwhile try to find a solution for this.

Finally, there's one more place that shows up in perf profiles at
higher partition counts and that is LockReleaseAll().  David,
Tsunakawa-san had worked on a patch [3], which still applies and can
be shown to be quite beneficial when generic plans are involved.  I
couldn't get it to show major improvement over the above numbers in
this case (for UPDATE that is), but maybe that's because the loop in
ExecInitModifyTable() mentioned above is still in the way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://commitfest.postgresql.org/28/2575/
[2] https://www.postgresql.org/message-id/flat/CAKJS1f_kfRQ3ZpjQyHC7%3DPK9vrhxiHBQFZ%2Bhc0JCwwnRKkF3hg%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/CAKJS1f-7T9F1xLw5PqgOApcV6YX3WYC4XJHHCpxh8hzcZsA-xA%40mail.gmail.com#c57f2f592484bca76310f4c551d4de15

v1-0002-Do-not-set-rootResultRelIndex-unnecessarily.patch (1K) Download Attachment
v1-0001-Revise-how-some-FDW-executor-APIs-obtain-ResultRe.patch (14K) Download Attachment
v1-0003-Delay-initializing-UPDATE-DELETE-ResultRelInfos.patch (91K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Amit Langote
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <[hidden email]> wrote:
> I would like to discuss a refactoring patch that builds on top of the
> patches at [1] to address $subject.

I've added this to the next CF: https://commitfest.postgresql.org/28/2621/

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

David Rowley
In reply to this post by Amit Langote
On Sat, 27 Jun 2020 at 00:36, Amit Langote <[hidden email]> wrote:
> 2. ExecCheckRTPerms(): checks permissions of *all* partitions before
> executing the plan tree, but maybe it's okay to check only the ones
> that will be accessed

I don't think it needs to be quite as complex as that.
expand_single_inheritance_child will set the
RangeTblEntry.requiredPerms to 0, so we never need to check
permissions on a partition.  The overhead of permission checking when
there are many partitions is just down to the fact that
ExecCheckRTPerms() loops over the entire rangetable and calls
ExecCheckRTEPerms for each one.  ExecCheckRTEPerms() does have very
little work to do when requiredPerms is 0, but the loop itself and the
function call overhead show up when you remove the other bottlenecks.

I have a patch somewhere that just had the planner add the RTindexes
with a non-zero requiredPerms and set that in the plan so that
ExecCheckRTPerms could just look at the ones that actually needed
something checked.   There's a slight disadvantage there that for
queries to non-partitioned tables that we need to build a Bitmapset
that has all items from the rangetable.  That's likely a small
overhead, but not free, so perhaps there is a better way.

David


Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Amit Langote
On Mon, Jun 29, 2020 at 10:39 AM David Rowley <[hidden email]> wrote:

>
> On Sat, 27 Jun 2020 at 00:36, Amit Langote <[hidden email]> wrote:
> > 2. ExecCheckRTPerms(): checks permissions of *all* partitions before
> > executing the plan tree, but maybe it's okay to check only the ones
> > that will be accessed
>
> I don't think it needs to be quite as complex as that.
> expand_single_inheritance_child will set the
> RangeTblEntry.requiredPerms to 0, so we never need to check
> permissions on a partition.  The overhead of permission checking when
> there are many partitions is just down to the fact that
> ExecCheckRTPerms() loops over the entire rangetable and calls
> ExecCheckRTEPerms for each one.  ExecCheckRTEPerms() does have very
> little work to do when requiredPerms is 0, but the loop itself and the
> function call overhead show up when you remove the other bottlenecks.

I had forgotten that we set requiredPerms to 0 for the inheritance child tables.

> I have a patch somewhere that just had the planner add the RTindexes
> with a non-zero requiredPerms and set that in the plan so that
> ExecCheckRTPerms could just look at the ones that actually needed
> something checked.   There's a slight disadvantage there that for
> queries to non-partitioned tables that we need to build a Bitmapset
> that has all items from the rangetable.  That's likely a small
> overhead, but not free, so perhaps there is a better way.

I can't think of anything for this that doesn't involve having one
more list of RTEs or bitmapset of RT indexes in PlannedStmt.



--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Amit Langote
In reply to this post by Amit Langote
On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <[hidden email]> wrote:
> I would like to discuss a refactoring patch that builds on top of the
> patches at [1] to address $subject.

I forgot to update a place in postgres_fdw causing one of its tests to crash.

Fixed in the attached updated version.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

v2-0003-Delay-initializing-UPDATE-DELETE-ResultRelInfos.patch (91K) Download Attachment
v2-0002-Do-not-set-rootResultRelIndex-unnecessarily.patch (1K) Download Attachment
v2-0001-Revise-how-some-FDW-executor-APIs-obtain-ResultRe.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Daniel Gustafsson
> On 1 Jul 2020, at 08:30, Amit Langote <[hidden email]> wrote:
>
> On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <[hidden email]> wrote:
>> I would like to discuss a refactoring patch that builds on top of the
>> patches at [1] to address $subject.
>
> I forgot to update a place in postgres_fdw causing one of its tests to crash.
>
> Fixed in the attached updated version.

The attached 0003 fails to apply to current HEAD, please submit another rebased
version.  Marking the entry as Waiting on Author in the meantime.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Amit Langote
Hi Daniel,

On Wed, Jul 1, 2020 at 6:50 PM Daniel Gustafsson <[hidden email]> wrote:

> > On 1 Jul 2020, at 08:30, Amit Langote <[hidden email]> wrote:
> >
> > On Fri, Jun 26, 2020 at 9:36 PM Amit Langote <[hidden email]> wrote:
> >> I would like to discuss a refactoring patch that builds on top of the
> >> patches at [1] to address $subject.
> >
> > I forgot to update a place in postgres_fdw causing one of its tests to crash.
> >
> > Fixed in the attached updated version.
>
> The attached 0003 fails to apply to current HEAD, please submit another rebased
> version.  Marking the entry as Waiting on Author in the meantime.
Thank you for the heads up.

Actually, as I noted in the first email, the patches here are to be
applied on top of patches of another thread that I chose not to post
here.  But I can see how that is inconvenient both for the CF bot and
other humans, so I'm attaching all of the patches.

Another thing I could do is decouple the patches to discuss here from
the patches of the other thread, which should be possible and might be
good to avoid back and forth between the two threads.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

v2-0005-Delay-initializing-UPDATE-DELETE-ResultRelInfos.patch (91K) Download Attachment
v2-0003-Revise-how-some-FDW-executor-APIs-obtain-ResultRe.patch (14K) Download Attachment
v2-0004-Do-not-set-rootResultRelIndex-unnecessarily.patch (1K) Download Attachment
v2-0001-Overhaul-UPDATE-s-targetlist-processing.patch (101K) Download Attachment
v2-0002-Overhaul-how-inherited-update-delete-are-handled.patch (254K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Daniel Gustafsson
> On 1 Jul 2020, at 15:38, Amit Langote <[hidden email]> wrote:

> Another thing I could do is decouple the patches to discuss here from
> the patches of the other thread, which should be possible and might be
> good to avoid back and forth between the two threads.

It sounds like it would make it easier for reviewers, so if it's possible with
a reasonable effort it might be worth it.  I've moved this entry to the next CF
for now.

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Robert Haas
In reply to this post by Amit Langote
On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <[hidden email]> wrote:
> 0001 and 0002 are preparatory patches.

I read through these patches a bit but it's really unclear what the
point of them is. I think they need better commit messages, or better
comments, or both.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Amit Langote
On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <[hidden email]> wrote:
> On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <[hidden email]> wrote:
> > 0001 and 0002 are preparatory patches.
>
> I read through these patches a bit but it's really unclear what the
> point of them is. I think they need better commit messages, or better
> comments, or both.

Thanks for taking a look.  Sorry about the lack of good commentary,
which I have tried to address in the attached updated version. I
extracted one more part as preparatory from the earlier 0003 patch, so
there are 4 patches now.

Also as discussed with Daniel, I have changed the patches so that they
can be applied on plain HEAD instead of having to first apply the
patches at [1].  Without runtime pruning for UPDATE/DELETE proposed in
[1], optimizing ResultRelInfo creation by itself does not improve the
performance/scalability by that much, but the benefit of lazily
creating ResultRelInfos seems clear so I think maybe it's okay to
pursue this independently.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ%40mail.gmail.com

v3-0003-Revise-child-to-root-tuple-conversion-map-managem.patch (28K) Download Attachment
v3-0004-Initialize-result-relation-information-lazily.patch (80K) Download Attachment
v3-0001-Revise-how-FDWs-obtain-result-relation-informatio.patch (16K) Download Attachment
v3-0002-Don-t-make-root-ResultRelInfo-for-insert-queries.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ModifyTable overheads in generic plans

Amit Langote
On Tue, Aug 4, 2020 at 3:15 PM Amit Langote <[hidden email]> wrote:

> On Sat, Aug 1, 2020 at 4:46 AM Robert Haas <[hidden email]> wrote:
> > On Fri, Jun 26, 2020 at 8:36 AM Amit Langote <[hidden email]> wrote:
> > > 0001 and 0002 are preparatory patches.
> >
> > I read through these patches a bit but it's really unclear what the
> > point of them is. I think they need better commit messages, or better
> > comments, or both.
>
> Thanks for taking a look.  Sorry about the lack of good commentary,
> which I have tried to address in the attached updated version. I
> extracted one more part as preparatory from the earlier 0003 patch, so
> there are 4 patches now.
>
> Also as discussed with Daniel, I have changed the patches so that they
> can be applied on plain HEAD instead of having to first apply the
> patches at [1].  Without runtime pruning for UPDATE/DELETE proposed in
> [1], optimizing ResultRelInfo creation by itself does not improve the
> performance/scalability by that much, but the benefit of lazily
> creating ResultRelInfos seems clear so I think maybe it's okay to
> pursue this independently.
Per cfbot's automatic patch tester, there were some issues in the 0004 patch:

nodeModifyTable.c: In function ‘ExecModifyTable’:
1529nodeModifyTable.c:2484:24: error: ‘junkfilter’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
1530              junkfilter->jf_junkAttNo,
1531                        ^
1532nodeModifyTable.c:2309:14: note: ‘junkfilter’ was declared here
1533  JunkFilter *junkfilter;
1534              ^
1535cc1: all warnings being treated as errors
1536<builtin>: recipe for target 'nodeModifyTable.o' failed
1537make[3]: *** [nodeModifyTable.o] Error 1

Fixed in the attached updated version

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

v4-0003-Revise-child-to-root-tuple-conversion-map-managem.patch (28K) Download Attachment
v4-0002-Don-t-make-root-ResultRelInfo-for-insert-queries.patch (10K) Download Attachment
v4-0004-Initialize-result-relation-information-lazily.patch (80K) Download Attachment
v4-0001-Revise-how-FDWs-obtain-result-relation-informatio.patch (16K) Download Attachment