speeding up planning with partitions

classic Classic list List threaded Threaded
201 messages Options
123456 ... 11
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
David, Imai-san,

Thanks for reviewing.  I've included both replies in this email so that I
can attach the latest patch as well.

On 2018/11/10 20:59, David Rowley wrote:

> I've started looking at these two, but only so far made it through
> 0001 and 0002.
>
> Here's what I noted down during the review.
>
> 0001:
>
> 1. Why do we need the new field that this patch adds? I see in 0002
> it's used like:
>
> + if (childrel->inh_root_parent > 0 &&
> + childrel->inh_root_parent == root->parse->resultRelation)
>
> Would something like...
> int relid;
> if (childrel->part_schema == NULL &&
> bms_get_singleton_member(childrel->top_parent_relids, &relid) &&
> relid == root->parse->resultRelation)
>
> ...not do the trick?
Actually, that's one way and earlier patches relied on that, but it gets a
bit ugly given that it's not always the top_parent_relid we're looking for
in the partitioning/inheritance specific code.  Consider this:

select *
from (select a from parted_table p union all
      select a from inherited_table i) s
where s.a = 1;

top_parent_relids refers to the union all parent subquery 's RT index,
which makes the partitioning/inheritance code scramble through a chain of
parent relation relids to figure out the RT index of the table in the query.

So, inh_root_parent is useful to distinguish the inheritance root parent
from the Append relation root parent, without much code.

That said, in the latest version, I've modified the UPDATE/DELETE planning
patch such that it doesn't use inh_root_parent (or the new code doesn't
need to refer to root parent where it previously did), so I've moved the
patch that adds inh_root_parent to the 2nd in the series.

> 0002:
>
> 2. What's wrong with childrel->relids?
>
> + child_relids = bms_make_singleton(appinfo->child_relid);

I've modified the code to not have to use child_relids.

> 3. Why not use childrel->top_parent_relids?
>
> + top_parent_relids = bms_make_singleton(childrel->inh_root_parent);

Ditto.

> 4. The following comment in inheritance_make_rel_from_joinlist()
> implies that the function will not be called for SELECT, but the
> comment above the function does not mention that.
>
> /*
> * For UPDATE/DELETE queries, the top parent can only ever be a table.
> * As a contrast, it could be a UNION ALL subquery in the case of SELECT.
> */
> Assert(planner_rt_fetch(top_parent_relid, root)->rtekind == RTE_RELATION);

Fixed the function's header comment to be clear.

> 5. Should the following comment talk about "Sub-partitioned tables"
> rather than "sub-partitions"?
>
> + * Sub-partitions have to be processed recursively, because
> + * AppendRelInfos link sub-partitions to their immediate parents, not
> + * the root partitioned table.

Okay, done.

> 6. Can't you just pass childrel->relids and
> childrel->top_parent_relids instead of making new ones?
>
> + child_relids = bms_make_singleton(appinfo->child_relid);
> + Assert(childrel->inh_root_parent > 0);
> + top_parent_relids = bms_make_singleton(childrel->inh_root_parent);
> + translated_joinlist = (List *)
> + adjust_appendrel_attrs_multilevel(root,
> +   (Node *) joinlist,
> +   child_relids,
> +   top_parent_relids);
The new code uses adjust_appendrel_attrs, so those Relids variables are
not needed.

> 7. I'm just wondering what your idea is here?
>
> + /* Reset join planning specific data structures. */
> + root->join_rel_list = NIL;
> + root->join_rel_hash = NULL;
>
> Is there a reason to nullify these? You're not releasing any memory
> and the new structures that will be built won't overlap with the ones
> built last round. I don't mean to imply that the code is wrong, it
> just does not appear to be particularly right.
In initial versions of the patch, the same top-level PlannerInfo was used
when calling make_rel_from_joinlist for all child tables.  So,
join_rel_hash built for a given child would be assumed to be valid for the
next which isn't true, because joinrels would be different across children.

In the latest patch, inheritance_make_rel_from_joinlist uses different
PlannerInfos for different child target relations, so the above problem
doesn't exist.  IOW, you won't see above two lines in the latest patch.

> 8. In regards to:
>
> + * NB: Do we need to change the child EC members to be marked
> + * as non-child somehow?
> + */
> + childrel->reloptkind = RELOPT_BASEREL;
>
> I know we talked a bit about this before, but this time I put together
> a crude patch that runs some code each time we skip an em_is_child ==
> true EquivalenceMember. The code checks if any of the em_relids are
> RELOPT_BASEREL. What I'm looking for here are places where we
> erroneously skip the member when we shouldn't.  Running the regression
> tests with this patch in place shows a number of problems.  Likely I
> should only trigger the warning when bms_membership(em->em_relids) ==
> BMS_SINGLETON, but it never-the-less appears to highlight various
> possible issues. Applying the same on master only appears to show the
> cases where em->em_relids isn't a singleton set.  I've attached the
> patch to let you see what I mean.
Thanks for this.  I've been thinking about what to do about it, but
haven't decided what's that yet.  Please let me spend some more time
thinking on it.  AFAICT, dealing with this will ensure that join planning
against target child relations can use EC-based optimizations, but it's
not incorrect as is per se.

On 2018/11/12 13:35, Imai, Yoshikazu wrote:
> adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
> adjust_appendrel_attrs_multilevel for leaf2: root -> sub1 -> leaf2

Ah, I see what you mean.

The root -> sub1 translation will be repeated for each leaf partition if
done via adjust_appendrel_attrs_multilevel.  On the other hand, if we
could do the root to sub1 translation once and pass it to the recursive
call using sub1 as the parent.

I've changed the patch use adjust_appendrel_attrs.

> Since it is difficult to explain my thoughts with words, I will show the
> performance degration case.
>
> Partition tables are below two sets.

[ ... ]

> Create a generic plan of updation or deletion.
>
> [create a delete generic plan]
> set plan_cache_mode = 'force_generic_plan';
> prepare delete_stmt(int) as delete from rt where b = $1;
> execute delete_stmt(1);

[ ... ]

> How amount of memory is used with above tests is...
>
> without v5 patches, Set1: 242MB
> without v5 patches, Set2: 247MB
> with v5 patches, Set1: 420MB
> with v5 patches, Set2: 820MB

Although I didn't aim to fix planning for the generic plan case where no
pruning occurs, the above result is not acceptable.  That is, the new
implementation of inheritance update/delete planning shouldn't consume
more memory than the previous.  In fact, it should've consumed less,
because the patch claims that it gets rid of redundant processing per
partition.

I understood why update/delete planning consumed more memory with the
patch.  It was due to a problem with the patch that modifies inheritance
update/delete planning.  The exact problem was that the query tree would
be translated (hence copied) *twice* for every partition!  First during
query planning where the query tree would be translated to figure out a
targetlist for partitions and then again before calling grouping_planner.
Also, the adjust_appendrel_attrs_multilevel made it worse for multi-level
partitioning case, because of repeated copying for root to intermediate
partitioned tables, as Imai-san pointed out.

I've fixed that making sure that query tree is translated only once and
saved for later steps to use.  Imai-san, please check the memory
consumption with the latest patch.

Attached updated patches.  Significant revisions are as follows (note that
I reversed the order of 0001 and 0002):

v6-0001-Overhaul-inheritance-update-delete-planning.patch

The major changes is fixing the problem above.

v6-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch

No change.

v6-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch

Made inheritance expansion a separate step of make_one_rel whereas
previously it would be invoked at the beginning of set_append_rel_size.
Now, it runs just before set_base_rel_sizes.  The same step also
recursively expands (and performs pruning for) any child partitioned
tables that were added by the expansion of partitioned tables originally
mentioned in the query.  With this change, we don't need to worry about
the range table changing as set_base_rel_size is executing, which could
lead to problems.

v6-0004-Move-append-expansion-code-into-its-own-file.patch
v6-0005-Teach-planner-to-only-process-unpruned-partitions.patch
v6-0006-Do-not-lock-all-partitions-at-the-beginning.patch

No change.

Thanks,
Amit

v6-0001-Overhaul-inheritance-update-delete-planning.patch (70K) Download Attachment
v6-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch (3K) Download Attachment
v6-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (117K) Download Attachment
v6-0004-Move-append-expansion-code-into-its-own-file.patch (153K) Download Attachment
v6-0005-Teach-planner-to-only-process-unpruned-partitions.patch (9K) Download Attachment
v6-0006-Do-not-lock-all-partitions-at-the-beginning.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
On 2018/11/14 19:28, Amit Langote wrote:

> Attached updated patches.  Significant revisions are as follows (note that
> I reversed the order of 0001 and 0002):
>
> v6-0001-Overhaul-inheritance-update-delete-planning.patch
>
> The major changes is fixing the problem above.
>
> v6-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch
>
> No change.
>
> v6-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch
>
> Made inheritance expansion a separate step of make_one_rel whereas
> previously it would be invoked at the beginning of set_append_rel_size.
> Now, it runs just before set_base_rel_sizes.  The same step also
> recursively expands (and performs pruning for) any child partitioned
> tables that were added by the expansion of partitioned tables originally
> mentioned in the query.  With this change, we don't need to worry about
> the range table changing as set_base_rel_size is executing, which could
> lead to problems.
>
> v6-0004-Move-append-expansion-code-into-its-own-file.patch
> v6-0005-Teach-planner-to-only-process-unpruned-partitions.patch
> v6-0006-Do-not-lock-all-partitions-at-the-beginning.patch
This went out sooner than it should have.  I hadn't waited for make
check-world to finish which showed that a file_fdw test exercising
inheritance crashed with 0001 patch due to a bogus Assert.

Fixed it in the attached version.

Thanks,
Amit

v7-0001-Overhaul-inheritance-update-delete-planning.patch (70K) Download Attachment
v7-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch (3K) Download Attachment
v7-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (117K) Download Attachment
v7-0004-Move-append-expansion-code-into-its-own-file.patch (153K) Download Attachment
v7-0005-Teach-planner-to-only-process-unpruned-partitions.patch (9K) Download Attachment
v7-0006-Do-not-lock-all-partitions-at-the-beginning.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
Hi Amit,

On Tue, Nov 13, 2018 at 10:29 PM, Amit Langote wrote:

> On 2018/11/12 13:35, Imai, Yoshikazu wrote:
> > adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
> > adjust_appendrel_attrs_multilevel for leaf2: root -> sub1 -> leaf2
>
> Ah, I see what you mean.
>
> The root -> sub1 translation will be repeated for each leaf partition
> if done via adjust_appendrel_attrs_multilevel.  On the other hand, if
> we could do the root to sub1 translation once and pass it to the recursive
> call using sub1 as the parent.
>
> I've changed the patch use adjust_appendrel_attrs.
>
> > Since it is difficult to explain my thoughts with words, I will show
> > the performance degration case.
> >
> > Partition tables are below two sets.
>
> [ ... ]
>
> > Create a generic plan of updation or deletion.
> >
> > [create a delete generic plan]
> > set plan_cache_mode = 'force_generic_plan'; prepare delete_stmt(int)
> > as delete from rt where b = $1; execute delete_stmt(1);
>
> [ ... ]
>
> > How amount of memory is used with above tests is...
> >
> > without v5 patches, Set1: 242MB
> > without v5 patches, Set2: 247MB
> > with v5 patches, Set1: 420MB
> > with v5 patches, Set2: 820MB
>
> Although I didn't aim to fix planning for the generic plan case where
> no pruning occurs, the above result is not acceptable.  That is, the new
> implementation of inheritance update/delete planning shouldn't consume
> more memory than the previous.  In fact, it should've consumed less,
> because the patch claims that it gets rid of redundant processing per
> partition.
>
> I understood why update/delete planning consumed more memory with the
> patch.  It was due to a problem with the patch that modifies inheritance
> update/delete planning.  The exact problem was that the query tree would
> be translated (hence copied) *twice* for every partition!  First during
> query planning where the query tree would be translated to figure out
> a targetlist for partitions and then again before calling
> grouping_planner.
> Also, the adjust_appendrel_attrs_multilevel made it worse for
> multi-level partitioning case, because of repeated copying for root to
> intermediate partitioned tables, as Imai-san pointed out.
>
> I've fixed that making sure that query tree is translated only once and
> saved for later steps to use.  Imai-san, please check the memory
> consumption with the latest patch.

Thanks for fixing!
Now, memory consumption is lower than the previous.

with v7 patches, Set1: 223MB
with v7 patches, Set2: 226MB

Thanks,

--
Yoshikazu Imai
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
On 2018/11/15 10:19, Imai, Yoshikazu wrote:

> On Tue, Nov 13, 2018 at 10:29 PM, Amit Langote wrote:
>> On 2018/11/12 13:35, Imai, Yoshikazu wrote:
>>> How amount of memory is used with above tests is...
>>>
>>> without v5 patches, Set1: 242MB
>>> without v5 patches, Set2: 247MB
>>> with v5 patches, Set1: 420MB
>>> with v5 patches, Set2: 820MB
>>
>> I understood why update/delete planning consumed more memory with the
>> patch.  It was due to a problem with the patch that modifies inheritance
>> update/delete planning.  The exact problem was that the query tree would
>> be translated (hence copied) *twice* for every partition!  First during
>> query planning where the query tree would be translated to figure out
>> a targetlist for partitions and then again before calling
>> grouping_planner.
>> Also, the adjust_appendrel_attrs_multilevel made it worse for
>> multi-level partitioning case, because of repeated copying for root to
>> intermediate partitioned tables, as Imai-san pointed out.
>>
>> I've fixed that making sure that query tree is translated only once and
>> saved for later steps to use.  Imai-san, please check the memory
>> consumption with the latest patch.
>
> Thanks for fixing!
> Now, memory consumption is lower than the previous.
>
> with v7 patches, Set1: 223MB
> with v7 patches, Set2: 226MB

Thanks for checking.  So at least we no longer have any memory
over-allocation bug with the patch, but perhaps other bugs are still
lurking. :)

Regards,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Amit Langote-2
On 2018/11/14 19:28, Amit Langote wrote:

> On 2018/11/10 20:59, David Rowley wrote:
>> 8. In regards to:
>>
>> + * NB: Do we need to change the child EC members to be marked
>> + * as non-child somehow?
>> + */
>> + childrel->reloptkind = RELOPT_BASEREL;
>>
>> I know we talked a bit about this before, but this time I put together
>> a crude patch that runs some code each time we skip an em_is_child ==
>> true EquivalenceMember. The code checks if any of the em_relids are
>> RELOPT_BASEREL. What I'm looking for here are places where we
>> erroneously skip the member when we shouldn't.  Running the regression
>> tests with this patch in place shows a number of problems.  Likely I
>> should only trigger the warning when bms_membership(em->em_relids) ==
>> BMS_SINGLETON, but it never-the-less appears to highlight various
>> possible issues. Applying the same on master only appears to show the
>> cases where em->em_relids isn't a singleton set.  I've attached the
>> patch to let you see what I mean.
>
> Thanks for this.  I've been thinking about what to do about it, but
> haven't decided what's that yet.  Please let me spend some more time
> thinking on it.  AFAICT, dealing with this will ensure that join planning
> against target child relations can use EC-based optimizations, but it's
> not incorrect as is per se.
I've been considered this a bit more and have some observations to share.
I found that the new inheritance_planner causes regression when the query
involves equivalence classes referencing the target relation, such as in
the following example:

create table ht (a int, b int) partition by hash (a);
create table ht1 partition of ht for values with (modulus 1024, remainder 0);
...
create table ht1024 partition of ht for values with (modulus 1024,
remainder 1023);
create table foo (a int, b int);
update ht set a = foo.a from foo where ht.b = foo.b;

For the above query, an EC containing ht.b and foo.b would be built.  With
the new approach this EC will need to be expanded to add em_is_child EC
members for all un-pruned child tables, whereas with the previous approach
there would be no child members because the EC would be built for the
child as the query's target relation to begin with.  So, with the old
approach there will be {ht1.b, foo.b} for query with ht1 as target
relation, {ht2.b, foo.b} for query with ht2 as target relation and so on.
Whereas with the new approach there will be just one query_planner run and
resulting EC will be {foo.b, ht.b, ht1.b, ht2.b, ...}.  So, the planning
steps that manipulate ECs now have to iterate through many members and
become a bottleneck if there are many un-pruned children.  To my surprise,
those bottlenecks are worse than having to rerun query_planner for each
child table.

So with master, I get the following planning time for the above update
query which btw doesn't prune (3 repeated runs)

Planning Time: 688.830 ms
Planning Time: 690.950 ms
Planning Time: 704.702 ms

And with the previous v7 patch:

Planning Time: 1373.398 ms
Planning Time: 1360.685 ms
Planning Time: 1356.313 ms

I've fixed that in the attached by utilizing the fact that now we build
the child PlannerInfo before we add child EC members.  By modifying
add_child_rel_equivalences such that it *replaces* the parent EC member
with the corresponding child member instead of appending it to the list,
if the child is the target relation.  That happens inside the child
target's private PlannerInfo, so it's harmless.  Also, it is no longer
marked as em_is_child=true, so as a whole, this more or less restores the
original behavior wrt to ECs (also proved by the fact that I now get the
same regression.diffs by applying David's verify_em_child.diff patch [1]
to the patched tree as with applying it to master modulo the varno
differences due to the patched).

With the attached updated patch (again this is with 0 partitions pruned):

Planning Time: 332.503 ms
Planning Time: 334.003 ms
Planning Time: 334.212 ms

If I add an additional condition so that only 1 partition is joined with
foo and the rest pruned, such as the following query (the case we're
trying to optimize):

update ht set a = foo.a from foo where foo.a = ht.b and foo.a = 1

I get the following numbers with master (no change from 0 pruned case):

Planning Time: 727.473 ms
Planning Time: 726.145 ms
Planning Time: 734.458 ms

But with the patches:

Planning Time: 0.797 ms
Planning Time: 0.751 ms
Planning Time: 0.801 ms

Attached v8 patches.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f8g9_BzE678BLBm-eoMMEYUUXhDABSpqtAHRUUTrm_vFA%40mail.gmail.com

v8-0001-Overhaul-inheritance-update-delete-planning.patch (77K) Download Attachment
v8-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch (3K) Download Attachment
v8-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (116K) Download Attachment
v8-0004-Move-append-expansion-code-into-its-own-file.patch (153K) Download Attachment
v8-0005-Teach-planner-to-only-process-unpruned-partitions.patch (9K) Download Attachment
v8-0006-Do-not-lock-all-partitions-at-the-beginning.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
Hi Amit,

On Tue, Nov 20, 2018 at 10:24 PM, Amit Langote wrote:
> Attached v8 patches.

Thanks for the patch. I took a look 0003, 0005, 0006 of v8 patch.

1.
0003: line 267-268
+ * Child relation may have be marked dummy if build_append_child_rel
+ * found self-contradictory quals.

/s/have be marked/have been marked/

2.
0003: around line 1077
In append.c(or prepunion.c)
228      * Check that there's at least one descendant, else treat as no-child
229      * case.  This could happen despite above has_subclass() check, if table
230      * once had a child but no longer does.

has_subclass() check is moved to subquery_planner from above this code,
so the comments need to be modified like below.

s/above has_subclass() check/has_subclass() check in subquery_planner/

3.
0003: line 1241-1244
0006: line ?

In comments of expand_partitioned_rtentry:
+ * Note: even though only the unpruned partitions will be added to the
+ * resulting plan, this still locks *all* partitions via find_all_inheritors
+ * in order to avoid partitions being locked in a different order than other
+ * places in the backend that may lock partitions.

This comments is no more needed if 0006 patch is applied because
find_all_inheritors is removed in the 0006 patch.

4.
0003: line 1541-1544

+ * Add the RelOptInfo.  Even though we may not really scan this relation
+ * for reasons such as contradictory quals, we still need need to create
+ * one, because for every RTE in the query's range table, there must be an
+ * accompanying RelOptInfo.

s/need need/need/

5.
0003: line 1620-1621

+ * After creating the RelOptInfo for the given child RT index, it goes on to
+ * initialize some of its fields base on the parent RelOptInfo.

s/fields base on/fields based on/

6.
parsenodes.h
 906  *    inh is true for relation references that should be expanded to include
 907  *    inheritance children, if the rel has any.  This *must* be false for
 908  *    RTEs other than RTE_RELATION entries.

I think inh can become true now even if RTEKind equals RTE_SUBQUERY, so latter
sentence need to be modified.

7.
0005: line 109-115
+ /*
+ * If partition is excluded by constraints, remove it from
+ * live_parts, too.
+ */
+ if (IS_DUMMY_REL(childrel))
+ parentrel->live_parts = bms_del_member(parentrel->live_parts, i);
+

When I read this comment, I imagined that relation_excluded_by_constraints()
would be called before this code. childrel is marked dummy if
build_append_child_rel found self-contradictory quals, so comments can be
modified more correctly like another comments in your patch as below.

In 0003: line 267-271
+ * Child relation may have be marked dummy if build_append_child_rel
+ * found self-contradictory quals.
+ */
+ if (IS_DUMMY_REL(childrel))
+ continue;

8.
0003: line 705-711
+ * Query planning may have added some columns to the top-level tlist,
+ * which happens when there are row marks applied to inheritance
+ * parent relations (additional junk columns needed for applying row
+ * marks are added after expanding inheritance.)
+ */
+ if (list_length(tlist) < list_length(root->processed_tlist))
+ tlist = root->processed_tlist;

In grouping_planner():

  if (planned_rel == NULL)
  {
      ...
      root->processed_tlist = tlist;
  }
  else
      tlist = root->processed_tlist;
  ...
  if (current_rel == NULL)
      current_rel = query_planner(root, tlist,
                                  standard_qp_callback, &qp_extra);
  ...
  /*
   * Query planning may have added some columns to the top-level tlist,
   * which happens when there are row marks applied to inheritance
   * parent relations (additional junk columns needed for applying row
   * marks are added after expanding inheritance.)
   */
  if (list_length(tlist) < list_length(root->processed_tlist))
      tlist = root->processed_tlist;


Are there any case tlist points to an address different from
root->processed_tlist after calling query_planner? Junk columns are possibly
added to root->processed_tlist after expanding inheritance, but that adding
process don't change the root->processed_tlist's pointer address.
I checked "Assert(tlist == root->processed_tlist)" after calling query_planner
passes "make check".

9.
0003: line 1722-1763
In build_append_child_rel():

+ /*
+ * In addition to the quals inherited from the parent, we might
+ * have securityQuals associated with this particular child node.
+ * (Currently this can only happen in appendrels originating from
+ * UNION ALL; inheritance child tables don't have their own
+ * securityQuals.) Pull any such securityQuals up into the
...
+ foreach(lc, childRTE->securityQuals)
+ {
...
+ }
+ Assert(security_level <= root->qual_security_level);
+ }

This foreach loop loops only once in the current regression tests. I checked
"Assert(childRTE->securityQuals->length == 1)" passes "make check".
I think there are no need to change codes, I state this fact only for sharing.


--
Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
Imai-san,

Thanks for the review again.

On 2018/12/05 11:29, Imai, Yoshikazu wrote:

> On Tue, Nov 20, 2018 at 10:24 PM, Amit Langote wrote:
>> Attached v8 patches.
>
> Thanks for the patch. I took a look 0003, 0005, 0006 of v8 patch.
>
> 1.
> 0003: line 267-268
> + * Child relation may have be marked dummy if build_append_child_rel
> + * found self-contradictory quals.
>
> /s/have be marked/have been marked/
>
> 2.
> 0003: around line 1077
> In append.c(or prepunion.c)
> 228      * Check that there's at least one descendant, else treat as no-child
> 229      * case.  This could happen despite above has_subclass() check, if table
> 230      * once had a child but no longer does.
>
> has_subclass() check is moved to subquery_planner from above this code,
> so the comments need to be modified like below.
>
> s/above has_subclass() check/has_subclass() check in subquery_planner/
>
> 3.
> 0003: line 1241-1244
> 0006: line ?
>
> In comments of expand_partitioned_rtentry:
> + * Note: even though only the unpruned partitions will be added to the
> + * resulting plan, this still locks *all* partitions via find_all_inheritors
> + * in order to avoid partitions being locked in a different order than other
> + * places in the backend that may lock partitions.
>
> This comments is no more needed if 0006 patch is applied because
> find_all_inheritors is removed in the 0006 patch.
>
> 4.
> 0003: line 1541-1544
>
> + * Add the RelOptInfo.  Even though we may not really scan this relation
> + * for reasons such as contradictory quals, we still need need to create
> + * one, because for every RTE in the query's range table, there must be an
> + * accompanying RelOptInfo.
>
> s/need need/need/
>
> 5.
> 0003: line 1620-1621
>
> + * After creating the RelOptInfo for the given child RT index, it goes on to
> + * initialize some of its fields base on the parent RelOptInfo.
>
> s/fields base on/fields based on/
Fixed all of 1-5.

> 6.
> parsenodes.h
>  906  *    inh is true for relation references that should be expanded to include
>  907  *    inheritance children, if the rel has any.  This *must* be false for
>  908  *    RTEs other than RTE_RELATION entries.
>
> I think inh can become true now even if RTEKind equals RTE_SUBQUERY, so latter
> sentence need to be modified.

Seems like an existing comment bug.  Why don't you send a patch as you
discovered it? :)

> 7.
> 0005: line 109-115
> + /*
> + * If partition is excluded by constraints, remove it from
> + * live_parts, too.
> + */
> + if (IS_DUMMY_REL(childrel))
> + parentrel->live_parts = bms_del_member(parentrel->live_parts, i);
> +
>
> When I read this comment, I imagined that relation_excluded_by_constraints()
> would be called before this code. childrel is marked dummy if
> build_append_child_rel found self-contradictory quals, so comments can be
> modified more correctly like another comments in your patch as below.
I realized that bms_del_member statement is unnecessary.  I've revised the
comment describing live_parts to say that it contains indexes into
part_rels array of the non-NULL RelOptInfos contained in it, that is,
RelOptInfos of un-pruned partitions (rest of the entries are NULL.)
Un-pruned partitions may become dummy due to contradictory constraints or
constraint exclusion using normal CHECK constraints later and whether it's
dummy is checked properly by functions that iterate over live_parts.

> 8.
> 0003: line 705-711
> + * Query planning may have added some columns to the top-level tlist,
> + * which happens when there are row marks applied to inheritance
> + * parent relations (additional junk columns needed for applying row
> + * marks are added after expanding inheritance.)
> + */
> + if (list_length(tlist) < list_length(root->processed_tlist))
> + tlist = root->processed_tlist;
>
> In grouping_planner():
>
>   if (planned_rel == NULL)
>   {
>       ...
>       root->processed_tlist = tlist;
>   }
>   else
>       tlist = root->processed_tlist;
>   ...
>   if (current_rel == NULL)
>       current_rel = query_planner(root, tlist,
>                                   standard_qp_callback, &qp_extra);
>   ...
>   /*
>    * Query planning may have added some columns to the top-level tlist,
>    * which happens when there are row marks applied to inheritance
>    * parent relations (additional junk columns needed for applying row
>    * marks are added after expanding inheritance.)
>    */
>   if (list_length(tlist) < list_length(root->processed_tlist))
>       tlist = root->processed_tlist;
>
>
> Are there any case tlist points to an address different from
> root->processed_tlist after calling query_planner? Junk columns are possibly
> added to root->processed_tlist after expanding inheritance, but that adding
> process don't change the root->processed_tlist's pointer address.
> I checked "Assert(tlist == root->processed_tlist)" after calling query_planner
> passes "make check".
You're right.  I think that may not have been true in some version of the
patch that I didn't share on the list, but it is with the latest patch.
I've removed that block of code and adjusted nearby comments to mention
that targetlist may change during query_planner.

> 9.
> 0003: line 1722-1763
> In build_append_child_rel():
>
> + /*
> + * In addition to the quals inherited from the parent, we might
> + * have securityQuals associated with this particular child node.
> + * (Currently this can only happen in appendrels originating from
> + * UNION ALL; inheritance child tables don't have their own
> + * securityQuals.) Pull any such securityQuals up into the
> ...
> + foreach(lc, childRTE->securityQuals)
> + {
> ...
> + }
> + Assert(security_level <= root->qual_security_level);
> + }
>
> This foreach loop loops only once in the current regression tests. I checked
> "Assert(childRTE->securityQuals->length == 1)" passes "make check".
> I think there are no need to change codes, I state this fact only for sharing.
Thanks for the information.  There aren't any changes to the code itself
due to this patch, just moved from one place to another.

Attached updated patches.  I have a few other changes in mind to make to
0001 such that the range table in each child's version of Query contains
only that child table in place of the original target relation, instead of
*all* child tables which is the current behavior.  The current behavior
makes range_table_mutator a big bottleneck when the number of un-pruned
target children is large.  But I'm saving it for the next week so that I
can prepare for the PGConf.ASIA that's starting on Monday next week.  See
you there. :)

Thanks,
Amit

v9-0001-Overhaul-inheritance-update-delete-planning.patch (77K) Download Attachment
v9-0002-Store-inheritance-root-parent-index-in-otherrel-s.patch (3K) Download Attachment
v9-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (116K) Download Attachment
v9-0004-Move-append-expansion-code-into-its-own-file.patch (153K) Download Attachment
v9-0005-Teach-planner-to-only-process-unpruned-partitions.patch (9K) Download Attachment
v9-0006-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
Hi, Amit

Thanks for the quick modification.

On Wed, Dec 5, 2018 at 8:26 PM, Amit Langote wrote:
> > 1.
...
> > 5.
> > 0003: line 1620-1621
> >
> > + * After creating the RelOptInfo for the given child RT index, it goes on to
> > + * initialize some of its fields base on the parent RelOptInfo.
> >
> > s/fields base on/fields based on/
>
> Fixed all of 1-5.

Thanks for fixing.

> > 6.
> > parsenodes.h
> >  906  *    inh is true for relation references that should be expanded to include
> >  907  *    inheritance children, if the rel has any.  This *must* be false for
> >  908  *    RTEs other than RTE_RELATION entries.
> >
> > I think inh can become true now even if RTEKind equals RTE_SUBQUERY, so latter
> > sentence need to be modified.
>
>
>
> Seems like an existing comment bug.  Why don't you send a patch as you
> discovered it? :)

Thanks, I am pleased with your proposal. I'll post it as a small fix of the comment.

> > 7.
> > 0005: line 109-115
...
> Un-pruned partitions may become dummy due to contradictory constraints or
> constraint exclusion using normal CHECK constraints later and whether it's
> dummy is checked properly by functions that iterate over live_parts.

Ah, I understand partitions are eliminated contradictory constraints or
constraint exclusion, both using constraints.

> Attached updated patches.  I have a few other changes in mind to make to
> 0001 such that the range table in each child's version of Query contains
> only that child table in place of the original target relation, instead of
> *all* child tables which is the current behavior.  The current behavior
> makes range_table_mutator a big bottleneck when the number of un-pruned
> target children is large.  But I'm saving it for the next week so that I

OK. I will continue the review of 0001 before/after your supplying of next
patch with keeping those in mind.

> can prepare for the PGConf.ASIA that's starting on Monday next week.  See
> you there. :)

Yeah, see you there. :)


--
Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
Hi, Amit,

On Fri, Dec 7, 2018 at 0:57 AM, Imai, Yoshikazu wrote:
> OK. I will continue the review of 0001 before/after your supplying of
> next patch with keeping those in mind.

Here's the continuation of the review. Almost all of below comments are
little fixes.

---
0001: line 76-77
In commit message:
  exclusion for target child relation, which is no longer
  is no longer needed.  Constraint exclusion runs during query_planner

s/which is no longer is no longer needed/which is no longer needed/

---
0001: line 464
+ if (IS_DUMMY_REL(find_base_rel(root, resultRelation )))

s/resultRelation )))/resultRelation)))/
(There is an extra space.)

---
0001: line 395-398
+ * Reset inh_target_child_roots to not be same as parent root's so that
+ * the subroots for this child's own children (if any) don't end up in
+ * root parent's list.  We'll eventually merge all entries into one list,
+ * but that's now now.

s/that's now now/that's not now/

---
0001: line 794
+ * are put into  a list that will be controlled by a single ModifyTable

s/are put into  a list/are put into a list/
(There are two spaces between "into" and "a".)

---
0001: line 241-242, 253-254, 291-294 (In set_append_rel_size())

+ if (appinfo->parent_relid == root->parse->resultRelation)
+ subroot = adjust_inherit_target_child(root, childrel, appinfo);

+ add_child_rel_equivalences(subroot, appinfo, rel, childrel,
+   root != subroot);

+ if (subroot != root)
+ {
+ root->inh_target_child_roots =
+ lappend(root->inh_target_child_roots, subroot);

A boolean value of "appinfo->parent_relid == root->parse->resultRelation" is
same with "subroot != root"(because of line 241-242), so we can use bool
variable here like
child_is_target = (appinfo->parent_relid == root->parse->resultRelation).
The name of child_is_target is brought from arguments of
add_child_rel_equivalences() in your patch.

I attached a little diff as "v9-0001-delta.diff".

---
0001: line 313-431

adjust_inherit_target_child() is in allpaths.c in your patch, but it has the
code like below ones which are in master's(not patch applied) planner.c or
planmain.c, so could it be possible in planner.c(or planmain.c)?
This point is less important, but I was just wondering whether
adjust_inherit_target_child() should be in allpaths.c, planner.c or planmain.c.

+ /* Translate the original query's expressions to this child. */
+ subroot = makeNode(PlannerInfo);
+ memcpy(subroot, root, sizeof(PlannerInfo));

+ root->parse->targetList = root->unexpanded_tlist;
+ subroot->parse = (Query *) adjust_appendrel_attrs(root,
+  (Node *) root->parse,
+  1, &appinfo);

+ tlist = preprocess_targetlist(subroot);
+ subroot->processed_tlist = tlist;
+ build_base_rel_tlists(subroot, tlist);

---
0001: line 57-70

In commit message:
  This removes some existing code in inheritance_planner that dealt
  with any subquery RTEs in the query.  The rationale of that code
  was that the subquery RTEs may change during each iteration of
  planning (that is, for different children), so different iterations
  better use different copies of those RTEs.  
  ...
  Since with the new code
  we perform planning just once, I think we don't need this special
  handling.

0001: line 772-782
- * controlled by a single ModifyTable node.  All the instances share the
- * same rangetable, but each instance must have its own set of subquery
- * RTEs within the finished rangetable because (1) they are likely to get
- * scribbled on during planning, and (2) it's not inconceivable that
- * subqueries could get planned differently in different cases.  We need
- * not create duplicate copies of other RTE kinds, in particular not the
- * target relations, because they don't have either of those issues.  Not
- * having to duplicate the target relations is important because doing so
- * (1) would result in a rangetable of length O(N^2) for N targets, with
- * at least O(N^3) work expended here; and (2) would greatly complicate
- * management of the rowMarks list.

I used considerable time to confirm there are no needs copying subquery RTEs in
the new code, but so far I couldn't. If copied RTEs are only used when planning,
it might not need to copy RTEs in the new code because we perform planning just
once, so I tried to detect when copied RTEs are used or overwritten, but I gave
up.

Of course, there are comments about this,

- * same rangetable, but each instance must have its own set of subquery
- * RTEs within the finished rangetable because (1) they are likely to get
- * scribbled on during planning, and (2) it's not inconceivable that

so copied RTEs might be used when planning, but I couldn't find the actual codes.
I also checked commits[1, 2] related to these codes. I'll check these for more
time but it would be better there are other's review and I also want a help here.

---
Maybe I checked all the way of the v9 patch excluding the codes about
EquivalenceClass codes(0001: line 567-638).
I'll consider whether there are any performance degration case, but I have
no idea for now. Do you have any points you concerns? If there, I'll check it.


[1] https://github.com/postgres/postgres/commit/b3aaf9081a1a95c245fd605dcf02c91b3a5c3a29
[2] https://github.com/postgres/postgres/commit/c03ad5602f529787968fa3201b35c119bbc6d782

--
Yoshikazu Imai


v9-0001-delta.diff (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
Thank you Imai-san.

On 2018/12/25 16:47, Imai, Yoshikazu wrote:

> Here's the continuation of the review. Almost all of below comments are
> little fixes.
>
> ---
> 0001: line 76-77
> In commit message:
>   exclusion for target child relation, which is no longer
>   is no longer needed.  Constraint exclusion runs during query_planner
>
> s/which is no longer is no longer needed/which is no longer needed/
>
> ---
> 0001: line 464
> + if (IS_DUMMY_REL(find_base_rel(root, resultRelation )))
>
> s/resultRelation )))/resultRelation)))/
> (There is an extra space.)
>
> ---
> 0001: line 395-398
> + * Reset inh_target_child_roots to not be same as parent root's so that
> + * the subroots for this child's own children (if any) don't end up in
> + * root parent's list.  We'll eventually merge all entries into one list,
> + * but that's now now.
>
> s/that's now now/that's not now/
>
> ---
> 0001: line 794
> + * are put into  a list that will be controlled by a single ModifyTable
>
> s/are put into  a list/are put into a list/
> (There are two spaces between "into" and "a".)

All fixed in my local repository.

> ---
> 0001: line 241-242, 253-254, 291-294 (In set_append_rel_size())
>
> + if (appinfo->parent_relid == root->parse->resultRelation)
> + subroot = adjust_inherit_target_child(root, childrel, appinfo);
>
> + add_child_rel_equivalences(subroot, appinfo, rel, childrel,
> +   root != subroot);
>
> + if (subroot != root)
> + {
> + root->inh_target_child_roots =
> + lappend(root->inh_target_child_roots, subroot);
>
> A boolean value of "appinfo->parent_relid == root->parse->resultRelation" is
> same with "subroot != root"(because of line 241-242), so we can use bool
> variable here like
> child_is_target = (appinfo->parent_relid == root->parse->resultRelation).
> The name of child_is_target is brought from arguments of
> add_child_rel_equivalences() in your patch.
>
> I attached a little diff as "v9-0001-delta.diff".

Sounds like a good idea for clarifying the code, so done.

> ---
> 0001: line 313-431
>
> adjust_inherit_target_child() is in allpaths.c in your patch, but it has the
> code like below ones which are in master's(not patch applied) planner.c or
> planmain.c, so could it be possible in planner.c(or planmain.c)?
> This point is less important, but I was just wondering whether
> adjust_inherit_target_child() should be in allpaths.c, planner.c or planmain.c.
>
> + /* Translate the original query's expressions to this child. */
> + subroot = makeNode(PlannerInfo);
> + memcpy(subroot, root, sizeof(PlannerInfo));
>
> + root->parse->targetList = root->unexpanded_tlist;
> + subroot->parse = (Query *) adjust_appendrel_attrs(root,
> +  (Node *) root->parse,
> +  1, &appinfo);
>
> + tlist = preprocess_targetlist(subroot);
> + subroot->processed_tlist = tlist;
> + build_base_rel_tlists(subroot, tlist);

I'm thinking of changing where adjust_inherit_target_child gets called
from.  In the current patch, it's in the middle of set_rel_size which I'm
starting to think is a not a good place for it.  Maybe, I'll place the
call call near to where inheritance is expanded.

> ---
> 0001: line 57-70
>
> In commit message:
>   This removes some existing code in inheritance_planner that dealt
>   with any subquery RTEs in the query.  The rationale of that code
>   was that the subquery RTEs may change during each iteration of
>   planning (that is, for different children), so different iterations
>   better use different copies of those RTEs.  
>   ...
>   Since with the new code
>   we perform planning just once, I think we don't need this special
>   handling.
>
> 0001: line 772-782
> - * controlled by a single ModifyTable node.  All the instances share the
> - * same rangetable, but each instance must have its own set of subquery
> - * RTEs within the finished rangetable because (1) they are likely to get
> - * scribbled on during planning, and (2) it's not inconceivable that
> - * subqueries could get planned differently in different cases.  We need
> - * not create duplicate copies of other RTE kinds, in particular not the
> - * target relations, because they don't have either of those issues.  Not
> - * having to duplicate the target relations is important because doing so
> - * (1) would result in a rangetable of length O(N^2) for N targets, with
> - * at least O(N^3) work expended here; and (2) would greatly complicate
> - * management of the rowMarks list.
>
> I used considerable time to confirm there are no needs copying subquery RTEs in
> the new code, but so far I couldn't. If copied RTEs are only used when planning,
> it might not need to copy RTEs in the new code because we perform planning just
> once, so I tried to detect when copied RTEs are used or overwritten, but I gave
> up.
>
> Of course, there are comments about this,
>
> - * same rangetable, but each instance must have its own set of subquery
> - * RTEs within the finished rangetable because (1) they are likely to get
> - * scribbled on during planning, and (2) it's not inconceivable that
>
> so copied RTEs might be used when planning, but I couldn't find the actual codes.
> I also checked commits[1, 2] related to these codes. I'll check these for more
> time but it would be better there are other's review and I also want a help here.
>
> [1]
https://github.com/postgres/postgres/commit/b3aaf9081a1a95c245fd605dcf02c91b3a5c3a29
> [2]
https://github.com/postgres/postgres/commit/c03ad5602f529787968fa3201b35c119bbc6d782

Thank you very much for spending time on that.  I'd really like someone
else to consider this as well.

Here's the summary of what I'm proposing to change with respect to the
above: because we perform scan-level planning only once for any given
relation including for sub-queries with the patch applied, we no longer
need to make copies of sub-query RTEs that are currently needed due to
repeated planning, with one copy per child target relation.  Since there
are no new copies, there is no need to process various nodes to change the
RT index being used to refer to the sub-query.  I have removed the code
that does to copying of subquery RTEs and the code that does translation
due new RT index being assigned every time a new copy was being made.

> ---
> Maybe I checked all the way of the v9 patch excluding the codes about
> EquivalenceClass codes(0001: line 567-638).
> I'll consider whether there are any performance degration case, but I have
> no idea for now. Do you have any points you concerns? If there, I'll check it.

I will send an updated patch hopefully before my new year vacation that
starts on Friday this week.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
Hi,

On 2018/12/26 13:28, Amit Langote wrote:

> On 2018/12/25 16:47, Imai, Yoshikazu wrote:
>> adjust_inherit_target_child() is in allpaths.c in your patch, but it has the
>> code like below ones which are in master's(not patch applied) planner.c or
>> planmain.c, so could it be possible in planner.c(or planmain.c)?
>> This point is less important, but I was just wondering whether
>> adjust_inherit_target_child() should be in allpaths.c, planner.c or planmain.c.
>>
>> + /* Translate the original query's expressions to this child. */
>> + subroot = makeNode(PlannerInfo);
>> + memcpy(subroot, root, sizeof(PlannerInfo));
>>
>> + root->parse->targetList = root->unexpanded_tlist;
>> + subroot->parse = (Query *) adjust_appendrel_attrs(root,
>> +  (Node *) root->parse,
>> +  1, &appinfo);
>>
>> + tlist = preprocess_targetlist(subroot);
>> + subroot->processed_tlist = tlist;
>> + build_base_rel_tlists(subroot, tlist);
>
> I'm thinking of changing where adjust_inherit_target_child gets called
> from.  In the current patch, it's in the middle of set_rel_size which I'm
> starting to think is a not a good place for it.  Maybe, I'll place the
> call call near to where inheritance is expanded.
So what I did here is that I added a new step to query_planner() itself
that adds the PlannerInfos for inheritance child target relations and
hence moved the function adjust_inherit_target_child() to planmain.c and
renamed it to add_inherit_target_child_root().  I've removed its
RelOptInfo argument and decided to modify the child RelOptInfo's
targetlist in allpaths.c.  So, add_inherit_target_child_root() only builds
the child PlannerInfo now.  Child PlannerInfo's are now stored in an array
of simple_rel_array_size elements, instead of a list.  It's filled with
NULLs except for the slots corresponding to child target relations.

In allpaths.c, I've added set_inherit_target_rel_sizes and
set_inherit_target_rel_pathlists, which look like set_append_rel_size and
set_append_rel_pathlist, respectively.  The reason to add separate
functions for the target relation case is that we don't want to combine
the outputs of children to form an "append relation" in that case.  So,
this patch no longer modifies set_append_rel_size for handling the case
where the parent relation is query's target relation.

> I will send an updated patch hopefully before my new year vacation that
> starts on Friday this week.

Here's the v10 of the patch.  I didn't get chance to do more changes than
those described above and address Imai-san's review comments yesterday.

Have a wonderful new year!  I'll be back on January 7 to reply on this thread.

Thanks,
Amit

v10-0001-Overhaul-inheritance-update-delete-planning.patch (97K) Download Attachment
v10-0002-Store-inheritance-root-parent-index-in-otherrel-.patch (3K) Download Attachment
v10-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (125K) Download Attachment
v10-0004-Move-append-expansion-code-into-its-own-file.patch (154K) Download Attachment
v10-0005-Teach-planner-to-only-process-unpruned-partition.patch (9K) Download Attachment
v10-0006-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Justin Pryzby
Hi,

I don't think I can't help with code review, but I loaded our largest
customer's schema into pg12dev and tested with this patch.  It's working well -
thanks for your work.

Details follow.

We have tooo many tables+indices so this vastly improves planning speed.  Our
large customers have ~300 parent tables and total ~20k child tables with total
~80k indices.  Our largest tables heirarchies have ~1200 child tables, which
may have as many as 6-8 indices each.  And 5-10 of the largest heirarchies are
unioned together in a view.

Running pg11.1, explaining query for the largest view with condition eliminates
all but today's tables can take several minutes with a cold cache, due to not
only stat()ing every file in every table in a partition heirarchy, before
pruning, but also actually open()ing all their indices.

Running 11dev with your v10 patch applied, this takes 2244ms with empty buffer
cache after postmaster restarted on a totally untuned instance (and a new
backend, with no cached opened files).

I was curious why it took even 2sec, and why it did so many opens() (but not
20k of them that PG11 does):

[pryzbyj@database postgresql]$ cut -d'(' -f1 /tmp/strace-12dev-explain |sort |uniq -c |sort -nr
   2544 lseek
   1263 open
   ...

It turns out 1050 open()s are due to historic data which is no longer being
loaded and therefor never converted to relkind=p (but hasn't exceeded the
retention interval so not yet DROPped, either).

Cheers,
Justin

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
On Fri, 4 Jan 2019 at 04:39, Justin Pryzby <[hidden email]> wrote:
> Running 11dev with your v10 patch applied, this takes 2244ms with empty buffer
> cache after postmaster restarted on a totally untuned instance (and a new
> backend, with no cached opened files).
>
> I was curious why it took even 2sec, and why it did so many opens() (but not
> 20k of them that PG11 does):

It would be pretty hard to know that without seeing the query plan.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
Thanks Justin for reporting the results of your testing.

On 2019/01/07 17:40, David Rowley wrote:
> On Fri, 4 Jan 2019 at 04:39, Justin Pryzby <[hidden email]> wrote:
>> Running 11dev with your v10 patch applied, this takes 2244ms with empty buffer
>> cache after postmaster restarted on a totally untuned instance (and a new
>> backend, with no cached opened files).
>>
>> I was curious why it took even 2sec, and why it did so many opens() (but not
>> 20k of them that PG11 does):
>
> It would be pretty hard to know that without seeing the query plan.

Yeah, I too would be curious to see if the resulting plan really needs to
do those open()s.  If all the open()s being seen here are accounted for by
un-pruned partitions (across the UNION ALL) contained in the plan and
their indexes, then the patch has done enough to help.  If the open()s can
be traced to pruned partitions, then there's something wrong with the patch.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Justin Pryzby
In reply to this post by David Rowley-3
On Mon, Jan 07, 2019 at 09:40:50PM +1300, David Rowley wrote:
> On Fri, 4 Jan 2019 at 04:39, Justin Pryzby <[hidden email]> wrote:
> > Running 11dev with your v10 patch applied, this takes 2244ms with empty buffer
> > cache after postmaster restarted on a totally untuned instance (and a new
> > backend, with no cached opened files).
> >
> > I was curious why it took even 2sec, and why it did so many opens() (but not
> > 20k of them that PG11 does):
>
> It would be pretty hard to know that without seeing the query plan.

The issue was this:
> > It turns out 1050 open()s are due to historic data which is no longer being
> > loaded and therefor never converted to relkind=p (but hasn't exceeded the
> > retention interval so not yet DROPped, either).

So there's no evidence of any issue with the patch.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
On 2019/01/07 23:13, Justin Pryzby wrote:
> The issue was this:
>>> It turns out 1050 open()s are due to historic data which is no longer being
>>> loaded and therefor never converted to relkind=p (but hasn't exceeded the
>>> retention interval so not yet DROPped, either).
>
> So there's no evidence of any issue with the patch.

Ah, so by this you had meant that the historic data is still a old-style
(9.6-style) inheritance hierarchy, which gets folded under the UNION ALL
whose other children are new-style partitioned tables.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Amit Langote-2
On 2018/12/27 20:25, Amit Langote wrote:
> Here's the v10 of the patch.  I didn't get chance to do more changes than
> those described above and address Imai-san's review comments yesterday.
>
> Have a wonderful new year!  I'll be back on January 7 to reply on this thread.

Rebased due to changed copyright year in prepunion.c.  Also realized that
the new files added by patch 0004 still had 2018 in them.

Thanks,
Amit

v11-0001-Overhaul-inheritance-update-delete-planning.patch (97K) Download Attachment
v11-0002-Store-inheritance-root-parent-index-in-otherrel-.patch (3K) Download Attachment
v11-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (125K) Download Attachment
v11-0004-Move-inheritance-expansion-code-into-its-own-fil.patch (154K) Download Attachment
v11-0005-Teach-planner-to-only-process-unpruned-partition.patch (9K) Download Attachment
v11-0006-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
On Tue, 8 Jan 2019 at 19:30, Amit Langote <[hidden email]> wrote:
> Rebased due to changed copyright year in prepunion.c.  Also realized that
> the new files added by patch 0004 still had 2018 in them.

I've made a pass over 0001. There's probably enough for you to look at
while I look at 0002 and the others.

0001

1. In your doc changes, just below a paragraph that you removed,
there's a paragraph starting "Both of these behaviors are likely to be
changed in a future release". This needs to be fixed since you've
removed the first of the two reasons.

2. This part should be left alone.

-    technique similar to partition pruning.  While it is primarily used
-    for partitioning implemented using the legacy inheritance method, it can be
-    used for other purposes, including with declarative partitioning.
+    technique similar to partition pruning.  It is primarily used
+    for partitioning implemented using the legacy inheritance method.

Looking at set_inherit_target_rel_sizes(), constraint exclusion still
is applied to partitions, it's just applied after pruning, according
to:

if (did_pruning && !bms_is_member(appinfo->child_relid, live_children))
{
/* This partition was pruned; skip it. */
set_dummy_rel_pathlist(childrel);
continue;
}

if (relation_excluded_by_constraints(root, childrel, childRTE))

3. add_child_rel_equivalences().  You're replacing parent EMs with
their child equivalent, but only when the eclass has no volatile
functions. Is this really what you want? I think this would misbehave
if we ever allowed: UPDATE ... SET .. ORDER BY, of which there's a
legitimate use case of wanting to reduce the chances of deadlocks
caused by non-deterministic UPDATE order.  Or if you think skipping
the volatile eclasses is fine today, then I think the comment you've
added to add_child_rel_equivalences should mention that.

4. Do you think it's okay that add_child_rel_equivalences() does not
update the ec_relids when removing the member?

UPDATE: I see you're likely leaving this alone since you're only doing
a shallow copy of the eclasses in
adjust_inherited_target_child_root(). It seems like a pretty bad idea
to do a shallow copy there.

5. What's CE?

+ /* CE failed, so finish copying/modifying join quals. */

6. Typo:

+ * ass dummy.  We must do this in this phase so that the rel's

ass -> as

7. There's no accumulation going on here:

+ /*
+ * Accumulate size information from each live child.
+ */
+ Assert(childrel->rows > 0);

8. Any point in this? We're about to loop again anyway...

+ /*
+ * If child is dummy, ignore it.
+ */
+ if (IS_DUMMY_REL(childrel))
+ continue;
+ }

9. It's a bit confusing to mention SELECT in this comment. The Assert
ensures it's an UPDATE or DELETE.

+ /*
+ * For UPDATE/DELETE queries, the top parent can only ever be a table.
+ * As a contrast, it could be a UNION ALL subquery in the case of SELECT.
+ */
+ Assert(root->parse->commandType == CMD_UPDATE ||
+    root->parse->commandType == CMD_DELETE);

10. I'd say the subroot assignment can go after the IS_DUMMY_REL
check. Keeping that loop as tight as possible for pruned rels seems
like a good idea.

+ subroot = root->inh_target_child_roots[appinfo->child_relid];
+ Assert(subroot->parse->resultRelation > 0);
+ childrel = find_base_rel(root, appinfo->child_relid);
+
+ /* Ignore excluded/pruned children. */
+ if (IS_DUMMY_REL(childrel))
+ continue;

11. I don't think you should reuse the childrel variable here:

+ childrel->reloptkind = RELOPT_BASEREL;
+
+ Assert(subroot->join_rel_list == NIL);
+ Assert(subroot->join_rel_hash == NULL);
+
+ /* Perform join planning and save the resulting RelOptInfo. */
+ childrel = make_rel_from_joinlist(subroot, translated_joinlist);
+
+ /*
+ * Remember this child target rel.  inheritance_planner will perform
+ * the remaining steps of planning for each child relation separately.
+ * Specifically, it will call grouping_planner on every
+ * RelOptInfo contained in the inh_target_child_rels list, each of
+ * which represents the source of tuples to be modified for a given
+ * target child rel.
+ */
+ root->inh_target_child_joinrels =
+ lappend(root->inh_target_child_joinrels, childrel);

12. The following comment makes less sense now that you've modified
the previous paragraph:

+ * Fortunately, the UPDATE/DELETE target can never be the nullable side of an
+ * outer join, so it's OK to generate the plan this way.

This text used to refer to:

but target inheritance has to be expanded at
 * the top.  The reason is that for UPDATE, each target relation needs a
 * different targetlist matching its own column set.  Fortunately,
 * the UPDATE/DELETE target can never be the nullable side of an outer join,
 * so it's OK to generate the plan this way.

you no longer describe plan as being expanded from the top rather than
at the bottom, which IMO is what "this way" refers to.

13. "tree is" -> "tree are" (references is plural)

+ * references in the join tree to the original target relation that's the
+ * root parent of the inheritance tree is replaced by each of its

14. Any reason to move this line from its original location?

  Assert(parse->commandType != CMD_INSERT);
+ parent_rte = planner_rt_fetch(top_parentRTindex, root);

Previously it was assigned just before it was needed and there's a
fast path out after where you moved it to and where it was.

15. relation_excluded_by_constraints(), the switch
(constraint_exclusion), you could consider turning that into

if (constraint_exclusion == CONSTRAINT_EXCLUSION_OFF)
    return false;
/*
 * When constraint_exclusion is set to 'partition' we only handle
 * OTHER_MEMBER_RELs.
 */
else if (constraint_exclusion == CONSTRAINT_EXCLUSION_PARTITION &&
         rel->reloptkind != RELOPT_OTHER_MEMBER_REL)
    return false;

When I wrote that code I was trying my best to make the complex rules
as simple as possible by separating them out.  The rules have become
quite simple after your change, so it probably does not warrant having
the switch.

16. I think the following comment needs to explain how large this
array is and what indexes it.  The current comment would have you
think there are only enough elements to store PlannerInfos for child
rels and leaves you guessing about what they're indexed by.

+ /*
+ * PlannerInfos corresponding to each inheritance child targets.
+ * Content of each PlannerInfo is same as the parent PlannerInfo, except
+ * for the parse tree which is a translated copy of the parent's parse
+ * tree.
+ */
+ struct PlannerInfo **inh_target_child_roots;

17. I'm getting an assert failure in add_paths_to_append_rel() for
Assert(parallel_workers > 0) during the partition_join tests.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
In reply to this post by Amit Langote-2
Hi,

I ran the performance tests for no prepared query and for prepared query with
plan_cache_mode='auto' and plan_cache_mode='force_custom_plan'. I also changed
number of partitions as 256 or 4096. I ran the tests on master and v9-patched.

[settings]
plan_cache_mode = 'auto' or 'force_custom_plan'
max_parallel_workers = 0
max_parallel_workers_per_gather = 0
max_locks_per_transaction = 4096

[partitioning table definitions(with 4096 partitions)]
create table rt (a int, b int, c int) partition by range (a);

\o /dev/null
select 'create table rt' || x::text || ' partition of rt for values from (' ||
 (x)::text || ') to (' || (x+1)::text || ');' from generate_series(1, 4096) x;
\gexec
\o

[pgbench(with 4096 partitions)]
no prepared: pgbench -n -f select4096.sql -T 60
prepared:    pgbench -n -f select4096.sql -T 60 -M prepared

[select4096.sql]
\set a random(1, 4096)
select a from rt where a = :a;

[results]
master:
  part_num  no-prepared   auto   force_custom_plan  (1-auto/force_custom_plan)
       256          604    571                 576                        0.01
      4096         17.5   17.5                15.1                       -0.16
               
patched:
  part_num  no-prepared   auto   force_custom_plan
       256         8614   9446                9384                      -0.006
      4096         7158   7165                7864                       0.089


There are almost no difference between auto and force_custom_plan with 256
partitions, but there are difference between auto and force_custom_plan with
4096 partitions. While auto is faster than force_custom_plan on master,
force_custom_plan is faster than auto on patched.

I wonder why force_custom_plan is faster than auto after applied the patch.

When we use PREPARE-EXECUTE, a generic plan is created and used if its cost is
cheaper than creating and using a custom plan with plan_cache_mode='auto',
while a custom plan is always created and used with plan_cache_mode='force_custom_plan'.
So one can think the difference in above results is because of creating or
using a generic plan.

I checked how many times a generic plan is created during executing pgbench and
found a generic plan is created only once and custom plans are created at other
times with plan_cache_mode='auto'. I also checked the time of creating a
generic plan, but it didn't take so much(250ms or so with 4096 partitions). So
the time of creating a generic plan does not affect the performance.

Currently, a generic plan is created at sixth time of executing EXECUTE query.
I changed it to more later (ex. at 400,000th time of executing EXECUTE query on
master with 4096 partitions, because 7000TPS x 60sec=420,0000 transactions are
run while executing pgbench.), then there are almost no difference between auto
and force_custom_plan. I think that creation of a generic plan affects the time
of executing queries which are ordered after creating generic plan.

If my assumption is right, we can expect some additional process is occurred at
executing queries ordered after creating a generic plan, which results in auto is
slower than force_custom_plan because of additional process. But looking at
above results, on master with 4096 partitions, auto is faster than force_custom_plan.
So now I am confused.

Do you have any ideas what does affect the performance?

--
Yoshikazu Imai
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Alvaro Herrera-9
In reply to this post by Amit Langote-2
> From 3b86331dd5a2368adc39c9fef92f3dd09d817a08 Mon Sep 17 00:00:00 2001
> From: amit <[hidden email]>
> Date: Wed, 7 Nov 2018 16:51:31 +0900
> Subject: [PATCH v11 4/6] Move inheritance expansion code into its own file

I wonder if it would make sense to commit 0004 ahead of the rest?  To
avoid carrying this huge one over and over ...

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

123456 ... 11