speeding up planning with partitions

classic Classic list List threaded Threaded
201 messages Options
12345 ... 11
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Michael Paquier-2
On Sat, Sep 29, 2018 at 07:00:02PM +0530, Dilip Kumar wrote:
> I think we can delay allocating memory for rel->part_rels?  And we can
> allocate in add_rel_partitions_to_query only
> for those partitions which survive pruning.

This last review set has not been answered, and as it is recent I am
moving the patch to next CF, waiting on author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
On 2018/10/02 10:20, Michael Paquier wrote:
> On Sat, Sep 29, 2018 at 07:00:02PM +0530, Dilip Kumar wrote:
>> I think we can delay allocating memory for rel->part_rels?  And we can
>> allocate in add_rel_partitions_to_query only
>> for those partitions which survive pruning.
>
> This last review set has not been answered, and as it is recent I am
> moving the patch to next CF, waiting on author.

I was thinking of doing that myself today, thanks for taking care of that.

Will get back to this by the end of this week.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Thomas Munro-3
In reply to this post by Amit Langote-2
On Fri, Sep 14, 2018 at 10:29 PM Amit Langote
<[hidden email]> wrote:
> Attached is what I have at the moment.

I realise this is a WIP but FYI the docs don't build (you removed a
<note> element that is still needed, when removing a paragraph).

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
On 2018/10/03 8:31, Thomas Munro wrote:
> On Fri, Sep 14, 2018 at 10:29 PM Amit Langote
> <[hidden email]> wrote:
>> Attached is what I have at the moment.
>
> I realise this is a WIP but FYI the docs don't build (you removed a
> <note> element that is still needed, when removing a paragraph).

Thanks Thomas for the heads up, will fix.

Regards,
Amit


Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
In reply to this post by Amit Langote-2
Hi, Amit!

On Thu, Sept 13, 2018 at 10:29 PM, Amit Langote wrote:
> Attached is what I have at the moment.

I also do the code review of the patch.
I could only see a v3-0001.patch so far, so below are all about v3-0001.patch.

I am new to inheritance/partitioning codes and code review, so my review below might have some mistakes. If there are mistakes, please point out that kindly :)


v3-0001:
1. Is there any reason inheritance_make_rel_from_joinlist returns "parent" that is passed to it? I think we can refer parent in the caller even if inheritance_make_rel_from_joinlist returns nothing.

+static RelOptInfo *
+inheritance_make_rel_from_joinlist(PlannerInfo *root,
+   RelOptInfo *parent,
+   List *joinlist)
+{
    ...
+ return parent;
+}

2. Is there the possible case that IS_DUMMY_REL(child_joinrel) is true in inheritance_adjust_scanjoin_target()?

+inheritance_adjust_scanjoin_target(PlannerInfo *root,
        ...
+{
        ...
+ foreach(lc, root->inh_target_child_rels)
+ {
                ...
+ /*
+ * If the child was pruned, its corresponding joinrel will be empty,
+ * which we can ignore safely.
+ */
+ if (IS_DUMMY_REL(child_joinrel))
+ continue;

I think inheritance_make_rel_from_joinlist() doesn't make dummy joinrel for the child that was pruned.

+inheritance_make_rel_from_joinlist(PlannerInfo *root,
...
+{
        ...
+ foreach(lc, root->append_rel_list)
+ {
+ RelOptInfo *childrel;
                ...
+ /* Ignore excluded/pruned children. */
+ if (IS_DUMMY_REL(childrel))
+ continue;
                ...
+ /*
+ * Save child joinrel to be processed later in
+ * inheritance_adjust_scanjoin_target, which adjusts its paths to
+ * be able to emit expressions in query's top-level target list.
+ */
+ root->inh_target_child_rels = lappend(root->inh_target_child_rels,
+  childrel);
+ }
+}

3.
Is the "root->parse->commandType != CMD_INSERT" required in:

@@ -2018,13 +1514,45 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
...
+ /*
+ * For UPDATE/DELETE on an inheritance parent, we must generate and
+ * apply scanjoin target based on targetlist computed using each
+ * of the child relations.
+ */
+ if (parse->commandType != CMD_INSERT &&
+ current_rel->reloptkind == RELOPT_BASEREL &&
+ current_rel->relid == root->parse->resultRelation &&
+ root->simple_rte_array[current_rel->relid]->inh)
...

@@ -2137,92 +1665,123 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
  final_rel->fdwroutine = current_rel->fdwroutine;
 
...
- foreach(lc, current_rel->pathlist)
+ if (current_rel->reloptkind == RELOPT_BASEREL &&
+ current_rel->relid == root->parse->resultRelation &&
+ !IS_DUMMY_REL(current_rel) &&
+ root->simple_rte_array[current_rel->relid]->inh &&
+ parse->commandType != CMD_INSERT)


I think if a condition would be "current_rel->relid == root->parse->resultRelation && parse->commandType != CMD_INSERT" at the above if clause, elog() is called in the query_planner and planner don't reach above if clause.
Of course there is the case that query_planner returns the dummy joinrel and elog is not called, but in that case, current_rel->relid may be 0(current_rel is dummy joinrel) and root->parse->resultRelation may be not 0(a query is INSERT).

4. Can't we use define value(IS_PARTITIONED or something like IS_INHERITANCE?) to identify inheritance and partitioned table in below code? It was little confusing to me that which code is related to inheritance/partitioned when looking codes.

In make_one_rel():
+ if (root->parse->resultRelation &&
+ root->simple_rte_array[root->parse->resultRelation]->inh)
+ {
...
+ rel = inheritance_make_rel_from_joinlist(root, targetrel, joinlist);

In inheritance_make_rel_from_joinlist():
+ if (childrel->part_scheme != NULL)
+ childrel =
+ inheritance_make_rel_from_joinlist(root, childrel,
+   translated_joinlist);


I can't review inheritance_adjust_scanjoin_target() deeply, because it is difficult to me to understand fully codes about join processing.

--
Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
Imai-san,

On 2018/10/04 17:11, Imai, Yoshikazu wrote:
> Hi, Amit!
>
> On Thu, Sept 13, 2018 at 10:29 PM, Amit Langote wrote:
>> Attached is what I have at the moment.
>
> I also do the code review of the patch.

Thanks a lot for your review.  I'm working on updating the patches as
mentioned upthread and should be able to post a new version sometime next
week.  I will consider your comments along with David's and Dilip's before
posting the new patch.

Regards,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Dilip Kumar-2
Hi Dilip,

Thanks for your review comments.  Sorry it took me a while replying to them.

On 2018/09/29 22:30, Dilip Kumar wrote:

> I was going through your patch (v3-0002) and I have some suggestions
>
> 1.
> - if (nparts > 0)
> + /*
> + * For partitioned tables, we just allocate space for RelOptInfo's.
> + * pointers for all partitions and copy the partition OIDs from the
> + * relcache.  Actual RelOptInfo is built for a partition only if it is
> + * not pruned.
> + */
> + if (rte->relkind == RELKIND_PARTITIONED_TABLE)
> + {
>   rel->part_rels = (RelOptInfo **)
> - palloc(sizeof(RelOptInfo *) * nparts);
> + palloc0(sizeof(RelOptInfo *) * rel->nparts);
> + return rel;
> + }
>
> I think we can delay allocating memory for rel->part_rels?  And we can
> allocate in add_rel_partitions_to_query only
> for those partitions which survive pruning.

Unfortunately, we can't do that.  The part_rels array is designed such
that there is one entry in it for each partition of a partitioned table
that physically exists.  Since the pruning code returns a set of indexes
into the array of *all* partitions, the planner's array must also be able
to hold *all* partitions, even if most of them would be NULL in the common
case.

Also, partition wise join code depends on finding a valid (even if dummy)
RelOptInfo for *all* partitions of a partitioned table to support outer
join planning.  Maybe, we can change partition wise join code to not
depend on such dummy-marked RelOptInfos on the nullable side, but until
then we'll have to leave part_rels like this.

> 2.
> add_rel_partitions_to_query
> ...
> + /* Expand the PlannerInfo arrays to hold new partition objects. */
> + num_added_parts = scan_all_parts ? rel->nparts :
> + bms_num_members(partindexes);
> + new_size = root->simple_rel_array_size + num_added_parts;
> + root->simple_rte_array = (RangeTblEntry **)
> + repalloc(root->simple_rte_array,
> + sizeof(RangeTblEntry *) * new_size);
> + root->simple_rel_array = (RelOptInfo **)
> + repalloc(root->simple_rel_array,
> + sizeof(RelOptInfo *) * new_size);
> + if (root->append_rel_array)
> + root->append_rel_array = (AppendRelInfo **)
> + repalloc(root->append_rel_array,
> + sizeof(AppendRelInfo *) * new_size);
> + else
> + root->append_rel_array = (AppendRelInfo **)
> + palloc0(sizeof(AppendRelInfo *) *
> + new_size);
>
> Instead of re-pallocing for every partitioned relation can't we first
> count the total number of surviving partitions and
> repalloc at once.

Hmm, doing this seems a bit hard too.  Since the function to perform
partition pruning (prune_append_rel_partitions), which determines the
number of partitions that will be added, currently contains a RelOptInfo
argument for using the partitioning info created by
set_relation_partition_info, we cannot call it on a partitioned table
unless we've created a RelOptInfo for it.  So, we cannot delay creating
partition RelOptInfos to a point where we've figured out all the children,
because some of them might be partitioned tables themselves.  IOW, we must
create them as we process each partitioned table (and its partitioned
children recursively) and expand the PlannerInfo arrays as we go.

I've thought about the alternative(s) that will allow us to do what you
suggest, but it cannot be implemented without breaking how we initialize
partitioning info in RelOptInfo.  For example, we could refactor
prune_append_rel_array's interface to accept a Relation pointer instead of
RelOptInfo, but we don't have a Relation pointer handy at the point where
we can do pruning without re-opening it, which is something to avoid.
Actually, that's not the only refactoring needed as I've come to know when
trying to implement it.

On a related note, with the latest patch, I've also delayed regular
inheritance expansion as a whole, to avoid making the partitioning
expansion a special case.   That means we'll expand the planner arrays
every time a inheritance parent is encountered in the range table.  The
only aspect where partitioning becomes a special case in the new code is
that we can call the pruning code even before we've expanded the planner
arrays and the pruning limits the size to which the arrays must be
expanded to.  Regular inheritance requires that the planner arrays be
expanded by the amount given by
list_length(find_all_inheritors(root_parent_oid)).

> 3.
> + /*
> + * And do prunin.  Note that this adds AppendRelInfo's of only the
> + * partitions that are not pruned.
> + */
>
> prunin/pruning

I've since rewritten this comment and fixed the misspelling.

I'm still working on some other comments.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by David Rowley-3
Hi David,

I've managed to get back to the rest of your comments.  Sorry that it took
me a while.

On 2018/09/11 8:23, David Rowley wrote:

> On 30 August 2018 at 21:29, Amit Langote <[hidden email]> wrote:
>> Attached updated patches, with 0002 containing the changes mentioned above.
>
> Here's my first pass review on this:
>
> 0002:
>
> 10. I think moving the PlannerInfo->total_table_pages calculation
> needs to be done in its own patch. This is a behavioural change where
> we no longer count pruned relations in the calculation which can
> result in plan changes. I posted a patch in [1] to fix this as a bug
> fix as I think the current code is incorrect. My patch also updates
> the first paragraph of the comment. You've not done that.

As mentioned on the other thread, I've included your patch in my own series.

> 11. "pruning"
>
> + * And do prunin.  Note that this adds AppendRelInfo's of only the

Fixed.

> 12. It's much more efficient just to do bms_add_range() outside the loop in:
>
> + for (i = 0; i < rel->nparts; i++)
> + {
> + rel->part_rels[i] = build_partition_rel(root, rel,
> + rel->part_oids[i]);
> + rel->live_parts = bms_add_member(rel->live_parts, i);
> + }

You're right, I almost forgot about bms_add_range that we just recently added.

> 13. In set_append_rel_size() the foreach(l, root->append_rel_list)
> loop could be made to loop over RelOptInfo->live_parts instead which
> would save having to skip over AppendRelInfos that don't belong to
> this parent. You'd need to make live_parts more general purpose and
> also use it to mark children of inheritance parents.

Hmm, I've thought about it, but live_parts is a set of indexes into the
part_rels array, which is meaningful only for partitioned tables.

I'm not really sure if we should generalize part_rels to child_rels and
set it even for regular inheritance, if only for the efficiency of
set_append_rel_size and set_append_rel_pathlists.

In the updated patches, I've managed to make this whole effort applicable
to regular inheritance in general (as I said I would last month), where
the only special case code needed for partitioning is to utilize the
functionality of partprune.c, but I wasn't generalize everything.  We can,
as a follow on patch maybe.

> 14. I think you can skip the following if both are NULL. You could
> likely add more smarts for different join types, but I think you
> should at least skip when both are NULL. Perhaps the loop could be
> driven off of bms_intersec of the two Rel's live_parts?
>
> + if (child_rel1 == NULL)
> + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts);
> + if (child_rel2 == NULL)
> + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts);

Actually, not needing to build the dummy partition rel here is something
we could get back to doing someday, but not in this patch.  The logic
involved to do so is too much related to join planning, so I thought we
could pursue it later.

> 15. The following is not required when append_rel_array was previously NULL.
>
> + MemSet(root->append_rel_array + root->simple_rel_array_size,
> +    0, sizeof(AppendRelInfo *) * num_added_parts);

Yeah, I've fixed that.  If previously NULL, it's palloc'd.

> 16. I don't think scan_all_parts is worth the extra code.  The cost of
> doing bms_num_members is going to be pretty small in comparison to
> building paths for and maybe doing a join search for all partitions.
>
> + num_added_parts = scan_all_parts ? rel->nparts :
> + bms_num_members(partindexes);
>
> In any case, you've got a bug in prune_append_rel_partitions() where
> you're setting scan_all_parts to true instead of returning when
> contradictory == true.

I too started disliking it, so got rid of scan_all_parts.

> 17. lockmode be of type LOCKMODE, not int.
>
> +    Oid childOID, int lockmode,

This might no longer be the same code as you saw when reviewing, but I've
changed all lockmode variables in the patch to use LOCKMODE.

> 18. Comment contradicts the code.
>
> + /* Open rel if needed; we already have required locks */
> + if (childOID != parentOID)
> + {
> + childrel = heap_open(childOID, lockmode);
>
> I think you should be using NoLock here.
>
> 19. Comment contradicts the code.
>
> + /* Close child relations, but keep locks */
> + if (childOID != parentOID)
> + {
> + Assert(childrel != NULL);
> + heap_close(childrel, lockmode);
> + }

These two blocks, too.  There is no such code with the latest patch.
Code's been moved such that there is no need for such special handling.

> 20. I assume translated_vars can now be NIL due to
> build_dummy_partition_rel() not setting it.
>
> - if (var->varlevelsup == 0 && appinfo)
> + if (var->varlevelsup == 0 && appinfo && appinfo->translated_vars)
>
> It might be worth writing a comment to explain that, otherwise it's
> not quite clear why you're doing this.

Regarding this and and one more comment below, I've replied below.

> 21. Unrelated change;
>
>   Assert(relid > 0 && relid < root->simple_rel_array_size);
> +

Couldn't find it in the latest patch, so must be gone.

> 22. The following comment mentions that Oids are copied, but that does
> not happen in the code.
>
> + /*
> + * For partitioned tables, we just allocate space for RelOptInfo's.
> + * pointers for all partitions and copy the partition OIDs from the
> + * relcache.  Actual RelOptInfo is built for a partition only if it is
> + * not pruned.
> + */
>
> The Oid copying already happened during get_relation_info().

Couldn't find this comment in the new code.

> 23. Traditionally translated_vars populated with a sequence of Vars in
> the same order to mean no translation. Here you're changing how that
> works:
>
> + /* leaving translated_vars to NIL to mean no translation needed */
>
> This seems to be about the extent of your documentation on this, which
> is not good enough.

I've changed the other code such that only the existing meaning of no-op
translation list is preserved, that is, the one you wrote above.  I
modified build_dummy_partition_rel such that a no-op translated vars is
built based on parent's properties, instead of leaving it NIL.  I've also
removed the special case code in adjust_appendrel_attrs_mutator which
checked translated_vars for NIL.

> 24. "each" -> "reach"? ... Actually, I don't understand the comment.
> In a partitioned hierarchy, how can the one before the top-level
> partitioned table not be a partitioned table?
>
> /*
> * Keep moving up until we each the parent rel that's not a
> * partitioned table.  The one before that one would be the root
> * parent.
> */

This comment and the code has since been rewritten, but to clarify, this
is the new comment:

     * Figuring out if we're the root partitioned table is a bit involved,
     * because the root table mentioned in the query itself might be a
     * child of UNION ALL subquery.
     */

> 25. "already"
>
> + * expand_inherited_rtentry alreay locked all partitions, so pass

No longer appears in the latest patch.

> 26. Your changes to make_partitionedrel_pruneinfo() look a bit broken.
> You're wrongly assuming live_parts is the same as present_parts.  If a
> CHECK constraint eliminated the partition then those will be present
> in live_parts but won't be part of the Append/MergeAppend subplans.
> You might be able to maintain some of this optimisation by checking
> for dummy rels inside the loop, but you're going to need to put back
> the code that sets present_parts.
>
> + present_parts = bms_copy(subpart->live_parts);

You're right.  The new code structure is such that it allows to delete the
partition index of a partition that's excluded by constraints, so I fixed
it so that live_parts no longer contains the partitions that are excluded
by constraints.

The existing code that sets present parts goes through all partitions by
looping over nparts members of part_rels, which is a pattern I think we
should avoid, as I'd think you'd agree.

> 27. Comment contradicts the code:
>
> + Bitmapset  *live_parts; /* unpruned parts; NULL if all are live */
>
> in add_rel_partitions_to_query() you're doing:
>
> + if (scan_all_parts)
> + {
> + for (i = 0; i < rel->nparts; i++)
> + {
> + rel->part_rels[i] = build_partition_rel(root, rel,
> + rel->part_oids[i]);
> + rel->live_parts = bms_add_member(rel->live_parts, i);
> + }
> + }
>
> so the NULL part seems untrue.

I've rewritten the comment.

>
> 28. Missing comments:
>
> + TupleDesc tupdesc;
> + Oid reltype;
>
> 29. The comment for prune_append_rel_partitions claims it "Returns RT
> indexes", but that's not the case, per:
>
> -Relids
> -prune_append_rel_partitions(RelOptInfo *rel)
> +void
> +prune_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel)

Has changed in the latest patch and now it returns a bitmapset of
partition indexes, same as what get_matching_partitions does.

> 0003:
>
> 30. 2nd test can be tested inside the first test to remove redundant
> partition check.
>
> - inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
> + if (rte->relkind != RELKIND_PARTITIONED_TABLE)
> + inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
>
>   /*
>   * Check that there's at least one descendant, else treat as no-child
>   * case.  This could happen despite above has_subclass() check, if table
>   * once had a child but no longer does.
>   */
> - if (list_length(inhOIDs) < 2)
> + if (rte->relkind != RELKIND_PARTITIONED_TABLE && list_length(inhOIDs) < 2)
>   {

This code has changed quite a bit in the latest patch, so the comment no
longer applies.

> 31. The following code is wrong:
>
> + /* Determine the correct lockmode to use. */
> + if (rootRTindex == root->parse->resultRelation)
> + lockmode = RowExclusiveLock;
> + else if (rootrc && RowMarkRequiresRowShareLock(rootrc->markType))
> + lockmode = RowShareLock;
> + else
> + lockmode = AccessShareLock;
>
> rootRTIndex remains at 0 if there are no row marks and resultRelation
> will be 0 for SELECT queries, this means you'll use a RowExclusiveLock
> for a SELECT instead of an AccessShareLock.
>
> Why not just check the lockmode of the parent and use that?

No such logic exists anymore due to recent developments (addition of
rellockmode to RTE).


I'll post the latest patches in this week.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Imai, Yoshikazu
Imai-san,

Thank you for reviewing.

On 2018/10/04 17:11, Imai, Yoshikazu wrote:

> Hi, Amit!
>
> On Thu, Sept 13, 2018 at 10:29 PM, Amit Langote wrote:
>> Attached is what I have at the moment.
>
> I also do the code review of the patch.
> I could only see a v3-0001.patch so far, so below are all about v3-0001.patch.
>
> I am new to inheritance/partitioning codes and code review, so my review below might have some mistakes. If there are mistakes, please point out that kindly :)
>
>
> v3-0001:
> 1. Is there any reason inheritance_make_rel_from_joinlist returns "parent" that is passed to it? I think we can refer parent in the caller even if inheritance_make_rel_from_joinlist returns nothing.
>
> +static RelOptInfo *
> +inheritance_make_rel_from_joinlist(PlannerInfo *root,
> +   RelOptInfo *parent,
> +   List *joinlist)
> +{
>     ...
> + return parent;
> +}

There used to be a reason to do that in previous versions, but seems it
doesn't hold anymore.  I've changed it to not return any value.

> 2. Is there the possible case that IS_DUMMY_REL(child_joinrel) is true in inheritance_adjust_scanjoin_target()?
>
> +inheritance_adjust_scanjoin_target(PlannerInfo *root,
> ...
> +{
> ...
> + foreach(lc, root->inh_target_child_rels)
> + {
> ...
> + /*
> + * If the child was pruned, its corresponding joinrel will be empty,
> + * which we can ignore safely.
> + */
> + if (IS_DUMMY_REL(child_joinrel))
> + continue;
>
> I think inheritance_make_rel_from_joinlist() doesn't make dummy joinrel for the child that was pruned.
>
> +inheritance_make_rel_from_joinlist(PlannerInfo *root,
> ...
> +{
> ...
> + foreach(lc, root->append_rel_list)
> + {
> + RelOptInfo *childrel;
> ...
> + /* Ignore excluded/pruned children. */
> + if (IS_DUMMY_REL(childrel))
> + continue;
> ...
> + /*
> + * Save child joinrel to be processed later in
> + * inheritance_adjust_scanjoin_target, which adjusts its paths to
> + * be able to emit expressions in query's top-level target list.
> + */
> + root->inh_target_child_rels = lappend(root->inh_target_child_rels,
> +  childrel);
> + }
> +}

You're right.  Checking that in inheritance_adjust_scanjoin_target was
redundant.

> 3.
> Is the "root->parse->commandType != CMD_INSERT" required in:
>
> @@ -2018,13 +1514,45 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
> ...
> + /*
> + * For UPDATE/DELETE on an inheritance parent, we must generate and
> + * apply scanjoin target based on targetlist computed using each
> + * of the child relations.
> + */
> + if (parse->commandType != CMD_INSERT &&
> + current_rel->reloptkind == RELOPT_BASEREL &&
> + current_rel->relid == root->parse->resultRelation &&
> + root->simple_rte_array[current_rel->relid]->inh)
> ...
>
> @@ -2137,92 +1665,123 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
>   final_rel->fdwroutine = current_rel->fdwroutine;
>  
> ...
> - foreach(lc, current_rel->pathlist)
> + if (current_rel->reloptkind == RELOPT_BASEREL &&
> + current_rel->relid == root->parse->resultRelation &&
> + !IS_DUMMY_REL(current_rel) &&
> + root->simple_rte_array[current_rel->relid]->inh &&
> + parse->commandType != CMD_INSERT)
>
>
> I think if a condition would be "current_rel->relid == root->parse->resultRelation && parse->commandType != CMD_INSERT" at the above if clause, elog() is called in the query_planner and planner don't reach above if clause.
> Of course there is the case that query_planner returns the dummy joinrel and elog is not called, but in that case, current_rel->relid may be 0(current_rel is dummy joinrel) and root->parse->resultRelation may be not 0(a query is INSERT).

Yeah, I realized that we can actually Assert(parse->commandType !=
CMD_INSERT) if the inh flag of the target/resultRelation RTE is true.  So,
checking it explicitly is redundant.

> 4. Can't we use define value(IS_PARTITIONED or something like IS_INHERITANCE?) to identify inheritance and partitioned table in below code? It was little confusing to me that which code is related to inheritance/partitioned when looking codes.
>
> In make_one_rel():
> + if (root->parse->resultRelation &&
> + root->simple_rte_array[root->parse->resultRelation]->inh)
> + {
> ...
> + rel = inheritance_make_rel_from_joinlist(root, targetrel, joinlist);
>
> In inheritance_make_rel_from_joinlist():
> + if (childrel->part_scheme != NULL)
> + childrel =
> + inheritance_make_rel_from_joinlist(root, childrel,
> +   translated_joinlist);

As it might have been clear, the value of inh flag is checked to detect
whether the operation uses inheritance, because it requires special
handling -- the operation must be applied to every child relation that
satisfies the conditions of the query.

part_scheme is checked to detect partitioning which has some special
behavior with respect to how the AppendRelInfos are set up.  For
partitioning, AppendRelInfos map partitions to their immediate parent, not
the top-most root parent, so the current code uses recursion to apply
certain transformations.  That's not required for non-partitioned case,
because all children are mapped to the root inheritance parent.

I'm trying to unify the two so that the partitioning case doesn't need any
special handling.

> I can't review inheritance_adjust_scanjoin_target() deeply, because it is difficult to me to understand fully codes about join processing.

Thanks for your comments, they are valuable.

Regards,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
And here is the latest set of patches.  Sorry it took me a while.

* To reiterate, this set of patches is intended to delay adding partitions
to the query's range table and PlannerInfo, which leads to significantly
reduced overheads in the common case where queries accessing partitioned
tables need to scan only 1 or few partitions.  As I mentioned before on
this thread [1], I wanted to change the patches such that the new approach
is adopted for all inheritance tables, not just partitioned tables,
because most of the code and data structures are shared and it no longer
made sense to me to create a diversion between the two cases by making the
partitioning a special case in a number of places in the planner code.  I
think I've managed to get to that point with the latest patches.  The new
code that performs inheritance expansion is common to partitioning,
inheritance, and also flattened UNION ALL to some degree, because all of
these cases use more or less the same code.  For partitioning, child
relations are opened and RTEs are added for them *after* performing
pruning, whereas for inheritance that's not possible, because pruning
using constraint exclusion cannot be performed without opening the
children.  This consideration is the only main thing that differs between
the handling of the two case, among other minor details that also differ.

* With the unification described above, inheritance expansion is no longer
part planner's prep phase, so I've located the new expansion code in a new
file under optimizer/utils named append.c, because it manages creation of
an appendrel's child relations.  That means a lot of helper code that once
was in prepunion.c is now in the new file.  Actually, I've moved *some* of
the helper routines such as adjust_appendre_* that do the mapping of
expressions between appendrel parent and children to a different file
appendinfo.c.

* I've also rewritten the patch to change how inherited update/delete
planning is done, so that it doesn't end up messing too much with
grouping_planner.  Instead of removing inheritance_planner altogether as
the earlier patches did, I've rewritten it such that the CPU and memory
intensive query_planner portion is run only once at the beginning to
create scan/join paths for non-excluded/unpruned child target relations,
followed by running grouping_planner to apply top-level targetlist
suitable for each child target relation.  grouping_planner is modified
slightly such that when it runs for every child target relation, it
doesn't again need to perform query planning.  It instead receives the
RelOptInfos that contain the scan/join paths of child target relations
that were generated during the initial query_planner run.  Beside this
main change, I've removed quite a bit of code in inheritance_planner that
relied on the existing way of redoing planning for each child target
relation.  The changes are described in the commit message of patch 0002.

* A few notes on the patches:

0001 is a separate patch, because it might be useful in some other
projects like [2].

0003 is David's patch that he posted in [3].


I didn't get time today to repeat the performance tests, but I'm planning
to next week.  In the meantime, please feel free to test them and provide
comments on the code changes.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/8097576f-f795-fc42-3c00-073f68bfb0b4%40lab.ntt.co.jp

[2]
https://www.postgresql.org/message-id/de957e5b-faa0-6fb9-c0ab-0b389d71cf5a%40lab.ntt.co.jp

[3] Re: Calculate total_table_pages after set_base_rel_sizes()
https://www.postgresql.org/message-id/CAKJS1f9NiQXO9KCv_cGgBShwqwT78wmArOht-5kWL%2BBt0v-AnQ%40mail.gmail.com

0001-Store-inheritance-root-parent-index-in-otherrel-s-Re.patch (3K) Download Attachment
0002-Overhaul-inheritance-update-delete-planning.patch (61K) Download Attachment
0003-Calculate-total_table_pages-after-set_base_rel_sizes.patch (6K) Download Attachment
0004-Lazy-creation-of-RTEs-for-inheritance-children.patch (192K) Download Attachment
0005-Teach-planner-to-only-process-unpruned-partitions.patch (9K) Download Attachment
0006-Do-not-lock-all-partitions-at-the-beginning.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
Hi,

On Thu, Oct 25, 2018 at 10:38 PM, Amit Langote wrote:
> And here is the latest set of patches.  Sorry it took me a while.

Thanks for revising the patch!

> I didn't get time today to repeat the performance tests, but I'm planning
> to next week.  In the meantime, please feel free to test them and provide
> comments on the code changes.

Since it takes me a lot of time to see all of the patches, I will post comments
little by little from easy parts like typo check.


1.
0002:
+ * inheritance_make_rel_from_joinlist
+ * Perform join planning for all non-dummy leaf inheritance chilren
+ * in their role as query's target relation

"inheritance chilren" -> "inheritance children"


2.
0002:
+ /*
+ * Sub-partitions have to be processed recursively, because
+ * AppendRelInfoss link sub-partitions to their immediate parents, not
+ * the root partitioned table.
+ */

AppendRelInfoss -> AppendRelInfos (?)


3.
0002:
+ /* Reset interal join planning data structures. */

interal -> internal


4.
0004:
-static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
- Index rti);

Comments referring to expand_inherited_rtentry() is left.

backend/optimizer/plan/planner.c:1310:
           * Because of the way expand_inherited_rtentry works, that should be
backend/optimizer/plan/planner.c:1317:
           * Instead the duplicate child RTE created by expand_inherited_rtentry
backend/optimizer/util/plancat.c:118:
    * the rewriter or when expand_inherited_rtentry() added it to the query's
backend/optimizer/util/plancat.c:640:
    * the rewriter or when expand_inherited_rtentry() added it to the query's

About the last two comments in the above,
"the rewriter or when expand_inherited_rtentry() added it to the query's"
would be
"the rewriter or when add_inheritance_child_rel() added it to the query's".

I don't know how to correct the first two comments.


5.
0004:
-static void expand_partitioned_rtentry(PlannerInfo *root,
-   RangeTblEntry *parentrte,
-   Index parentRTindex, Relation parentrel,
-   PlanRowMark *top_parentrc, LOCKMODE lockmode,
-   List **appinfos);

Comments referring to expand_partitioned_rtentry() is also left.

backend/executor/execPartition.c:941:
 /*
  * get_partition_dispatch_recurse
  *      Recursively expand partition tree rooted at rel
  *
  * As the partition tree is expanded in a depth-first manner, we maintain two
  * global lists: of PartitionDispatch objects corresponding to partitioned
  * tables in *pds and of the leaf partition OIDs in *leaf_part_oids.
  *
  * Note that the order of OIDs of leaf partitions in leaf_part_oids matches
  * the order in which the planner's expand_partitioned_rtentry() processes
  * them.  It's not necessarily the case that the offsets match up exactly,
  * because constraint exclusion might prune away some partitions on the
  * planner side, whereas we'll always have the complete list; but unpruned
  * partitions will appear in the same order in the plan as they are returned
  * here.
  */

I think the second paragraph of the comments is no longer correct.
expand_partitioned_rtentry() expands the partition tree in a depth-first
manner, whereas expand_append_rel() doesn't neither in a depth-first manner
nor a breadth-first manner as below.

partitioned table tree image:
pt
  sub1
    sub1_1
      leaf0
    leaf1
  sub2
    leaf2
  leaf3

append_rel_list(expanded by expand_partitioned_rtentry):
  [sub1, sub1_1, leaf0, leaf1, sub2, leaf2, leaf3]

append_rel_list(expanded by expand_append_rel):
  [sub1, sub2, leaf3, sub1_1, leaf1, leaf0, leaf2]


6.
0004
+ /*
+ * A partitioned child table with 0 children is a dummy rel, so don't
+ * bother creating planner objects for it.
+ */
+ if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ {
+ PartitionDesc partdesc = RelationGetPartitionDesc(childrel);
+
+ Assert(!RELATION_IS_OTHER_TEMP(childrel));
+ if (partdesc->nparts == 0)
+ {
+ heap_close(childrel, NoLock);
+ return NULL;
+ }
+ }
+
+ /*
+ * If childrel doesn't belong to this session, skip it, also relinquishing
+ * the lock.
+ */
+ if (RELATION_IS_OTHER_TEMP(childrel))
+ {
+ heap_close(childrel, lockmode);
+ return NULL;
+ }

If we process the latter if block before the former one, Assert can be excluded
from the code.



It might be difficult to me to examine the codes whether there exists any logical
wrongness along with significant planner code changes, but I will try to look it.
Since it passes make check-world successfully, I think as of now there might be
not any wrong points.

If there's time, I will also do the performance tests.


--
Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
About inheritance_make_rel_from_joinlist(), I considered how it processes
joins for sub-partitioned-table.

sub-partitioned-table image:
part
  sub1
    leaf1
    leaf2

inheritance_make_rel_from_joinlist() translates join_list and join_info_list
for each leafs(leaf1, leaf2 in above image). To translate those two lists for
leaf1, inheritance_make_rel_from_joinlist() translates lists from part to sub1
and nextly from sub1 to leaf1. For leaf2, inheritance_make_rel_from_joinlist()
translates lists from part to sub1 and from sub1 to leaf2. There are same
translation from part to sub1, and I think it is better if we can eliminate it.
I attached 0002-delta.patch.

In the patch, inheritance_make_rel_from_joinlist() translates lists not only for
leafs but for mid-nodes, in a depth-first manner, so it can use lists of
mid-nodes for translating lists from mid-node to leafs, which eliminates same
translation.

I think it might be better if we can apply same logic to inheritance_planner(),
which once implemented the same logic as below.


    foreach(lc, root->append_rel_list)
    {
        AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
        ...
        /*
         * expand_inherited_rtentry() always processes a parent before any of
         * that parent's children, so the parent_root for this relation should
         * already be available.
         */
        parent_root = parent_roots[appinfo->parent_relid];
        Assert(parent_root != NULL);
        parent_parse = parent_root->parse;
        ...
        subroot->parse = (Query *)
            adjust_appendrel_attrs(parent_root,
                                   (Node *) parent_parse,
                                   1, &appinfo);


--
Yoshikazu Imai


0002-delta.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Alvaro Herrera-9
In reply to this post by Amit Langote-2
ISTM patch 0004 is impossible to review just because of size -- I
suppose the bulk of it is just code that moves from one file to another,
but that there are also code changes in it.  Maybe it would be better to
split it so that one patch moves the code to the new files without
changing it, then another patch does the actual functional changes?  If
git shows the first half as a rename, it's easier to be confident about
it.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
Thanks for looking.

On 2018/11/07 12:32, Alvaro Herrera wrote:
> ISTM patch 0004 is impossible to review just because of size -- I
> suppose the bulk of it is just code that moves from one file to another,
> but that there are also code changes in it.  Maybe it would be better to
> split it so that one patch moves the code to the new files without
> changing it, then another patch does the actual functional changes?  If
> git shows the first half as a rename, it's easier to be confident about
> it.

Okay, will post a new version structured that way shortly.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Imai, Yoshikazu
Imai-san,

Thank you for reviewing.

On 2018/11/05 17:28, Imai, Yoshikazu wrote:
> Since it takes me a lot of time to see all of the patches, I will post comments
> little by little from easy parts like typo check.

I've fixed the typos you pointed out.

> 4.
> 0004:
> -static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
> - Index rti);
>
> Comments referring to expand_inherited_rtentry() is left.
>
> backend/optimizer/plan/planner.c:1310:
>            * Because of the way expand_inherited_rtentry works, that should be
> backend/optimizer/plan/planner.c:1317:
>            * Instead the duplicate child RTE created by expand_inherited_rtentry
> backend/optimizer/util/plancat.c:118:
>     * the rewriter or when expand_inherited_rtentry() added it to the query's
> backend/optimizer/util/plancat.c:640:
>     * the rewriter or when expand_inherited_rtentry() added it to the query's
>
> About the last two comments in the above,
> "the rewriter or when expand_inherited_rtentry() added it to the query's"
> would be
> "the rewriter or when add_inheritance_child_rel() added it to the query's".
>
> I don't know how to correct the first two comments.

I've since modified the patch to preserve expand_inherited_rtentry name,
although with a new implementation.

> 5.
> 0004:
> -static void expand_partitioned_rtentry(PlannerInfo *root,
> -   RangeTblEntry *parentrte,
> -   Index parentRTindex, Relation parentrel,
> -   PlanRowMark *top_parentrc, LOCKMODE lockmode,
> -   List **appinfos);
>
> Comments referring to expand_partitioned_rtentry() is also left.

I've preserved this name as well.

> backend/executor/execPartition.c:941:
>  /*
>   * get_partition_dispatch_recurse
>   *      Recursively expand partition tree rooted at rel
>   *
>   * As the partition tree is expanded in a depth-first manner, we maintain two
>   * global lists: of PartitionDispatch objects corresponding to partitioned
>   * tables in *pds and of the leaf partition OIDs in *leaf_part_oids.
>   *
>   * Note that the order of OIDs of leaf partitions in leaf_part_oids matches
>   * the order in which the planner's expand_partitioned_rtentry() processes
>   * them.  It's not necessarily the case that the offsets match up exactly,
>   * because constraint exclusion might prune away some partitions on the
>   * planner side, whereas we'll always have the complete list; but unpruned
>   * partitions will appear in the same order in the plan as they are returned
>   * here.
>   */
>
> I think the second paragraph of the comments is no longer correct.
> expand_partitioned_rtentry() expands the partition tree in a depth-first
> manner, whereas expand_append_rel() doesn't neither in a depth-first manner
> nor a breadth-first manner as below.

Well, expand_append_rel doesn't process a whole partitioning tree on its
own, only one partitioned table at a time.  As set_append_rel_size()
traverses the root->append_rel_list, leaf partitions get added to the
resulting plan in what's effectively a depth-first order.

I've updated this comment as far this patch is concerned, although the
patch in the other thread about removal of executor overhead in
partitioning is trying to remove this function
(get_partition_dispatch_recurse) altogether anyway.

> 6.
> 0004
> + /*
> + * A partitioned child table with 0 children is a dummy rel, so don't
> + * bother creating planner objects for it.
> + */
> + if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + {
> + PartitionDesc partdesc = RelationGetPartitionDesc(childrel);
> +
> + Assert(!RELATION_IS_OTHER_TEMP(childrel));
> + if (partdesc->nparts == 0)
> + {
> + heap_close(childrel, NoLock);
> + return NULL;
> + }
> + }
> +
> + /*
> + * If childrel doesn't belong to this session, skip it, also relinquishing
> + * the lock.
> + */
> + if (RELATION_IS_OTHER_TEMP(childrel))
> + {
> + heap_close(childrel, lockmode);
> + return NULL;
> + }
>
> If we process the latter if block before the former one, Assert can be excluded
> from the code.

I've since moved the partitioning-related code to a partitioning specific
function, so this no longer applies.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Imai, Yoshikazu
Imai-san,

On 2018/11/07 10:00, Imai, Yoshikazu wrote:

> About inheritance_make_rel_from_joinlist(), I considered how it processes
> joins for sub-partitioned-table.
>
> sub-partitioned-table image:
> part
>   sub1
>     leaf1
>     leaf2
>
> inheritance_make_rel_from_joinlist() translates join_list and join_info_list
> for each leafs(leaf1, leaf2 in above image). To translate those two lists for
> leaf1, inheritance_make_rel_from_joinlist() translates lists from part to sub1
> and nextly from sub1 to leaf1. For leaf2, inheritance_make_rel_from_joinlist()
> translates lists from part to sub1 and from sub1 to leaf2. There are same
> translation from part to sub1, and I think it is better if we can eliminate it.
> I attached 0002-delta.patch.
>
> In the patch, inheritance_make_rel_from_joinlist() translates lists not only for
> leafs but for mid-nodes, in a depth-first manner, so it can use lists of
> mid-nodes for translating lists from mid-node to leafs, which eliminates same
> translation.

I don't think the translation happens twice for the same leaf partitions.

Applying adjust_appendrel_attrs_*multilevel* for only leaf nodes, as
happens with the current code, is same as first translating using
adjust_appendrel_attrs from part to sub1 and then from sub1 to leaf1 and
leaf2 during recursion with sub1 as the parent.

> I think it might be better if we can apply same logic to inheritance_planner(),
> which once implemented the same logic as below.
>
>
>     foreach(lc, root->append_rel_list)
>     {
>         AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
>         ...
>         /*
>          * expand_inherited_rtentry() always processes a parent before any of
>          * that parent's children, so the parent_root for this relation should
>          * already be available.
>          */
>         parent_root = parent_roots[appinfo->parent_relid];
>         Assert(parent_root != NULL);
>         parent_parse = parent_root->parse;
>         ...
>         subroot->parse = (Query *)
>             adjust_appendrel_attrs(parent_root,
>                                    (Node *) parent_parse,
>                                    1, &appinfo);

Actually, inheritance_planner is also using
adjust_appendrel_attrs_multilevel.  I'm not sure if you're looking at the
latest (10/26) patch.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Amit Langote-2
On 2018/11/07 12:44, Amit Langote wrote:

> Thanks for looking.
>
> On 2018/11/07 12:32, Alvaro Herrera wrote:
>> ISTM patch 0004 is impossible to review just because of size -- I
>> suppose the bulk of it is just code that moves from one file to another,
>> but that there are also code changes in it.  Maybe it would be better to
>> split it so that one patch moves the code to the new files without
>> changing it, then another patch does the actual functional changes?  If
>> git shows the first half as a rename, it's easier to be confident about
>> it.
>
> Okay, will post a new version structured that way shortly.
And here are patches structured that way.  I've addressed some of the
comments posted by Imai-san upthread.  Also, since David's patch to
postpone PlannerInfo.total_pages calculation went into the tree earlier
this week, it's no longer included in this set.

With this breakdown, patches are as follows:

v5-0001-Store-inheritance-root-parent-index-in-otherrel-s.patch

  Adds a inh_root_parent field that's set in inheritance child otherrel
  RelOptInfos to store the RT index of their root parent

v5-0002-Overhaul-inheritance-update-delete-planning.patch

  Patch that adjusts planner so that inheritance_planner can use partition
  pruning.

v5-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch

  Patch that adjusts planner such that inheritance expansion step is
  postponed from early subquery_planner to the beginning of
  set_append_rel_size, so that child tables are added to the Query and
  PlannerInfo after partition pruning.  While the move only benefits
  partitioning, because non-partition children cannot be pruned until
  after they're fully opened, the new code handles both partitioned tables
  and regular inheritance parents.

v5-0004-Move-append-expansion-code-into-its-own-file.patch

  With patch 0003, inheritance expansion is no longer a part of the prep
  phase of planning, so it seems odd that inheritance expansion code is in
  optimizer/prep/prepunion.c.  This patch moves the code to a new file
  optimizer/utils/append.c.  Also, some of the appendrel translation
  subroutines are moved over to optimizer/utils/appendinfo.c.

  No functional changes with this patch.

v5-0005-Teach-planner-to-only-process-unpruned-partitions.patch

  Patch that adds a live_parts field to RelOptInfo which is set in
  partitioned rels to contain partition indexes of only non-dummy children
  replace the loops of the following form:

    for (i = 0; i < rel->nparts; i++)
    {
        RelOptInfo *partrel = rel->part_rels[i];

        ...some processing
    }

  at various points within the planner with:

    i = -1
    while ((i = bms_get_next(rel->live_parts)) >= 0)
    {
        RelOptInfo *partrel = rel->part_rels[i];

        ...some processing
    }

v5-0006-Do-not-lock-all-partitions-at-the-beginning.patch

  A simple patch that removes the find_all_inheritors called for
  partitioned root parent only to lock *all* partitions, in favor of
  locking only the unpruned partitions.

Thanks,
Amit

v5-0001-Store-inheritance-root-parent-index-in-otherrel-s.patch (3K) Download Attachment
v5-0002-Overhaul-inheritance-update-delete-planning.patch (61K) Download Attachment
v5-0003-Lazy-creation-of-RTEs-for-inheritance-children.patch (116K) Download Attachment
v5-0004-Move-append-expansion-code-into-its-own-file.patch (149K) Download Attachment
v5-0005-Teach-planner-to-only-process-unpruned-partitions.patch (9K) Download Attachment
v5-0006-Do-not-lock-all-partitions-at-the-beginning.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Jesper Pedersen
Hi Amit,

On 11/9/18 3:55 AM, Amit Langote wrote:
> And here are patches structured that way.  I've addressed some of the
> comments posted by Imai-san upthread.  Also, since David's patch to
> postpone PlannerInfo.total_pages calculation went into the tree earlier
> this week, it's no longer included in this set.
>

Thanks for doing the split this way. The patch passes check-world.

I ran a SELECT test using hash partitions, and got

        Master    v5
64:    6k        59k
1024:  283       59k

The non-partitioned case gives 77k. The difference in TPS between the
partition case vs. the non-partitioned case comes down to
set_plain_rel_size() vs. set_append_rel_size() under
set_base_rel_sizes(); flamegraphs for this sent off-list.

Best regards,
  Jesper

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
In reply to this post by Amit Langote-2
On 9 November 2018 at 21:55, Amit Langote <[hidden email]> wrote:
> v5-0001-Store-inheritance-root-parent-index-in-otherrel-s.patch
>
>   Adds a inh_root_parent field that's set in inheritance child otherrel
>   RelOptInfos to store the RT index of their root parent
>
> v5-0002-Overhaul-inheritance-update-delete-planning.patch
>
>   Patch that adjusts planner so that inheritance_planner can use partition
>   pruning.

I've started looking at these two, but only so far made it through
0001 and 0002.

Here's what I noted down during the review.

0001:

1. Why do we need the new field that this patch adds? I see in 0002
it's used like:

+ if (childrel->inh_root_parent > 0 &&
+ childrel->inh_root_parent == root->parse->resultRelation)

Would something like...
int relid;
if (childrel->part_schema == NULL &&
bms_get_singleton_member(childrel->top_parent_relids, &relid) &&
relid == root->parse->resultRelation)

...not do the trick?

0002:

2. What's wrong with childrel->relids?

+ child_relids = bms_make_singleton(appinfo->child_relid);

3. Why not use childrel->top_parent_relids?

+ top_parent_relids = bms_make_singleton(childrel->inh_root_parent);

4. The following comment in inheritance_make_rel_from_joinlist()
implies that the function will not be called for SELECT, but the
comment above the function does not mention that.

/*
* For UPDATE/DELETE queries, the top parent can only ever be a table.
* As a contrast, it could be a UNION ALL subquery in the case of SELECT.
*/
Assert(planner_rt_fetch(top_parent_relid, root)->rtekind == RTE_RELATION);

5. Should the following comment talk about "Sub-partitioned tables"
rather than "sub-partitions"?

+ * Sub-partitions have to be processed recursively, because
+ * AppendRelInfos link sub-partitions to their immediate parents, not
+ * the root partitioned table.

6. Can't you just pass childrel->relids and
childrel->top_parent_relids instead of making new ones?

+ child_relids = bms_make_singleton(appinfo->child_relid);
+ Assert(childrel->inh_root_parent > 0);
+ top_parent_relids = bms_make_singleton(childrel->inh_root_parent);
+ translated_joinlist = (List *)
+ adjust_appendrel_attrs_multilevel(root,
+   (Node *) joinlist,
+   child_relids,
+   top_parent_relids);

7. I'm just wondering what your idea is here?

+ /* Reset join planning specific data structures. */
+ root->join_rel_list = NIL;
+ root->join_rel_hash = NULL;

Is there a reason to nullify these? You're not releasing any memory
and the new structures that will be built won't overlap with the ones
built last round. I don't mean to imply that the code is wrong, it
just does not appear to be particularly right.

8. In regards to:

+ * NB: Do we need to change the child EC members to be marked
+ * as non-child somehow?
+ */
+ childrel->reloptkind = RELOPT_BASEREL;

I know we talked a bit about this before, but this time I put together
a crude patch that runs some code each time we skip an em_is_child ==
true EquivalenceMember. The code checks if any of the em_relids are
RELOPT_BASEREL. What I'm looking for here are places where we
erroneously skip the member when we shouldn't.  Running the regression
tests with this patch in place shows a number of problems.  Likely I
should only trigger the warning when bms_membership(em->em_relids) ==
BMS_SINGLETON, but it never-the-less appears to highlight various
possible issues. Applying the same on master only appears to show the
cases where em->em_relids isn't a singleton set.  I've attached the
patch to let you see what I mean.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

verify_em_child.diff (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
In reply to this post by Amit Langote-2
Hi Amit,

On Thu, Nov 8, 2018 at 8:26 PM, Amit Langote wrote:

> On 2018/11/07 10:00, Imai, Yoshikazu wrote:
> > About inheritance_make_rel_from_joinlist(), I considered how it processes
> > joins for sub-partitioned-table.
> >
> > sub-partitioned-table image:
> > part
> >   sub1
> >     leaf1
> >     leaf2
> >
> > inheritance_make_rel_from_joinlist() translates join_list and join_info_list
> > for each leafs(leaf1, leaf2 in above image). To translate those two lists for
> > leaf1, inheritance_make_rel_from_joinlist() translates lists from part to sub1
> > and nextly from sub1 to leaf1. For leaf2, inheritance_make_rel_from_joinlist()
> > translates lists from part to sub1 and from sub1 to leaf2. There are same
> > translation from part to sub1, and I think it is better if we can eliminate it.
> > I attached 0002-delta.patch.
> >
> > In the patch, inheritance_make_rel_from_joinlist() translates lists not only for
> > leafs but for mid-nodes, in a depth-first manner, so it can use lists of
> > mid-nodes for translating lists from mid-node to leafs, which eliminates same
> > translation.
>
> I don't think the translation happens twice for the same leaf partitions.
>
> Applying adjust_appendrel_attrs_*multilevel* for only leaf nodes, as
> happens with the current code, is same as first translating using
> adjust_appendrel_attrs from part to sub1 and then from sub1 to leaf1 and
> leaf2 during recursion with sub1 as the parent.

Thanks for replying.
I interpreted your thoughts about translation as below.

adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
adjust_appendrel_attrs_multilevel for leaf2: sub1(produced at above) -> leaf2

But I wonder translation of leaf2 actually reuses the results of sub1 which is
produced at leaf1 translation. ISTM translation for leaf1, leaf2 are executed
as below.

adjust_appendrel_attrs_multilevel for leaf1: root -> sub1 -> leaf1
adjust_appendrel_attrs_multilevel for leaf2: root -> sub1 -> leaf2


> > I think it might be better if we can apply same logic to inheritance_planner(),
> > which once implemented the same logic as below.
> >
> >
> >     foreach(lc, root->append_rel_list)
> >     {
> >         AppendRelInfo *appinfo = lfirst_node(AppendRelInfo, lc);
> >         ...
> >         /*
> >          * expand_inherited_rtentry() always processes a parent before any of
> >          * that parent's children, so the parent_root for this relation should
> >          * already be available.
> >          */
> >         parent_root = parent_roots[appinfo->parent_relid];
> >         Assert(parent_root != NULL);
> >         parent_parse = parent_root->parse;
> >         ...
> >         subroot->parse = (Query *)
> >             adjust_appendrel_attrs(parent_root,
> >                                    (Node *) parent_parse,
> >                                    1, &appinfo);
>
> Actually, inheritance_planner is also using
> adjust_appendrel_attrs_multilevel.  I'm not sure if you're looking at the
> latest (10/26) patch.

Sorry for my poor explanation. I described the above code as old code which is
not patch applied.


Since it is difficult to explain my thoughts with words, I will show the
performance degration case.

Partition tables are below two sets.

Set1:
[create 800 partitions directly under root]
CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a);
\o /dev/null
SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF rt FOR VALUES FROM ('
|| (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x;
\gexec
\o

Set2:
[create 800 partitions under a partitioned table which is under root]
CREATE TABLE rt (a int, b int) PARTITION BY RANGE (a);
CREATE TABLE sub1 PARTITION OF rt FOR VALUES FROM (0) TO (100) PARTITION BY
RANGE (b);
\o /dev/null
SELECT 'CREATE TABLE leaf' || x::text || ' PARTITION OF sub1 FOR VALUES FROM ('
|| (x)::text || ') TO (' || (x+1)::text || ');' FROM generate_series(0, 800) x;
\gexec
\o


Create a generic plan of updation or deletion.

[create a delete generic plan]
set plan_cache_mode = 'force_generic_plan';
prepare delete_stmt(int) as delete from rt where b = $1;
execute delete_stmt(1);


In creating generic plan, paths/plans for all partitions are created because
we don't know which plan is used before "EXECUTE" command happens.
In creating paths in inheritance_planner(),
adjust_appendrel_attrs()/adjust_appendrel_attrs_multilevel() is executed many
times and it allocates a lot of memory in total if there are many partitions.

How amount of memory is used with above tests is...

without v5 patches, Set1: 242MB
without v5 patches, Set2: 247MB
with v5 patches, Set1: 420MB
with v5 patches, Set2: 820MB
# Thanks for supplying v5 patches :)

Without sub-partition(Set1), memory allocated by adjust_appendrel_attrs() with
v5 patches is 1.7x larger than without v5 patches. With sub-partition(Set2),
memory allocated by adjust_appendrel_attrs() with v5 patches is 3.3x larger
than without v5 patches.
I think why memory usage in "with v5 patches, Set2" is almost 2 times than
"with v5 patches, Set1" is that adjust_appendrel_attrs() from root to sub1 is
occurred at each translation for each partitions in "with v5 patches, Set2".

Currently, a query to a partition table is processed faster by a custom plan
than by a generic plan, so we would not use a generic plan that I don't know
whether we should solve this large memory allocation problem.

--
Yoshikazu Imai

12345 ... 11