speeding up planning with partitions

classic Classic list List threaded Threaded
201 messages Options
1234567 ... 11
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote
On Wed, Jan 9, 2019 at 11:41 PM Alvaro Herrera <[hidden email]> wrote:
> > From 3b86331dd5a2368adc39c9fef92f3dd09d817a08 Mon Sep 17 00:00:00 2001
> > From: amit <[hidden email]>
> > Date: Wed, 7 Nov 2018 16:51:31 +0900
> > Subject: [PATCH v11 4/6] Move inheritance expansion code into its own file
>
> I wonder if it would make sense to commit 0004 ahead of the rest?  To
> avoid carrying this huge one over and over ...

Maybe a good idea.  I will rearrange the patches that  way tomorrow.

Thanks,
Amit

Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
In reply to this post by Amit Langote-2
Hi Amit,

On Mon, Jan 7, 2019 at 6:30 PM, Amit Langote wrote:

> On 2018/12/27 20:25, Amit Langote wrote:
> > Here's the v10 of the patch.  I didn't get chance to do more changes
> > than those described above and address Imai-san's review comments
> yesterday.
> >
> > Have a wonderful new year!  I'll be back on January 7 to reply on this
> thread.
>
> Rebased due to changed copyright year in prepunion.c.  Also realized that
> the new files added by patch 0004 still had 2018 in them.

Thank you for new patches.


I also have some comments on 0001, set_inherit_target_rel_sizes().

In set_inherit_target_rel_sizes():

Some codes are placed not the same order as set_append_rel_size().

0001: at line 325-326,
+ ListCell   *l;
+ bool has_live_children;

In set_append_rel_size(), "has_live_children" is above of the "ListCell *l";

0001: at line 582-603
+ if (IS_DUMMY_REL(childrel))
+ continue;
+
...
+ Assert(childrel->rows > 0);
+
+ /* We have at least one live child. */
+ has_live_children = true;

In set_append_rel_size(),
+ /* We have at least one live child. */
+ has_live_children = true;
is directly under of
+ if (IS_DUMMY_REL(childrel))
+ continue;

0001: at line 606-622,
+ if (!has_live_children)
+ {
+ /*
+ * All children were excluded by constraints, so mark the relation
+ * ass dummy.  We must do this in this phase so that the rel's
+ * dummy-ness is visible when we generate paths for other rels.
+ */
+ set_dummy_rel_pathlist(rel);
+ }
+ else
+ /*
+ * Set a non-zero value here to cope with the caller's requirement
+ * that non-dummy relations are actually not empty.  We don't try to
+ * be accurate here, because we're not going to create a path that
+ * combines the children outputs.
+ */
+ rel->rows = 1;

In set_append_rel_size(), a condition of if clause is not !has_live_children but
has_live_children.

I also noticed there isn't else block while there is if block.

--
Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Amit Langote
On 2019/01/10 0:16, Amit Langote wrote:

> On Wed, Jan 9, 2019 at 11:41 PM Alvaro Herrera <[hidden email]> wrote:
>>> From 3b86331dd5a2368adc39c9fef92f3dd09d817a08 Mon Sep 17 00:00:00 2001
>>> From: amit <[hidden email]>
>>> Date: Wed, 7 Nov 2018 16:51:31 +0900
>>> Subject: [PATCH v11 4/6] Move inheritance expansion code into its own file
>>
>> I wonder if it would make sense to commit 0004 ahead of the rest?  To
>> avoid carrying this huge one over and over ...
>
> Maybe a good idea.  I will rearrange the patches that  way tomorrow.
Here's v12, which is more or less same as v11 but with the order of
patches switched so that the code movement patch is first.  Note that the
attached v12-0001 contains no functional changes (but there's tiny note in
the commit message mentioning the addition of a tiny function which is
just old code).

In the v13 that I will try to post tomorrow, I will have hopefully
addressed David's and Imai-san's review comments.  Thank you both!

Regards,
Amit

v12-0001-Move-inheritance-expansion-code-into-its-own-fil.patch (121K) Download Attachment
v12-0002-Overhaul-inheritance-update-delete-planning.patch (97K) Download Attachment
v12-0003-Store-inheritance-root-parent-index-in-otherrel-.patch (3K) Download Attachment
v12-0004-Lazy-creation-of-RTEs-for-inheritance-children.patch (125K) Download Attachment
v12-0005-Teach-planner-to-only-process-unpruned-partition.patch (9K) Download Attachment
v12-0006-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
On Thu, 10 Jan 2019 at 21:41, Amit Langote
<[hidden email]> wrote:
> In the v13 that I will try to post tomorrow, I will have hopefully
> addressed David's and Imai-san's review comments.  Thank you both!

I'd been looking at v11's 0002 and started on 0003 too. I'll include
my notes so far if you're about to send a v13.


v11 0002

18. There's a missing case in the following code. I understand that
makeNode() will 0 the member here, but that does not stop the various
other initialisations that set the default value for the field.  Below
there's a missing case where parent != NULL && parent->rtekind !=
RTE_RELATION. You might be better just always zeroing the field below
"rel->partitioned_child_rels = NIL;"

+
+ /*
+ * For inheritance child relations, we also set inh_root_parent.
+ * Note that 'parent' might itself be a child (a sub-partitioned
+ * partition), in which case we simply use its value of
+ * inh_root_parent.
+ */
+ if (parent->rtekind == RTE_RELATION)
+ rel->inh_root_parent = parent->inh_root_parent > 0 ?
+ parent->inh_root_parent :
+ parent->relid;
  }
  else
+ {
  rel->top_parent_relids = NULL;
+ rel->inh_root_parent = 0;
+ }

19. Seems strange to have a patch that adds a new field that is
unused. I also don't quite understand yet why top_parent_relids can't
be used instead. I think I recall being confused about that before, so
maybe it's worth writing a comment to mention why it cannot be used.

v11 0003

20. This code looks wrong:

+ /*
+ * expand_inherited_tables may have proved that the relation is empty, so
+ * check if it's so.
+ */
+ else if (rte->inh && !IS_DUMMY_REL(rel))


Likely you'll want:

else if rte->inh)
{
if (IS_DUMMY_REL(rel))
return;
// other stuff
}

otherwise, you'll end up in the else clause when you shouldn't be.

21. is -> was

+ * The child rel's RelOptInfo is created during
+ * expand_inherited_tables().
  */
  childrel = find_base_rel(root, childRTindex);

since you're talking about something that already happened.

I'll continue looking at v12.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Alvaro Herrera-9
In reply to this post by Amit Langote-2
On 2019-Jan-10, Amit Langote wrote:

> Here's v12, which is more or less same as v11 but with the order of
> patches switched so that the code movement patch is first.  Note that the
> attached v12-0001 contains no functional changes (but there's tiny note in
> the commit message mentioning the addition of a tiny function which is
> just old code).

Pushed 0001 with some minor tweaks, thanks.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera <[hidden email]> wrote:
> Pushed 0001 with some minor tweaks, thanks.

Thanks for pushing.  I had looked at 0001 last night and there wasn't
much to report, just:

v12 0001

1. I see you've moved translate_col_privs() out of prepunion.c into
appendinfo.c. This required making it an external function, but it's
only use is in inherit.c, so would it not be better to put it there
and keep it static?

2. The following two lines I think need to swap their order.

+#include "utils/rel.h"
+#include "utils/lsyscache.h"

Both are pretty minor details but thought I'd post them anyway.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Alvaro Herrera-9
On 2019/01/11 2:56, Alvaro Herrera wrote:
> On 2019-Jan-10, Amit Langote wrote:
>
>> Here's v12, which is more or less same as v11 but with the order of
>> patches switched so that the code movement patch is first.  Note that the
>> attached v12-0001 contains no functional changes (but there's tiny note in
>> the commit message mentioning the addition of a tiny function which is
>> just old code).
>
> Pushed 0001 with some minor tweaks, thanks.

Thank you for the tweaks and committing.

Regards,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by David Rowley-3
Sorry, I hadn't read this email before sending my earlier "thank you for
committing" email.

On 2019/01/11 6:47, David Rowley wrote:

> On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera <[hidden email]> wrote:
>> Pushed 0001 with some minor tweaks, thanks.
>
> Thanks for pushing.  I had looked at 0001 last night and there wasn't
> much to report, just:
>
> v12 0001
>
> 1. I see you've moved translate_col_privs() out of prepunion.c into
> appendinfo.c. This required making it an external function, but it's
> only use is in inherit.c, so would it not be better to put it there
> and keep it static?
Actually, I *was* a bit puzzled where to put it.  I tend to agree with you
now that it might be define it locally within inherit.c as it might not be
needed elsewhere.  Let's hear what Alvaro thinks.  I'm attaching a patch
anyway.

> 2. The following two lines I think need to swap their order.
>
> +#include "utils/rel.h"
> +#include "utils/lsyscache.h"

Oops, fixed.

> Both are pretty minor details but thought I'd post them anyway.

Thank you for reporting.

Attached find the patch.

Regards,
Amit

b60c397599-fixups.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
In reply to this post by Amit Langote-2
On Thu, 10 Jan 2019 at 21:41, Amit Langote
<[hidden email]> wrote:
> Here's v12, which is more or less same as v11 but with the order of
> patches switched so that the code movement patch is first.  Note that the
> attached v12-0001 contains no functional changes (but there's tiny note in
> the commit message mentioning the addition of a tiny function which is
> just old code).

A few more comments based on reading v12.

v12 0002:

1. Missing braces around the else clause. (Should be consistent with
the "if" above)

+ if (!has_live_children)
+ {
+ /*
+ * All children were excluded by constraints, so mark the relation
+ * ass dummy.  We must do this in this phase so that the rel's
+ * dummy-ness is visible when we generate paths for other rels.
+ */
+ set_dummy_rel_pathlist(rel);
+ }
+ else
+ /*
+ * Set a non-zero value here to cope with the caller's requirement
+ * that non-dummy relations are actually not empty.  We don't try to
+ * be accurate here, because we're not going to create a path that
+ * combines the children outputs.
+ */
+ rel->rows = 1;

v12 0004:

2. I wonder if there's a better way, instead of doing this:

+ if (child_rel1 == NULL)
+ child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts);
+ if (child_rel2 == NULL)
+ child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts);

maybe add some logic in populate_joinrel_with_paths() to allow NULL
rels to mean dummy rels.  There's a bit of a problem there as the
joinrel has already been created by that time, but perhaps
populate_joinrel_with_paths is a better place to decide if the dummy
rel needs to be built or not.

3. I wonder if there's a better way to handle what
build_dummy_partition_rel() does. I see the child's relid to the
parent's relid and makes up a fake AppendRelInfo and puts it in the
parent's slot.  What's going to happen when the parent is not the
top-level parent? It'll already have a AppendRelInfo set.

I had thought something like the following could break this, but of
course, it does not since we build the dummy rel for the pruned
sub_parent2, so we don't hit the NULL relation case when doing the
next level. i.e we only make dummies for the top-level, never dummys
of joinrels.

Does that not mean that the if (parent->top_parent_relids) will always
be false in build_dummy_partition_rel() and it'll only ever get
rtekind == RTE_RELATION?

drop table if exists parent;
create table parent (id int, a int, b text, c float) partition by range (a);
create table sub_parent1 (b text, c float, a int, id int) partition by
range (a);
create table sub_parent2 (c float, b text, id int, a int) partition by
range (a);
alter table parent attach partition sub_parent1 for values from (0) to (10);
alter table parent attach partition sub_parent2 for values from (10) to (20);

create table child11 (id int, b text, c float, a int);
create table child12 (id int, b text, c float, a int);
create table child21 (id int, b text, c float, a int);
create table child22 (id int, b text, c float, a int);

alter table sub_parent1 attach partition child11 for values from (0) to (5);
alter table sub_parent1 attach partition child12 for values from (5) to (10);
alter table sub_parent2 attach partition child21 for values from (10) to (15);
alter table sub_parent2 attach partition child22 for values from (15) to (20);

insert into parent values(0,1,'test',100.0);

select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 10;

4. How are dummy rels handled in grouping_planner()?

I see you have this:

- if (IS_DUMMY_REL(planned_rel))
+ if (!parent_rte->inh || IS_DUMMY_REL(planned_rel))
  {
  grouping_planner(root, false, planned_rel, 0.0);
  return;

With the above example I tried to see how it was handled by doing:

postgres=# update parent set c = c where a = 333;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

I didn't look into what's causing the crash.

5. Wondering why you removed the if (childOID != parentOID) from:

- if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
- {
- heap_close(newrelation, lockmode);
- continue;
- }

Isn't that releasing the only lock we hold on the rel defined in the query?

I tested with:

-- session 1
create temp table a1(a int);
create temp table a2(a int) inherits(a1);

-- session 2
select oid::regclass from pg_class where relname = 'a1';
     oid
--------------
 pg_temp_3.a1
(1 row)

explain select * from pg_temp_3.a1;
WARNING:  you don't own a lock of type AccessShareLock
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=4)
   One-Time Filter: false
(2 rows)

6. expand_planner_arrays() zeros a portion of the append_rel_array
even if it just palloc0'd the array. While it's not a bug, it is
repeat work.  It should be okay to move the Memset() up to the
repalloc().

7. I see get_relation_info() grew an extra parameter.  Would it now be
better just to pass rte instead of doing;

get_relation_info(root, rte->relid, rte->inh, rte->updatedCols,
  rel);

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
Hi David,

On Thu, Jan 10, 2019 at 4:02 PM, David Rowley wrote:
> 3. I wonder if there's a better way to handle what
> build_dummy_partition_rel() does. I see the child's relid to the
> parent's relid and makes up a fake AppendRelInfo and puts it in the
> parent's slot.  What's going to happen when the parent is not the
> top-level parent? It'll already have a AppendRelInfo set.
...
>
> select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 10;

I think there is a mistake in the select SQL.
"p1.id < 10" doesn't prune any partition because tables are partitioned by
column "a" in your definition. Isn't it?

> Does that not mean that the if (parent->top_parent_relids) will always
> be false in build_dummy_partition_rel() and it'll only ever get
> rtekind == RTE_RELATION?

At least, I checked if (parent->top_parent_relids) can be true if I execute
below SQL.

select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 15;

I couldn't check other points you mentioned, but I also think
build_dummy_partition_rel() needs more consideration because I felt it has
complicated logic when I was checking around here.


Amit,
I also realized there are some mistakes in the comments around this function.

+ * build_dummy_partition_rel
+ * Build a RelOptInfo and AppendRelInfo for a pruned partition
s/and AppendRelInfo/and an AppendRelInfo/

+ * Now we'll need a (no-op) AppendRelInfo for parent, because we're
+ * setting the dummy partition's relid to be same as the parent's.
s/a \(no-op\) AppendRelInfo/an \(no-op\) AppendRelInfo/

--
Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Amit Langote-2
On 2019/01/11 11:07, Amit Langote wrote:

> On 2019/01/11 6:47, David Rowley wrote:
>> On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera <[hidden email]> wrote:
>>> Pushed 0001 with some minor tweaks, thanks.
>>
>> Thanks for pushing.  I had looked at 0001 last night and there wasn't
>> much to report, just:
>>
>> v12 0001
>>
>> 1. I see you've moved translate_col_privs() out of prepunion.c into
>> appendinfo.c. This required making it an external function, but it's
>> only use is in inherit.c, so would it not be better to put it there
>> and keep it static?
>
> Actually, I *was* a bit puzzled where to put it.  I tend to agree with you
> now that it might be define it locally within inherit.c as it might not be
> needed elsewhere.  Let's hear what Alvaro thinks.  I'm attaching a patch
> anyway.
>
>> 2. The following two lines I think need to swap their order.
>>
>> +#include "utils/rel.h"
>> +#include "utils/lsyscache.h"
>
> Oops, fixed.
>
>> Both are pretty minor details but thought I'd post them anyway.
>
> Thank you for reporting.
>
> Attached find the patch.
Looking around a bit more, I started thinking even build_child_join_sjinfo
doesn't belong in appendinfo.c (just to be clear, it was defined in
prepunion.c before this commit), so maybe we should move it to joinrels.c
and instead export adjust_child_relids that's required by it from
appendinfo.c.  There's already adjust_child_relids_multilevel in
appendinfo.h, so having adjust_child_relids next to it isn't too bad.  At
least not as bad as appendinfo.c exporting build_child_join_sjinfo for
joinrels.c to use.

I have updated the patch.

Thanks,
Amit

b60c397599-fixups-v2.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Imai, Yoshikazu
In reply to this post by Imai, Yoshikazu
On Thu, Jan 10, 2019 at 6:10 PM, Imai, Yoshikazu wrote:
> > Does that not mean that the if (parent->top_parent_relids) will always
> > be false in build_dummy_partition_rel() and it'll only ever get
> > rtekind == RTE_RELATION?
>
> At least, I checked if (parent->top_parent_relids) can be true if I
> execute below SQL.
>
> select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 15;

Sorry, I also made mistake. I was executed:
select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.a < 15;

--
Yoshikazu Imai


> -----Original Message-----
> From: Imai, Yoshikazu [mailto:[hidden email]]
> Sent: Friday, January 11, 2019 3:10 PM
> To: 'David Rowley' <[hidden email]>; Amit Langote
> <[hidden email]>
> Cc: Amit Langote <[hidden email]>; Alvaro Herrera
> <[hidden email]>; Pg Hackers <[hidden email]>
> Subject: RE: speeding up planning with partitions
>
> Hi David,
>
> On Thu, Jan 10, 2019 at 4:02 PM, David Rowley wrote:
> > 3. I wonder if there's a better way to handle what
> > build_dummy_partition_rel() does. I see the child's relid to the
> > parent's relid and makes up a fake AppendRelInfo and puts it in the
> > parent's slot.  What's going to happen when the parent is not the
> > top-level parent? It'll already have a AppendRelInfo set.
> ...
> >
> > select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id
> > < 10;
>
> I think there is a mistake in the select SQL.
> "p1.id < 10" doesn't prune any partition because tables are partitioned
> by column "a" in your definition. Isn't it?
>
> > Does that not mean that the if (parent->top_parent_relids) will always
> > be false in build_dummy_partition_rel() and it'll only ever get
> > rtekind == RTE_RELATION?
>
> At least, I checked if (parent->top_parent_relids) can be true if I
> execute below SQL.
>
> select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id
> < 15;
>
> I couldn't check other points you mentioned, but I also think
> build_dummy_partition_rel() needs more consideration because I felt it
> has complicated logic when I was checking around here.
>
>
> Amit,
> I also realized there are some mistakes in the comments around this
> function.
>
> + * build_dummy_partition_rel
> + * Build a RelOptInfo and AppendRelInfo for a pruned
> partition
> s/and AppendRelInfo/and an AppendRelInfo/
>
> + * Now we'll need a (no-op) AppendRelInfo for parent, because
> we're
> + * setting the dummy partition's relid to be same as the parent's.
> s/a \(no-op\) AppendRelInfo/an \(no-op\) AppendRelInfo/
>
> --
> Yoshikazu Imai

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by David Rowley-3
Thanks for reviewing, David, Imai-san.  Replying to all reviews (up to and
including David's comments earlier today) with this one email so that I
can attach the finished patch here.

On 2019/01/09 9:09, David Rowley wrote:

> On Tue, 8 Jan 2019 at 19:30, Amit Langote <[hidden email]> wrote:
>> Rebased due to changed copyright year in prepunion.c.  Also realized that
>> the new files added by patch 0004 still had 2018 in them.
>
> I've made a pass over 0001. There's probably enough for you to look at
> while I look at 0002 and the others.
>
> 0001
>
> 1. In your doc changes, just below a paragraph that you removed,
> there's a paragraph starting "Both of these behaviors are likely to be
> changed in a future release". This needs to be fixed since you've
> removed the first of the two reasons.
OK, I've fixed the sentence and moved it to the previous paragraph.

> 2. This part should be left alone.
>
> -    technique similar to partition pruning.  While it is primarily used
> -    for partitioning implemented using the legacy inheritance method, it can be
> -    used for other purposes, including with declarative partitioning.
> +    technique similar to partition pruning.  It is primarily used
> +    for partitioning implemented using the legacy inheritance method.
>
> Looking at set_inherit_target_rel_sizes(), constraint exclusion still
> is applied to partitions, it's just applied after pruning, according
> to:
OK, I've restored the sentence.

> 3. add_child_rel_equivalences().  You're replacing parent EMs with
> their child equivalent, but only when the eclass has no volatile
> functions. Is this really what you want? I think this would misbehave
> if we ever allowed: UPDATE ... SET .. ORDER BY, of which there's a
> legitimate use case of wanting to reduce the chances of deadlocks
> caused by non-deterministic UPDATE order.  Or if you think skipping
> the volatile eclasses is fine today, then I think the comment you've
> added to add_child_rel_equivalences should mention that.

To be honest, I hadn't considered this aspect before.

I think it would be OK to create a new copy of the EC even if it's a
volatile one as we're creating an entirely new one for the child's
planning.  Maybe, this will also ensure that someone who will work in the
future on implementing UPDATE SET ORDER BY, they won't have to fiddle with
this code.

> 4. Do you think it's okay that add_child_rel_equivalences() does not
> update the ec_relids when removing the member?

That's an oversight.  Fixed by making add_child_rel_equivalences do this:

+            cur_ec->ec_relids = bms_difference(cur_ec->ec_relids,
+                                               parent_rel->relids);
+            cur_ec->ec_relids = bms_add_members(cur_ec->ec_relids,
+                                                child_rel->relids);

>
> UPDATE: I see you're likely leaving this alone since you're only doing
> a shallow copy of the eclasses in
> adjust_inherited_target_child_root(). It seems like a pretty bad idea
> to do a shallow copy there.

So, you're talking about this code:

    /*
     * Child root should get its own copy of ECs, because they'll be modified
     * to replace parent EC expressions by child expressions in
     * add_child_rel_equivalences.
     */
    subroot->eq_classes = NIL;
    foreach(lc, root->eq_classes)
    {
        EquivalenceClass *ec = lfirst(lc);
        EquivalenceClass *new_ec = makeNode(EquivalenceClass);

        memcpy(new_ec, ec, sizeof(EquivalenceClass));
        new_ec->ec_members = list_copy(ec->ec_members);
        subroot->eq_classes = lappend(subroot->eq_classes, new_ec);
    }

Can you say what you think is wrong with this way of making a copy of the ECs?

>
> 5. What's CE?
>
> + /* CE failed, so finish copying/modifying join quals. */

Constraint exclusion.  It seems I needed to fix comments around here.

> 6. Typo:
>
> + * ass dummy.  We must do this in this phase so that the rel's
>
> ass -> as

Oops!  Fixed.

>
> 7. There's no accumulation going on here:
>
> + /*
> + * Accumulate size information from each live child.
> + */
> + Assert(childrel->rows > 0);

Removed the comment.

>
> 8. Any point in this? We're about to loop again anyway...
>
> + /*
> + * If child is dummy, ignore it.
> + */
> + if (IS_DUMMY_REL(childrel))
> + continue;
> + }

Removed this code.

> 9. It's a bit confusing to mention SELECT in this comment. The Assert
> ensures it's an UPDATE or DELETE.
>
> + /*
> + * For UPDATE/DELETE queries, the top parent can only ever be a table.
> + * As a contrast, it could be a UNION ALL subquery in the case of SELECT.
> + */
> + Assert(root->parse->commandType == CMD_UPDATE ||
> +    root->parse->commandType == CMD_DELETE);

I guess we don't need the 2nd sentence.  Removed.

> 10. I'd say the subroot assignment can go after the IS_DUMMY_REL
> check. Keeping that loop as tight as possible for pruned rels seems
> like a good idea.
>
> + subroot = root->inh_target_child_roots[appinfo->child_relid];
> + Assert(subroot->parse->resultRelation > 0);
> + childrel = find_base_rel(root, appinfo->child_relid);
> +
> + /* Ignore excluded/pruned children. */
> + if (IS_DUMMY_REL(childrel))
> + continue;
Agreed, done.

> 11. I don't think you should reuse the childrel variable here:
>
> + childrel->reloptkind = RELOPT_BASEREL;
> +
> + Assert(subroot->join_rel_list == NIL);
> + Assert(subroot->join_rel_hash == NULL);
> +
> + /* Perform join planning and save the resulting RelOptInfo. */
> + childrel = make_rel_from_joinlist(subroot, translated_joinlist);
> +
> + /*
> + * Remember this child target rel.  inheritance_planner will perform
> + * the remaining steps of planning for each child relation separately.
> + * Specifically, it will call grouping_planner on every
> + * RelOptInfo contained in the inh_target_child_rels list, each of
> + * which represents the source of tuples to be modified for a given
> + * target child rel.
> + */
> + root->inh_target_child_joinrels =
> + lappend(root->inh_target_child_joinrels, childrel);
OK, I've added a new variable called childjoinrel.

> 12. The following comment makes less sense now that you've modified
> the previous paragraph:
>
> + * Fortunately, the UPDATE/DELETE target can never be the nullable side of an
> + * outer join, so it's OK to generate the plan this way.
>
> This text used to refer to:
>
> but target inheritance has to be expanded at
>  * the top.  The reason is that for UPDATE, each target relation needs a
>  * different targetlist matching its own column set.  Fortunately,
>  * the UPDATE/DELETE target can never be the nullable side of an outer join,
>  * so it's OK to generate the plan this way.
>
> you no longer describe plan as being expanded from the top rather than
> at the bottom, which IMO is what "this way" refers to.
To be honest, I never quite understood what that line really means, which
is why I kept it around.  What I do know though is that, even with the
patched, we still generate subpaths for each child whose targetlist
patches the child, so I think the part about "this way" that prompted
someone to write that line still remains.  Does that make sense to you?

> 13. "tree is" -> "tree are" (references is plural)
>
> + * references in the join tree to the original target relation that's the
> + * root parent of the inheritance tree is replaced by each of its

Fixed.

> 14. Any reason to move this line from its original location?
>
>   Assert(parse->commandType != CMD_INSERT);
> + parent_rte = planner_rt_fetch(top_parentRTindex, root);
>
> Previously it was assigned just before it was needed and there's a
> fast path out after where you moved it to and where it was.

Next patch in the series needs it moved, but no reason for this patch to
move it.  Put it back where it was.

> 15. relation_excluded_by_constraints(), the switch
> (constraint_exclusion), you could consider turning that into
>
> if (constraint_exclusion == CONSTRAINT_EXCLUSION_OFF)
>     return false;
> /*
>  * When constraint_exclusion is set to 'partition' we only handle
>  * OTHER_MEMBER_RELs.
>  */
> else if (constraint_exclusion == CONSTRAINT_EXCLUSION_PARTITION &&
>          rel->reloptkind != RELOPT_OTHER_MEMBER_REL)
>     return false;
>
> When I wrote that code I was trying my best to make the complex rules
> as simple as possible by separating them out.  The rules have become
> quite simple after your change, so it probably does not warrant having
> the switch.
OK, done.

> 16. I think the following comment needs to explain how large this
> array is and what indexes it.  The current comment would have you
> think there are only enough elements to store PlannerInfos for child
> rels and leaves you guessing about what they're indexed by.
>
> + /*
> + * PlannerInfos corresponding to each inheritance child targets.
> + * Content of each PlannerInfo is same as the parent PlannerInfo, except
> + * for the parse tree which is a translated copy of the parent's parse
> + * tree.
> + */
> + struct PlannerInfo **inh_target_child_roots;
I've added those details to the comment.

>
> 17. I'm getting an assert failure in add_paths_to_append_rel() for
> Assert(parallel_workers > 0) during the partition_join tests.

I guess that was due to not using the correct root in
inherit_target_rel_pathlists.  Fixed.

On 2019/01/10 18:12, David Rowley wrote:>

> I'd been looking at v11's 0002 and started on 0003 too. I'll include
> my notes so far if you're about to send a v13.
>
>
> v11 0002
>
> 18. There's a missing case in the following code. I understand that
> makeNode() will 0 the member here, but that does not stop the various
> other initialisations that set the default value for the field.  Below
> there's a missing case where parent != NULL && parent->rtekind !=
> RTE_RELATION. You might be better just always zeroing the field below
> "rel->partitioned_child_rels = NIL;"
>
> +
> + /*
> + * For inheritance child relations, we also set inh_root_parent.
> + * Note that 'parent' might itself be a child (a sub-partitioned
> + * partition), in which case we simply use its value of
> + * inh_root_parent.
> + */
> + if (parent->rtekind == RTE_RELATION)
> + rel->inh_root_parent = parent->inh_root_parent > 0 ?
> + parent->inh_root_parent :
> + parent->relid;
>   }
>   else
> + {
>   rel->top_parent_relids = NULL;
> + rel->inh_root_parent = 0;
> + }
Okay, done.  Actually, also did the same for top_parent_relids.

> 19. Seems strange to have a patch that adds a new field that is
> unused. I also don't quite understand yet why top_parent_relids can't
> be used instead. I think I recall being confused about that before, so
> maybe it's worth writing a comment to mention why it cannot be used.

See if this updated comment makes it any clearer:

    /*
     * For inheritance children, this is the RT index of inheritance table
     * mentioned in the query from which this relation originated.
     * top_parent_relids cannot be used for this, because if the inheritance
     * root table is itself under UNION ALL, top_parent_relids contains the
     * RT index of UNION ALL parent subquery.
     */

This is its own patch, because it was thought it could be useful to
another thread which has since stalled:

https://www.postgresql.org/message-id/be766794-eb16-b798-52ec-1f786b26b61b%40lab.ntt.co.jp

> v11 0003
>
> 20. This code looks wrong:
>
> + /*
> + * expand_inherited_tables may have proved that the relation is empty, so
> + * check if it's so.
> + */
> + else if (rte->inh && !IS_DUMMY_REL(rel))
>
>
> Likely you'll want:
>
> else if rte->inh)
> {
> if (IS_DUMMY_REL(rel))
> return;
> // other stuff
> }
>
> otherwise, you'll end up in the else clause when you shouldn't be.
OK, done that way.

> 21. is -> was
>
> + * The child rel's RelOptInfo is created during
> + * expand_inherited_tables().
>   */
>   childrel = find_base_rel(root, childRTindex);
>
> since you're talking about something that already happened.

OK, fixed.

On 2019/01/11 13:01, David Rowley wrote:>

> A few more comments based on reading v12.
>
> v12 0002:
>
> 1. Missing braces around the else clause. (Should be consistent with
> the "if" above)
>
> + if (!has_live_children)
> + {
> + /*
> + * All children were excluded by constraints, so mark the relation
> + * ass dummy.  We must do this in this phase so that the rel's
> + * dummy-ness is visible when we generate paths for other rels.
> + */
> + set_dummy_rel_pathlist(rel);
> + }
> + else
> + /*
> + * Set a non-zero value here to cope with the caller's requirement
> + * that non-dummy relations are actually not empty.  We don't try to
> + * be accurate here, because we're not going to create a path that
> + * combines the children outputs.
> + */
> + rel->rows = 1;
Agreed, done.

> v12 0004:
>
> 2. I wonder if there's a better way, instead of doing this:
>
> + if (child_rel1 == NULL)
> + child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts);
> + if (child_rel2 == NULL)
> + child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts);
>
> maybe add some logic in populate_joinrel_with_paths() to allow NULL
> rels to mean dummy rels.  There's a bit of a problem there as the
> joinrel has already been created by that time, but perhaps
> populate_joinrel_with_paths is a better place to decide if the dummy
> rel needs to be built or not.
Hmm, I'd thought about that, but concluded that I shouldn't mix that work
with this refactoring project.  We can try to hack the join planning code
later, but until then build_dummy_partition_rel can keep things working.
Once we have a working solution that works without having to create this
dummy RelOptInfo, we can remove build_dummy_partition_rel.

> 3. I wonder if there's a better way to handle what
> build_dummy_partition_rel() does. I see the child's relid to the
> parent's relid and makes up a fake AppendRelInfo and puts it in the
> parent's slot.  What's going to happen when the parent is not the
> top-level parent? It'll already have a AppendRelInfo set.

Yeah, the parent's AppendRelInfo would already be present and it won't be
replaced:

    if (root->append_rel_array[parent->relid] == NULL)
    {
        AppendRelInfo *appinfo = make_append_rel_info(parent, parentrte,
                                                      parent->tupdesc,
                                                      parentrte->relid,
                                                      parent->reltype,
                                                      parent->relid);

        root->append_rel_array[parent->relid] = appinfo;
    }

Now you'll say why the discrepancy? For the top-level parent's dummy
children, appinfo generated is actually no-op, because it's generated
using the above code.  But for an intermediate parent's dummy parent's
there already exists a "real" appinfo.  It doesn't make much difference as
far as I can tell, because the appinfo is not used for anything significant.

> I had thought something like the following could break this, but of
> course, it does not since we build the dummy rel for the pruned
> sub_parent2, so we don't hit the NULL relation case when doing the
> next level. i.e we only make dummies for the top-level, never dummys
> of joinrels.
>
> Does that not mean that the if (parent->top_parent_relids) will always
> be false in build_dummy_partition_rel() and it'll only ever get
> rtekind == RTE_RELATION?

Well, parentrte->rtekind should always be RTE_RELATION in
build_dummy_partition_rel, because partitionwise join is considered only
for partitioned tables and joinrels resulting from joining partitioned tables.

parent->top_parent_relids might be non-NULL if it's an intermediate parent.

> 4. How are dummy rels handled in grouping_planner()?
>
> I see you have this:
>
> - if (IS_DUMMY_REL(planned_rel))
> + if (!parent_rte->inh || IS_DUMMY_REL(planned_rel))
>   {
>   grouping_planner(root, false, planned_rel, 0.0);
>   return;
>
> With the above example I tried to see how it was handled by doing:
>
> postgres=# update parent set c = c where a = 333;
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
>
> I didn't look into what's causing the crash.
I tried your example, but it didn't crash for me:

explain update parent set c = c where a = 333;
                     QUERY PLAN
────────────────────────────────────────────────────
 Update on parent  (cost=0.00..0.00 rows=0 width=0)
   ->  Result  (cost=0.00..0.00 rows=0 width=54)
         One-Time Filter: false
(3 rows)


> 5. Wondering why you removed the if (childOID != parentOID) from:
>
> - if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
> - {
> - heap_close(newrelation, lockmode);
> - continue;
> - }
>
> Isn't that releasing the only lock we hold on the rel defined in the query?

I think I mistakenly removed it.  Added it back.

> 6. expand_planner_arrays() zeros a portion of the append_rel_array
> even if it just palloc0'd the array. While it's not a bug, it is
> repeat work.  It should be okay to move the Memset() up to the
> repalloc().

Done.  Also moved other MemSets to their respective repallocs.

> 7. I see get_relation_info() grew an extra parameter.  Would it now be
> better just to pass rte instead of doing;
>
> get_relation_info(root, rte->relid, rte->inh, rte->updatedCols,
>   rel);
>

OK, done.

On 2019/01/10 15:37, Imai, Yoshikazu wrote:>

> I also have some comments on 0001, set_inherit_target_rel_sizes().
>
> In set_inherit_target_rel_sizes():
>
> Some codes are placed not the same order as set_append_rel_size().
>
> 0001: at line 325-326,
> + ListCell   *l;
> + bool has_live_children;
>
> In set_append_rel_size(), "has_live_children" is above of the "ListCell *l";
OK, moved to look like set_append_rel_size.

> 0001: at line 582-603
> + if (IS_DUMMY_REL(childrel))
> + continue;
> +
> ...
> + Assert(childrel->rows > 0);
> +
> + /* We have at least one live child. */
> + has_live_children = true;
>
> In set_append_rel_size(),
> + /* We have at least one live child. */
> + has_live_children = true;
> is directly under of
> + if (IS_DUMMY_REL(childrel))
> + continue;
OK, done.

> 0001: at line 606-622,
> + if (!has_live_children)
> + {
> + /*
> + * All children were excluded by constraints, so mark the relation
> + * ass dummy.  We must do this in this phase so that the rel's
> + * dummy-ness is visible when we generate paths for other rels.
> + */
> + set_dummy_rel_pathlist(rel);
> + }
> + else
> + /*
> + * Set a non-zero value here to cope with the caller's requirement
> + * that non-dummy relations are actually not empty.  We don't try to
> + * be accurate here, because we're not going to create a path that
> + * combines the children outputs.
> + */
> + rel->rows = 1;
>
> In set_append_rel_size(), a condition of if clause is not
> !has_live_children but
> has_live_children.
>
> I also noticed there isn't else block while there is if block.
Things that set_inherit_target_rel_sizes and set_append_rel_size need to
do, respectively, are different, so the code looks different.  Anyway,
I've switched the above if else blocks' code so that it looks like this:

if (has_live_children)
    rel->rows = 1;
else
    set_dummy_rel_pathlist(rel);

Here are the updated patches.  I've also attached the patch I posted
earlier today here as 0001-Some-fixups-for-b60c397599-v2.patch.

Thanks you again.

Regards,
Amit

0001-Some-fixups-for-b60c397599-v2.patch (15K) Download Attachment
v13-0002-Overhaul-inheritance-update-delete-planning.patch (99K) Download Attachment
v13-0003-Store-inheritance-root-parent-index-in-otherrel-.patch (4K) Download Attachment
v13-0004-Lazy-creation-of-RTEs-for-inheritance-children.patch (128K) Download Attachment
v13-0005-Teach-planner-to-only-process-unpruned-partition.patch (9K) Download Attachment
v13-0006-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
On Sat, 12 Jan 2019 at 02:00, Amit Langote
<[hidden email]> wrote:

> That's an oversight.  Fixed by making add_child_rel_equivalences do this:
>
> +            cur_ec->ec_relids = bms_difference(cur_ec->ec_relids,
> +                                               parent_rel->relids);
> +            cur_ec->ec_relids = bms_add_members(cur_ec->ec_relids,
> +                                                child_rel->relids);
>
> >
> > UPDATE: I see you're likely leaving this alone since you're only doing
> > a shallow copy of the eclasses in
> > adjust_inherited_target_child_root(). It seems like a pretty bad idea
> > to do a shallow copy there.
>
> So, you're talking about this code:
>
>     /*
>      * Child root should get its own copy of ECs, because they'll be modified
>      * to replace parent EC expressions by child expressions in
>      * add_child_rel_equivalences.
>      */
>     subroot->eq_classes = NIL;
>     foreach(lc, root->eq_classes)
>     {
>         EquivalenceClass *ec = lfirst(lc);
>         EquivalenceClass *new_ec = makeNode(EquivalenceClass);
>
>         memcpy(new_ec, ec, sizeof(EquivalenceClass));
>         new_ec->ec_members = list_copy(ec->ec_members);
>         subroot->eq_classes = lappend(subroot->eq_classes, new_ec);
>     }
>
> Can you say what you think is wrong with this way of making a copy of the ECs?

If you shallow copy with memcpy(new_ec, ec,
sizeof(EquivalenceClass));, then fields such as ec_relids will just
point to the same memory as the parent PlannerInfo's
EquivalenceClasses, so when you do your adjustment (as above) on the
child eclasses, you'll modify memory belonging to the parent. I'd
assume you'd not want to do that since you need to keep the parent
intact at least to make copies for other child rels.  You'd have
gotten away with it before since you performed a list_copy() and only
were deleting the parent's EMs with list_delete_cell() which was
modifying the copy, not the original list. Since you were missing the
alteration to ec_relids, then you might not have seen issues with the
shallow copy, but now that you are changing that field, are you not
modifying the ec_relids field that still set in the parent's
PlannerInfo?  In this instance, you've sidestepped that issue by using
bms_difference() which creates a copy of the Bitmapset and modifies
that. I think you'd probably see issues if you tried to use
bms_del_members().  I think not doing the deep copy is going to give
other people making changes in this are a hard time in the future.

> > 12. The following comment makes less sense now that you've modified
> > the previous paragraph:
> >
> > + * Fortunately, the UPDATE/DELETE target can never be the nullable side of an
> > + * outer join, so it's OK to generate the plan this way.
> >
> > This text used to refer to:
> >
> > but target inheritance has to be expanded at
> >  * the top.  The reason is that for UPDATE, each target relation needs a
> >  * different targetlist matching its own column set.  Fortunately,
> >  * the UPDATE/DELETE target can never be the nullable side of an outer join,
> >  * so it's OK to generate the plan this way.
> >
> > you no longer describe plan as being expanded from the top rather than
> > at the bottom, which IMO is what "this way" refers to.
>
> To be honest, I never quite understood what that line really means, which
> is why I kept it around.  What I do know though is that, even with the
> patched, we still generate subpaths for each child whose targetlist
> patches the child, so I think the part about "this way" that prompted
> someone to write that line still remains.  Does that make sense to you?

Not quite.  I thought that "OK to generate the plan this way" is
talking about expanding the children at the top of the plan, e.g.

explain (costs off) update listp set b = b + 1 from t where listp.a = t.a;
              QUERY PLAN
--------------------------------------
 Update on listp
   Update on listp1
   Update on listp2
   ->  Merge Join
         Merge Cond: (listp1.a = t.a)
         ->  Sort
               Sort Key: listp1.a
               ->  Seq Scan on listp1
         ->  Sort
               Sort Key: t.a
               ->  Seq Scan on t
   ->  Merge Join
         Merge Cond: (listp2.a = t.a)
         ->  Sort
               Sort Key: listp2.a
               ->  Seq Scan on listp2
         ->  Sort
               Sort Key: t.a
               ->  Seq Scan on t

i.e each non-pruned partition gets joined to the other tables
(expanded from the top of the plan tree). The other tables appear once
for each child target relation.

instead of:

postgres=# explain (costs off) select * from listp inner join t on
listp.a = t.a;
              QUERY PLAN
--------------------------------------
 Merge Join
   Merge Cond: (t.a = listp1.a)
   ->  Sort
         Sort Key: t.a
         ->  Seq Scan on t
   ->  Sort
         Sort Key: listp1.a
         ->  Append
               ->  Seq Scan on listp1
               ->  Seq Scan on listp2

where the children are expanded from the bottom (or at the leaf side
of the plan tree), since we always plant our trees up-side-down, in
computer science.

In my view, since you removed the part about where the children are
expanded then it does not quite make sense anymore.

Of course, I might not be reading the comment as it was intended.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

David Rowley-3
On Sat, 12 Jan 2019 at 21:49, David Rowley <[hidden email]> wrote:

> > Can you say what you think is wrong with this way of making a copy of the ECs?
>
> If you shallow copy with memcpy(new_ec, ec,
> sizeof(EquivalenceClass));, then fields such as ec_relids will just
> point to the same memory as the parent PlannerInfo's
> EquivalenceClasses, so when you do your adjustment (as above) on the
> child eclasses, you'll modify memory belonging to the parent. I'd
> assume you'd not want to do that since you need to keep the parent
> intact at least to make copies for other child rels.  You'd have
> gotten away with it before since you performed a list_copy() and only
> were deleting the parent's EMs with list_delete_cell() which was
> modifying the copy, not the original list. Since you were missing the
> alteration to ec_relids, then you might not have seen issues with the
> shallow copy, but now that you are changing that field, are you not
> modifying the ec_relids field that still set in the parent's
> PlannerInfo?  In this instance, you've sidestepped that issue by using
> bms_difference() which creates a copy of the Bitmapset and modifies
> that. I think you'd probably see issues if you tried to use
> bms_del_members().  I think not doing the deep copy is going to give
> other people making changes in this are a hard time in the future.

Setting to waiting on author pending clarification about shallow vs
deep copying of ECs.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by David Rowley-3
Thanks for the comments.

On 2019/01/12 17:49, David Rowley wrote:

> On Sat, 12 Jan 2019 at 02:00, Amit Langote
> <[hidden email]> wrote:
>> That's an oversight.  Fixed by making add_child_rel_equivalences do this:
>>
>> +            cur_ec->ec_relids = bms_difference(cur_ec->ec_relids,
>> +                                               parent_rel->relids);
>> +            cur_ec->ec_relids = bms_add_members(cur_ec->ec_relids,
>> +                                                child_rel->relids);
>>
>>>
>>> UPDATE: I see you're likely leaving this alone since you're only doing
>>> a shallow copy of the eclasses in
>>> adjust_inherited_target_child_root(). It seems like a pretty bad idea
>>> to do a shallow copy there.
>>
>> So, you're talking about this code:
>>
>>     /*
>>      * Child root should get its own copy of ECs, because they'll be modified
>>      * to replace parent EC expressions by child expressions in
>>      * add_child_rel_equivalences.
>>      */
>>     subroot->eq_classes = NIL;
>>     foreach(lc, root->eq_classes)
>>     {
>>         EquivalenceClass *ec = lfirst(lc);
>>         EquivalenceClass *new_ec = makeNode(EquivalenceClass);
>>
>>         memcpy(new_ec, ec, sizeof(EquivalenceClass));
>>         new_ec->ec_members = list_copy(ec->ec_members);
>>         subroot->eq_classes = lappend(subroot->eq_classes, new_ec);
>>     }
>>
>> Can you say what you think is wrong with this way of making a copy of the ECs?
>
> If you shallow copy with memcpy(new_ec, ec,
> sizeof(EquivalenceClass));, then fields such as ec_relids will just
> point to the same memory as the parent PlannerInfo's
> EquivalenceClasses, so when you do your adjustment (as above) on the
> child eclasses, you'll modify memory belonging to the parent. I'd
> assume you'd not want to do that since you need to keep the parent
> intact at least to make copies for other child rels.  You'd have
> gotten away with it before since you performed a list_copy() and only
> were deleting the parent's EMs with list_delete_cell() which was
> modifying the copy, not the original list. Since you were missing the
> alteration to ec_relids, then you might not have seen issues with the
> shallow copy, but now that you are changing that field, are you not
> modifying the ec_relids field that still set in the parent's
> PlannerInfo?
Yep.  add_eq_member, when adding the child member to the child's copy of
EC, will end up modifying ec_relids that it shares with the parent's copy,
because it uses bms_add_member which modifies the input bitmapset in-place.

> In this instance, you've sidestepped that issue by using
> bms_difference() which creates a copy of the Bitmapset and modifies
> that. I think you'd probably see issues if you tried to use
> bms_del_members().  I think not doing the deep copy is going to give
> other people making changes in this are a hard time in the future.

I thought of inventing a _copyEquivalenceClass but noticed this in
_copyPathKey:

    /* EquivalenceClasses are never moved, so just shallow-copy the pointer */
    COPY_SCALAR_FIELD(pk_eclass);

I think that won't be a problem in our case, because this comment seems to
be talking about identical copies of a given EC.  In our case, we're
talking about making copies that are essentially different because of
parent-to-child translation.  However, providing a function in copyfuncs.c
to copy ECs would make it sound like making identical copies of ECs is OK
(which it apparently isn't), so I added the copying code in the place
where the patch wants to use it.  The new code takes care to make copies
of all the fields that might change, not just ec_members.

Also, realizing another problem with where the copying code is placed now
in the patch (adjust_inherited_target_child_root), I have moved it to
adjust_inherit_target_rel_sizes so that translation from parent-to-child
of EC members works correctly.  Concretely, the problem is that a level 2
partition's EC members won't be created with the patch's current way of
initializing eq_classes in the partition subroots -- all initially get the
copy of root parent's EC members, whereas translation can only occur
between a partition and its immediate parent.  Without EC members, a level
2 leaf partition will not be able to merge join.  So, for example:

explain update p set a = foo.a+1 from foo where p.a = foo.a;
                                QUERY PLAN
──────────────────────────────────────────────────────────────────────────
 Update on p  (cost=0.00..98556.15 rows=6535012 width=16)
   Update on p11
   Update on p2
   ->  Nested Loop  (cost=0.00..97614.88 rows=6502500 width=16)
         ->  Seq Scan on p11  (cost=0.00..35.50 rows=2550 width=10)
         ->  Materialize  (cost=0.00..48.25 rows=2550 width=10)
               ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=10)
   ->  Merge Join  (cost=359.57..941.28 rows=32512 width=16)
         Merge Cond: (p2.a = foo.a)
         ->  Sort  (cost=179.78..186.16 rows=2550 width=10)
               Sort Key: p2.a
               ->  Seq Scan on p2  (cost=0.00..35.50 rows=2550 width=10)
         ->  Sort  (cost=179.78..186.16 rows=2550 width=10)
               Sort Key: foo.a
               ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=10)

vs.

explain update p set a = foo.a+1 from foo where p.a = foo.a;
                                QUERY PLAN
──────────────────────────────────────────────────────────────────────────
 Update on p  (cost=359.57..1882.55 rows=65024 width=16)
   Update on p11
   Update on p2
   ->  Merge Join  (cost=359.57..941.28 rows=32512 width=16)
         Merge Cond: (p11.a = foo.a)
         ->  Sort  (cost=179.78..186.16 rows=2550 width=10)
               Sort Key: p11.a
               ->  Seq Scan on p11  (cost=0.00..35.50 rows=2550 width=10)
         ->  Sort  (cost=179.78..186.16 rows=2550 width=10)
               Sort Key: foo.a
               ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=10)
   ->  Merge Join  (cost=359.57..941.28 rows=32512 width=16)
         Merge Cond: (p2.a = foo.a)
         ->  Sort  (cost=179.78..186.16 rows=2550 width=10)
               Sort Key: p2.a
               ->  Seq Scan on p2  (cost=0.00..35.50 rows=2550 width=10)
         ->  Sort  (cost=179.78..186.16 rows=2550 width=10)
               Sort Key: foo.a
               ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=10)
(19 rows)

after fixing.  Note that p11 is a partition of p1 which is partition of p.
 p2 is partition of p.

>>> 12. The following comment makes less sense now that you've modified
>>> the previous paragraph:
>>>
>>> + * Fortunately, the UPDATE/DELETE target can never be the nullable side of an
>>> + * outer join, so it's OK to generate the plan this way.
>>>
>>> This text used to refer to:
>>>
>>> but target inheritance has to be expanded at
>>>  * the top.  The reason is that for UPDATE, each target relation needs a
>>>  * different targetlist matching its own column set.  Fortunately,
>>>  * the UPDATE/DELETE target can never be the nullable side of an outer join,
>>>  * so it's OK to generate the plan this way.
>>>
>>> you no longer describe plan as being expanded from the top rather than
>>> at the bottom, which IMO is what "this way" refers to.
>>
>> To be honest, I never quite understood what that line really means, which
>> is why I kept it around.  What I do know though is that, even with the
>> patched, we still generate subpaths for each child whose targetlist
>> patches the child, so I think the part about "this way" that prompted
>> someone to write that line still remains.  Does that make sense to you?
>
> Not quite.  I thought that "OK to generate the plan this way" is
> talking about expanding the children at the top of the plan, e.g.
>
> explain (costs off) update listp set b = b + 1 from t where listp.a = t.a;
>               QUERY PLAN
> --------------------------------------
>  Update on listp
>    Update on listp1
>    Update on listp2
>    ->  Merge Join
>          Merge Cond: (listp1.a = t.a)
>          ->  Sort
>                Sort Key: listp1.a
>                ->  Seq Scan on listp1
>          ->  Sort
>                Sort Key: t.a
>                ->  Seq Scan on t
>    ->  Merge Join
>          Merge Cond: (listp2.a = t.a)
>          ->  Sort
>                Sort Key: listp2.a
>                ->  Seq Scan on listp2
>          ->  Sort
>                Sort Key: t.a
>                ->  Seq Scan on t
>
> i.e each non-pruned partition gets joined to the other tables
> (expanded from the top of the plan tree). The other tables appear once
> for each child target relation.
>
> instead of:
>
> postgres=# explain (costs off) select * from listp inner join t on
> listp.a = t.a;
>               QUERY PLAN
> --------------------------------------
>  Merge Join
>    Merge Cond: (t.a = listp1.a)
>    ->  Sort
>          Sort Key: t.a
>          ->  Seq Scan on t
>    ->  Sort
>          Sort Key: listp1.a
>          ->  Append
>                ->  Seq Scan on listp1
>                ->  Seq Scan on listp2
>
> where the children are expanded from the bottom (or at the leaf side
> of the plan tree), since we always plant our trees up-side-down, in
> computer science.
>
> In my view, since you removed the part about where the children are
> expanded then it does not quite make sense anymore.
Thanks for explaining.  I think my previous reply was confusing -- I
wanted to say that we *do* still expand the children at the top, in the
sense that the resulting plan shape hasn't changed due to the patch.   I
suspect that it's the resulting plan shape that "generating plan this way"
refers to, which hasn't changed with the patch.  It's true that the patch
optimizes away repeated query_planner calls but only because it makes
other arrangements to produce the same output as before.

Attached updated patches with mainly fixes around EC copying as described
above and other cosmetic fixes like comment updates.  I've also moved
around the code a bit, putting the new functions
add_inherit_target_child_roots, etc. in inherit.c instead of planmain.c.

Thanks,
Amit

0001-Some-fixups-for-b60c397599-v2.patch (15K) Download Attachment
v14-0002-Overhaul-inheritance-update-delete-planning.patch (102K) Download Attachment
v14-0003-Store-inheritance-root-parent-index-in-otherrel-.patch (4K) Download Attachment
v14-0004-Lazy-creation-of-RTEs-for-inheritance-children.patch (130K) Download Attachment
v14-0005-Teach-planner-to-only-process-unpruned-partition.patch (9K) Download Attachment
v14-0006-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Alvaro Herrera-9
In reply to this post by Amit Langote-2
On 2019-Jan-11, Amit Langote wrote:

> Looking around a bit more, I started thinking even build_child_join_sjinfo
> doesn't belong in appendinfo.c (just to be clear, it was defined in
> prepunion.c before this commit), so maybe we should move it to joinrels.c
> and instead export adjust_child_relids that's required by it from
> appendinfo.c.  There's already adjust_child_relids_multilevel in
> appendinfo.h, so having adjust_child_relids next to it isn't too bad.  At
> least not as bad as appendinfo.c exporting build_child_join_sjinfo for
> joinrels.c to use.

OK, pushed, thanks.  I may have caused merge conflicts with the rest of
the series, because I reordered some functions -- sorry about that.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
On 2019/01/17 4:32, Alvaro Herrera wrote:

> On 2019-Jan-11, Amit Langote wrote:
>
>> Looking around a bit more, I started thinking even build_child_join_sjinfo
>> doesn't belong in appendinfo.c (just to be clear, it was defined in
>> prepunion.c before this commit), so maybe we should move it to joinrels.c
>> and instead export adjust_child_relids that's required by it from
>> appendinfo.c.  There's already adjust_child_relids_multilevel in
>> appendinfo.h, so having adjust_child_relids next to it isn't too bad.  At
>> least not as bad as appendinfo.c exporting build_child_join_sjinfo for
>> joinrels.c to use.
>
> OK, pushed, thanks.
Thank you.

> I may have caused merge conflicts with the rest of
> the series, because I reordered some functions -- sorry about that.

No problem.

I have rebased the patches.  Other than rebasing, I've squashed the patch
to add inh_root_parent field to RelOptInfo which contained no other code
changes to use that field into the patch that contains changes to start
using it.  Updated the commit message of lazy-partition-initialization
patch to be accurate (it contained old details that have changed since I
first wrote that commit message).

Thanks,
Amit

v15-0001-Overhaul-inheritance-update-delete-planning.patch (102K) Download Attachment
v15-0002-Lazy-creation-of-RTEs-for-inheritance-children.patch (133K) Download Attachment
v15-0003-Teach-planner-to-only-process-unpruned-partition.patch (9K) Download Attachment
v15-0004-Do-not-lock-all-partitions-at-the-beginning.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: speeding up planning with partitions

Amit Langote-2
In reply to this post by Imai, Yoshikazu
Thank you Imai-san for testing.  Sorry it totally slipped my mind to reply
to this email.

On 2019/01/09 11:08, Imai, Yoshikazu wrote:

> I wonder why force_custom_plan is faster than auto after applied the patch.
>
> When we use PREPARE-EXECUTE, a generic plan is created and used if its cost is
> cheaper than creating and using a custom plan with plan_cache_mode='auto',
> while a custom plan is always created and used with plan_cache_mode='force_custom_plan'.
> So one can think the difference in above results is because of creating or
> using a generic plan.
>
> I checked how many times a generic plan is created during executing pgbench and
> found a generic plan is created only once and custom plans are created at other
> times with plan_cache_mode='auto'. I also checked the time of creating a
> generic plan, but it didn't take so much(250ms or so with 4096 partitions). So
> the time of creating a generic plan does not affect the performance.
>
> Currently, a generic plan is created at sixth time of executing EXECUTE query.
> I changed it to more later (ex. at 400,000th time of executing EXECUTE query on
> master with 4096 partitions, because 7000TPS x 60sec=420,0000 transactions are
> run while executing pgbench.), then there are almost no difference between auto
> and force_custom_plan. I think that creation of a generic plan affects the time
> of executing queries which are ordered after creating generic plan.
>
> If my assumption is right, we can expect some additional process is occurred at
> executing queries ordered after creating a generic plan, which results in auto is
> slower than force_custom_plan because of additional process. But looking at
> above results, on master with 4096 partitions, auto is faster than force_custom_plan.
> So now I am confused.
>
> Do you have any ideas what does affect the performance?

Are you saying that, when using auto mode, all executions of the query
starting from 7th are slower than the first 5 executions?  That is, the
latency of creating and using a custom plan increases *after* a generic
plan is created and discarded on the 6th execution of the query?  If so,
that is inexplicable to me.

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

RE: speeding up planning with partitions

Tsunakawa, Takayuki
Imai-san,

From: Amit Langote [mailto:[hidden email]]

> On 2019/01/09 11:08, Imai, Yoshikazu wrote:
> > I wonder why force_custom_plan is faster than auto after applied the patch.
> >
> > When we use PREPARE-EXECUTE, a generic plan is created and used if its
> cost is
> > cheaper than creating and using a custom plan with plan_cache_mode='auto',
> > while a custom plan is always created and used with
> plan_cache_mode='force_custom_plan'.
> > So one can think the difference in above results is because of creating
> or
> > using a generic plan.
> >
> > I checked how many times a generic plan is created during executing pgbench
> and
> > found a generic plan is created only once and custom plans are created
> at other
> > times with plan_cache_mode='auto'. I also checked the time of creating
> a
> > generic plan, but it didn't take so much(250ms or so with 4096 partitions).
> So
> > the time of creating a generic plan does not affect the performance.
> >
> > Currently, a generic plan is created at sixth time of executing EXECUTE
> query.
> > I changed it to more later (ex. at 400,000th time of executing EXECUTE
> query on
> > master with 4096 partitions, because 7000TPS x 60sec=420,0000
> transactions are
> > run while executing pgbench.), then there are almost no difference between
> auto
> > and force_custom_plan. I think that creation of a generic plan affects
> the time
> > of executing queries which are ordered after creating generic plan.
> >
> > If my assumption is right, we can expect some additional process is
> occurred at
> > executing queries ordered after creating a generic plan, which results
> in auto is
> > slower than force_custom_plan because of additional process. But looking
> at
> > above results, on master with 4096 partitions, auto is faster than
> force_custom_plan.
> > So now I am confused.
> >
> > Do you have any ideas what does affect the performance?
>
> Are you saying that, when using auto mode, all executions of the query
> starting from 7th are slower than the first 5 executions?  That is, the
> latency of creating and using a custom plan increases *after* a generic
> plan is created and discarded on the 6th execution of the query?  If so,
> that is inexplicable to me.

Isn't CheckCachedPlan() (and AcquireExecutorLocks() therein) called in every EXECUTE after 6th one due to some unknow issue?
Does choose_custom_plan() always return true after 6th EXECUTE?

Regards
Takayuki Tsunakawa


1234567 ... 11