ON CONFLICT DO UPDATE for partitioned tables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
53 messages Options
123
Reply | Threaded
Open this post in threaded view
|

ON CONFLICT DO UPDATE for partitioned tables

Alvaro Herrera-9
I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
Following the lead of edd44738bc88 ("Be lazier about partition tuple
routing.") this incarnation only does the necessary push-ups for the
specific partition that needs it, at execution time.  As far as I can
tell, it works as intended.

I chose to refuse the case where the DO UPDATE clause causes the tuple
to move to another partition (i.e. you're updating the partition key of
the tuple).  While it's probably possible to implement that, it doesn't
seem a very productive use of time.

However, there is a shortcoming in the design: it fails if there are
multiple levels of partitioning, because there is no easy (efficient)
way to map the index OID more than one level.  I had already mentioned
this shortcoming to Amit's patch.  So this case (which I added in the
regression tests) fails unexpectedly:

-- multiple-layered partitioning
create table parted_conflict_test (a int primary key, b text) partition by range (a);
create table parted_conflict_test_1 partition of parted_conflict_test
  for values from (0) to (10000) partition by range (a);
create table parted_conflict_test_1_1 partition of parted_conflict_test_1
  for values from (0) to (100);
insert into parted_conflict_test values ('10', 'ten');
insert into parted_conflict_test values ('10', 'ten two')
  on conflict (a) do update set b = excluded.b;
ERROR:  invalid ON CONFLICT DO UPDATE specification
DETAIL:  An inferred index was not found in partition "parted_conflict_test_1_1".

So the problem here is that my MapPartitionIndexList() implementation is
too stupid.  I think it would be smarter to use the ResultRelInfo
instead of bare Relation, for one.  But that still doesn't answer how to
find a "path" from root to leaf partition, which is what I'd need to
verify that there are valid pg_inherits relationships for the partition
indexes.  I'm probably missing something about the partitionDesc or
maybe the partitioned_rels lists that helps me do it efficiently, but I
hope figure out soon.

One idea I was toying with is to add RelationData->rd_children as a list
of OIDs of children relations.  So it should be possible to walk the
list from the root to the desired descendant, without having to scan
pg_inherits repeatedly ... although that would probably require doing
relation_open() for every index, which sounds undesirable.

(ISTM that having RelationData->rd_children might be a good idea in
general anyway -- I mean to speed up some code that currently scans
pg_inherits via find_inheritance_children.  However, since the partition
descriptor itself is already in relcache, maybe this doesn't matter too
much.)

Another idea is to abandon the notion that we need to find a path from
parent index to descendant index, and just run the inference algorithm
again on the partition.  I'm not sure how much I like this idea, yet.

Anyway, while this is still WIP, I think it works correctly for the case
where there is a single partition level.


[1] https://postgr.es/m/c1651d5b-7bd6-b7e7-e1cc-16ecfe2c0da5@...

--
Álvaro Herrer

v1-0001-fix-ON-CONFLICT-DO-UPDATE-for-partitioned-tables.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote-2
On 2018/02/28 9:46, Alvaro Herrera wrote:
> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
> Following the lead of edd44738bc88 ("Be lazier about partition tuple
> routing.") this incarnation only does the necessary push-ups for the
> specific partition that needs it, at execution time.  As far as I can
> tell, it works as intended.

Thanks.  I too have been meaning to send an updated (nay, significantly
rewritten) version of the patch, but you beat me to it.

> I chose to refuse the case where the DO UPDATE clause causes the tuple
> to move to another partition (i.e. you're updating the partition key of
> the tuple).  While it's probably possible to implement that, it doesn't
> seem a very productive use of time.

Probably a good idea to save it for later.

> However, there is a shortcoming in the design: it fails if there are
> multiple levels of partitioning, because there is no easy (efficient)
> way to map the index OID more than one level.  I had already mentioned
> this shortcoming to Amit's patch.  So this case (which I added in the
> regression tests) fails unexpectedly:
>
> -- multiple-layered partitioning
> create table parted_conflict_test (a int primary key, b text) partition by range (a);
> create table parted_conflict_test_1 partition of parted_conflict_test
>   for values from (0) to (10000) partition by range (a);
> create table parted_conflict_test_1_1 partition of parted_conflict_test_1
>   for values from (0) to (100);
> insert into parted_conflict_test values ('10', 'ten');
> insert into parted_conflict_test values ('10', 'ten two')
>   on conflict (a) do update set b = excluded.b;
> ERROR:  invalid ON CONFLICT DO UPDATE specification
> DETAIL:  An inferred index was not found in partition "parted_conflict_test_1_1".
>
> So the problem here is that my MapPartitionIndexList() implementation is
> too stupid.  I think it would be smarter to use the ResultRelInfo
> instead of bare Relation, for one.  But that still doesn't answer how to
> find a "path" from root to leaf partition, which is what I'd need to
> verify that there are valid pg_inherits relationships for the partition
> indexes.  I'm probably missing something about the partitionDesc or
> maybe the partitioned_rels lists that helps me do it efficiently, but I
> hope figure out soon.
>
> One idea I was toying with is to add RelationData->rd_children as a list
> of OIDs of children relations.  So it should be possible to walk the
> list from the root to the desired descendant, without having to scan
> pg_inherits repeatedly ... although that would probably require doing
> relation_open() for every index, which sounds undesirable.
>
> (ISTM that having RelationData->rd_children might be a good idea in
> general anyway -- I mean to speed up some code that currently scans
> pg_inherits via find_inheritance_children.  However, since the partition
> descriptor itself is already in relcache, maybe this doesn't matter too
> much.)
>
> Another idea is to abandon the notion that we need to find a path from
> parent index to descendant index, and just run the inference algorithm
> again on the partition.  I'm not sure how much I like this idea, yet.
>
> Anyway, while this is still WIP, I think it works correctly for the case
> where there is a single partition level.
Actually, after your comment on my original patch [1], I did make it work
for multiple levels by teaching the partition initialization code to find
a given partition's indexes that are inherited from the root table (that
is the table mentioned in command).  So, after a tuple is routed to a
partition, we switch from the original arbiterIndexes list to the one we
created for the partition, which must contain OIDs corresponding to those
in the original list.  After all, for each of the parent's indexes that
the planner put into the original arbiterIndexes list, there must exist an
index in each of the leaf partitions.

I had also observed when working on the patch that various TupleTableSlots
used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
inheritance-translated target list (DO UPDATE SET target list).  In fact,
that has to take into account also the dropped columns; we may have
dropped columns either in parent or in a partition or in both at same or
different attnum positions.  That means, simple map_partition_varattnos()
translation doesn't help in this case.

For example, with your patch (sorry, I know you said it's a WIP patch), I
see either a crash or errors when dealing with such differing attribute
numbers:

drop table p;
create table p (a int, b text) partition by list (a);
create table p12 (b text, a int);
alter table p attach partition p12 for values in (1, 2);
alter table p drop b, add b text;
create table p4 partition of p for values in (4);
create unique index on p (a);

insert into p values (1, 'a') on conflict (a) do update set b = excluded.b;

insert into p values (1, 'b') on conflict (a) do update set b = excluded.b;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

insert into p values (4, 'a') on conflict (a) do update set b = excluded.b;

postgres=# insert into p values (4, 'b') on conflict (a) do update set b =
excluded.b;
ERROR:  attribute number 3 exceeds number of columns 2

I attach my patch here for your reference, which I polished this morning
after seeing your email and the patch.  It works for most of the cases, as
you can see in the updated tests in insert_conflict.sql.  Since I agree
with you that we should, for now, error out if DO UPDATE causes row
movement, I adopted the code from your patch for that.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/20171227225031.osh6vunpuhsath25%40alvherre.pgsql

v1-0001-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Robert Haas
In reply to this post by Alvaro Herrera-9
On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
<[hidden email]> wrote:

> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
> Following the lead of edd44738bc88 ("Be lazier about partition tuple
> routing.") this incarnation only does the necessary push-ups for the
> specific partition that needs it, at execution time.  As far as I can
> tell, it works as intended.
>
> I chose to refuse the case where the DO UPDATE clause causes the tuple
> to move to another partition (i.e. you're updating the partition key of
> the tuple).  While it's probably possible to implement that, it doesn't
> seem a very productive use of time.

I would have thought that to be the only case we could support with
the current infrastructure.  Doesn't a correct implementation for any
other case require a global index?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote-2
On 2018/03/01 1:03, Robert Haas wrote:

> On Tue, Feb 27, 2018 at 7:46 PM, Alvaro Herrera
> <[hidden email]> wrote:
>> I updated Amit Langote's patch for INSERT ON CONFLICT DO UPDATE[1].
>> Following the lead of edd44738bc88 ("Be lazier about partition tuple
>> routing.") this incarnation only does the necessary push-ups for the
>> specific partition that needs it, at execution time.  As far as I can
>> tell, it works as intended.
>>
>> I chose to refuse the case where the DO UPDATE clause causes the tuple
>> to move to another partition (i.e. you're updating the partition key of
>> the tuple).  While it's probably possible to implement that, it doesn't
>> seem a very productive use of time.
>
> I would have thought that to be the only case we could support with
> the current infrastructure.  Doesn't a correct implementation for any
> other case require a global index?

I'm thinking that Alvaro is talking here about the DO UPDATE action part,
not the conflict checking part.  The latter will definitely require global
indexes if conflict were to be checked on columns not containing the
partition key.

The case Alvaro mentions arises after checking the conflict, presumably
using an inherited unique index whose keys must include the partition
keys. If the conflict action is DO UPDATE and its SET clause changes
partition key columns such that the row will have to change the partition,
then the current patch will result in an error.  I think that's because
making update row movement work in this case will require some adjustments
to what 2f178441044 (Allow UPDATE to move rows between partitions)
implemented.  We wouldn't have things set up in the ModifyTableState that
the current row-movement code depends on being set up; for example, there
wouldn't be per-subplan ResultRelInfo's in the ON CONFLICT DO UPDATE case.
 The following Assert in ExecUpdate() will fail for instance:

            map_index = resultRelInfo - mtstate->resultRelInfo;
            Assert(map_index >= 0 && map_index < mtstate->mt_nplans);

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Alvaro Herrera-9
In reply to this post by Amit Langote-2
Amit Langote wrote:

> Actually, after your comment on my original patch [1], I did make it work
> for multiple levels by teaching the partition initialization code to find
> a given partition's indexes that are inherited from the root table (that
> is the table mentioned in command).  So, after a tuple is routed to a
> partition, we switch from the original arbiterIndexes list to the one we
> created for the partition, which must contain OIDs corresponding to those
> in the original list.  After all, for each of the parent's indexes that
> the planner put into the original arbiterIndexes list, there must exist an
> index in each of the leaf partitions.

Oh, your solution for this seems simple enough.  Silly me, I was trying
to implement it in a quite roundabout way.  Thanks.  (I do wonder if we
should save the "root" reloid in the relcache).

> I had also observed when working on the patch that various TupleTableSlots
> used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
> inheritance-translated target list (DO UPDATE SET target list).  In fact,
> that has to take into account also the dropped columns; we may have
> dropped columns either in parent or in a partition or in both at same or
> different attnum positions.  That means, simple map_partition_varattnos()
> translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet.  Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy.  These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful.  I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c.  In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own.  (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery.  For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it.  And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about.  I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result?  I'm obviously confused about what
expand_targetlist call this comment is talking about.  Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday.  I'll get back to this
soon, but in the meantime, here's what I have.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Alvaro Herrera-9
In reply to this post by Amit Langote-2
Amit Langote wrote:

> Actually, after your comment on my original patch [1], I did make it work
> for multiple levels by teaching the partition initialization code to find
> a given partition's indexes that are inherited from the root table (that
> is the table mentioned in command).  So, after a tuple is routed to a
> partition, we switch from the original arbiterIndexes list to the one we
> created for the partition, which must contain OIDs corresponding to those
> in the original list.  After all, for each of the parent's indexes that
> the planner put into the original arbiterIndexes list, there must exist an
> index in each of the leaf partitions.

Oh, your solution for this seems simple enough.  Silly me, I was trying
to implement it in a quite roundabout way.  Thanks.  (I do wonder if we
should save the "root" reloid in the relcache).

> I had also observed when working on the patch that various TupleTableSlots
> used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
> inheritance-translated target list (DO UPDATE SET target list).  In fact,
> that has to take into account also the dropped columns; we may have
> dropped columns either in parent or in a partition or in both at same or
> different attnum positions.  That means, simple map_partition_varattnos()
> translation doesn't help in this case.

Yeah, I was aware these corner cases could become a problem though I
hadn't gotten around to testing them yet.  Thanks for all your work on
this.

The usage of the few optimizer/prep/ functions that are currently static
doesn't fill me with joy.  These functions have weird APIs because
they're static so we don't rally care, but once we export them we should
strive to be more careful.  I'd rather stay away from just exporting
them all, so I chose to encapsulate those calls in a single function and
export only expand_targetlist from preptlist.c, keeping the others
static in prepunion.c.  In the attached patch set, I put an API change
(work on tupdescs rather than full-blown relations) for a couple of
those functions as 0001, then your patch as 0002, then a few fixups of
my own.  (0002 is not bit-by-bit identical to yours; I think I had to
fix some merge conflict with 0001, but should be pretty much the same).

But looking further, I think there is much cruft that has accumulated in
those functions (because they've gotten simplified over time), and we
could do some additional cleanup surgery.  For example, there is no
reason to pass a list pointer to make_inh_translation_list(); we could
just return it.  And then we don't have to cons up a fake AppendRelInfo
with all dummy values that adjust_inherited_tlist doesn't even care
about.  I think there was a point for all these contortions back at some
point (visible by checking git history of this code), but it all seems
useless now.

Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result?  I'm obviously confused about what
expand_targetlist call this comment is talking about.  Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday.  I'll get back to this
soon, but in the meantime, here's what I have.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v3-0001-Make-some-static-functions-work-on-TupleDesc-rath.patch (6K) Download Attachment
v3-0002-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patch (31K) Download Attachment
v3-0003-fixups.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote-2
On 2018/03/03 0:36, Alvaro Herrera wrote:

> Amit Langote wrote:
>
>> Actually, after your comment on my original patch [1], I did make it work
>> for multiple levels by teaching the partition initialization code to find
>> a given partition's indexes that are inherited from the root table (that
>> is the table mentioned in command).  So, after a tuple is routed to a
>> partition, we switch from the original arbiterIndexes list to the one we
>> created for the partition, which must contain OIDs corresponding to those
>> in the original list.  After all, for each of the parent's indexes that
>> the planner put into the original arbiterIndexes list, there must exist an
>> index in each of the leaf partitions.
>
> Oh, your solution for this seems simple enough.  Silly me, I was trying
> to implement it in a quite roundabout way.  Thanks.  (I do wonder if we
> should save the "root" reloid in the relcache).

Do you mean store the root reloid for "any" partition (table or index)?

>> I had also observed when working on the patch that various TupleTableSlots
>> used by the ON CONFLICT DO UPDATE code must be based on TupleDesc of the
>> inheritance-translated target list (DO UPDATE SET target list).  In fact,
>> that has to take into account also the dropped columns; we may have
>> dropped columns either in parent or in a partition or in both at same or
>> different attnum positions.  That means, simple map_partition_varattnos()
>> translation doesn't help in this case.
>
> Yeah, I was aware these corner cases could become a problem though I
> hadn't gotten around to testing them yet.  Thanks for all your work on
> this.
>
> The usage of the few optimizer/prep/ functions that are currently static
> doesn't fill me with joy.  These functions have weird APIs because
> they're static so we don't rally care, but once we export them we should
> strive to be more careful.  I'd rather stay away from just exporting
> them all, so I chose to encapsulate those calls in a single function and
> export only expand_targetlist from preptlist.c, keeping the others
> static in prepunion.c.  In the attached patch set, I put an API change
> (work on tupdescs rather than full-blown relations) for a couple of
> those functions as 0001, then your patch as 0002, then a few fixups of
> my own.  (0002 is not bit-by-bit identical to yours; I think I had to
> fix some merge conflict with 0001, but should be pretty much the same).
>
> But looking further, I think there is much cruft that has accumulated in
> those functions (because they've gotten simplified over time), and we
> could do some additional cleanup surgery.  For example, there is no
> reason to pass a list pointer to make_inh_translation_list(); we could
> just return it.  And then we don't have to cons up a fake AppendRelInfo
> with all dummy values that adjust_inherited_tlist doesn't even care
> about.  I think there was a point for all these contortions back at some
> point (visible by checking git history of this code), but it all seems
> useless now.

Yeah, I think it might as well work to fix up these interfaces the way
you've done.

> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
> expand_targetlist() runs *after*, not before, so how could it have
> affected the result?  I'm obviously confused about what
> expand_targetlist call this comment is talking about.  Anyway I wanted
> to make it use resjunk entries instead, but that broke some other case
> that I didn't have time to research yesterday.  I'll get back to this
> soon, but in the meantime, here's what I have.

Hmm.  I can imagine how the newly added comments in
adjust_inherited_tlist() may be confusing.  With this patch, we're now
calling adjust_inherited_tlist() from the executor, whereas it was
designed only to be run in the planner prep phase.  What we're passing to
it from the executor is the DO UPDATE SET's targetlist that has undergone
the expand_targetlist() treatment by the planner.  Maybe, we need to
update the adjust_inherited_tlist() comments to reflect the expansion of
its scope due to this patch.

Some comments on 0003:

+     * Fist, fix the target entries' resnos, by using inheritance
translation.

First

+    appinfo.parent_reltype = InvalidOid; // parentRel->rd_rel->reltype;

I guess you won't retain that comment. :)

+    result_tl = expand_targetlist(result_tl, CMD_UPDATE,

Should we add a CmdType argument to adjust_and_expand_partition_tlist()
and use it instead of hard-coding CMD_UPDATE here?

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Pavan Deolasee
In reply to this post by Alvaro Herrera-9


On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <[hidden email]> wrote:


Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
expand_targetlist() runs *after*, not before, so how could it have
affected the result? 

If I understand correctly, planner must have called expand_targetlist() once for the parent relation's descriptor and added any dropped columns from the parent relation. So we may not find mapped attributes for those dropped columns in the parent. I haven't actually tested this case though.

I wonder if it makes sense to actually avoid expanding on-conflict-set targetlist in case the target is a partition table and deal with it during execution, like we are doing now.
 
I'm obviously confused about what
expand_targetlist call this comment is talking about.  Anyway I wanted
to make it use resjunk entries instead, but that broke some other case
that I didn't have time to research yesterday.  I'll get back to this
soon, but in the meantime, here's what I have.

+           conv_setproj =
+               adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
+                                                 RelationGetDescr(partrel),
+                                                 RelationGetRelationName(partrel),
+                                                 firstVarno,
+                                                 conv_setproj);

Aren't we are adding back Vars referring to the parent relation, after having converted the existing ones to use partition relation's varno? May be that works because missing attributes are already added during planning and expand_targetlist() here only adds dropped columns, which are just NULL constants.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Pavan Deolasee
In reply to this post by Alvaro Herrera-9


On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <[hidden email]> wrote:


@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
    int         num_subplan_partition_offsets;
    TupleTableSlot *partition_tuple_slot;
    TupleTableSlot *root_tuple_slot;
+   List      **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
 } PartitionTupleRouting;


I am curious why you decided to add these members to PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better place to track these or is there some rule that we follow?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Etsuro Fujita
(2018/03/16 19:43), Pavan Deolasee wrote:
> On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <[hidden email]
> <mailto:[hidden email]>> wrote:

> @@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
>      int         num_subplan_partition_offsets;
>      TupleTableSlot *partition_tuple_slot;
>      TupleTableSlot *root_tuple_slot;
> +   List      **partition_arbiter_indexes;
> +   TupleTableSlot **partition_conflproj_slots;
> +   TupleTableSlot **partition_existing_slots;
>   } PartitionTupleRouting;

> I am curious why you decided to add these members to
> PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
> place to track these or is there some rule that we follow?

I just started reviewing the patch, so maybe I'm missing something, but
I think it would be a good idea to have these in that structure, not in
ResultRelInfo, because these would be required only for partitions
chosen via tuple routing.

Best regards,
Etsuro Fujita

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Pavan Deolasee


On Fri, Mar 16, 2018 at 5:13 PM, Etsuro Fujita <[hidden email]> wrote:
(2018/03/16 19:43), Pavan Deolasee wrote:
On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <[hidden email]
<mailto:[hidden email]>> wrote:

@@ -106,6 +120,9 @@ typedef struct PartitionTupleRouting
     int         num_subplan_partition_offsets;
     TupleTableSlot *partition_tuple_slot;
     TupleTableSlot *root_tuple_slot;
+   List      **partition_arbiter_indexes;
+   TupleTableSlot **partition_conflproj_slots;
+   TupleTableSlot **partition_existing_slots;
  } PartitionTupleRouting;

I am curious why you decided to add these members to
PartitionTupleRouting structure. Wouldn't ResultRelationInfo be a better
place to track these or is there some rule that we follow?

I just started reviewing the patch, so maybe I'm missing something, but I think it would be a good idea to have these in that structure, not in ResultRelInfo, because these would be required only for partitions chosen via tuple routing.

Hmm, yeah, probably you're right. The reason I got confused is because the patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the converted projection info/where clause for a given partition in its ResultRelationInfo. So I was wondering if we can move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and then just use that per-partition structures too.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Álvaro Herrera
Pavan Deolasee wrote:

> Hmm, yeah, probably you're right. The reason I got confused is because the
> patch uses ri_onConflictSetProj/ri_onConflictSetWhere to store the
> converted projection info/where clause for a given partition in its
> ResultRelationInfo. So I was wondering if we can
> move mt_arbiterindexes, mt_existing and mt_conflproj to ResultRelInfo and
> then just use that per-partition structures too.

I wonder if the proposed structure is good actually.

Some notes as I go along.

1. ModifyTableState->mt_arbiterindexes is just a copy of
ModifyTable->arbiterIndexes.  So why do we need it?  For an
unpartitioned relation we can just use
ModifyTableState.ps->arbiterIndexes.  Now, for each partition we need to
map these indexes onto the partition indexes.  Not sure where to put
these; I'm tempted to say ResultRelInfo is the place.  Maybe the list
should always be in ResultRelInfo instead of the state/plan node?  Not
sure.

2. We don't need mt_existing to be per-relation; a single tuple slot can
do for all partitions; we just need to ExecSlotSetDescriptor to the
partition's descriptor whenever the slot is going to be used.  (This
means the slot no longer has a fixed tupdesc.  That seems OK).

3. ModifyTableState->mt_conflproj is more or less the same thing; the
same TTS can be reused by all the different projections, as long as we
set the descriptor before projecting.  So we don't
need PartitionTupleRouting->partition_conflproj_slots, but we do need a
descriptor to be used when needed.  Let's say
  PartitionTupleRouting->partition_confl_tupdesc

I'll experiment with these ideas and see where that leads me.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Álvaro Herrera
So ExecInsert receives the ModifyTableState, and separately it receives
arbiterIndexes and the OnConflictAction, both of which are members of
the passed ModifyTableState.  I wonder why does it do that; wouldn't it
be simpler to extract those members from the node?

With the patch proposed upthread, we receive arbiterIndexes as a
parameter and if the table is a partition we summarily ignore those and
use the list as extracted from the PartitionRoutingInfo.  This is
confusing and pointless.  It seems to me that the logic ought to be "if
partition then use the list in PartitionRoutingInfo; if not partition
use it from ModifyTableState".  This requires changing as per above,
i.e. make the arbiter index list not part of the ExecInsert's API.

The OnConflictAction doesn't matter much; not passing it is merely a
matter of cleanliness.

Or is there another reason to pass the index list?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Peter Geoghegan-4
On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
<[hidden email]> wrote:
> So ExecInsert receives the ModifyTableState, and separately it receives
> arbiterIndexes and the OnConflictAction, both of which are members of
> the passed ModifyTableState.  I wonder why does it do that; wouldn't it
> be simpler to extract those members from the node?

> Or is there another reason to pass the index list?

It works that way pretty much by accident, as far as I can tell.
Removing the two extra arguments sounds like a good idea.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Álvaro Herrera
Peter Geoghegan wrote:

> On Fri, Mar 16, 2018 at 11:21 AM, Alvaro Herrera
> <[hidden email]> wrote:
> > So ExecInsert receives the ModifyTableState, and separately it receives
> > arbiterIndexes and the OnConflictAction, both of which are members of
> > the passed ModifyTableState.  I wonder why does it do that; wouldn't it
> > be simpler to extract those members from the node?
>
> > Or is there another reason to pass the index list?
>
> It works that way pretty much by accident, as far as I can tell.
> Removing the two extra arguments sounds like a good idea.

Great, thanks.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Álvaro Herrera
In reply to this post by Álvaro Herrera
Another thing I noticed is that the split of the ON CONFLICT slot and
its corresponding projection is pretty weird.  The projection is in
ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
make the projection work without a corresponding slot initialized with
the correct descriptor, so splitting it this way doesn't make a lot of
sense to me.

(Now, TBH the split between resultRelInfo->ri_projectReturning and
ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
returning project uses, doesn't make a lot of sense to me either; so
maybe there some reason that I'm just not seeing.  But I digress.)

So I want to propose that we move the slot to be together with the
projection node that it serves, ie. we put the slot in ResultRelInfo:

typedef struct ResultRelInfo
{
...
    /* for computing ON CONFLICT DO UPDATE SET */
    TupleTableSlot *ri_onConflictProjSlot;
    ProjectionInfo *ri_onConflictSetProj;

and with this the structure makes more sense.  So ExecInitModifyTable
does this

        /* create target slot for UPDATE SET projection */
        tupDesc = ExecTypeFromTL((List *) node->onConflictSet,
                                 relationDesc->tdhasoid);
        resultRelInfo->ri_onConflictProjSlot =
            ExecInitExtraTupleSlot(mtstate->ps.state, tupDesc);

        /* build UPDATE SET projection state */
        resultRelInfo->ri_onConflictSetProj =
            ExecBuildProjectionInfo(node->onConflictSet, econtext,
                                    resultRelInfo->ri_onConflictProjSlot,
                                    &mtstate->ps, relationDesc);

and then ExecOnConflictUpdate can simply do this:

    /* Project the new tuple version */
    ExecProject(resultRelInfo->ri_onConflictSetProj);

    /* Execute UPDATE with projection */
    *returning = ExecUpdate(mtstate, &tuple.t_self, NULL,
                            resultRelInfo->ri_onConflictProjSlot, planSlot,
                            &mtstate->mt_epqstate, mtstate->ps.state,
                            canSetTag);

Now, maybe there is some reason I'm missing for the on conflict slot for
the projection to be in ModifyTableState rather than resultRelInfo.  But
this code passes all current tests, so I don't know what that reason
would be.


Overall, the resulting code looks simpler to me than the previous
arrangements.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Andres Freund
Hi,

On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:

> Another thing I noticed is that the split of the ON CONFLICT slot and
> its corresponding projection is pretty weird.  The projection is in
> ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
> make the projection work without a corresponding slot initialized with
> the correct descriptor, so splitting it this way doesn't make a lot of
> sense to me.
>
> (Now, TBH the split between resultRelInfo->ri_projectReturning and
> ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
> returning project uses, doesn't make a lot of sense to me either; so
> maybe there some reason that I'm just not seeing.  But I digress.)

The projections for different child tables / child plans can look
different, therefore it can't be stored in ModifyTableState itself. No?

The slot's descriptor is changed to be "appropriate" when necessary:
                        if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))
                                ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc));


> So I want to propose that we move the slot to be together with the
> projection node that it serves, ie. we put the slot in ResultRelInfo:

That'll mean we have a good number of additional slots in some cases. I
don't think the overhead of that is going to break the bank, but it's
worth considering.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Álvaro Herrera
Andres Freund wrote:

> Hi,
>
> On 2018-03-16 18:23:44 -0300, Alvaro Herrera wrote:
> > Another thing I noticed is that the split of the ON CONFLICT slot and
> > its corresponding projection is pretty weird.  The projection is in
> > ResultRelInfo, but the slot itself is in ModifyTableState.  You can't
> > make the projection work without a corresponding slot initialized with
> > the correct descriptor, so splitting it this way doesn't make a lot of
> > sense to me.
> >
> > (Now, TBH the split between resultRelInfo->ri_projectReturning and
> > ModifyTableState->ps.ps_ResultTupleSlot, which is the slot that the
> > returning project uses, doesn't make a lot of sense to me either; so
> > maybe there some reason that I'm just not seeing.  But I digress.)
>
> The projections for different child tables / child plans can look
> different, therefore it can't be stored in ModifyTableState itself. No?
>
> The slot's descriptor is changed to be "appropriate" when necessary:
> if (slot->tts_tupleDescriptor != RelationGetDescr(resultRelationDesc))
> ExecSetSlotDescriptor(slot, RelationGetDescr(resultRelationDesc));

Grumble.  This stuff looks like full of cheats to me, but I won't touch
it anyway.

> > So I want to propose that we move the slot to be together with the
> > projection node that it serves, ie. we put the slot in ResultRelInfo:
>
> That'll mean we have a good number of additional slots in some cases. I
> don't think the overhead of that is going to break the bank, but it's
> worth considering.

Good point.

I think what I should be doing is the same as the returning stuff: keep
a tupdesc around, and use a single slot, whose descriptor is changed
just before the projection.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Álvaro Herrera
Alvaro Herrera wrote:

> I think what I should be doing is the same as the returning stuff: keep
> a tupdesc around, and use a single slot, whose descriptor is changed
> just before the projection.

Yes, this works, though it's ugly.  Not any uglier than what's already
there, though, so I think it's okay.

The only thing that I remain unhappy about this patch is the whole
adjust_and_expand_partition_tlist() thing.  I fear we may be doing
redundant and/or misplaced work.  I'll look into it next week.

0001 should be pretty much ready to push -- adjustments to ExecInsert
  and ModifyTableState I already mentioned.

0002 is stuff I would like to get rid of completely -- changes to
  planner code so that it better supports functionality we need for
  0003.

0003 is the main patch.  Compared to the previous version, this one
  reuses slots by switching them to different tupdescs as needed.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v4-0001-Simplify-ExecInsert-API-re.-ON-CONFLICT-data.patch (4K) Download Attachment
v4-0002-Make-some-static-functions-work-on-TupleDesc-rath.patch (7K) Download Attachment
v4-0003-Fix-ON-CONFLICT-to-work-with-partitioned-tables.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ON CONFLICT DO UPDATE for partitioned tables

Amit Langote-2
In reply to this post by Pavan Deolasee
On 2018/03/05 18:04, Pavan Deolasee wrote:

> On Fri, Mar 2, 2018 at 9:06 PM, Alvaro Herrera <[hidden email]>
> wrote:
>> Re. the "ugly hack" comments in adjust_inherited_tlist(), I'm confused:
>> expand_targetlist() runs *after*, not before, so how could it have
>> affected the result?
>>
>
> If I understand correctly, planner must have called expand_targetlist()
> once for the parent relation's descriptor and added any dropped columns
> from the parent relation. So we may not find mapped attributes for those
> dropped columns in the parent. I haven't actually tested this case though.
>
> I wonder if it makes sense to actually avoid expanding on-conflict-set
> targetlist in case the target is a partition table and deal with it during
> execution, like we are doing now.
>> I'm obviously confused about what
>> expand_targetlist call this comment is talking about.  Anyway I wanted
>> to make it use resjunk entries instead, but that broke some other case
>> that I didn't have time to research yesterday.  I'll get back to this
>> soon, but in the meantime, here's what I have.
>>
>
> +           conv_setproj =
> +
>  adjust_and_expand_partition_tlist(RelationGetDescr(firstResultRel),
> +                                                 RelationGetDescr(partrel),
> +
>  RelationGetRelationName(partrel),
> +                                                 firstVarno,
> +                                                 conv_setproj);
>
> Aren't we are adding back Vars referring to the parent relation, after
> having converted the existing ones to use partition relation's varno? May
> be that works because missing attributes are already added during planning
> and expand_targetlist() here only adds dropped columns, which are just NULL
> constants.

I think this suggestion to defer onConflictSet target list expansion  to
execution time is a good one.  So, in preprocess_targetlist(), we'd only
perform exapand_targetlist on the onConflictSet list if the table is not a
partitioned table.  For partitioned tables, we don't know which
partition(s) will be affected, so it's useless to do the expansion there.
Instead it's better to expand in ExecInitPartitionInfo(), possibly after
changing original TargetEntry nodes to have correct resnos due to
attribute number differences (calling adjust_inherited_tlist(), etc.).

And then since we're not expanding the target list in the planner anymore,
we don't need to install those hacks in adjust_inherited_tlist() like the
patch currently does to ignore entries for dropped columns in the parent
the plan-time expansion adds.

Thanks,
Amit


123