Restrict concurrent update/delete with UPDATE of partition key

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Restrict concurrent update/delete with UPDATE of partition key

amul sul
Hi All,

Attaching POC patch that throws an error in the case of a concurrent update
to an already deleted tuple due to UPDATE of partition key[1].

In a normal update new tuple is linked to the old one via ctid forming
a chain of tuple versions but UPDATE of partition key[1] move tuple
from one partition to an another partition table which breaks that chain.

Consider a following[2] concurrent update case where one session trying to
update a row that's locked for a concurrent update by the another session
cause tuple movement to the another partition.

create table foo (a int2, b text) partition by list (a);
create table foo1 partition of foo for values IN (1);
create table foo2 partition of foo for values IN (2);
insert into foo values(1, 'ABC');

 ----------- session 1 -----------
postgres=# begin;
BEGIN
postgres=# update foo set a=2 where a=1;
UPDATE 1


 ----------- session 2  -----------
postgres=# update foo set b='EFG' where a=1;

 ….. wait state ……

----------- session 1 -----------
postgres=# commit;
COMMIT

 ----------- session 2  -----------
UPDATE 0

This UPDATE 0 is the problematic, see Greg Stark's update[3] explains
why we need an error.

To throw an error we need an indicator that the targeted row has been
already moved to the another partition, and for that setting a ctid.ip_blkid to
InvalidBlockId looks viable option for now.

The attached patch incorporates the following logic suggested by Amit
Kapila[4]:

"We can pass a flag say row_moved (or require_row_movement) to heap_delete
which will in turn set InvalidBlockId in ctid instead of setting it to
self. Then the
ExecUpdate needs to check for the same and return an error when heap_update is
not successful (result != HeapTupleMayBeUpdated)."

1] https://postgr.es/m/CAJ3gD9do9o2ccQ7j7%2BtSgiE1REY65XRiMb%3DyJO3u3QhyP8EEPQ%40mail.gmail.com
2] With https://postgr.es/m/CAJ3gD9fzD4jBpv+zXqZYnW=h9JXUFG9E7NGdA9gR_JJbOj=Q5A@...
patch applied.
3] https://postgr.es/m/CAM-w4HPis7rbnwi%2BoXjnouqMSRAC5DeVcMdxEXTMfDos1kaYPQ%40mail.gmail.com
4] https://postgr.es/m/CAA4eK1KEZQ%2BCyXbBzfn1jFHoEfa_OemDLhLyy7xfD1QUZLo1DQ%40mail.gmail.com

Regards,
Amul


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

0001-POC-Invalidate-ip_blkid-v1.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

Robert Haas
On Wed, Sep 27, 2017 at 7:07 AM, amul sul <[hidden email]> wrote:
> Attaching POC patch that throws an error in the case of a concurrent update
> to an already deleted tuple due to UPDATE of partition key[1].
>
> In a normal update new tuple is linked to the old one via ctid forming
> a chain of tuple versions but UPDATE of partition key[1] move tuple
> from one partition to an another partition table which breaks that chain.

This patch needs a rebase.  It has one whitespace-only hunk that
should possibly be excluded.

I think the basic idea of this is sound.  Either you or Amit need to
document the behavior in the user-facing documentation, and it needs
tests that hit every single one of the new error checks you've added
(obviously, the tests will only work in combination with Amit's
patch).  The isolation should be sufficient to write such tests.

It needs some more extensive comments as well.  The fact that we're
assigning a meaning to ip_blkid -> InvalidBlockNumber is a big deal,
and should at least be documented in itemptr.h in the comments for the
ItemPointerData structure.

I suspect ExecOnConflictUpdate needs an update for the
HeapTupleUpdated case similar to what you've done elsewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Sat, Nov 11, 2017 at 1:05 AM, Robert Haas <[hidden email]> wrote:

> On Wed, Sep 27, 2017 at 7:07 AM, amul sul <[hidden email]> wrote:
>> Attaching POC patch that throws an error in the case of a concurrent update
>> to an already deleted tuple due to UPDATE of partition key[1].
>>
>> In a normal update new tuple is linked to the old one via ctid forming
>> a chain of tuple versions but UPDATE of partition key[1] move tuple
>> from one partition to an another partition table which breaks that chain.
>
> This patch needs a rebase.  It has one whitespace-only hunk that
> should possibly be excluded.
>
Thanks for looking at the patch.

> I think the basic idea of this is sound.  Either you or Amit need to
> document the behavior in the user-facing documentation, and it needs
> tests that hit every single one of the new error checks you've added
> (obviously, the tests will only work in combination with Amit's
> patch).  The isolation should be sufficient to write such tests.
>
> It needs some more extensive comments as well.  The fact that we're
> assigning a meaning to ip_blkid -> InvalidBlockNumber is a big deal,
> and should at least be documented in itemptr.h in the comments for the
> ItemPointerData structure.
>
> I suspect ExecOnConflictUpdate needs an update for the
> HeapTupleUpdated case similar to what you've done elsewhere.
>
UPDATE of partition key v25[1] has documented this concurrent scenario,
I need to rework on that document paragraph to include this behaviour, will
discuss with Amit.

Attached 0001 patch includes error check for 8 functions, out of 8 I am able
to build isolation test for 4 functions -- ExecUpdate,ExecDelete,
GetTupleForTrigger & ExecLockRows.

And remaining are EvalPlanQualFetch, ExecOnConflictUpdate,
RelationFindReplTupleByIndex & RelationFindReplTupleSeq.  Note that check in
RelationFindReplTupleByIndex & RelationFindReplTupleSeq will have LOG not an
ERROR.

Any help/suggestion to build test for these function would be much appreciated.


1] http://postgr.es/m/CAJ3gD9f4Um99sOJmVSEPj783VWw5GbXkgU3OWcYZJv=ipjTkAw@...


Regards,
Amul

0001-POC-Invalidate-ip_blkid-v2.patch (15K) Download Attachment
0002-isolation-tests-v1.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

Robert Haas
On Thu, Nov 23, 2017 at 6:48 AM, amul sul <[hidden email]> wrote:
> And remaining are EvalPlanQualFetch, ExecOnConflictUpdate,
> RelationFindReplTupleByIndex & RelationFindReplTupleSeq.  Note that check in
> RelationFindReplTupleByIndex & RelationFindReplTupleSeq will have LOG not an
> ERROR.

The first one is going to come up when you have, for example, two
concurrent updates targeting the same row, and the second one when you
have an ON CONFLICT UPDATE clause.  I guess the latter two are
probably related to logical replication, and maybe not easy to test
via an automated regression test.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

Álvaro Herrera
In reply to this post by amul sul
A typo in all the messages the patch adds:
"to an another" -> "to another"

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

akapila
In reply to this post by amul sul
On Thu, Nov 23, 2017 at 5:18 PM, amul sul <[hidden email]> wrote:

> On Sat, Nov 11, 2017 at 1:05 AM, Robert Haas <[hidden email]> wrote:
>> On Wed, Sep 27, 2017 at 7:07 AM, amul sul <[hidden email]> wrote:
>>> Attaching POC patch that throws an error in the case of a concurrent update
>>> to an already deleted tuple due to UPDATE of partition key[1].
>>>
>>> In a normal update new tuple is linked to the old one via ctid forming
>>> a chain of tuple versions but UPDATE of partition key[1] move tuple
>>> from one partition to an another partition table which breaks that chain.
>>
>> This patch needs a rebase.  It has one whitespace-only hunk that
>> should possibly be excluded.
>>
> Thanks for looking at the patch.
>
>> I think the basic idea of this is sound.  Either you or Amit need to
>> document the behavior in the user-facing documentation, and it needs
>> tests that hit every single one of the new error checks you've added
>> (obviously, the tests will only work in combination with Amit's
>> patch).  The isolation should be sufficient to write such tests.
>>
>> It needs some more extensive comments as well.  The fact that we're
>> assigning a meaning to ip_blkid -> InvalidBlockNumber is a big deal,
>> and should at least be documented in itemptr.h in the comments for the
>> ItemPointerData structure.
>>
>> I suspect ExecOnConflictUpdate needs an update for the
>> HeapTupleUpdated case similar to what you've done elsewhere.
>>
>
> UPDATE of partition key v25[1] has documented this concurrent scenario,
> I need to rework on that document paragraph to include this behaviour, will
> discuss with Amit.
>
> Attached 0001 patch includes error check for 8 functions, out of 8 I am able
> to build isolation test for 4 functions -- ExecUpdate,ExecDelete,
> GetTupleForTrigger & ExecLockRows.
>

Few comments:

1.
@@ -1480,6 +1493,10 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
  ereport(ERROR,
  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  errmsg("could not serialize access due to concurrent update")));
+ if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("tuple to be updated was already moved to an another
partition due to concurrent update")));

Why do you think we need this check in the OnConflictUpdate path?  I
think we don't it here because we are going to relinquish this version
of the tuple and will start again and might fetch some other row
version.  Also, we don't support Insert .. On Conflict Update with
partitioned tables, see[1], which is also an indication that at the
very least we don't need it now.

2.
@@ -2709,6 +2709,10 @@ EvalPlanQualFetch(EState *estate, Relation
relation, int lockmode,
  ereport(ERROR,
  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  errmsg("could not serialize access due to concurrent update")));
+ if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("tuple to be updated was already moved to an another
partition due to concurrent update")));

..
..
+++ b/src/backend/executor/nodeLockRows.c
@@ -218,6 +218,11 @@ lnext:
  ereport(ERROR,
  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  errmsg("could not serialize access due to concurrent update")));
+ if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("tuple to be locked was already moved to an another partition
due to concurrent update")));
+

At some places after heap_lock_tuple the error message says "tuple to
be updated .." and other places it says "tuple to be locked ..".  Can
we use the same message consistently?  I think it would be better to
use the second one.

3.
}
+
  /* tuple already deleted; nothing to do */
  return NULL;

Spurious whitespace.

4.  There is no need to use *POC* in the name of the patch.  I think
this is no more a POC patch.

[1] - https://www.postgresql.org/message-id/7ff1e8ec-dc39-96b1-7f47-ff5965dceeac%40lab.ntt.co.jp

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Sat, Nov 25, 2017 at 11:39 AM, Amit Kapila <[hidden email]> wrote:
> On Thu, Nov 23, 2017 at 5:18 PM, amul sul <[hidden email]> wrote:
>> On Sat, Nov 11, 2017 at 1:05 AM, Robert Haas <[hidden email]> wrote:
>>> On Wed, Sep 27, 2017 at 7:07 AM, amul sul <[hidden email]> wrote:
>>>>
[...]
> Few comments:
>
Thanks for looking at the patch, please find my comments inline:

> 1.
> @@ -1480,6 +1493,10 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
>   ereport(ERROR,
>   (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>   errmsg("could not serialize access due to concurrent update")));
> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("tuple to be updated was already moved to an another
> partition due to concurrent update")));
>
> Why do you think we need this check in the OnConflictUpdate path?  I
> think we don't it here because we are going to relinquish this version
> of the tuple and will start again and might fetch some other row
> version.  Also, we don't support Insert .. On Conflict Update with
> partitioned tables, see[1], which is also an indication that at the
> very least we don't need it now.
>
Agreed, even though this case will never going to be anytime soon
shouldn't we have a check for invalid block id? IMHO, we should have
this check and error report or assert, thoughts?

> 2.
> @@ -2709,6 +2709,10 @@ EvalPlanQualFetch(EState *estate, Relation
> relation, int lockmode,
>   ereport(ERROR,
>   (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>   errmsg("could not serialize access due to concurrent update")));
> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("tuple to be updated was already moved to an another
> partition due to concurrent update")));
>
> ..
> ..
> +++ b/src/backend/executor/nodeLockRows.c
> @@ -218,6 +218,11 @@ lnext:
>   ereport(ERROR,
>   (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>   errmsg("could not serialize access due to concurrent update")));
> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("tuple to be locked was already moved to an another partition
> due to concurrent update")));
> +
>
> At some places after heap_lock_tuple the error message says "tuple to
> be updated .." and other places it says "tuple to be locked ..".  Can
> we use the same message consistently?  I think it would be better to
> use the second one.
>
Okay, will use "tuple to be locked"

> 3.
> }
> +
>   /* tuple already deleted; nothing to do */
>   return NULL;
>
> Spurious whitespace.
>
Sorry about that, will fix this.

> 4.  There is no need to use *POC* in the name of the patch.  I think
> this is no more a POC patch.
>
Understood.

Regards,
Amul

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
In reply to this post by Álvaro Herrera
On Fri, Nov 24, 2017 at 9:37 PM, Alvaro Herrera <[hidden email]> wrote:
> A typo in all the messages the patch adds:
> "to an another" -> "to another"
>

Thanks for the looking into the patch, will fix in the next version.

Regards,
Amul

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

akapila
In reply to this post by amul sul
On Tue, Nov 28, 2017 at 5:58 PM, amul sul <[hidden email]> wrote:

> On Sat, Nov 25, 2017 at 11:39 AM, Amit Kapila <[hidden email]> wrote:
>> On Thu, Nov 23, 2017 at 5:18 PM, amul sul <[hidden email]> wrote:
>>> On Sat, Nov 11, 2017 at 1:05 AM, Robert Haas <[hidden email]> wrote:
>>>> On Wed, Sep 27, 2017 at 7:07 AM, amul sul <[hidden email]> wrote:
>>>>>
> [...]
>> Few comments:
>>
> Thanks for looking at the patch, please find my comments inline:
>
>> 1.
>> @@ -1480,6 +1493,10 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
>>   ereport(ERROR,
>>   (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>>   errmsg("could not serialize access due to concurrent update")));
>> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
>> + ereport(ERROR,
>> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("tuple to be updated was already moved to an another
>> partition due to concurrent update")));
>>
>> Why do you think we need this check in the OnConflictUpdate path?  I
>> think we don't it here because we are going to relinquish this version
>> of the tuple and will start again and might fetch some other row
>> version.  Also, we don't support Insert .. On Conflict Update with
>> partitioned tables, see[1], which is also an indication that at the
>> very least we don't need it now.
>>
> Agreed, even though this case will never going to be anytime soon
> shouldn't we have a check for invalid block id? IMHO, we should have
> this check and error report or assert, thoughts?
>

I feel adding code which can't be hit (even if it is error handling)
is not a good idea.  I think having an Assert should be okay, but
please write comments to explain the reason for adding an Assert.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Wed, Nov 29, 2017 at 7:51 AM, Amit Kapila <[hidden email]> wrote:

> On Tue, Nov 28, 2017 at 5:58 PM, amul sul <[hidden email]> wrote:
>> On Sat, Nov 25, 2017 at 11:39 AM, Amit Kapila <[hidden email]> wrote:
>>> On Thu, Nov 23, 2017 at 5:18 PM, amul sul <[hidden email]> wrote:
>>>> On Sat, Nov 11, 2017 at 1:05 AM, Robert Haas <[hidden email]> wrote:
>>>>> On Wed, Sep 27, 2017 at 7:07 AM, amul sul <[hidden email]> wrote:
>>>>>>
>> [...]
>>> Few comments:
>>>
>> Thanks for looking at the patch, please find my comments inline:
>>
>>> 1.
>>> @@ -1480,6 +1493,10 @@ ExecOnConflictUpdate(ModifyTableState *mtstate,
>>>   ereport(ERROR,
>>>   (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>>>   errmsg("could not serialize access due to concurrent update")));
>>> + if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>> + errmsg("tuple to be updated was already moved to an another
>>> partition due to concurrent update")));
>>>
>>> Why do you think we need this check in the OnConflictUpdate path?  I
>>> think we don't it here because we are going to relinquish this version
>>> of the tuple and will start again and might fetch some other row
>>> version.  Also, we don't support Insert .. On Conflict Update with
>>> partitioned tables, see[1], which is also an indication that at the
>>> very least we don't need it now.
>>>
>> Agreed, even though this case will never going to be anytime soon
>> shouldn't we have a check for invalid block id? IMHO, we should have
>> this check and error report or assert, thoughts?
>>
>
> I feel adding code which can't be hit (even if it is error handling)
> is not a good idea.  I think having an Assert should be okay, but
> please write comments to explain the reason for adding an Assert.
>
Agree, updated in the attached patch.  Patch 0001 also includes your
previous review comment[1] and typo correction suggested by Alvaro[2].

Patch 0002 still missing tests for EvalPlanQualFetch() function.  I think we
could skip that because direct/indirect callers of EvalPlanQualFetch() are
GetTupleForTrigger, ExecDelete, ExecUpdate & ExecLockRows got the required test
coverage in the attached patch.

1] https://postgr.es/m/CAA4eK1LQS6TmsGaEwR9HgF-9TZTHxrdAELuX6wOZBDbbjOfDjQ@...
2] https://postgr.es/m/20171124160756.eyljpmpfzwd6jmnr@...

Regards,
Amul

0002-isolation-tests-v2.patch (12K) Download Attachment
0001-Invalidate-ip_blkid-v3.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

Stephen Frost
Amul,

* amul sul ([hidden email]) wrote:
> Agree, updated in the attached patch.  Patch 0001 also includes your
> previous review comment[1] and typo correction suggested by Alvaro[2].

Looks like this needs to be rebased (though the failures aren't too bad,
from what I'm seeing), so going to mark this back to Waiting For Author.
Hopefully this also helps to wake this thread up a bit and get another
review of it.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Thu, Jan 11, 2018 at 8:06 PM, Stephen Frost <[hidden email]> wrote:

> Amul,
>
> * amul sul ([hidden email]) wrote:
>> Agree, updated in the attached patch.  Patch 0001 also includes your
>> previous review comment[1] and typo correction suggested by Alvaro[2].
>
> Looks like this needs to be rebased (though the failures aren't too bad,
> from what I'm seeing), so going to mark this back to Waiting For Author.
> Hopefully this also helps to wake this thread up a bit and get another
> review of it.
>
Thanks for looking at this thread, attached herewith an updated patch rebase on
'UPDATE of partition key v35' patch[1].


Regards,
Amul


1] https://postgr.es/m/CAJ3gD9dixkkMzNnnP1CaZ1H17-U17ok_sVbjZZo+wnB=rJH6yg@...

0001-Invalidate-ip_blkid-v4.patch (15K) Download Attachment
0002-isolation-tests-v3.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

akapila
On Fri, Jan 12, 2018 at 11:43 AM, amul sul <[hidden email]> wrote:
>
> Thanks for looking at this thread, attached herewith an updated patch rebase on
> 'UPDATE of partition key v35' patch[1].
>

  ExecDelete(mtstate, tupleid, oldtuple, planSlot, epqstate, estate,
-   &tuple_deleted, false, false);
+   &tuple_deleted, false, false, true);

  /*
  * For some reason if DELETE didn't happen (e.g. trigger prevented
@@ -1292,6 +1299,11 @@ lreplace:;
  ereport(ERROR,
  (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
  errmsg("could not serialize access due to concurrent update")));
+ if (!BlockNumberIsValid(BlockIdGetBlockNumber(&((hufd.ctid).ip_blkid))))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("tuple to be locked was already moved to another partition
due to concurrent update")));

I have asked to change the message "tuple to be updated .." after
heap_lock_tuple call not in nodeModifyTable.c, so please revert the
message in nodeModifyTable.c.

Have you verified the changes in execReplication.c in some way?  It is
not clear to me how do you ensure to set the special value
(InvalidBlockNumber) in CTID for delete operation via logical
replication?  The logical replication worker uses function
ExecSimpleRelationDelete to perform Delete and there is no way it can
pass the correct value of row_moved in heap_delete.  Am I missing
something due to which we don't need to do this?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapila <[hidden email]> wrote:
> On Fri, Jan 12, 2018 at 11:43 AM, amul sul <[hidden email]> wrote:
[....]
> I have asked to change the message "tuple to be updated .." after
> heap_lock_tuple call not in nodeModifyTable.c, so please revert the
> message in nodeModifyTable.c.
>

Understood, fixed in the attached version, sorry I'd missed it.

> Have you verified the changes in execReplication.c in some way? It is
> not clear to me how do you ensure to set the special value
> (InvalidBlockNumber) in CTID for delete operation via logical
> replication?  The logical replication worker uses function
> ExecSimpleRelationDelete to perform Delete and there is no way it can
> pass the correct value of row_moved in heap_delete.  Am I missing
> something due to which we don't need to do this?
>

You are correct, from ExecSimpleRelationDelete, heap_delete will always
receive row_moved = false and InvalidBlockNumber will never set.

I didn't found any test case to hit changes in execReplication.c.  I am not sure
what should we suppose do here, and having LOG is how much worse either.

What do you think, should we add an assert like EvalPlanQualFetch() here or
the current LOG message is fine?

Thanks for the review.

Regards,
Amul Sul

0001-Invalidate-ip_blkid-v5-wip.patch (16K) Download Attachment
0002-isolation-tests-v3.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

akapila
On Wed, Jan 24, 2018 at 12:44 PM, amul sul <[hidden email]> wrote:

> On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapila <[hidden email]> wrote:
>> On Fri, Jan 12, 2018 at 11:43 AM, amul sul <[hidden email]> wrote:
> [....]
>> I have asked to change the message "tuple to be updated .." after
>> heap_lock_tuple call not in nodeModifyTable.c, so please revert the
>> message in nodeModifyTable.c.
>>
>
> Understood, fixed in the attached version, sorry I'd missed it.
>
>> Have you verified the changes in execReplication.c in some way? It is
>> not clear to me how do you ensure to set the special value
>> (InvalidBlockNumber) in CTID for delete operation via logical
>> replication?  The logical replication worker uses function
>> ExecSimpleRelationDelete to perform Delete and there is no way it can
>> pass the correct value of row_moved in heap_delete.  Am I missing
>> something due to which we don't need to do this?
>>
>
> You are correct, from ExecSimpleRelationDelete, heap_delete will always
> receive row_moved = false and InvalidBlockNumber will never set.
>

So, this means that in case of logical replication, it won't generate
the error this patch is trying to introduce.  I think if we want to
handle this we need some changes in WAL and logical decoding as well.

Robert, others, what do you think?  I am not very comfortable leaving
this unaddressed, if we don't want to do anything about it, at least
we should document it.

> I didn't found any test case to hit changes in execReplication.c.  I am not sure
> what should we suppose do here, and having LOG is how much worse either.
>

I think you can manually (via debugger) hit this by using
PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
you need to do is in node-1, create a partitioned table and subscribe
it on node-2.  Now, perform an Update on node-1, then stop the logical
replication worker before it calls heap_lock_tuple.  Now, in node-2,
update the same row such that it moves the row.  Now, continue the
logical replication worker.  I think it should hit your new code, if
not then we need to think of some other way.

> What do you think, should we add an assert like EvalPlanQualFetch() here or
> the current LOG message is fine?
>

I think first let's try to hit this case.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
Hi Amit,

Sorry for the delayed response.

On Fri, Jan 26, 2018 at 11:58 AM, Amit Kapila <[hidden email]> wrote:
> On Wed, Jan 24, 2018 at 12:44 PM, amul sul <[hidden email]> wrote:
>> On Tue, Jan 23, 2018 at 7:01 PM, Amit Kapila <[hidden email]> wrote:
>>> On Fri, Jan 12, 2018 at 11:43 AM, amul sul <[hidden email]> wrote:
[....]
> I think you can manually (via debugger) hit this by using
> PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
> you need to do is in node-1, create a partitioned table and subscribe
> it on node-2.  Now, perform an Update on node-1, then stop the logical
> replication worker before it calls heap_lock_tuple.  Now, in node-2,
> update the same row such that it moves the row.  Now, continue the
> logical replication worker.  I think it should hit your new code, if
> not then we need to think of some other way.
>

I am able to hit the change log using above steps. Thanks a lot for the
step by step guide, I really needed that.

One strange behavior I found in the logical replication which is reproducible
without attached patch as well -- when I have updated on node2 by keeping
breakpoint before the heap_lock_tuple call in replication worker, I can see
a duplicate row was inserted on the node2, see this:

== NODE 1 ==

postgres=# insert into foo values(1, 'initial insert');
INSERT 0 1

postgres=# select tableoid::regclass, * from foo;
 tableoid | a |       b
----------+---+----------------
 foo1     | 1 | initial insert
(1 row)


=== NODE 2 ==

postgres=# select tableoid::regclass, * from foo;
 tableoid | a |       b
----------+---+----------------
 foo1     | 1 | initial insert
(1 row)


== NODE 1 ==

postgres=# update foo set a=2, b='node1_update' where a=1;
UPDATE 1

<---- BREAK POINT BEFORE heap_lock_tuple IN replication worker  --->


== NODE 2 ==

postgres=#  update foo set a=2, b='node2_update' where a=1;

<---- RELEASE BREAK POINT --->

postgres=# 2018-02-02 12:35:45.050 IST [91786] LOG:  tuple to be
locked was already moved to another partition due to concurrent
update, retrying

postgres=# select tableoid::regclass, * from foo;
 tableoid | a |      b
----------+---+--------------
 foo2     | 2 | node2_update
 foo2     | 2 | node1_update
(2 rows)


== NODE 1 ==

postgres=# select tableoid::regclass, * from foo;
 tableoid | a |      b
----------+---+--------------
 foo2     | 2 | node1_update
(1 row)

I am thinking to report this in a separate thread, but not sure if
this is already known behaviour or not.

== schema to reproduce above case ==
-- node1
create table foo (a int2, b text) partition by list (a);
create table foo1 partition of foo for values IN (1);
create table foo2 partition of foo for values IN (2);
insert into foo values(1, 'initial insert');
CREATE PUBLICATION update_row_mov_pub FOR ALL TABLES;
ALTER TABLE foo REPLICA IDENTITY FULL;
ALTER TABLE foo1 REPLICA IDENTITY FULL;
ALTER TABLE foo2 REPLICA IDENTITY FULL;

-- node2
create table foo (a int2, b text) partition by list (a);
create table foo1 partition of foo for values IN (1);
create table foo2 partition of foo for values IN (2);
CREATE SUBSCRIPTION update_row_mov_sub CONNECTION 'host=localhost
dbname=postgres' PUBLICATION update_row_mov_pub;
== END==

Updated patch attached -- correct changes in execReplication.c.

Regards,
Amul Sul

0001-Invalidate-ip_blkid-v5-wip2.patch (17K) Download Attachment
0002-isolation-tests-v3.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

Robert Haas
In reply to this post by akapila
On Fri, Jan 26, 2018 at 1:28 AM, Amit Kapila <[hidden email]> wrote:
> So, this means that in case of logical replication, it won't generate
> the error this patch is trying to introduce.  I think if we want to
> handle this we need some changes in WAL and logical decoding as well.
>
> Robert, others, what do you think?  I am not very comfortable leaving
> this unaddressed, if we don't want to do anything about it, at least
> we should document it.

As I said on the other thread, I'm not sure how reasonable it really
is to try to do anything about this.  For both the issue you raised
there, I think we'd need to introduce a new WAL record type that
represents a delete from one table and an insert to another that
should be considered as a single operation. I'm not keen on that idea,
but you can make an argument that it's the Right Thing To Do.  I would
be more inclined, at least for v11, to just document that the
delete+insert will be replayed separately on replicas.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

akapila
In reply to this post by amul sul
On Fri, Feb 2, 2018 at 2:11 PM, amul sul <[hidden email]> wrote:

> On Fri, Jan 26, 2018 at 11:58 AM, Amit Kapila <[hidden email]> wrote:
> [....]
>> I think you can manually (via debugger) hit this by using
>> PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
>> you need to do is in node-1, create a partitioned table and subscribe
>> it on node-2.  Now, perform an Update on node-1, then stop the logical
>> replication worker before it calls heap_lock_tuple.  Now, in node-2,
>> update the same row such that it moves the row.  Now, continue the
>> logical replication worker.  I think it should hit your new code, if
>> not then we need to think of some other way.
>>
>
> I am able to hit the change log using above steps. Thanks a lot for the
> step by step guide, I really needed that.
>
> One strange behavior I found in the logical replication which is reproducible
> without attached patch as well -- when I have updated on node2 by keeping
> breakpoint before the heap_lock_tuple call in replication worker, I can see
> a duplicate row was inserted on the node2, see this:
>
..
>
> I am thinking to report this in a separate thread, but not sure if
> this is already known behaviour or not.
>

I think it is worth to discuss this behavior in a separate thread.
However, if possible, try to reproduce it without partitioning and
then report it.

>
> Updated patch attached -- correct changes in execReplication.c.
>

Your changes look correct to me.

I wonder what will be the behavior of this patch with
wal_consistency_checking [1].  I think it will generate a failure as
there is nothing in WAL to replay it.  Can you once try it?  If we see
a failure with wal consistency checker, then we need to think whether
(a) we want to deal with it by logging this information, or (b) do we
want to mask it or (c) something else?


[1] -  https://www.postgresql.org/docs/devel/static/runtime-config-developer.html


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Sun, Feb 4, 2018 at 10:47 AM, Amit Kapila <[hidden email]> wrote:

> On Fri, Feb 2, 2018 at 2:11 PM, amul sul <[hidden email]> wrote:
>> On Fri, Jan 26, 2018 at 11:58 AM, Amit Kapila <[hidden email]> wrote:
>> [....]
>>> I think you can manually (via debugger) hit this by using
>>> PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
>>> you need to do is in node-1, create a partitioned table and subscribe
>>> it on node-2.  Now, perform an Update on node-1, then stop the logical
>>> replication worker before it calls heap_lock_tuple.  Now, in node-2,
>>> update the same row such that it moves the row.  Now, continue the
>>> logical replication worker.  I think it should hit your new code, if
>>> not then we need to think of some other way.
>>>
>>
>> I am able to hit the change log using above steps. Thanks a lot for the
>> step by step guide, I really needed that.
>>
>> One strange behavior I found in the logical replication which is reproducible
>> without attached patch as well -- when I have updated on node2 by keeping
>> breakpoint before the heap_lock_tuple call in replication worker, I can see
>> a duplicate row was inserted on the node2, see this:
>>
> ..
>>
>> I am thinking to report this in a separate thread, but not sure if
>> this is already known behaviour or not.
>>
>
> I think it is worth to discuss this behavior in a separate thread.
> However, if possible, try to reproduce it without partitioning and
> then report it.
>
Logical replication behavior for the normal table is as expected, this happens
only with partition table, will start a new thread for this on hacker.

>>
>> Updated patch attached -- correct changes in execReplication.c.
>>
>
> Your changes look correct to me.
>
> I wonder what will be the behavior of this patch with
> wal_consistency_checking [1].  I think it will generate a failure as
> there is nothing in WAL to replay it.  Can you once try it?  If we see
> a failure with wal consistency checker, then we need to think whether
> (a) we want to deal with it by logging this information, or (b) do we
> want to mask it or (c) something else?
>
>
> [1] -  https://www.postgresql.org/docs/devel/static/runtime-config-developer.html
>
Yes, you are correct standby stopped with a following error:

 FATAL:  inconsistent page found, rel 1663/13260/16390, forknum 0, blkno 0
 CONTEXT:  WAL redo at 0/3002510 for Heap/DELETE: off 6 KEYS_UPDATED
 LOG:  startup process (PID 22791) exited with exit code 1
 LOG:  terminating any other active server processes
 LOG:  database system is shut down

I have tested warm standby replication setup using attached script. Without
wal_consistency_checking setting, it works fine & data from master to standby is
replicated as expected, if this guaranty is enough then I think could skip this
error from wal consistent check for such deleted tuple (I guess option
b that you have suggested), thoughts?

test.sh (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restrict concurrent update/delete with UPDATE of partition key

amul sul
On Tue, Feb 6, 2018 at 7:05 PM, amul sul <[hidden email]> wrote:

> On Sun, Feb 4, 2018 at 10:47 AM, Amit Kapila <[hidden email]> wrote:
>> On Fri, Feb 2, 2018 at 2:11 PM, amul sul <[hidden email]> wrote:
>>> On Fri, Jan 26, 2018 at 11:58 AM, Amit Kapila <[hidden email]> wrote:
>>> [....]
>>>> I think you can manually (via debugger) hit this by using
>>>> PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
>>>> you need to do is in node-1, create a partitioned table and subscribe
>>>> it on node-2.  Now, perform an Update on node-1, then stop the logical
>>>> replication worker before it calls heap_lock_tuple.  Now, in node-2,
>>>> update the same row such that it moves the row.  Now, continue the
>>>> logical replication worker.  I think it should hit your new code, if
>>>> not then we need to think of some other way.
>>>>
>>>
>>> I am able to hit the change log using above steps. Thanks a lot for the
>>> step by step guide, I really needed that.
>>>
>>> One strange behavior I found in the logical replication which is reproducible
>>> without attached patch as well -- when I have updated on node2 by keeping
>>> breakpoint before the heap_lock_tuple call in replication worker, I can see
>>> a duplicate row was inserted on the node2, see this:
>>>
>> ..
>>>
>>> I am thinking to report this in a separate thread, but not sure if
>>> this is already known behaviour or not.
>>>
>>
>> I think it is worth to discuss this behavior in a separate thread.
>> However, if possible, try to reproduce it without partitioning and
>> then report it.
>>
> Logical replication behavior for the normal table is as expected, this happens
> only with partition table, will start a new thread for this on hacker.
>
Posted on hackers :
https://postgr.es/m/CAAJ_b94bYxLsX0erZXVH-anQPbWqcYUPWX4xVRa1YJY=Ph60ZQ@...

Regards,
Amul Sul

12
Previous Thread Next Thread