Logical replication can be broken by domain constraint with NOT VALID option

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Logical replication can be broken by domain constraint with NOT VALID option

Andrey Lepikhov
Hi,

During patch development I ran into a small problem (see attachment,
fail_replication.sh):
1. We have a table with logical replication to another node.
2. On the master and replica add such "NOT VALID" domain constraint on
the table  that some tuples  violates the constraint.
3. UPDATE the table: set value of the tuple that violates constraint to
correct value.
4. That's all!

The reason for this problem is that on UPDATE walsender sends old tuple
value (that violates the constraint) with new version (satisfied the
constraint).
Replication worker at replica node restores slot from transfer
representation. During this process domain checking constraint and
returns an ERROR.
Because we can't apply WAL record of the UPDATE command, logical
replication will be stopped at all.
As I understand, this problem can be reproduced in all postgres versions
with logical replication feature.
This problem can be solved by many ways and approaches. I wrote the
patch to solve this problem (see in attachment) by the shortest way.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

fail_replication.sh (1K) Download Attachment
0001-Fix-the-problem-of-logical-replication-with-domain-N.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication can be broken by domain constraint with NOT VALID option

Tom Lane-2
Andrey Lepikhov <[hidden email]> writes:
> During patch development I ran into a small problem (see attachment,
> fail_replication.sh):
> 1. We have a table with logical replication to another node.
> 2. On the master and replica add such "NOT VALID" domain constraint on
> the table  that some tuples  violates the constraint.
> 3. UPDATE the table: set value of the tuple that violates constraint to
> correct value.
> 4. That's all!

> The reason for this problem is that on UPDATE walsender sends old tuple
> value (that violates the constraint) with new version (satisfied the
> constraint).
> Replication worker at replica node restores slot from transfer
> representation. During this process domain checking constraint and
> returns an ERROR.

I'm not sure this is something we should attempt to fix.  There are
an infinite number of ways you can break logical replication by
presenting it with inconsistent data, and that's really what you've
done here.

> This problem can be solved by many ways and approaches. I wrote the
> patch to solve this problem (see in attachment) by the shortest way.

That patch is certainly utterly unacceptable.  It'd allow the
receipient to accept data that violates the domain constraint.

The situation you're describing would probably best be handled by
not adding the constraint on the replica side until all the
bad data has been corrected (and replicated).

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Logical replication can be broken by domain constraint with NOT VALID option

Andrey Lepikhov


On 03/11/2019 20:42, Tom Lane wrote:

> Andrey Lepikhov <[hidden email]> writes:
>> The reason for this problem is that on UPDATE walsender sends old tuple
>> value (that violates the constraint) with new version (satisfied the
>> constraint).
>> Replication worker at replica node restores slot from transfer
>> representation. During this process domain checking constraint and
>> returns an ERROR.
>
> I'm not sure this is something we should attempt to fix.  There are
> an infinite number of ways you can break logical replication by
> presenting it with inconsistent data, and that's really what you've
> done here.
This problem reproduced by standard way from the documentation. I assume
this inconsistency option is allowed by SQL standard because it has a
practical usage.

>
>> This problem can be solved by many ways and approaches. I wrote the
>> patch to solve this problem (see in attachment) by the shortest way.
>
> That patch is certainly utterly unacceptable.  It'd allow the
> receipient to accept data that violates the domain constraint.

If this is the only reason, I propose a new version of the patch (see in
attachment). It is satisfy the "Paranoid safety" rule.
>
> The situation you're describing would probably best be handled by
> not adding the constraint on the replica side until all the
> bad data has been corrected (and replicated).

On any PostgreSQL-based multimaster system, this will be a problem.

--
regards,
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



v2-0001-Fix-the-problem-of-logical-replication-with-domain-N.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Logical replication can be broken by domain constraint with NOT VALID option

Euler Taveira
Em dom., 3 de nov. de 2019 às 23:33, Andrey Lepikhov
<[hidden email]> escreveu:

>
> On 03/11/2019 20:42, Tom Lane wrote:
> > Andrey Lepikhov <[hidden email]> writes:
> >> The reason for this problem is that on UPDATE walsender sends old tuple
> >> value (that violates the constraint) with new version (satisfied the
> >> constraint).
> >> Replication worker at replica node restores slot from transfer
> >> representation. During this process domain checking constraint and
> >> returns an ERROR.
> >
> > I'm not sure this is something we should attempt to fix.  There are
> > an infinite number of ways you can break logical replication by
> > presenting it with inconsistent data, and that's really what you've
> > done here.
>
> This problem reproduced by standard way from the documentation. I assume
> this inconsistency option is allowed by SQL standard because it has a
> practical usage.
>
Could you point out the problem in the documentation?

> >
> >> This problem can be solved by many ways and approaches. I wrote the
> >> patch to solve this problem (see in attachment) by the shortest way.
> >
> > That patch is certainly utterly unacceptable.  It'd allow the
> > receipient to accept data that violates the domain constraint.
>
> If this is the only reason, I propose a new version of the patch (see in
> attachment). It is satisfy the "Paranoid safety" rule.
> >
I don't think that is acceptable either. If you have different schemas
(even for a small period of time), you should handle it dropping and
recreating the constraints. Logical replication is far from a complete
feature. There should be cases that someone wants to enforce even the
FK constraints in the subscriber. I certainly wouldn't like to open
that can of worms. Relaxing constraints could lead to inconsistent
datasets across nodes. If you want to accept constraint violation,
drop the constraints.

> > The situation you're describing would probably best be handled by
> > not adding the constraint on the replica side until all the
> > bad data has been corrected (and replicated).
>
> On any PostgreSQL-based multimaster system, this will be a problem.
>
... if you do not replicate DDLs in the same order it occurs or if you
have different schemas.


--
   Euler Taveira                                   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Reply | Threaded
Open this post in threaded view
|

Re: Logical replication can be broken by domain constraint with NOT VALID option

Andrey Lepikhov


On 05/11/2019 20:21, Euler Taveira wrote:
> Em dom., 3 de nov. de 2019 às 23:33, Andrey Lepikhov
> <[hidden email]> escreveu:
>> If this is the only reason, I propose a new version of the patch (see in
>> attachment). It is satisfy the "Paranoid safety" rule.
> I don't think that is acceptable either. If you have different schemas
> (even for a small period of time), you should handle it dropping and
> recreating the constraints.

Changing schema is a big deal. But adding a constraint with "not valid"
option can be used frequently. May be for change phone numbers format,
for example.

> Logical replication is far from a complete
> feature. There should be cases that someone wants to enforce even the
> FK constraints in the subscriber. I certainly wouldn't like to open
> that can of worms. Relaxing constraints could lead to inconsistent
> datasets across nodes. If you want to accept constraint violation,
> drop the constraints.

May be logical replication is incomplete. But it is no argument to not
fix an errors that we found.
In v2 version of the patch constraints are suppressed only for old
version of the tuple that used for search in the heap and can't be
applied. In this sense we do not relaxing any constraints.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company