locking [user] catalog tables vs 2pc vs logical rep

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

locking [user] catalog tables vs 2pc vs logical rep

Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
On Tue, Feb 23, 2021 at 3:58 AM Andres Freund <[hidden email]> wrote:
>
>
> At first this seems to be a significant issue. But on the other hand, if
> you were to shut the cluster down in this situation (or disconnect all
> sessions), you have broken cluster on your hand - without logical
> decoding being involved. As it turns out, we need to read pg_class to
> log in...  And I can't remember this being reported to be a problem?
>

I don't remember seeing such a report but I think that is a reason
enough (leaving aside logical decoding of 2PC) to either disallow
locking catalog tables or at least document it in some way.

>
> Perhaps all that we need to do is to disallow 2PC prepare if [user]
> catalog tables have been locked exclusively?
>

Right, and we have discussed this during development [1][2]. We
thought either we disallow this operation or will document it. I
thought of doing this along with a core-implementation of Prepare
waiting to get it logically replicated. But at this stage, I think if
the user wants he can do a similar thing in his application where
after prepare it can wait for the transaction to get logically
replicated (if they have their own replication solution based on
logical decoding) and then decide whether to rollback or commit. So,
maybe we should either disallow this operation or at least document
it. What do you think?

> Similar to how we're
> disallowing preparing tables with temp table access.
>

Yeah, we disallow other things like pg_export_snapshot as well in a
Prepared transaction, so we can probably disallow this operation as
well.

[1] - https://www.postgresql.org/message-id/CAA4eK1JeeXOwD6rYnhSOYk5YN-fUTmxe1GkTpN2-BvgnKN6gZg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAMGcDxf83P5SGnGH52=_0wRP9pO6uRWCMRwAA0nxKtZvir2_vQ@...

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
On Tue, Feb 23, 2021 at 9:09 AM Andres Freund <[hidden email]> wrote:

>
> On 2021-02-23 08:56:39 +0530, Amit Kapila wrote:
> > On Tue, Feb 23, 2021 at 3:58 AM Andres Freund <[hidden email]> wrote:
> > > Perhaps all that we need to do is to disallow 2PC prepare if [user]
> > > catalog tables have been locked exclusively?
>
> > Right, and we have discussed this during development [1][2].
>
> I remember bringing it up before as well... Issues like this really need
> to be mentioned as explicit caveats at least somewhere in the code and
> commit message.
>

Okay, so is it sufficient to add comments in code, or do we want to
add something in docs? I am not completely sure if we need to add in
docs till we have core-implementation of prepare waiting to get
logically replicated.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
On Tue, Feb 23, 2021 at 9:33 AM Andres Freund <[hidden email]> wrote:

>
> On 2021-02-23 09:24:18 +0530, Amit Kapila wrote:
> > Okay, so is it sufficient to add comments in code, or do we want to
> > add something in docs? I am not completely sure if we need to add in
> > docs till we have core-implementation of prepare waiting to get
> > logically replicated.
>
> There's plenty users of logical decoding that aren't going through the
> normal replication mechanism - so they can hit this. So I think it needs
> to be documented somewhere.
>
As per discussion, the attached patch updates both docs and comments
in the code.

--
With Regards,
Amit Kapila.

v1-0001-Update-the-docs-and-comments-for-decoding-of-prep.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
On Tue, Feb 23, 2021 at 12:00 PM Amit Kapila <[hidden email]> wrote:

>
> On Tue, Feb 23, 2021 at 9:33 AM Andres Freund <[hidden email]> wrote:
> >
> > On 2021-02-23 09:24:18 +0530, Amit Kapila wrote:
> > > Okay, so is it sufficient to add comments in code, or do we want to
> > > add something in docs? I am not completely sure if we need to add in
> > > docs till we have core-implementation of prepare waiting to get
> > > logically replicated.
> >
> > There's plenty users of logical decoding that aren't going through the
> > normal replication mechanism - so they can hit this. So I think it needs
> > to be documented somewhere.
> >
>
> As per discussion, the attached patch updates both docs and comments
> in the code.
>

I have pushed this patch.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

vignesh C
In reply to this post by Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Ajin Cherian
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

vignesh C
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

vignesh C
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
On Tue, Apr 20, 2021 at 9:57 AM vignesh C <[hidden email]> wrote:
>
> This similar problem exists in case of synchronous replication setup
> having synchronous_standby_names referring to the subscriber, when we
> do the steps "begin;lock pg_class; insert into test1 values(10);
> commit". In this case while decoding of commit, the commit will wait
> while trying to acquire a lock on pg_class relation,
>

So, this appears to be an existing caveat of synchronous replication.
If that is the case, I am not sure if it is a good idea to just block
such ops for the prepared transaction. Also, what about other
operations which acquire an exclusive lock on [user]_catalog_tables
like:
cluster pg_trigger using pg_class_oid_index, similarly cluster on any
user_catalog_table, then the other problematic operation could
truncate of user_catalog_table as is discussed in another thread [1].
I think all such operations can block even with synchronous
replication. I am not sure if we can create examples for all cases
because for ex. we don't have use of user_catalog_tables in-core but
maybe for others, we can try to create examples and see what happens?

If all such operations can block for synchronous replication and
prepared transactions replication then we might want to document them
as caveats at page:
https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
and then also give the reference for these caveats at prepared
transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html

What do you think?

As this appears to be an existing caveat of logical replication, I
have added the Petr and Peter E in this email.

[1] - https://www.postgresql.org/message-
id/OSBPR01MB4888314C70DA6B112E32DD6AED2B9%40OSBPR01MB4888.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Michael Paquier-2
In reply to this post by Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Michael Paquier-2
In reply to this post by akapila
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

RE: locking [user] catalog tables vs 2pc vs logical rep

osumi.takamichi@fujitsu.com
In reply to this post by akapila
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
On Tue, May 25, 2021 at 1:43 PM [hidden email]
<[hidden email]> wrote:

>
> On Monday, May 24, 2021 1:33 PM Amit Kapila <[hidden email]> wrote:
> > On Tue, Apr 20, 2021 at 9:57 AM vignesh C <[hidden email]> wrote:
> > >
> > > This similar problem exists in case of synchronous replication setup
> > > having synchronous_standby_names referring to the subscriber, when we
> > > do the steps "begin;lock pg_class; insert into test1 values(10);
> > > commit". In this case while decoding of commit, the commit will wait
> > > while trying to acquire a lock on pg_class relation,
> > >
> >
> > So, this appears to be an existing caveat of synchronous replication.
> > If that is the case, I am not sure if it is a good idea to just block such ops for the
> > prepared transaction. Also, what about other operations which acquire an
> > exclusive lock on [user]_catalog_tables
> > like:
> > cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> > user_catalog_table, then the other problematic operation could truncate of
> > user_catalog_table as is discussed in another thread [1].
> > I think all such operations can block even with synchronous replication. I am not
> > sure if we can create examples for all cases because for ex. we don't have use
> > of user_catalog_tables in-core but maybe for others, we can try to create
> > examples and see what happens?
> >
> > If all such operations can block for synchronous replication and prepared
> > transactions replication then we might want to document them as caveats at
> > page:
> > https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> > and then also give the reference for these caveats at prepared transactions
> > page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-com
> > mits.html
> >
> > What do you think?
> I've checked the behavior of CLUSTER command
> in synchronous mode, one of the examples above, as well.
>
> IIUC, you meant pg_class, and
> the deadlock happens when I run cluster commands on pg_class using its index in synchronous mode.
> The command I used is "BEGIN; CLUSTER pg_class USING pg_class_oid_index; END;".
> This deadlock comes from 2 processes, the backend to wait synchronization of the standby
> and the walsender process which wants to take a lock on pg_class.
>

Have you tried to prepare this transaction? That won't be allowed. I
wanted to see if we can generate some scenarios where it is blocked
for prepared xacts decoding and for synchronous replication.

> Therefore, I think we need to do something, at least documentation fix,
> as you mentioned.
>

Yes, I think that is true.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

akapila
In reply to this post by Michael Paquier-2
On Tue, May 25, 2021 at 12:40 PM Michael Paquier <[hidden email]> wrote:

>
> On Mon, May 24, 2021 at 10:03:01AM +0530, Amit Kapila wrote:
> > So, this appears to be an existing caveat of synchronous replication.
> > If that is the case, I am not sure if it is a good idea to just block
> > such ops for the prepared transaction. Also, what about other
> > operations which acquire an exclusive lock on [user]_catalog_tables
> > like:
> > cluster pg_trigger using pg_class_oid_index, similarly cluster on any
> > user_catalog_table, then the other problematic operation could
> > truncate of user_catalog_table as is discussed in another thread [1].
> > I think all such operations can block even with synchronous
> > replication. I am not sure if we can create examples for all cases
> > because for ex. we don't have use of user_catalog_tables in-core but
> > maybe for others, we can try to create examples and see what happens?
> >
> > If all such operations can block for synchronous replication and
> > prepared transactions replication then we might want to document them
> > as caveats at page:
> > https://www.postgresql.org/docs/devel/logicaldecoding-synchronous.html
> > and then also give the reference for these caveats at prepared
> > transactions page:https://www.postgresql.org/docs/devel/logicaldecoding-two-phase-commits.html
> >
> > What do you think?
>
> It seems to me that the 2PC issues on catalog tables and the issues
> related to logical replication in synchonous mode are two distinct
> things that need to be fixed separately.
>

Fair enough. But the way we were looking at them as they will also
block (lead to deadlock) for logical replication of prepared
transactions and also logical replication in synchonous mode without
prepared transactions. Now, if we want to deal with the 2PC issues
separately that should be fine as well. However, for that we need to
see which all operations we want to block on [user]_catalog_tables.
The first one is lock command, then there are other operations like
Cluster which take exclusive lock on system catalog tables and we
allow them to be part of prepared transactions (example Cluster
pg_trigger using pg_trigger_oid_index;), another kind of operation is
Truncate on user_catalog_tables. Now, some of these might not allow
connecting after restart so we might need to think whether we want to
prohibit all such operations or only some of them.

>
> The second issue with logical replication is still disruptive, but it
> looks to me more like a don't-do-it issue, and documenting the caveats
> sounds fine enough.
>

Right, that is what I also think.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Tom Lane-2
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

Petr Jelinek-5
In reply to this post by akapila
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: locking [user] catalog tables vs 2pc vs logical rep

vignesh C
In reply to this post by Michael Paquier-2
CONTENTS DELETED
The author has deleted this message.
123