Question about StartLogicalReplication() error path

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about StartLogicalReplication() error path

Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

akapila
On Fri, Jun 11, 2021 at 7:38 AM Jeff Davis <[hidden email]> wrote:

>
> StartLogicalReplication() calls CreateDecodingContext(), which says:
>
>   else if (start_lsn < slot->data.confirmed_flush)
>   {
>     /*
>      * It might seem like we should error out in this case, but it's
>      * pretty common for a client to acknowledge a LSN it doesn't
> have to
>      * do anything for, and thus didn't store persistently, because the
>      * xlog records didn't result in anything relevant for logical
>      * decoding. Clients have to be able to do that to support
> synchronous
>      * replication.
>      */
>      ...
>      start_lsn = slot->data.confirmed_flush;
>   }
>
..
..
>
> I'm not sure I understand the comment overall. Why would the client
> request something that it has already acknowledged,
>

Because sometimes clients don't have to do anything for xlog records.
One example is WAL for DDL where logical decoding didn't produce
anything for the client but later with keepalive we send the LSN of
WAL where DDL has finished and the client just responds with the
position sent by the server as it doesn't have any other pending
transactions.

> and why would the
> server override that and just advance to the confirmed_lsn?
>

I think because there is no need to process the WAL that has been
confirmed by the client. Do you see any problems with this scheme?

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Robert Haas
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
In reply to this post by Robert Haas
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
In reply to this post by Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Andres Freund
In reply to this post by Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Robert Haas
In reply to this post by Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Andres Freund
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Andres Freund
In reply to this post by Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

akapila
In reply to this post by Jeff Davis-8
On Fri, Jun 11, 2021 at 11:52 AM Jeff Davis <[hidden email]> wrote:

>
> On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:
> > I think because there is no need to process the WAL that has been
> > confirmed by the client. Do you see any problems with this scheme?
>
> Several:
>
> * Replication setups are complex, and it can be easy to misconfigure
> something or have a bug in some control code. An error is valuable to
> detect the problem closer to the source.
>
> * There are plausible configurations where things could go badly wrong.
> For instance, if you are storing the decoded data in another postgres
> server with syncrhonous_commit=off, and acknowledging LSNs before they
> are durable. A crash of the destination system would be consistent, but
> it would be missing some data earlier than the confirmed_flush_lsn. The
> client would then request the data starting at its stored lsn progress
> value, but the server would skip ahead to the confirmed_flush_lsn;
> silently missing data.
>

AFAIU, currently, in such a case, the subscriber (client) won't
advance the flush location (confirmed_flush_lsn). So, we won't lose
any data.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Robert Haas
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Kyotaro Horiguchi-4
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

akapila
In reply to this post by Jeff Davis-8
On Tue, Jun 15, 2021 at 3:21 AM Jeff Davis <[hidden email]> wrote:
>
> On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:
> > I'm happy to hear other opinions, but I think I would be inclined to
> > vote in favor of #1 and/or #2 but against #3.
>
> What about upgrading it to, say, LOG? It seems like it would happen
> pretty infrequently, and in the event something strange happens, might
> rule out some possibilities.
>

I don't see any problem with changing it to LOG if that helps
especially because it won't happen too often.

--
With Regards,
Amit Kapila.


Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

Jeff Davis-8
In reply to this post by Kyotaro Horiguchi-4
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Question about StartLogicalReplication() error path

akapila
On Thu, Jun 17, 2021 at 4:25 AM Jeff Davis <[hidden email]> wrote:

>
> On Tue, 2021-06-15 at 15:19 +0900, Kyotaro Horiguchi wrote:
> > I don't think the message is neded, but I don't oppose it as far as
> > the level is LOG and the messages were changed as something like
> > this:
> >
> >
> > -               elog(DEBUG1, "cannot stream from %X/%X, minimum is
> > %X/%X, forwarding",
> > +               elog(LOG, "%X/%X has been already streamed,
> > forwarding to %X/%X",
> >
> > FWIW, I most prefer #1. I see #2 as optional. and see #3 as the
> > above.
>
> Attached.
>

LGTM.

--
With Regards,
Amit Kapila.