> StartLogicalReplication() calls CreateDecodingContext(), which says:
> else if (start_lsn < slot->data.confirmed_flush)
> * It might seem like we should error out in this case, but it's
> * pretty common for a client to acknowledge a LSN it doesn't
> have to
> * do anything for, and thus didn't store persistently, because the
> * xlog records didn't result in anything relevant for logical
> * decoding. Clients have to be able to do that to support
> * replication.
> start_lsn = slot->data.confirmed_flush;
> I'm not sure I understand the comment overall. Why would the client
> request something that it has already acknowledged,
Because sometimes clients don't have to do anything for xlog records.
One example is WAL for DDL where logical decoding didn't produce
anything for the client but later with keepalive we send the LSN of
WAL where DDL has finished and the client just responds with the
position sent by the server as it doesn't have any other pending
> and why would the
> server override that and just advance to the confirmed_lsn?
I think because there is no need to process the WAL that has been
confirmed by the client. Do you see any problems with this scheme?
> On Fri, 2021-06-11 at 10:13 +0530, Amit Kapila wrote:
> > I think because there is no need to process the WAL that has been
> > confirmed by the client. Do you see any problems with this scheme?
> * Replication setups are complex, and it can be easy to misconfigure
> something or have a bug in some control code. An error is valuable to
> detect the problem closer to the source.
> * There are plausible configurations where things could go badly wrong.
> For instance, if you are storing the decoded data in another postgres
> server with syncrhonous_commit=off, and acknowledging LSNs before they
> are durable. A crash of the destination system would be consistent, but
> it would be missing some data earlier than the confirmed_flush_lsn. The
> client would then request the data starting at its stored lsn progress
> value, but the server would skip ahead to the confirmed_flush_lsn;
> silently missing data.
AFAIU, currently, in such a case, the subscriber (client) won't
advance the flush location (confirmed_flush_lsn). So, we won't lose
On Tue, Jun 15, 2021 at 3:21 AM Jeff Davis <[hidden email]> wrote:
> On Mon, 2021-06-14 at 13:13 -0400, Robert Haas wrote:
> > I'm happy to hear other opinions, but I think I would be inclined to
> > vote in favor of #1 and/or #2 but against #3.
> What about upgrading it to, say, LOG? It seems like it would happen
> pretty infrequently, and in the event something strange happens, might
> rule out some possibilities.
I don't see any problem with changing it to LOG if that helps
especially because it won't happen too often.
> On Tue, 2021-06-15 at 15:19 +0900, Kyotaro Horiguchi wrote:
> > I don't think the message is neded, but I don't oppose it as far as
> > the level is LOG and the messages were changed as something like
> > this:
> > - elog(DEBUG1, "cannot stream from %X/%X, minimum is
> > %X/%X, forwarding",
> > + elog(LOG, "%X/%X has been already streamed,
> > forwarding to %X/%X",
> > FWIW, I most prefer #1. I see #2 as optional. and see #3 as the
> > above.