Unnecessary delay in streaming replication due to replay lag

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Unnecessary delay in streaming replication due to replay lag

Asim R P
Hi

Standby does not start walreceiver process until startup process
finishes WAL replay.  The more WAL there is to replay, longer is the
delay in starting streaming replication.  If replication connection is
temporarily disconnected, this delay becomes a major problem and we
are proposing a solution to avoid the delay.

WAL replay is likely to fall behind when master is processing
write-heavy workload, because WAL is generated by concurrently running
backends on master while only one startup process on standby replays WAL
records in sequence as new WAL is received from master.

Replication connection between walsender and walreceiver may break due
to reasons such as transient network issue, standby going through
restart, etc.  The delay in resuming replication connection leads to
lack of high availability - only one copy of WAL is available during
this period.

The problem worsens when the replication is configured to be
synchronous.  Commits on master must wait until the WAL replay is
finished on standby, walreceiver is then started and it confirms flush
of WAL upto the commit LSN.  If synchronous_commit GUC is set to
remote_write, this behavior is equivalent to tacitly changing it to
remote_apply until the replication connection is re-established!

Has anyone encountered such a problem with streaming replication?

We propose to address this by starting walreceiver without waiting for
startup process to finish replay of WAL.  Please see attached
patchset.  It can be summarized as follows:

    0001 - TAP test to demonstrate the problem.

    0002 - The standby startup sequence is changed such that
           walreceiver is started by startup process before it begins
           to replay WAL.

    0003 - Postmaster starts walreceiver if it finds that a
           walreceiver process is no longer running and the state
           indicates that it is operating as a standby.

This is a POC, we are looking for early feedback on whether the
problem is worth solving and if it makes sense to solve if along this
route.

Hao and Asim

0001-Test-that-replay-of-WAL-logs-on-standby-does-not-aff.patch (12K) Download Attachment
0003-Start-WAL-receiver-when-it-is-found-not-running.patch (8K) Download Attachment
0002-Start-WAL-receiver-before-startup-process-replays-ex.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Michael Paquier-2
On Fri, Jan 17, 2020 at 09:34:05AM +0530, Asim R P wrote:
> Standby does not start walreceiver process until startup process
> finishes WAL replay.  The more WAL there is to replay, longer is the
> delay in starting streaming replication.  If replication connection is
> temporarily disconnected, this delay becomes a major problem and we
> are proposing a solution to avoid the delay.

Yeah, that's documented:
https://www.postgresql.org/message-id/20190910062325.GD11737@...

> We propose to address this by starting walreceiver without waiting for
> startup process to finish replay of WAL.  Please see attached
> patchset.  It can be summarized as follows:
>
>     0001 - TAP test to demonstrate the problem.

There is no real need for debug_replay_delay because we have already
recovery_min_apply_delay, no?  That would count only after consistency
has been reached, and only for COMMIT records, but your test would be
enough with that.

>     0002 - The standby startup sequence is changed such that
>            walreceiver is started by startup process before it begins
>            to replay WAL.

See below.

>     0003 - Postmaster starts walreceiver if it finds that a
>            walreceiver process is no longer running and the state
>            indicates that it is operating as a standby.

I have not checked in details, but I smell some race conditions
between the postmaster and the startup process here.

> This is a POC, we are looking for early feedback on whether the
> problem is worth solving and if it makes sense to solve if along this
> route.

You are not the first person interested in this problem, we have a
patch registered in this CF to control the timing when a WAL receiver
is started at recovery:
https://commitfest.postgresql.org/26/1995/
https://www.postgresql.org/message-id/b271715f-f945-35b0-d1f5-c9de3e56f65e@...

I am pretty sure that we should not change the default behavior to
start the WAL receiver after replaying everything from the archives to
avoid copying some WAL segments for nothing, so being able to use a
GUC switch should be the way to go, and Konstantin's latest patch was
using this approach.  Your patch 0002 adds visibly a third mode: start
immediately on top of the two ones already proposed:
- Start after replaying all WAL available locally and in the
archives.
- Start after reaching a consistent point.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Asim R P
On Fri, Jan 17, 2020 at 11:08 AM Michael Paquier <[hidden email]> wrote:

>
> On Fri, Jan 17, 2020 at 09:34:05AM +0530, Asim R P wrote:
> >
> >     0001 - TAP test to demonstrate the problem.
>
> There is no real need for debug_replay_delay because we have already
> recovery_min_apply_delay, no?  That would count only after consistency
> has been reached, and only for COMMIT records, but your test would be
> enough with that.
>

Indeed, we didn't know about recovery_min_apply_delay.  Thank you for
the suggestion, the updated test is attached.

>
> > This is a POC, we are looking for early feedback on whether the
> > problem is worth solving and if it makes sense to solve if along this
> > route.
>
> You are not the first person interested in this problem, we have a
> patch registered in this CF to control the timing when a WAL receiver
> is started at recovery:
> https://commitfest.postgresql.org/26/1995/
> https://www.postgresql.org/message-id/b271715f-f945-35b0-d1f5-c9de3e56f65e@...
>

Great to know about this patch and the discussion.  The test case and
the part that saves next start point in control file from our patch
can be combined with Konstantin's patch to solve this problem.  Let me
work on that.

> I am pretty sure that we should not change the default behavior to
> start the WAL receiver after replaying everything from the archives to
> avoid copying some WAL segments for nothing, so being able to use a
> GUC switch should be the way to go, and Konstantin's latest patch was
> using this approach.  Your patch 0002 adds visibly a third mode: start
> immediately on top of the two ones already proposed:
> - Start after replaying all WAL available locally and in the
> archives.
> - Start after reaching a consistent point.

Consistent point should be reached fairly quickly, in spite of large
replay lag.  Min recovery point is updated during XLOG flush and that
happens when a commit record is replayed.  Commits should occur
frequently in the WAL stream.  So I do not see much value in starting
WAL receiver immediately as compared to starting it after reaching a
consistent point.  Does that make sense?

That said, is there anything obviously wrong with starting WAL receiver
immediately, even before reaching consistent state?  A consequence is
that WAL receiver may overwrite a WAL segment while startup process is
reading and replaying WAL from it.  But that doesn't appear to be a
problem because the overwrite should happen with identical content as
before.

Asim

v1-0001-Test-that-replay-of-WAL-logs-on-standby-does-not-.patch (10K) Download Attachment
v1-0003-Start-WAL-receiver-when-it-is-found-not-running.patch (9K) Download Attachment
v1-0002-Start-WAL-receiver-before-startup-process-replays.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Asim Praveen
I would like to revive this thready by submitting a rebased patch to start streaming replication without waiting for startup process to finish replaying all WAL.  The start LSN for streaming is determined to be the LSN that points to the beginning of the most recently flushed WAL segment.

The patch passes tests under src/test/recovery and top level “make check”.


v2-0001-Start-WAL-receiver-before-startup-process-replays.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Michael Paquier-2
On Sun, Aug 09, 2020 at 05:54:32AM +0000, Asim Praveen wrote:
> I would like to revive this thready by submitting a rebased patch to
> start streaming replication without waiting for startup process to
> finish replaying all WAL.  The start LSN for streaming is determined
> to be the LSN that points to the beginning of the most recently
> flushed WAL segment.
>
> The patch passes tests under src/test/recovery and top level “make check”.

I have not really looked at the proposed patch, but it would be good
to have some documentation.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Asim Praveen


> On 09-Aug-2020, at 2:11 PM, Michael Paquier <[hidden email]> wrote:
>
> I have not really looked at the proposed patch, but it would be good
> to have some documentation.
>

Ah, right.  The basic idea is to reuse the logic to allow read-only connections to also start WAL streaming.  The patch borrows a new GUC “wal_receiver_start_condition” introduced by another patch alluded to upthread.  It affects when to start WAL receiver process on a standby.  By default, the GUC is set to “replay”, which means no change in current behavior - WAL receiver is started only after replaying all WAL already available in pg_wal.  When set to “consistency”, WAL receiver process is started earlier, as soon as consistent state is reached during WAL replay.

The LSN where to start streaming from is determined to be the LSN that points at the beginning of the WAL segment file that was most recently flushed in pg_wal.  To find the most recently flushed WAL segment, first blocks of all WAL segment files in pg_wal, starting from the segment that contains currently replayed record, are inspected.  The search stops when a first page with no valid header is found.

The benefits of starting WAL receiver early are mentioned upthread but allow me to reiterate: as WAL streaming starts, any commits that are waiting for synchronous replication on the master are unblocked.  The benefit of this is apparent in situations where significant replay lag has been built up and the replication is configured to be synchronous.

Asim
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Masahiko Sawada-2
In reply to this post by Asim Praveen
On Sun, 9 Aug 2020 at 14:54, Asim Praveen <[hidden email]> wrote:
>
> I would like to revive this thready by submitting a rebased patch to start streaming replication without waiting for startup process to finish replaying all WAL.  The start LSN for streaming is determined to be the LSN that points to the beginning of the most recently flushed WAL segment.
>
> The patch passes tests under src/test/recovery and top level “make check”.
>

The patch can be applied cleanly to the current HEAD but I got the
error on building the code with this patch:

xlog.c: In function 'StartupXLOG':
xlog.c:7315:6: error: too few arguments to function 'RequestXLogStreaming'
 7315 |      RequestXLogStreaming(ThisTimeLineID,
      |      ^~~~~~~~~~~~~~~~~~~~
In file included from xlog.c:59:
../../../../src/include/replication/walreceiver.h:463:13: note: declared here
  463 | extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr,
      |             ^~~~~~~~~~~~~~~~~~~~

cfbot also complaints this.

Could you please update the patch?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Asim Praveen


> On 10-Aug-2020, at 12:27 PM, Masahiko Sawada <[hidden email]> wrote:
>
> The patch can be applied cleanly to the current HEAD but I got the
> error on building the code with this patch:
>
> xlog.c: In function 'StartupXLOG':
> xlog.c:7315:6: error: too few arguments to function 'RequestXLogStreaming'
> 7315 |      RequestXLogStreaming(ThisTimeLineID,
>      |      ^~~~~~~~~~~~~~~~~~~~
> In file included from xlog.c:59:
> ../../../../src/include/replication/walreceiver.h:463:13: note: declared here
>  463 | extern void RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr,
>      |             ^~~~~~~~~~~~~~~~~~~~
>
> cfbot also complaints this.
>
> Could you please update the patch?
>
Thank you for trying the patch and apologies for the compiler error.  I missed adding a hunk earlier, it should be fixed in the version attached here.


v3-0001-Start-WAL-receiver-before-startup-process-replays.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

lchch1990@sina.cn
Hello

I read the code and test the patch, it run well on my side, and I have several issues on the
patch.

1. When call RequestXLogStreaming() during replay, you pick timeline straightly from control
file, do you think it should pick timeline from timeline history file?

2. In archive recovery mode which will never turn to a stream mode, I think in current code it
will call RequestXLogStreaming() too which can avoid.

3. I found two 018_xxxxx.pl when I do make check, maybe rename the new one?



Regards,
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
EMAIL: mailto:movead(dot)li(at)highgo(dot)ca
Reply | Threaded
Open this post in threaded view
|

Re: Unnecessary delay in streaming replication due to replay lag

Michael Paquier-2
On Tue, Sep 15, 2020 at 05:30:22PM +0800, [hidden email] wrote:
> I read the code and test the patch, it run well on my side, and I have several issues on the
> patch.

+                   RequestXLogStreaming(ThisTimeLineID,
+                                        startpoint,
+                                        PrimaryConnInfo,
+                                        PrimarySlotName,
+                                        wal_receiver_create_temp_slot);

This patch thinks that it is fine to request streaming even if
PrimaryConnInfo is not set, but that's not fine.

Anyway, I don't quite understand what you are trying to achieve here.
"startpoint" is used to request the beginning of streaming.  It is
roughly the consistency LSN + some alpha with some checks on WAL
pages (those WAL page checks are not acceptable as they make
maintenance harder).  What about the case where consistency is
reached but there are many segments still ahead that need to be
replayed?  Your patch would cause streaming to begin too early, and
a manual copy of segments is not a rare thing as in some environments
a bulk copy of segments can make the catchup of a standby faster than
streaming.

It seems to me that what you are looking for here is some kind of
pre-processing before entering the redo loop to determine the LSN
that could be reused for the fast streaming start, which should match
the end of the WAL present locally.  In short, you would need a
XLogReaderState that begins a scan of WAL from the redo point until it
cannot find anything more, and use the last LSN found as a base to
begin requesting streaming.  The question of timeline jumps can also
be very tricky, but it could also be possible to not allow this option
if a timeline jump happens while attempting to guess the end of WAL
ahead of time.  Another thing: could it be useful to have an extra
mode to begin streaming without waiting for consistency to finish?
--
Michael

signature.asc (849 bytes) Download Attachment