Possible missing segments in archiving on standby

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Possible missing segments in archiving on standby

Kyotaro Horiguchi-4
Hello.

While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.

A. Walreceiver is gracefully terminated just after a segment is
   finished.

B. Walreceiver is gracefully terminated while receiving filling chunks
  for a segment switch.

The two above are reprodusible (without distinction between the two)
using a helper patch. See below.

There's one more issue here.

C. Standby doesn't archive a segment until walreceiver receives any
  data for the next segment.

I'm not sure wehther we assume C as an issue.

The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000).  That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.

The second attached patch fixes all of A to C, but seems somewhat
redundant.

Any opnions and/or suggestions are welcome.


The attached files are:

1. v1-0001-Make-sure-standby-archives-all-segments.patch:
  Fix for A and B.

2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
  Fix for A, B and C.

3. repro.sh
  The reproducer shell script used below.

4. repro_helper.patch
  Helper patch for repro.sh for master and patch 1 above.

5. repro_helper2.patch
  Helper patch for repro.sh for patch 2 above.

=====
** REPRODUCER

The failure is reproducible with some code tweak.

1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge

4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t; select pg_switch_wal();"

5. look into the archive directory of the standby.
   If no missing segments found in archive, repeat from 3.

The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.

$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008

The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


From afa907bca7db8ea6335d47bd02761f567591d553 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Tue, 30 Jun 2020 14:21:30 +0900
Subject: [PATCH v1] Make sure standby archives all segments

Standby fails to archive a segment if standby is stopped just after a
segment is finished or stopped just after a segment swtich. Make sure
that walreceiver archives all segments by rechecking at start.
---
 src/backend/replication/walreceiver.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..680154365d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -938,6 +938,23 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
  else
  XLogArchiveNotify(xlogfname);
  }
+ else if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS)
+ {
+ /*
+ * If we are starting streaming at the beginning of a segment,
+ * there may be casees where the previous segment have not been
+ * archived yet.  Make sure it is archived.
+ */
+ char xlogfname[MAXFNAMELEN];
+ XLogSegNo prevseg;
+
+ XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+ XLogFileName(xlogfname, ThisTimeLineID, prevseg,
+ wal_segment_size);
+
+ /* Mark as ".ready" of not yet */
+ XLogArchiveCheckDone(xlogfname);
+ }
  recvFile = -1;
 
  /* Create/use new log file */
--
2.18.4


From 7af716134dceb3bafce421dfeaffebf1e1e3e17d Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Mon, 29 Jun 2020 16:12:01 +0900
Subject: [PATCH v1] Make sure standby archives all segments immediately

Standby may get a bit being late in archive, since walsender doesn't
archive a segment until it receives any data for the next segment, Fix
that by archiving just after a segment is finished.

Also, standby fails to archive a segment if standby is stopped just
after a segment is finished or stopped just after a segment
swtich. Make sure that walreceiver archives all segments by rechecking
at start.
---
 src/backend/replication/walreceiver.c | 82 ++++++++++++++++++---------
 1 file changed, 54 insertions(+), 28 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..831718c859 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -902,49 +902,34 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
  {
  int segbytes;
 
- if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+ /* Open the segment if not yet */
+ if (recvFile < 0)
  {
  bool use_existent;
 
+ recvFileTLI = ThisTimeLineID;
+
  /*
- * fsync() and close current file before we switch to next one. We
- * would otherwise have to reopen this file to fsync it later
+ * If we are starting streaming at the beginning of a segment,
+ * there may be the case where the previous segment have not been
+ * archived yet.  Make sure it is archived.
  */
- if (recvFile >= 0)
+ if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && recvSegNo == 0)
  {
  char xlogfname[MAXFNAMELEN];
+ XLogSegNo prevseg;
 
- XLogWalRcvFlush(false);
+ XLByteToPrevSeg(recptr, prevseg, wal_segment_size);
+ XLogFileName(xlogfname, recvFileTLI, prevseg, wal_segment_size);
 
- XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
-
- /*
- * XLOG segment files will be re-read by recovery in startup
- * process soon, so we don't advise the OS to release cache
- * pages associated with the file like XLogFileClose() does.
- */
- if (close(recvFile) != 0)
- ereport(PANIC,
- (errcode_for_file_access(),
- errmsg("could not close log segment %s: %m",
- xlogfname)));
-
- /*
- * Create .done file forcibly to prevent the streamed segment
- * from being archived later.
- */
- if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
- XLogArchiveForceDone(xlogfname);
- else
- XLogArchiveNotify(xlogfname);
+ /* Mark as ".ready" of not yet */
+ XLogArchiveCheckDone(xlogfname);
  }
- recvFile = -1;
 
  /* Create/use new log file */
  XLByteToSeg(recptr, recvSegNo, wal_segment_size);
  use_existent = true;
  recvFile = XLogFileInit(recvSegNo, &use_existent, true);
- recvFileTLI = ThisTimeLineID;
  }
 
  /* Calculate the start offset of the received logs */
@@ -985,6 +970,47 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
  buf += byteswritten;
 
  LogstreamResult.Write = recptr;
+
+ /*
+ * Close the current WAL segment if it is completed then let the file
+ * be archived if needed.
+ */
+ if (!XLByteInSeg(recptr, recvSegNo, wal_segment_size))
+ {
+ char xlogfname[MAXFNAMELEN];
+
+ Assert (recvFile >= 0);
+
+ /*
+ * fsync() and close current file before we switch to next one. We
+ * would otherwise have to reopen this file to fsync it later
+ */
+ XLogWalRcvFlush(false);
+
+ XLogFileName(xlogfname, recvFileTLI, recvSegNo, wal_segment_size);
+
+ /*
+ * XLOG segment files will be re-read by recovery in startup
+ * process soon, so we don't advise the OS to release cache
+ * pages associated with the file like XLogFileClose() does.
+ */
+ if (close(recvFile) != 0)
+ ereport(PANIC,
+ (errcode_for_file_access(),
+ errmsg("could not close log segment %s: %m",
+ xlogfname)));
+
+ /*
+ * Create .done file forcibly to prevent the streamed segment
+ * from being archived later.
+ */
+ if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS)
+ XLogArchiveForceDone(xlogfname);
+ else
+ XLogArchiveNotify(xlogfname);
+
+ recvFile = -1;
+ }
  }
 
  /* Update shared-memory status */
--
2.18.4


#! /bin/bash

ROOT=`pwd`
LOGFILE="repro.log"
PGPORT1=15432
PGPORT2=15433

PGDATA1=$ROOT/reprodata1
ARCHDIR1=$ROOT/reproarc1
PGDATA2=$ROOT/reprodata2
ARCHDIR2=$ROOT/reproarc2


function cleanup {
        echo -n "Killing servers..."
        pg_ctl -D $PGDATA1 -m i stop
        pg_ctl -D $PGDATA2 -m i stop
        echo "done."
        exit 1
}

rm -r $PGDATA1 $PGDATA2 $ARCHDIR1 $ARCHDIR2
mkdir $ARCHDIR1 $ARCHDIR2

# Create primary
echo "# Creating primary"
initdb -D $PGDATA1 &>$LOGFILE
cat >> $PGDATA1/postgresql.conf <<EOF
wal_keep_segments=10
archive_mode=always
archive_command='cp %p $ARCHDIR1/%f'
EOF

# Start primary
echo "# Starting primary"
pg_ctl -D $PGDATA1 -o"-p $PGPORT1" start &>>$LOGFILE

# Create standby
echo "# Creating standby"
pg_basebackup -D $PGDATA2 -h /tmp -p $PGPORT1 &>>$LOGFILE
cat >> $PGDATA2/postgresql.conf <<EOF
archive_command='cp %p $ARCHDIR2/%f'
primary_conninfo='host=/tmp port=$PGPORT1'
EOF
touch $PGDATA2/standby.signal

trap cleanup ERR 2 3 15

# Start primary
echo "# Starting standby"
pg_ctl -D $PGDATA2 -o"-p $PGPORT2" start &>>$LOGFILE
sleep 3

echo "Start:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

# Delocate from boundary..
psql -p $PGPORT1 -c "create table t(); drop table t" &>>$LOGFILE
sleep 1

# TEST 1: walreceiver stops just after a segment is completed
echo "# test 1" >> $LOGFILE
touch /tmp/hoge1
psql -p $PGPORT1 -c "create table t(a int); insert into t (select a from generate_series(0, 260000) a); drop table t;" &>>$LOGFILE
echo "# test 1 end" >> $LOGFILE

psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
sleep 2

echo "After test 1:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
sleep 2

# TEST 2: walreceiver stops while receiving filling chunks after a wal switch.
echo "# test 2" >> $LOGFILE
touch /tmp/hoge2
psql -p $PGPORT1 -c "create table t(); drop table t; select pg_switch_wal()" &>>$LOGFILE
echo "# test 2 end" >> $LOGFILE

sleep 2

echo "After test 2:"
echo -n "Primary location: "
psql -tAp $PGPORT1 -c "select pg_current_wal_lsn()"
echo -n "Standby location: "
psql -tAp $PGPORT2 -c "select pg_last_wal_receive_lsn()"

# stop servers
pg_ctl -D $PGDATA1 stop &>>$LOGFILE
pg_ctl -D $PGDATA2 stop &>>$LOGFILE

#show last three archived segments
echo "# primary archive"
ls $ARCHDIR1 | egrep '[3-9]$'
echo "# standby archive"
ls $ARCHDIR2 | egrep '[3-9]$'

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index d1ad75da87..dcccef151d 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -985,6 +985,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
  buf += byteswritten;
 
  LogstreamResult.Write = recptr;
+
+ {
+ /* fake oneshot SIGTERM just at segment end */
+ struct stat b;
+ char *sigfile1 = "/tmp/hoge1";
+ char *sigfile2 = "/tmp/hoge2";
+
+ if (LogstreamResult.Write % wal_segment_size == 0 &&
+ stat(sigfile1, &b) == 0)
+ {
+ unlink(sigfile1);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+ }
+
+ if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+ stat(sigfile2, &b) == 0)
+ {
+ unlink(sigfile2);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+ }
+ }
  }
 
  /* Update shared-memory status */

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 831718c859..b9a7c73ed7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1011,6 +1011,29 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
 
  recvFile = -1;
  }
+
+ {
+ /* fake oneshot SIGTERM just at segment end */
+ struct stat b;
+ char *sigfile1 = "/tmp/hoge1";
+ char *sigfile2 = "/tmp/hoge2";
+
+ if (LogstreamResult.Write % wal_segment_size == 0 &&
+ stat(sigfile1, &b) == 0)
+ {
+ unlink(sigfile1);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig1@%lX", LogstreamResult.Write)));
+ }
+
+ if (LogstreamResult.Write % wal_segment_size == 0x500000 &&
+ stat(sigfile2, &b) == 0)
+ {
+ unlink(sigfile2);
+ got_SIGTERM = true;
+ ereport(LOG,(errmsg("STOP BY trig2@%lX", LogstreamResult.Write)));
+ }
+ }
  }
 
  /* Update shared-memory status */