Problem while setting the fpw with SIGHUP

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
63 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Problem while setting the fpw with SIGHUP

Dilip Kumar-2
While setting the full_page_write with SIGHUP I hit an assert in checkpoint process. And, that is because inside a CRITICAL section we are calling RecoveryInProgress which intern allocates memory.  So I have moved RecoveryInProgress call out of the CRITICAL section and the problem got solved.

command executed: killall -SIGHUP postgres
Crash call stack:
#0  0x00007fa19560d5d7 in raise () from /lib64/libc.so.6
#1  0x00007fa19560ecc8 in abort () from /lib64/libc.so.6
#2  0x00000000009fc991 in ExceptionalCondition (conditionName=0xc5ab1c "!(CritSectionCount == 0)", errorType=0xc5a739 "FailedAssertion", 
    fileName=0xc5a8a5 "mcxt.c", lineNumber=635) at assert.c:54
#3  0x0000000000a34e56 in MemoryContextCreate (node=0x192edc0, tag=T_AllocSetContext, size=216, nameoffset=216, methods=0xc58620 <AllocSetMethods>, 
    parent=0x18fe1b0, name=0xac1137 "WAL record construction", flags=0) at mcxt.c:635
#4  0x0000000000a2aaa1 in AllocSetContextCreateExtended (parent=0x18fe1b0, name=0xac1137 "WAL record construction", flags=0, minContextSize=0, 
    initBlockSize=8192, maxBlockSize=8388608) at aset.c:463
#5  0x000000000055983c in InitXLogInsert () at xloginsert.c:1033
#6  0x000000000054e4e5 in InitXLOGAccess () at xlog.c:8183
#7  0x000000000054df71 in RecoveryInProgress () at xlog.c:7952
#8  0x00000000005507f6 in UpdateFullPageWrites () at xlog.c:9566
#9  0x00000000007ea821 in UpdateSharedMemoryConfig () at checkpointer.c:1366
#10 0x00000000007e95a1 in CheckpointerMain () at checkpointer.c:383

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

fix_fpwupdate.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Fri, Mar 09, 2018 at 01:42:04PM +0530, Dilip Kumar wrote:
> While setting the full_page_write with SIGHUP I hit an assert in checkpoint
> process. And, that is because inside a CRITICAL section we are calling
> RecoveryInProgress which intern allocates memory.  So I have moved
> RecoveryInProgress call out of the CRITICAL section and the problem got
> solved.

Indeed, I can see how this is possible.

If you apply the following you can also have way more fun, but that's
overdoing it:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7918,6 +7918,8 @@ CheckRecoveryConsistency(void)
bool
RecoveryInProgress(void)
{
+   Assert(CritSectionCount == 0);

Anyway, it seems to me that you are not taking care of all possible race
conditions here.  RecoveryInProgress() could as well be called in
XLogFlush(), and that's a code path taken during redo.

Instead of doing what you are suggesting, why not moving
InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
the allocations for WAL inserts is done unconditionally?  This has
the cost of also making this allocation even for backends which are
started during recovery, still we are talking about allocating a couple
of bytes in exchange of addressing completely all race conditions in
this area.  InitXLogInsert() does not depend on any post-recovery data
like ThisTimeLineId, so a split is possible.

Thoughts?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote:
> Instead of doing what you are suggesting, why not moving
> InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
> the allocations for WAL inserts is done unconditionally?  This has
> the cost of also making this allocation even for backends which are
> started during recovery, still we are talking about allocating a couple
> of bytes in exchange of addressing completely all race conditions in
> this area.  InitXLogInsert() does not depend on any post-recovery data
> like ThisTimeLineId, so a split is possible.

I have been hacking things this way, and it seems to me that it takes
care of all this class of problems.  CreateCheckPoint() actually
mentions that InitXLogInsert() cannot be called in a critical section,
so the comments don't match the code.  I also think that we still want
to be able to use RecoveryInProgress() in critical sections to do
decision-making for the generation of WAL records.
--
Michael

xlog-insert-critical-v1.patch (2K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Dilip Kumar-2
On Fri, Mar 16, 2018 at 10:53 AM, Michael Paquier <[hidden email]> wrote:
On Tue, Mar 13, 2018 at 05:04:04PM +0900, Michael Paquier wrote:
> Instead of doing what you are suggesting, why not moving
> InitXLogInsert() out of InitXLOGAccess() and change InitPostgres() so as
> the allocations for WAL inserts is done unconditionally?  This has
> the cost of also making this allocation even for backends which are
> started during recovery, still we are talking about allocating a couple
> of bytes in exchange of addressing completely all race conditions in
> this area.  InitXLogInsert() does not depend on any post-recovery data
> like ThisTimeLineId, so a split is possible.

I have been hacking things this way, and it seems to me that it takes
care of all this class of problems.  CreateCheckPoint() actually
mentions that InitXLogInsert() cannot be called in a critical section,
so the comments don't match the code.  I also think that we still want
to be able to use RecoveryInProgress() in critical sections to do
decision-making for the generation of WAL records

Thanks for the patch, the idea looks good to me.  Please find some comments and updated patch.

I think like WALWriterProcess, we need to call InitXLogInsert for the CheckpointerProcess as well as for the BgWriterProcess
because earlier they were calling InitXLogInsert while check RecoveryInProgress before inserting the WAL.

see below crash:
#0  0x00007f89273a65d7 in raise () from /lib64/libc.so.6
#1  0x00007f89273a7cc8 in abort () from /lib64/libc.so.6
#2  0x00000000009fd24e in errfinish (dummy=0) at elog.c:556
#3  0x00000000009ff70b in elog_finish (elevel=20, fmt=0xac0d1d "too much WAL data") at elog.c:1378
#4  0x0000000000558766 in XLogRegisterData (data=0xf3efac <fullPageWrites> "\001", len=1) at xloginsert.c:330
#5  0x000000000055080e in UpdateFullPageWrites () at xlog.c:9569
#6  0x00000000007ea831 in UpdateSharedMemoryConfig () at checkpointer.c:1360
#7  0x00000000007e95b1 in CheckpointerMain () at checkpointer.c:370
#8  0x0000000000561680 in AuxiliaryProcessMain (argc=2, argv=0x7fffcfd4bec0) at bootstrap.c:447

I have modified you patch and called InitXLogInsert for CheckpointerProcess and BgWriterProcess also. After that the
issue is solved and fpw is getting set properly.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

xlog-insert-critical-v2.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote:
> I think like WALWriterProcess, we need to call InitXLogInsert for the
> CheckpointerProcess as well as for the BgWriterProcess
> because earlier they were calling InitXLogInsert while check
> RecoveryInProgress before inserting the WAL.

/* don't set signals, bgwriter has its own agenda */
+                       InitXLOGAccess();
+                       InitXLogInsert()

This is wrong, as the checkpointer is started as well on standbys, and
that InitXLOGAccess initializes things for WAL generation like
ThisTimeLineID.  So you should just call InitXLogInsert(), and a comment
would be welcome for both the bgwriter and the checkpointer.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Dilip Kumar-2
On Tue, Mar 20, 2018 at 11:26 AM, Michael Paquier <[hidden email]> wrote:
On Tue, Mar 20, 2018 at 10:43:55AM +0530, Dilip Kumar wrote:
> I think like WALWriterProcess, we need to call InitXLogInsert for the
> CheckpointerProcess as well as for the BgWriterProcess
> because earlier they were calling InitXLogInsert while check
> RecoveryInProgress before inserting the WAL.

/* don't set signals, bgwriter has its own agenda */
+                       InitXLOGAccess();
+                       InitXLogInsert()

This is wrong, as the checkpointer is started as well on standbys, and
that InitXLOGAccess initializes things for WAL generation like
ThisTimeLineID.  So you should just call InitXLogInsert(), and a comment
would be welcome for both the bgwriter and the checkpointer.

Yeah, you are right.  Fixed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

xlog-insert-critical-v3.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:
> Yeah, you are right.  Fixed.

So I have been spending a couple of hours playing with your patch, and
tested various configurations manually, like switch the fpw switch to on
and off while running in parallel pgbench.  I have also tested
promotions, etc.

While the patch does its job, it is possible to simplify a bit more the
calls to InitXLogInsert().  Particularly, the one in CreateCheckpoint()
is basically useless for the checkpointer, still it is useful for the
startup process when trigerring an end-in-recovery checkpoint.  So one
additional cleanup would be to move the call in CreateCheckpoint() to
bootstrap.c for the startup process.  In order to test that, please make
sure to create fallback_promote at the root of the data folder before
sending SIGUSR2 to the startup process which would trigger the pre-9.3
promotion where the end-of-recovery checkpoint is triggered by the
startup process itself.

+   /* Initialize the working areas for constructing WAL records. */
+   InitXLogInsert();
Instead of having the same comment for each process calling
InitXLogInsert() multiple times, I think that it would be better to
complete the comment in bootstrap.c where is written "XLOG operations".

Here is a suggestion:
/*
 * Initialize WAL-related operations and enter the main loop of each
 * process.  InitXLogInsert is called for each process which can
 * generate WAL.  While this is wasteful for processes started on
 * a standby, this gives the guarantee that initialization of WAL
 * insertion areas is able to happen in a consistent way and out of
 * any critical sections so as the facility is usable when a promotion
 * is triggered.
 */

What do you think?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Dilip Kumar-2
On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier <[hidden email]> wrote:
On Tue, Mar 20, 2018 at 12:00:47PM +0530, Dilip Kumar wrote:
> Yeah, you are right.  Fixed.

So I have been spending a couple of hours playing with your patch, and
tested various configurations manually, like switch the fpw switch to on
and off while running in parallel pgbench.  I have also tested
promotions, etc.

While the patch does its job, it is possible to simplify a bit more the
calls to InitXLogInsert().  Particularly, the one in CreateCheckpoint()
is basically useless for the checkpointer, still it is useful for the
startup process when trigerring an end-in-recovery checkpoint.  So one
additional cleanup would be to move the call in CreateCheckpoint() to
bootstrap.c for the startup process.

In StarupXLOG, just before the CreateCheckPoint() call,  we are calling LocalSetXLogInsertAllowed().  So, I am thinking can we just remove InitXLogInsert from CreateCheckpoint. And, we don't need to move it to bootstrap.c.  Or am I missing something?
 
In order to test that, please make
sure to create fallback_promote at the root of the data folder before
sending SIGUSR2 to the startup process which would trigger the pre-9.3
promotion where the end-of-recovery checkpoint is triggered by the
startup process itself.

+   /* Initialize the working areas for constructing WAL records. */
+   InitXLogInsert();
Instead of having the same comment for each process calling
InitXLogInsert() multiple times, I think that it would be better to
complete the comment in bootstrap.c where is written "XLOG operations".

Here is a suggestion:
/*
 * Initialize WAL-related operations and enter the main loop of each
 * process.  InitXLogInsert is called for each process which can
 * generate WAL.  While this is wasteful for processes started on
 * a standby, this gives the guarantee that initialization of WAL
 * insertion areas is able to happen in a consistent way and out of
 * any critical sections so as the facility is usable when a promotion
 * is triggered.
 */

What do you think?
 
Looks good to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Mon, Mar 26, 2018 at 02:32:22PM +0530, Dilip Kumar wrote:
> On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier <[hidden email]>
> wrote:
> In StartupXLOG, just before the CreateCheckPoint() call,  we are calling
> LocalSetXLogInsertAllowed().  So, I am thinking can we just remove
> InitXLogInsert from CreateCheckpoint. And, we don't need to move it to
> bootstrap.c.  Or am I missing something?

I have finally been able to spend more time on this issue, and checked
for a couple of hours all the calls to RecoveryInProgress() that could
be triggered within a critical section to see if the move I suggested
previously is worth it ot not as this would cost some memory for
standbys all the time, which would suck for many read-only sessions.

There are a couple of calls potentially happening in critical sections,
however except for the one in UpdateFullPageWrites() those are used for
sanity  checks in code paths that should never trigger it, take
XLogInsertBegin() for example.  So with assertions this would trigger
a failure before the real elog(ERROR) message shows up.

Hence, I am changing opinion still I think that instead of the patch you
proposed first we could just do a call to InitXLogInsert() before
entering the critical section.  This is also more consistent with what
CreateCheckpoint() does.  That's also less risky for a backpatch as your
patch increases the window between the beginning of the critical section
and the real moment where the check for RecoveryInProgress is needed.  A
comment explaining why the initialization needs to happen is also
essential.

All in all, this would give the simple patch attached.

Thoughts?
--
Michael

0001-Fix-WAL-insert-when-updating-full_page_writes-for-ch.patch (1K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Kyotaro HORIGUCHI-2
At Tue, 27 Mar 2018 16:46:30 +0900, Michael Paquier <[hidden email]> wrote in <[hidden email]>

> I have finally been able to spend more time on this issue, and checked
> for a couple of hours all the calls to RecoveryInProgress() that could
> be triggered within a critical section to see if the move I suggested
> previously is worth it ot not as this would cost some memory for
> standbys all the time, which would suck for many read-only sessions.
>
> There are a couple of calls potentially happening in critical sections,
> however except for the one in UpdateFullPageWrites() those are used for
> sanity  checks in code paths that should never trigger it, take
> XLogInsertBegin() for example.  So with assertions this would trigger
> a failure before the real elog(ERROR) message shows up.
>
> Hence, I am changing opinion still I think that instead of the patch you
> proposed first we could just do a call to InitXLogInsert() before
> entering the critical section.  This is also more consistent with what
> CreateCheckpoint() does.  That's also less risky for a backpatch as your
> patch increases the window between the beginning of the critical section
> and the real moment where the check for RecoveryInProgress is needed.  A
> comment explaining why the initialization needs to happen is also
> essential.
>
> All in all, this would give the simple patch attached.
>
> Thoughts?
At the first look I was uneasy that the patch initializes xlog
working area earlier than required.

The current UpdateFullPageWrites is safe on standby and promotion
so what we should consider is only the non-standby case. I think
what we should do is just calling RecoveryInProgress() at the
beginning of CheckPointerMain, which is just the same thing with
InitPostgres, but before setting up signal handler to avoid
processing SIGHUP before being ready to insert xlog.


regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4b452e7cee..c446e81299 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -197,6 +197,13 @@ CheckpointerMain(void)
 
  CheckpointerShmem->checkpointer_pid = MyProcPid;
 
+ /*
+ * We need to call InitXLOGAccess(), if the system isn't in hot-standby
+ * mode.  This is handled by calling RecoveryInProgress and ignoring the
+ * result. This needs to be done before SIGHUP handler is set up.
+ */
+ (void) RecoveryInProgress();
+
  /*
  * Properly accept or ignore signals the postmaster might send us
  *
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
> The current UpdateFullPageWrites is safe on standby and promotion
> so what we should consider is only the non-standby case. I think
> what we should do is just calling RecoveryInProgress() at the
> beginning of CheckPointerMain, which is just the same thing with
> InitPostgres, but before setting up signal handler to avoid
> processing SIGHUP before being ready to insert xlog.

Your proposal does not fix the issue for a checkpointer process started
on a standby.  After a promotion, if SIGHUP is issued with a change in
full_page_writes, then the initialization of InitXLogInsert() would
happen again in the critical section of UpdateFullPageWrites().  The
window is rather small for normal promotions as the startup process
requests a checkpoint which would do the initialization, and much larger
for fallback_promote where the startup process is in charge of doing the
end-of-recovery checkpoint.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Kyotaro HORIGUCHI-2
At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <[hidden email]> wrote in <[hidden email]>

> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
> > The current UpdateFullPageWrites is safe on standby and promotion
> > so what we should consider is only the non-standby case. I think
> > what we should do is just calling RecoveryInProgress() at the
> > beginning of CheckPointerMain, which is just the same thing with
> > InitPostgres, but before setting up signal handler to avoid
> > processing SIGHUP before being ready to insert xlog.
>
> Your proposal does not fix the issue for a checkpointer process started
> on a standby.  After a promotion, if SIGHUP is issued with a change in
> full_page_writes, then the initialization of InitXLogInsert() would
> happen again in the critical section of UpdateFullPageWrites().  The
> window is rather small for normal promotions as the startup process
> requests a checkpoint which would do the initialization, and much larger
> for fallback_promote where the startup process is in charge of doing the
> end-of-recovery checkpoint.

Yeah. I realized that after sending the mail.

I looked closer and found several problems there.

- On standby, StartupXLOG calls UpdateFullPageWrites and
  checkpointer can call the same function simultaneously, but it
  doesn't assume concurrent call.

- StartupXLOG can make a concurrent write to
  Insert->fullPageWrite so it needs to be locked.

- At the time of the very end of recovery, the startup process
  ignores possible change of full_page_writes GUC. It sticks with
  the startup value. It leads to loss of
  XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
  reload)

- If checkpointer calls UpdateFullPageWrites concurrently with
  change of SharedRecoveryInProgress to false in StartupXLOG the
  change may lose corresponding XLOG_CHANGE_FPW.

So, if we don't accept the current behavior, what I think we
should do are all of the follows.

A. In StartupXLOG, protect write to Insert->fullPageWrites with
  wal insert exlusive lock. Do the same thing for read in
  UpdateFullPageWrites.

B. Surround the whole UpdateFullPageWrites with any kind of lock
  to exclude concurrent calls. The attached uses ControlFileLock.
  This also exludes the function with chaging of
  SharedRecoveryInProgress to avoid loss of XLOG_CHANGE_FPW.

C. After exiting recovery mode, call UpdateFullPageWrites from
  StartupXLOG if shared fullPageWrites is found changed from the
  last known value. If checkponiter did the same thing at the
  same time, one of them completes the work.

D. Call RecoveryInProgress to set up xlog working area.

The attached does that. I don't like that it uses ControlFileLock
to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
WALInsertLock cannot be used since UpdateFullPageWrites may take
the same lock.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Michael Paquier-2
On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote:
> The attached does that. I don't like that it uses ControlFileLock
> to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
> WALInsertLock cannot be used since UpdateFullPageWrites may take
> the same lock.

You visibly forgot your patch.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Kyotaro HORIGUCHI-2
At Wed, 28 Mar 2018 15:59:48 +0900, Michael Paquier <[hidden email]> wrote in <[hidden email]>
> On Wed, Mar 28, 2018 at 03:40:59PM +0900, Kyotaro HORIGUCHI wrote:
> > The attached does that. I don't like that it uses ControlFileLock
> > to exlucde concurrent UpdateFullPageWrites and StartupXLOG but
> > WALInsertLock cannot be used since UpdateFullPageWrites may take
> > the same lock.
>
> You visibly forgot your patch.

Mmm, someone must have eaten that. I'm sure it is attached this
time.

I don't like UpdateFullPageWrites is using ControlFileLock to
exclusion..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cb9c2a29cb..d2bb5e16ac 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7558,9 +7558,11 @@ StartupXLOG(void)
  /*
  * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
  * record before resource manager writes cleanup WAL records or checkpoint
- * record is written.
+ * record is written, ignoreing the change of full_page_write GUC so far.
  */
+ WALInsertLockAcquireExclusive();
  Insert->fullPageWrites = lastFullPageWrites;
+ WALInsertLockRelease();
  LocalSetXLogInsertAllowed();
  UpdateFullPageWrites();
  LocalXLogInsertAllowed = -1;
@@ -7783,6 +7785,9 @@ StartupXLOG(void)
  * itself, we use the info_lck here to ensure that there are no race
  * conditions concerning visibility of other recent updates to shared
  * memory.
+ *
+ * ControlFileLock excludes concurrent update of shared fullPageWrites in
+ * addition to its ordinary use.
  */
  LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
  ControlFile->state = DB_IN_PRODUCTION;
@@ -7790,11 +7795,25 @@ StartupXLOG(void)
 
  SpinLockAcquire(&XLogCtl->info_lck);
  XLogCtl->SharedRecoveryInProgress = false;
+ lastFullPageWrites = Insert->fullPageWrites;
  SpinLockRelease(&XLogCtl->info_lck);
 
  UpdateControlFile();
  LWLockRelease(ControlFileLock);
 
+ /*
+ * Shared fullPageWrites at the end of recovery differs to our last known
+ * value. The change has been made by checkpointer but we haven't made
+ * corresponding XLOG_FPW_CHANGE record. We reread GUC change if any and
+ * try update shared fullPageWrites by myself. It ends with doing nothing
+ * if checkpointer already did the same thing.
+ */
+ if (lastFullPageWrites != fullPageWrites)
+ {
+ HandleStartupProcInterrupts();
+ UpdateFullPageWrites();
+ }
+
  /*
  * If there were cascading standby servers connected to us, nudge any wal
  * sender processes to notice that we've been promoted.
@@ -9525,8 +9544,10 @@ XLogReportParameters(void)
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
+ * This function assumes called concurrently from multiple processes and
+ * called concurrently with changing of shared fullPageWrites. See
+ * StartupXLOG().
+ *
  */
 void
 UpdateFullPageWrites(void)
@@ -9536,13 +9557,25 @@ UpdateFullPageWrites(void)
  /*
  * Do nothing if full_page_writes has not been changed.
  *
- * It's safe to check the shared full_page_writes without the lock,
- * because we assume that there is no concurrently running process which
- * can update it.
+ * We acquire ControlFileLock to exclude other concurrent call and change
+ * of shared fullPageWrites.
  */
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ WALInsertLockAcquireExclusive();
  if (fullPageWrites == Insert->fullPageWrites)
+ {
+ WALInsertLockRelease();
+ LWLockRelease(ControlFileLock);
  return;
+ }
+ WALInsertLockRelease();
 
+ /*
+ * Need to set up XLogInsert working area before entering critical section
+ * if we are not in recovery mode.
+ */
+ (void) RecoveryInProgress();
+
  START_CRIT_SECTION();
 
  /*
@@ -9578,6 +9611,8 @@ UpdateFullPageWrites(void)
  WALInsertLockRelease();
  }
  END_CRIT_SECTION();
+
+ LWLockRelease(ControlFileLock);
 }
 
 /*
Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

akapila
In reply to this post by Kyotaro HORIGUCHI-2
On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
<[hidden email]> wrote:

> At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <[hidden email]> wrote in <[hidden email]>
>> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
>> > The current UpdateFullPageWrites is safe on standby and promotion
>> > so what we should consider is only the non-standby case. I think
>> > what we should do is just calling RecoveryInProgress() at the
>> > beginning of CheckPointerMain, which is just the same thing with
>> > InitPostgres, but before setting up signal handler to avoid
>> > processing SIGHUP before being ready to insert xlog.
>>
>> Your proposal does not fix the issue for a checkpointer process started
>> on a standby.  After a promotion, if SIGHUP is issued with a change in
>> full_page_writes, then the initialization of InitXLogInsert() would
>> happen again in the critical section of UpdateFullPageWrites().  The
>> window is rather small for normal promotions as the startup process
>> requests a checkpoint which would do the initialization, and much larger
>> for fallback_promote where the startup process is in charge of doing the
>> end-of-recovery checkpoint.
>
> Yeah. I realized that after sending the mail.
>
> I looked closer and found several problems there.
>
> - On standby, StartupXLOG calls UpdateFullPageWrites and
>   checkpointer can call the same function simultaneously, but it
>   doesn't assume concurrent call.
>
> - StartupXLOG can make a concurrent write to
>   Insert->fullPageWrite so it needs to be locked.
>
> - At the time of the very end of recovery, the startup process
>   ignores possible change of full_page_writes GUC. It sticks with
>   the startup value. It leads to loss of
>   XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
>   reload)
>

Won't this be covered by checkpointer process?  Basically, the next
time checkpointer sees that it has received the sighup, it will call
UpdateFullPageWrites which will log the record if required.

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable?  I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages.  Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Kyotaro HORIGUCHI-2
At Sat, 31 Mar 2018 17:43:58 +0530, Amit Kapila <[hidden email]> wrote in <[hidden email]>

> On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
> <[hidden email]> wrote:
> > At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier <[hidden email]> wrote in <[hidden email]>
> >> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
> >> > The current UpdateFullPageWrites is safe on standby and promotion
> >> > so what we should consider is only the non-standby case. I think
> >> > what we should do is just calling RecoveryInProgress() at the
> >> > beginning of CheckPointerMain, which is just the same thing with
> >> > InitPostgres, but before setting up signal handler to avoid
> >> > processing SIGHUP before being ready to insert xlog.
> >>
> >> Your proposal does not fix the issue for a checkpointer process started
> >> on a standby.  After a promotion, if SIGHUP is issued with a change in
> >> full_page_writes, then the initialization of InitXLogInsert() would
> >> happen again in the critical section of UpdateFullPageWrites().  The
> >> window is rather small for normal promotions as the startup process
> >> requests a checkpoint which would do the initialization, and much larger
> >> for fallback_promote where the startup process is in charge of doing the
> >> end-of-recovery checkpoint.
> >
> > Yeah. I realized that after sending the mail.
> >
> > I looked closer and found several problems there.
> >
> > - On standby, StartupXLOG calls UpdateFullPageWrites and
> >   checkpointer can call the same function simultaneously, but it
> >   doesn't assume concurrent call.
> >
> > - StartupXLOG can make a concurrent write to
> >   Insert->fullPageWrite so it needs to be locked.
> >
> > - At the time of the very end of recovery, the startup process
> >   ignores possible change of full_page_writes GUC. It sticks with
> >   the startup value. It leads to loss of
> >   XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
> >   reload)
> >
>
> Won't this be covered by checkpointer process?  Basically, the next
> time checkpointer sees that it has received the sighup, it will call
> UpdateFullPageWrites which will log the record if required.

Right. Checkpointer is doing the right thing, but without
writing XLOG_FPW_CHANGE records during recovery.

The problem is in StartupXLOG. It doesn't read shared FPW flag
during recovery and updates local flag according to WAL
records. Then it tries to issue XLOG_FPW_CHANGE if its local
status and shared flag are different. This is correct.

But after that, checkpointer still cannot write XLOG (before
SharedRecovoeryInProgress becomes false) but checkpointer can
change shared fullPagesWrites without writing WAL and the WAL
record is eventually lost.

> In general, I was wondering why in the first place this variable
> (full_page_writes) is a SIGHUP variable?  I think if the user tries to
> switch it to 'on' from 'off', it won't guarantee the recovery from
> torn pages.  Yeah, one can turn it to 'off' from 'on' without any
> problem, but as the reverse doesn't guarantee anything, it can confuse
> users. What do you think?

I tend to agree with you. It works as expected after the next
checkpoint. So define the variable as "it can be changed any time
but has an effect at the next checkpoint time", then remove
XLOG_FPW_CHANGE record. And that eliminates the problem of
concurrent calls since the checkpointer becomes the only modifier
of the variable. And the problematic function
UpdateFullPageWrites also no longer needs to write a WAL
record. The information is conveyed only by checkpoint records.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Kyotaro HORIGUCHI-2
Hello.

At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>

> > In general, I was wondering why in the first place this variable
> > (full_page_writes) is a SIGHUP variable?  I think if the user tries to
> > switch it to 'on' from 'off', it won't guarantee the recovery from
> > torn pages.  Yeah, one can turn it to 'off' from 'on' without any
> > problem, but as the reverse doesn't guarantee anything, it can confuse
> > users. What do you think?
>
> I tend to agree with you. It works as expected after the next
> checkpoint. So define the variable as "it can be changed any time
> but has an effect at the next checkpoint time", then remove
> XLOG_FPW_CHANGE record. And that eliminates the problem of
> concurrent calls since the checkpointer becomes the only modifier
> of the variable. And the problematic function
> UpdateFullPageWrites also no longer needs to write a WAL
> record. The information is conveyed only by checkpoint records.
I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
pg_start/stop_backup to know FPW's turning-off without waiting
for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
required since no one uses the information. It seems even harmful
when it is written at the incorrect place.

In the attached patch, shared fullPageWrites is updated only at
REDO point and XLOG_FPW_CHANGE is written only for FPW's turning
off before REDO point.

Disabling FPW at any time makes sense when we need to slow down
the speed of WAL urgently, but...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From 10f6865cf1e96e8d883ed7e03e1fafb6e73ec935 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Fri, 6 Apr 2018 13:57:48 +0900
Subject: [PATCH] Change FPW handling

The GUC full_pages_writes currently has an effect immediately. That
makes a race condition between config reload on checkpointer and
StartupXLOG. But since full page images are meaningful only when they
are attached to all WAL records covers a checkpoint, there is no
problem if we update the shared FPW only at REDO point.  On the other
hand, online backup mechanism on standby requires to know if FPW is
turned off before the next checkpoint record comes.

As the result, with this patch, changing of full_page_writes takes
effect at REDO point and additional XLOG_FPW_CHANGE is written only
for turning-off. These are sufficient for standby-backup to work
properly, reduces complexity and prevent the race condition.
---
 src/backend/access/transam/xlog.c     | 114 +++++++++-------------------------
 src/backend/postmaster/checkpointer.c |   6 --
 src/include/access/xlog.h             |   1 -
 src/include/catalog/pg_control.h      |   2 +-
 4 files changed, 31 insertions(+), 92 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..0a81650c26 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -202,14 +202,6 @@ static XLogRecPtr LastRec;
 static XLogRecPtr receivedUpto = 0;
 static TimeLineID receiveTLI = 0;
 
-/*
- * During recovery, lastFullPageWrites keeps track of full_page_writes that
- * the replayed WAL records indicate. It's initialized with full_page_writes
- * that the recovery starting checkpoint record indicates, and then updated
- * each time XLOG_FPW_CHANGE record is replayed.
- */
-static bool lastFullPageWrites;
-
 /*
  * Local copy of SharedRecoveryInProgress variable. True actually means "not
  * known, need to check the shared state".
@@ -6776,11 +6768,7 @@ StartupXLOG(void)
  */
  restoreTwoPhaseData();
 
- lastFullPageWrites = checkPoint.fullPageWrites;
-
  RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
- doPageWrites = lastFullPageWrites;
-
  if (RecPtr < checkPoint.redo)
  ereport(PANIC,
  (errmsg("invalid redo in checkpoint record")));
@@ -7575,16 +7563,6 @@ StartupXLOG(void)
  /* Pre-scan prepared transactions to find out the range of XIDs present */
  oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
 
- /*
- * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
- * record before resource manager writes cleanup WAL records or checkpoint
- * record is written.
- */
- Insert->fullPageWrites = lastFullPageWrites;
- LocalSetXLogInsertAllowed();
- UpdateFullPageWrites();
- LocalXLogInsertAllowed = -1;
-
  if (InRecovery)
  {
  /*
@@ -7808,6 +7786,13 @@ StartupXLOG(void)
  ControlFile->state = DB_IN_PRODUCTION;
  ControlFile->time = (pg_time_t) time(NULL);
 
+ /*
+ * Set the initial value of shared fullPageWrites. Once
+ * SharedRecoveryInProgress is turned false, checkpointer will update this
+ * value.
+ */
+ XLogCtl->Insert.fullPageWrites = checkPoint.fullPageWrites;
+
  SpinLockAcquire(&XLogCtl->info_lck);
  XLogCtl->SharedRecoveryInProgress = false;
  SpinLockRelease(&XLogCtl->info_lck);
@@ -8669,6 +8654,20 @@ CreateCheckPoint(int flags)
  */
  last_important_lsn = GetLastImportantRecPtr();
 
+ /*
+ * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
+ * the flag actually takes effect. No lock is required since checkpointer
+ * is the only updator of shared fullPageWrites after recovery is
+ * finished. Both shared and local fullPageWrites do not change before the
+ * next reading below.
+ */
+ if (Insert->fullPageWrites && !fullPageWrites)
+ {
+ XLogBeginInsert();
+ XLogRegisterData((char *) (&fullPageWrites), sizeof(bool));
+ XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
+ }
+
  /*
  * We must block concurrent insertions while examining insert state to
  * determine the checkpoint REDO pointer.
@@ -8710,6 +8709,15 @@ CreateCheckPoint(int flags)
  else
  checkPoint.PrevTimeLineID = ThisTimeLineID;
 
+ /*
+ * Update shared flag of fullPageWrites. WALInsertLock ensures that this
+ * affects all WAL records exactly from REDO point. As the result a
+ * checkpoint marked as fpw=true is ensured that all WAL records have full
+ * page image.
+ */
+ if (fullPageWrites != Insert->fullPageWrites)
+ Insert->fullPageWrites = fullPageWrites;
+
  checkPoint.fullPageWrites = Insert->fullPageWrites;
 
  /*
@@ -9541,65 +9549,6 @@ XLogReportParameters(void)
  }
 }
 
-/*
- * Update full_page_writes in shared memory, and write an
- * XLOG_FPW_CHANGE record if necessary.
- *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
- */
-void
-UpdateFullPageWrites(void)
-{
- XLogCtlInsert *Insert = &XLogCtl->Insert;
-
- /*
- * Do nothing if full_page_writes has not been changed.
- *
- * It's safe to check the shared full_page_writes without the lock,
- * because we assume that there is no concurrently running process which
- * can update it.
- */
- if (fullPageWrites == Insert->fullPageWrites)
- return;
-
- START_CRIT_SECTION();
-
- /*
- * It's always safe to take full page images, even when not strictly
- * required, but not the other round. So if we're setting full_page_writes
- * to true, first set it true and then write the WAL record. If we're
- * setting it to false, first write the WAL record and then set the global
- * flag.
- */
- if (fullPageWrites)
- {
- WALInsertLockAcquireExclusive();
- Insert->fullPageWrites = true;
- WALInsertLockRelease();
- }
-
- /*
- * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
- * full_page_writes during archive recovery, if required.
- */
- if (XLogStandbyInfoActive() && !RecoveryInProgress())
- {
- XLogBeginInsert();
- XLogRegisterData((char *) (&fullPageWrites), sizeof(bool));
-
- XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
- }
-
- if (!fullPageWrites)
- {
- WALInsertLockAcquireExclusive();
- Insert->fullPageWrites = false;
- WALInsertLockRelease();
- }
- END_CRIT_SECTION();
-}
-
 /*
  * Check that it's OK to switch to new timeline during recovery.
  *
@@ -9965,9 +9914,6 @@ xlog_redo(XLogReaderState *record)
  XLogCtl->lastFpwDisableRecPtr = ReadRecPtr;
  SpinLockRelease(&XLogCtl->info_lck);
  }
-
- /* Keep track of full_page_writes */
- lastFullPageWrites = fpw;
  }
 }
 
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4b452e7cee..8b87d139d0 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1359,12 +1359,6 @@ UpdateSharedMemoryConfig(void)
  /* update global shmem state for sync rep */
  SyncRepUpdateSyncStandbysDefined();
 
- /*
- * If full_page_writes has been changed by SIGHUP, we update it in shared
- * memory and write an XLOG_FPW_CHANGE record.
- */
- UpdateFullPageWrites();
-
  elog(DEBUG2, "checkpointer updated shared memory configuration values");
 }
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 421ba6d775..bc23574694 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -270,7 +270,6 @@ extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
 extern XLogRecPtr XLogRestorePoint(const char *rpName);
-extern void UpdateFullPageWrites(void);
 extern void GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 773d9e6eba..cb911fd5a9 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -38,7 +38,7 @@ typedef struct CheckPoint
  TimeLineID ThisTimeLineID; /* current TLI */
  TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new
  * timeline (equals ThisTimeLineID otherwise) */
- bool fullPageWrites; /* current full_page_writes */
+ bool fullPageWrites; /* true if all covering WALs are having FPI */
  uint32 nextXidEpoch; /* higher-order bits of nextXid */
  TransactionId nextXid; /* next free XID */
  Oid nextOid; /* next free OID */
--
2.16.3

Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

akapila
On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
<[hidden email]> wrote:

> Hello.
>
> At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>
>> > In general, I was wondering why in the first place this variable
>> > (full_page_writes) is a SIGHUP variable?  I think if the user tries to
>> > switch it to 'on' from 'off', it won't guarantee the recovery from
>> > torn pages.  Yeah, one can turn it to 'off' from 'on' without any
>> > problem, but as the reverse doesn't guarantee anything, it can confuse
>> > users. What do you think?
>>
>> I tend to agree with you. It works as expected after the next
>> checkpoint. So define the variable as "it can be changed any time
>> but has an effect at the next checkpoint time", then remove
>> XLOG_FPW_CHANGE record. And that eliminates the problem of
>> concurrent calls since the checkpointer becomes the only modifier
>> of the variable. And the problematic function
>> UpdateFullPageWrites also no longer needs to write a WAL
>> record. The information is conveyed only by checkpoint records.
>
> I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
> pg_start/stop_backup to know FPW's turning-off without waiting
> for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
> required since no one uses the information. It seems even harmful
> when it is written at the incorrect place.
>
> In the attached patch, shared fullPageWrites is updated only at
> REDO point
>

I am not completely sure if that is the right option because this
would mean that we are defining the new scope for a GUC variable.  I
guess we should take the input of others as well.  I am not sure what
is the right way to do that, but maybe we can start a new thread with
a proper subject and description rather than discussing this under
some related bug fix patch discussion.  I guess we can try that after
CF unless some other people pitch in and share their feedback.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Kyotaro HORIGUCHI-2

At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila <[hidden email]> wrote in <[hidden email]>

> On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
> <[hidden email]> wrote:
> > Hello.
> >
> > At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>
> >> > In general, I was wondering why in the first place this variable
> >> > (full_page_writes) is a SIGHUP variable?  I think if the user tries to
> >> > switch it to 'on' from 'off', it won't guarantee the recovery from
> >> > torn pages.  Yeah, one can turn it to 'off' from 'on' without any
> >> > problem, but as the reverse doesn't guarantee anything, it can confuse
> >> > users. What do you think?
> >>
> >> I tend to agree with you. It works as expected after the next
> >> checkpoint. So define the variable as "it can be changed any time
> >> but has an effect at the next checkpoint time", then remove
> >> XLOG_FPW_CHANGE record. And that eliminates the problem of
> >> concurrent calls since the checkpointer becomes the only modifier
> >> of the variable. And the problematic function
> >> UpdateFullPageWrites also no longer needs to write a WAL
> >> record. The information is conveyed only by checkpoint records.
> >
> > I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
> > pg_start/stop_backup to know FPW's turning-off without waiting
> > for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
> > required since no one uses the information. It seems even harmful
> > when it is written at the incorrect place.
> >
> > In the attached patch, shared fullPageWrites is updated only at
> > REDO point
> >
>
> I am not completely sure if that is the right option because this
> would mean that we are defining the new scope for a GUC variable.  I
> guess we should take the input of others as well.  I am not sure what
> is the right way to do that, but maybe we can start a new thread with
> a proper subject and description rather than discussing this under
> some related bug fix patch discussion.  I guess we can try that after
> CF unless some other people pitch in and share their feedback.
I'd like to refrain from making a new thread since this topic is
registered as an open item (in Old Bugs section). Or making a new
thread then relinking it from the page is preferable?

I'm surprised a bit that this got confilcted so soon. On the way
rebasing, for anyone's information, I considered comment and
documentation fix but all comments and documentation can be read
as both old and new behavior. That being said, the patch contains
a small addtion about the new behavior.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From fe0401335e0ac1a9ae36dbdb5c2db82f9081a1a3 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <[hidden email]>
Date: Fri, 6 Apr 2018 13:57:48 +0900
Subject: [PATCH] Change FPW handling

The GUC full_pages_writes currently has an effect immediately. That
makes a race condition between config reload on checkpointer and
StartupXLOG. But since full page images are meaningful only when they
are attached to all WAL records covers a checkpoint, there is no
problem if we update the shared FPW only at REDO point.  On the other
hand, online backup mechanism on standby requires to know if FPW is
turned off before the next checkpoint record comes.

As the result, with this patch, changing of full_page_writes takes
effect at REDO point and additional XLOG_FPW_CHANGE is written only
for turning-off. These are sufficient for standby-backup to work
properly, reduces complexity and prevent the race condition.
---
 doc/src/sgml/config.sgml              |   4 +-
 src/backend/access/transam/xlog.c     | 114 +++++++++-------------------------
 src/backend/postmaster/checkpointer.c |   6 --
 src/include/access/xlog.h             |   1 -
 src/include/catalog/pg_control.h      |   2 +-
 5 files changed, 34 insertions(+), 93 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d5f2d23c4..7ea42c25e2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2598,7 +2598,9 @@ include_dir 'conf.d'
        <para>
         This parameter can only be set in the <filename>postgresql.conf</filename>
         file or on the server command line.
-        The default is <literal>on</literal>.
+
+        The default is <literal>on</literal>. The change of the parmeter takes
+        effect at the next checkpoint time.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4a47395174..ba9cf07d38 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -202,14 +202,6 @@ static XLogRecPtr LastRec;
 static XLogRecPtr receivedUpto = 0;
 static TimeLineID receiveTLI = 0;
 
-/*
- * During recovery, lastFullPageWrites keeps track of full_page_writes that
- * the replayed WAL records indicate. It's initialized with full_page_writes
- * that the recovery starting checkpoint record indicates, and then updated
- * each time XLOG_FPW_CHANGE record is replayed.
- */
-static bool lastFullPageWrites;
-
 /*
  * Local copy of SharedRecoveryInProgress variable. True actually means "not
  * known, need to check the shared state".
@@ -6851,11 +6843,7 @@ StartupXLOG(void)
  */
  restoreTwoPhaseData();
 
- lastFullPageWrites = checkPoint.fullPageWrites;
-
  RedoRecPtr = XLogCtl->RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
- doPageWrites = lastFullPageWrites;
-
  if (RecPtr < checkPoint.redo)
  ereport(PANIC,
  (errmsg("invalid redo in checkpoint record")));
@@ -7650,16 +7638,6 @@ StartupXLOG(void)
  /* Pre-scan prepared transactions to find out the range of XIDs present */
  oldestActiveXID = PrescanPreparedTransactions(NULL, NULL);
 
- /*
- * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
- * record before resource manager writes cleanup WAL records or checkpoint
- * record is written.
- */
- Insert->fullPageWrites = lastFullPageWrites;
- LocalSetXLogInsertAllowed();
- UpdateFullPageWrites();
- LocalXLogInsertAllowed = -1;
-
  if (InRecovery)
  {
  /*
@@ -7893,6 +7871,13 @@ StartupXLOG(void)
  ControlFile->state = DB_IN_PRODUCTION;
  ControlFile->time = (pg_time_t) time(NULL);
 
+ /*
+ * Set the initial value of shared fullPageWrites. Once
+ * SharedRecoveryInProgress is turned false, checkpointer will update this
+ * value.
+ */
+ XLogCtl->Insert.fullPageWrites = checkPoint.fullPageWrites;
+
  SpinLockAcquire(&XLogCtl->info_lck);
  XLogCtl->SharedRecoveryInProgress = false;
  SpinLockRelease(&XLogCtl->info_lck);
@@ -8754,6 +8739,20 @@ CreateCheckPoint(int flags)
  */
  last_important_lsn = GetLastImportantRecPtr();
 
+ /*
+ * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before
+ * the flag actually takes effect. No lock is required since checkpointer
+ * is the only updator of shared fullPageWrites after recovery is
+ * finished. Both shared and local fullPageWrites do not change before the
+ * next reading below.
+ */
+ if (Insert->fullPageWrites && !fullPageWrites)
+ {
+ XLogBeginInsert();
+ XLogRegisterData((char *) (&fullPageWrites), sizeof(bool));
+ XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
+ }
+
  /*
  * We must block concurrent insertions while examining insert state to
  * determine the checkpoint REDO pointer.
@@ -8795,6 +8794,15 @@ CreateCheckPoint(int flags)
  else
  checkPoint.PrevTimeLineID = ThisTimeLineID;
 
+ /*
+ * Update shared flag of fullPageWrites. WALInsertLock ensures that this
+ * affects all WAL records exactly from REDO point. As the result a
+ * checkpoint marked as fpw=true is ensured that all WAL records have full
+ * page image.
+ */
+ if (fullPageWrites != Insert->fullPageWrites)
+ Insert->fullPageWrites = fullPageWrites;
+
  checkPoint.fullPageWrites = Insert->fullPageWrites;
 
  /*
@@ -9642,65 +9650,6 @@ XlogChecksums(ChecksumType new_type)
  XLogInsert(RM_XLOG_ID, XLOG_CHECKSUMS);
 }
 
-/*
- * Update full_page_writes in shared memory, and write an
- * XLOG_FPW_CHANGE record if necessary.
- *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
- */
-void
-UpdateFullPageWrites(void)
-{
- XLogCtlInsert *Insert = &XLogCtl->Insert;
-
- /*
- * Do nothing if full_page_writes has not been changed.
- *
- * It's safe to check the shared full_page_writes without the lock,
- * because we assume that there is no concurrently running process which
- * can update it.
- */
- if (fullPageWrites == Insert->fullPageWrites)
- return;
-
- START_CRIT_SECTION();
-
- /*
- * It's always safe to take full page images, even when not strictly
- * required, but not the other round. So if we're setting full_page_writes
- * to true, first set it true and then write the WAL record. If we're
- * setting it to false, first write the WAL record and then set the global
- * flag.
- */
- if (fullPageWrites)
- {
- WALInsertLockAcquireExclusive();
- Insert->fullPageWrites = true;
- WALInsertLockRelease();
- }
-
- /*
- * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
- * full_page_writes during archive recovery, if required.
- */
- if (XLogStandbyInfoActive() && !RecoveryInProgress())
- {
- XLogBeginInsert();
- XLogRegisterData((char *) (&fullPageWrites), sizeof(bool));
-
- XLogInsert(RM_XLOG_ID, XLOG_FPW_CHANGE);
- }
-
- if (!fullPageWrites)
- {
- WALInsertLockAcquireExclusive();
- Insert->fullPageWrites = false;
- WALInsertLockRelease();
- }
- END_CRIT_SECTION();
-}
-
 /*
  * Check that it's OK to switch to new timeline during recovery.
  *
@@ -10066,9 +10015,6 @@ xlog_redo(XLogReaderState *record)
  XLogCtl->lastFpwDisableRecPtr = ReadRecPtr;
  SpinLockRelease(&XLogCtl->info_lck);
  }
-
- /* Keep track of full_page_writes */
- lastFullPageWrites = fpw;
  }
  else if (info == XLOG_CHECKSUMS)
  {
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4b452e7cee..8b87d139d0 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -1359,12 +1359,6 @@ UpdateSharedMemoryConfig(void)
  /* update global shmem state for sync rep */
  SyncRepUpdateSyncStandbysDefined();
 
- /*
- * If full_page_writes has been changed by SIGHUP, we update it in shared
- * memory and write an XLOG_FPW_CHANGE record.
- */
- UpdateFullPageWrites();
-
  elog(DEBUG2, "checkpointer updated shared memory configuration values");
 }
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index f21870c644..6e4648e94b 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -276,7 +276,6 @@ extern void CreateCheckPoint(int flags);
 extern bool CreateRestartPoint(int flags);
 extern void XLogPutNextOid(Oid nextOid);
 extern XLogRecPtr XLogRestorePoint(const char *rpName);
-extern void UpdateFullPageWrites(void);
 extern void GetFullPageWriteInfo(XLogRecPtr *RedoRecPtr_p, bool *doPageWrites_p);
 extern XLogRecPtr GetRedoRecPtr(void);
 extern XLogRecPtr GetInsertRecPtr(void);
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index 33c59f9a63..1710b8ce1e 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -38,7 +38,7 @@ typedef struct CheckPoint
  TimeLineID ThisTimeLineID; /* current TLI */
  TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new
  * timeline (equals ThisTimeLineID otherwise) */
- bool fullPageWrites; /* current full_page_writes */
+ bool fullPageWrites; /* true if all covering WALs are having FPI */
  uint32 nextXidEpoch; /* higher-order bits of nextXid */
  TransactionId nextXid; /* next free XID */
  Oid nextOid; /* next free OID */
--
2.16.3

Reply | Threaded
Open this post in threaded view
|

Re: Problem while setting the fpw with SIGHUP

Heikki Linnakangas
On 09/04/18 11:13, Kyotaro HORIGUCHI wrote:

>
> At Fri, 6 Apr 2018 17:59:58 +0530, Amit Kapila <[hidden email]> wrote in <[hidden email]>
>> On Fri, Apr 6, 2018 at 1:50 PM, Kyotaro HORIGUCHI
>> <[hidden email]> wrote:
>>> Hello.
>>>
>>> At Wed, 04 Apr 2018 17:26:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <[hidden email]> wrote in <[hidden email]>
>>>>> In general, I was wondering why in the first place this variable
>>>>> (full_page_writes) is a SIGHUP variable?  I think if the user tries to
>>>>> switch it to 'on' from 'off', it won't guarantee the recovery from
>>>>> torn pages.  Yeah, one can turn it to 'off' from 'on' without any
>>>>> problem, but as the reverse doesn't guarantee anything, it can confuse
>>>>> users. What do you think?
>>>>
>>>> I tend to agree with you. It works as expected after the next
>>>> checkpoint. So define the variable as "it can be changed any time
>>>> but has an effect at the next checkpoint time", then remove
>>>> XLOG_FPW_CHANGE record. And that eliminates the problem of
>>>> concurrent calls since the checkpointer becomes the only modifier
>>>> of the variable. And the problematic function
>>>> UpdateFullPageWrites also no longer needs to write a WAL
>>>> record. The information is conveyed only by checkpoint records.
>>>
>>> I noticed that XLOG_FPW_CHANGE(fpw=false) is still required by
>>> pg_start/stop_backup to know FPW's turning-off without waiting
>>> for the next checkpoint record. But XLOG_FPW_CHANGE(true) is not
>>> required since no one uses the information. It seems even harmful
>>> when it is written at the incorrect place.
>>>
>>> In the attached patch, shared fullPageWrites is updated only at
>>> REDO point
>>
>> I am not completely sure if that is the right option because this
>> would mean that we are defining the new scope for a GUC variable.  I
>> guess we should take the input of others as well.  I am not sure what
>> is the right way to do that, but maybe we can start a new thread with
>> a proper subject and description rather than discussing this under
>> some related bug fix patch discussion.  I guess we can try that after
>> CF unless some other people pitch in and share their feedback.

I think the new behavior where the GUC only takes effect at next
checkpoint is OK. It seems quite intuitive.

> [rebased patch version]

Looks good at a quick glance. Assuming no objections from others, I'll
take a closer look and commit tomorrow. Thanks!

- Heikki

1234