Is Recovery actually paused?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Is Recovery actually paused?

Dilip Kumar-2
Hello,

We have an interface to pause the WAL replay (pg_wal_replay_pause) and
to know whether the WAL replay pause is requested
(pg_is_wal_replay_paused).  But there is no way to know whether the
recovery is actually paused or not.  Actually, the recovery process
might process an extra WAL before pausing the recovery.  So does it
make sense to provide a new interface to tell whether the recovery is
actually paused or not?

One solution could be that we convert the XLogCtlData->recoveryPause
from bool to tri-state variable (0-> recovery not paused 1-> pause
requested 2-> actually paused).

Any opinion on this?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Simon Riggs
On Mon, 19 Oct 2020 at 15:11, Dilip Kumar <[hidden email]> wrote:

> We have an interface to pause the WAL replay (pg_wal_replay_pause) and
> to know whether the WAL replay pause is requested
> (pg_is_wal_replay_paused).  But there is no way to know whether the
> recovery is actually paused or not.  Actually, the recovery process
> might process an extra WAL before pausing the recovery.  So does it
> make sense to provide a new interface to tell whether the recovery is
> actually paused or not?
>
> One solution could be that we convert the XLogCtlData->recoveryPause
> from bool to tri-state variable (0-> recovery not paused 1-> pause
> requested 2-> actually paused).
>
> Any opinion on this?

Why would we want this? What problem are you trying to solve?

If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?

--
Simon Riggs                http://www.EnterpriseDB.com/


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Tue, Oct 20, 2020 at 1:11 PM Simon Riggs <[hidden email]> wrote:

>
> On Mon, 19 Oct 2020 at 15:11, Dilip Kumar <[hidden email]> wrote:
>
> > We have an interface to pause the WAL replay (pg_wal_replay_pause) and
> > to know whether the WAL replay pause is requested
> > (pg_is_wal_replay_paused).  But there is no way to know whether the
> > recovery is actually paused or not.  Actually, the recovery process
> > might process an extra WAL before pausing the recovery.  So does it
> > make sense to provide a new interface to tell whether the recovery is
> > actually paused or not?
> >
> > One solution could be that we convert the XLogCtlData->recoveryPause
> > from bool to tri-state variable (0-> recovery not paused 1-> pause
> > requested 2-> actually paused).
> >
> > Any opinion on this?
>
> Why would we want this? What problem are you trying to solve?

The requirement is to know the last replayed WAL on the standby so
unless we can guarantee that the recovery is actually paused we can
never get the safe last_replay_lsn value.

> If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?

Maybe we can also do that but pg_is_wal_replay_paused is an existing
API and the behavior is to know whether the recovery paused is
requested or not,  So I am not sure is it a good idea to change the
behavior of the existing API?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Tue, Oct 20, 2020 at 1:22 PM Dilip Kumar <[hidden email]> wrote:

>
> On Tue, Oct 20, 2020 at 1:11 PM Simon Riggs <[hidden email]> wrote:
> >
> > On Mon, 19 Oct 2020 at 15:11, Dilip Kumar <[hidden email]> wrote:
> >
> > > We have an interface to pause the WAL replay (pg_wal_replay_pause) and
> > > to know whether the WAL replay pause is requested
> > > (pg_is_wal_replay_paused).  But there is no way to know whether the
> > > recovery is actually paused or not.  Actually, the recovery process
> > > might process an extra WAL before pausing the recovery.  So does it
> > > make sense to provide a new interface to tell whether the recovery is
> > > actually paused or not?
> > >
> > > One solution could be that we convert the XLogCtlData->recoveryPause
> > > from bool to tri-state variable (0-> recovery not paused 1-> pause
> > > requested 2-> actually paused).
> > >
> > > Any opinion on this?
> >
> > Why would we want this? What problem are you trying to solve?
>
> The requirement is to know the last replayed WAL on the standby so
> unless we can guarantee that the recovery is actually paused we can
> never get the safe last_replay_lsn value.
>
> > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?
>
> Maybe we can also do that but pg_is_wal_replay_paused is an existing
> API and the behavior is to know whether the recovery paused is
> requested or not,  So I am not sure is it a good idea to change the
> behavior of the existing API?
>
Attached is the POC patch to show what I have in mind.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

POC-0001-New-interface-to-know-wal-recovery-actually-paused.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Simon Riggs
On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote:

> > > Why would we want this? What problem are you trying to solve?
> >
> > The requirement is to know the last replayed WAL on the standby so
> > unless we can guarantee that the recovery is actually paused we can
> > never get the safe last_replay_lsn value.
> >
> > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?
> >
> > Maybe we can also do that but pg_is_wal_replay_paused is an existing
> > API and the behavior is to know whether the recovery paused is
> > requested or not,  So I am not sure is it a good idea to change the
> > behavior of the existing API?
> >
>
> Attached is the POC patch to show what I have in mind.

If you don't like it, I doubt anyone else cares for the exact current
behavior either. Thanks for pointing those issues out.

It would make sense to alter pg_wal_replay_pause() so that it blocks
until paused.

I suggest you add the 3-value state as you suggest, but make
pg_is_wal_replay_paused() respond:
if paused, true
if requested, wait until paused, then return true
else false

That then solves your issues with a smoother interface.

--
Simon Riggs                http://www.EnterpriseDB.com/


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs <[hidden email]> wrote:

>
> On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote:
>
> > > > Why would we want this? What problem are you trying to solve?
> > >
> > > The requirement is to know the last replayed WAL on the standby so
> > > unless we can guarantee that the recovery is actually paused we can
> > > never get the safe last_replay_lsn value.
> > >
> > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?
> > >
> > > Maybe we can also do that but pg_is_wal_replay_paused is an existing
> > > API and the behavior is to know whether the recovery paused is
> > > requested or not,  So I am not sure is it a good idea to change the
> > > behavior of the existing API?
> > >
> >
> > Attached is the POC patch to show what I have in mind.
>
> If you don't like it, I doubt anyone else cares for the exact current
> behavior either. Thanks for pointing those issues out.
>
> It would make sense to alter pg_wal_replay_pause() so that it blocks
> until paused.
>
> I suggest you add the 3-value state as you suggest, but make
> pg_is_wal_replay_paused() respond:
> if paused, true
> if requested, wait until paused, then return true
> else false
>
> That then solves your issues with a smoother interface.
>

Make sense to me, I will change as per the suggestion.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Tue, Oct 20, 2020 at 5:59 PM Dilip Kumar <[hidden email]> wrote:

>
> On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs <[hidden email]> wrote:
> >
> > On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote:
> >
> > > > > Why would we want this? What problem are you trying to solve?
> > > >
> > > > The requirement is to know the last replayed WAL on the standby so
> > > > unless we can guarantee that the recovery is actually paused we can
> > > > never get the safe last_replay_lsn value.
> > > >
> > > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?
> > > >
> > > > Maybe we can also do that but pg_is_wal_replay_paused is an existing
> > > > API and the behavior is to know whether the recovery paused is
> > > > requested or not,  So I am not sure is it a good idea to change the
> > > > behavior of the existing API?
> > > >
> > >
> > > Attached is the POC patch to show what I have in mind.
> >
> > If you don't like it, I doubt anyone else cares for the exact current
> > behavior either. Thanks for pointing those issues out.
> >
> > It would make sense to alter pg_wal_replay_pause() so that it blocks
> > until paused.
> >
> > I suggest you add the 3-value state as you suggest, but make
> > pg_is_wal_replay_paused() respond:
> > if paused, true
> > if requested, wait until paused, then return true
> > else false
> >
> > That then solves your issues with a smoother interface.
> >
>
> Make sense to me, I will change as per the suggestion.

I have noticed one more issue, the problem is that if the recovery
process is currently not processing any WAL and just waiting for the
WAL to become available then the pg_is_wal_replay_paused will be stuck
forever.  Having said that there is the same problem even if we design
the new interface which checks whether the recovery is actually paused
or not because until the recovery process gets the next wal it will
not check whether the recovery pause is requested or not so the actual
recovery paused flag will never be set.

One idea could be, if the recovery process is waiting for WAL and a
recovery pause is requested then we can assume that the recovery is
paused because before processing the next wal it will always check
whether the recovery pause is requested or not.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Robert Haas
On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:

> I have noticed one more issue, the problem is that if the recovery
> process is currently not processing any WAL and just waiting for the
> WAL to become available then the pg_is_wal_replay_paused will be stuck
> forever.  Having said that there is the same problem even if we design
> the new interface which checks whether the recovery is actually paused
> or not because until the recovery process gets the next wal it will
> not check whether the recovery pause is requested or not so the actual
> recovery paused flag will never be set.
>
> One idea could be, if the recovery process is waiting for WAL and a
> recovery pause is requested then we can assume that the recovery is
> paused because before processing the next wal it will always check
> whether the recovery pause is requested or not.

That seems fine, because the user's question is presumably whether the
pause has taken effect so that no more records will be replayed
barring an un-paused.

However, it might be better to implement this by having the system
absorb the pause immediately when it's in this state, rather than
trying to detect this state and treat it specially.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Simon Riggs
In reply to this post by Dilip Kumar-2
On Wed, 21 Oct 2020 at 12:16, Dilip Kumar <[hidden email]> wrote:

>
> On Tue, Oct 20, 2020 at 5:59 PM Dilip Kumar <[hidden email]> wrote:
> >
> > On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs <[hidden email]> wrote:
> > >
> > > On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote:
> > >
> > > > > > Why would we want this? What problem are you trying to solve?
> > > > >
> > > > > The requirement is to know the last replayed WAL on the standby so
> > > > > unless we can guarantee that the recovery is actually paused we can
> > > > > never get the safe last_replay_lsn value.
> > > > >
> > > > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish?
> > > > >
> > > > > Maybe we can also do that but pg_is_wal_replay_paused is an existing
> > > > > API and the behavior is to know whether the recovery paused is
> > > > > requested or not,  So I am not sure is it a good idea to change the
> > > > > behavior of the existing API?
> > > > >
> > > >
> > > > Attached is the POC patch to show what I have in mind.
> > >
> > > If you don't like it, I doubt anyone else cares for the exact current
> > > behavior either. Thanks for pointing those issues out.
> > >
> > > It would make sense to alter pg_wal_replay_pause() so that it blocks
> > > until paused.
> > >
> > > I suggest you add the 3-value state as you suggest, but make
> > > pg_is_wal_replay_paused() respond:
> > > if paused, true
> > > if requested, wait until paused, then return true
> > > else false
> > >
> > > That then solves your issues with a smoother interface.
> > >
> >
> > Make sense to me, I will change as per the suggestion.
>
> I have noticed one more issue, the problem is that if the recovery
> process is currently not processing any WAL and just waiting for the
> WAL to become available then the pg_is_wal_replay_paused will be stuck
> forever.  Having said that there is the same problem even if we design
> the new interface which checks whether the recovery is actually paused
> or not because until the recovery process gets the next wal it will
> not check whether the recovery pause is requested or not so the actual
> recovery paused flag will never be set.
>
> One idea could be, if the recovery process is waiting for WAL and a
> recovery pause is requested then we can assume that the recovery is
> paused because before processing the next wal it will always check
> whether the recovery pause is requested or not.

If ReadRecord() is waiting for WAL (at bottom of recovery loop), then
when it does return it will immediately move to pause (at top of next
loop). Which makes it easy to cover these cases.

It would be easy enough to create another variable that shows "waiting
for WAL", since that is in itself a useful and interesting thing to be
able to report.

pg_is_wal_replay_paused() and pg_wal_replay_pause() would then return
whenever it is either (fully paused || waiting for WAL &&
pause_requested))

We can then create a new function called pg_wal_replay_status() that
returns multiple values: RECOVERING | WAITING_FOR_WAL | PAUSED

--
Simon Riggs                http://www.EnterpriseDB.com/


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Kyotaro Horiguchi-4
In reply to this post by Robert Haas
At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> > One idea could be, if the recovery process is waiting for WAL and a
> > recovery pause is requested then we can assume that the recovery is
> > paused because before processing the next wal it will always check
> > whether the recovery pause is requested or not.
..
> However, it might be better to implement this by having the system
> absorb the pause immediately when it's in this state, rather than
> trying to detect this state and treat it specially.

The paused state is shown in pg_stat_activity.wait_event and it is
strange that pg_is_wal_replay_paused() is inconsistent with the
column. To make them consistent, we need to call recoveryPausesHere()
at the end of WaitForWALToBecomeAvailable() and let
pg_wal_replay_pause() call WakeupRecovery().

I think we don't need a separate function to find the state.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
<[hidden email]> wrote:

>
> At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> > > One idea could be, if the recovery process is waiting for WAL and a
> > > recovery pause is requested then we can assume that the recovery is
> > > paused because before processing the next wal it will always check
> > > whether the recovery pause is requested or not.
> ..
> > However, it might be better to implement this by having the system
> > absorb the pause immediately when it's in this state, rather than
> > trying to detect this state and treat it specially.
>
> The paused state is shown in pg_stat_activity.wait_event and it is
> strange that pg_is_wal_replay_paused() is inconsistent with the
> column.

Right

To make them consistent, we need to call recoveryPausesHere()
> at the end of WaitForWALToBecomeAvailable() and let
> pg_wal_replay_pause() call WakeupRecovery().
>
> I think we don't need a separate function to find the state.

The idea makes sense to me.  I will try to change the patch as per the
suggestion.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote:

>
> On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> <[hidden email]> wrote:
> >
> > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> > > > One idea could be, if the recovery process is waiting for WAL and a
> > > > recovery pause is requested then we can assume that the recovery is
> > > > paused because before processing the next wal it will always check
> > > > whether the recovery pause is requested or not.
> > ..
> > > However, it might be better to implement this by having the system
> > > absorb the pause immediately when it's in this state, rather than
> > > trying to detect this state and treat it specially.
> >
> > The paused state is shown in pg_stat_activity.wait_event and it is
> > strange that pg_is_wal_replay_paused() is inconsistent with the
> > column.
>
> Right
>
> To make them consistent, we need to call recoveryPausesHere()
> > at the end of WaitForWALToBecomeAvailable() and let
> > pg_wal_replay_pause() call WakeupRecovery().
> >
> > I think we don't need a separate function to find the state.
>
> The idea makes sense to me.  I will try to change the patch as per the
> suggestion.
Here is the patch based on this idea.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v1-0001-pg_is_wal_replay_paused-will-wait-for-recovery-to.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Yugo Nagata
On Thu, 22 Oct 2020 20:36:48 +0530
Dilip Kumar <[hidden email]> wrote:

> On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote:
> >
> > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> > <[hidden email]> wrote:
> > >
> > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> > > > > One idea could be, if the recovery process is waiting for WAL and a
> > > > > recovery pause is requested then we can assume that the recovery is
> > > > > paused because before processing the next wal it will always check
> > > > > whether the recovery pause is requested or not.
> > > ..
> > > > However, it might be better to implement this by having the system
> > > > absorb the pause immediately when it's in this state, rather than
> > > > trying to detect this state and treat it specially.
> > >
> > > The paused state is shown in pg_stat_activity.wait_event and it is
> > > strange that pg_is_wal_replay_paused() is inconsistent with the
> > > column.
> >
> > Right
> >
> > To make them consistent, we need to call recoveryPausesHere()
> > > at the end of WaitForWALToBecomeAvailable() and let
> > > pg_wal_replay_pause() call WakeupRecovery().
> > >
> > > I think we don't need a separate function to find the state.
> >
> > The idea makes sense to me.  I will try to change the patch as per the
> > suggestion.
>
> Here is the patch based on this idea.

I reviewd this patch.

First, I made a recovery conflict situation using a table lock.

Standby:
#= begin;
#= select * from t;

Primary:
#= begin;
#= lock t in ;

After this, WAL of the table lock cannot be replayed due to a lock acquired
in the standby.

Second, during the delay, I executed pg_wal_replay_pause() and
pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until
max_standby_streaming_delay was expired, and eventually returned true.

I can also see the same behaviour by setting recovery_min_apply_delay.

So, pg_is_wal_replay_paused waits for recovery to get paused and this works
successfully as expected.

However, I wonder users don't expect pg_is_wal_replay_paused to wait.
Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
although this setting may not be usual. In addition, some users may set
recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
it could wait for a long time.

At least, I think we need some descriptions on document to explain
pg_is_wal_replay_paused could wait while a time.

Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
control whether this waits for recovery to get paused or not? By setting its
default value to true or false, users can use the old format for calling this
and the backward compatibility can be maintained.


As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.


+                   errhint("Recovery control functions can only be executed during recovery.")));          

There are a few tabs at the end of this line.

Regards,
Yugo Nagata

--
Yugo NAGATA <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Mon, Nov 30, 2020 at 12:17 PM Yugo NAGATA <[hidden email]> wrote:

Thanks for looking into this.

> On Thu, 22 Oct 2020 20:36:48 +0530
> Dilip Kumar <[hidden email]> wrote:
>
> > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote:
> > >
> > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> > > <[hidden email]> wrote:
> > > >
> > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> > > > > > One idea could be, if the recovery process is waiting for WAL and a
> > > > > > recovery pause is requested then we can assume that the recovery is
> > > > > > paused because before processing the next wal it will always check
> > > > > > whether the recovery pause is requested or not.
> > > > ..
> > > > > However, it might be better to implement this by having the system
> > > > > absorb the pause immediately when it's in this state, rather than
> > > > > trying to detect this state and treat it specially.
> > > >
> > > > The paused state is shown in pg_stat_activity.wait_event and it is
> > > > strange that pg_is_wal_replay_paused() is inconsistent with the
> > > > column.
> > >
> > > Right
> > >
> > > To make them consistent, we need to call recoveryPausesHere()
> > > > at the end of WaitForWALToBecomeAvailable() and let
> > > > pg_wal_replay_pause() call WakeupRecovery().
> > > >
> > > > I think we don't need a separate function to find the state.
> > >
> > > The idea makes sense to me.  I will try to change the patch as per the
> > > suggestion.
> >
> > Here is the patch based on this idea.
>
> I reviewd this patch.
>
> First, I made a recovery conflict situation using a table lock.
>
> Standby:
> #= begin;
> #= select * from t;
>
> Primary:
> #= begin;
> #= lock t in ;
>
> After this, WAL of the table lock cannot be replayed due to a lock acquired
> in the standby.
>
> Second, during the delay, I executed pg_wal_replay_pause() and
> pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until
> max_standby_streaming_delay was expired, and eventually returned true.
>
> I can also see the same behaviour by setting recovery_min_apply_delay.
>
> So, pg_is_wal_replay_paused waits for recovery to get paused and this works
> successfully as expected.
>
> However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> although this setting may not be usual. In addition, some users may set
> recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> it could wait for a long time.
>
> At least, I think we need some descriptions on document to explain
> pg_is_wal_replay_paused could wait while a time.

Ok

> Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> control whether this waits for recovery to get paused or not? By setting its
> default value to true or false, users can use the old format for calling this
> and the backward compatibility can be maintained.

So basically, if the wait_recovery_pause flag is false then we will
immediately return true if the pause is requested?  I agree that it is
good to have an API to know whether the recovery pause is requested or
not but I am not sure is it good idea to make this API serve both the
purpose?  Anyone else have any thoughts on this?

>
> As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
>
>
> +                   errhint("Recovery control functions can only be executed during recovery.")));
>
> There are a few tabs at the end of this line.

I will fix.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Mon, Nov 30, 2020 at 2:40 PM Dilip Kumar <[hidden email]> wrote:

>
> On Mon, Nov 30, 2020 at 12:17 PM Yugo NAGATA <[hidden email]> wrote:
>
> Thanks for looking into this.
>
> > On Thu, 22 Oct 2020 20:36:48 +0530
> > Dilip Kumar <[hidden email]> wrote:
> >
> > > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote:
> > > >
> > > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
> > > > <[hidden email]> wrote:
> > > > >
> > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> > > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> > > > > > > One idea could be, if the recovery process is waiting for WAL and a
> > > > > > > recovery pause is requested then we can assume that the recovery is
> > > > > > > paused because before processing the next wal it will always check
> > > > > > > whether the recovery pause is requested or not.
> > > > > ..
> > > > > > However, it might be better to implement this by having the system
> > > > > > absorb the pause immediately when it's in this state, rather than
> > > > > > trying to detect this state and treat it specially.
> > > > >
> > > > > The paused state is shown in pg_stat_activity.wait_event and it is
> > > > > strange that pg_is_wal_replay_paused() is inconsistent with the
> > > > > column.
> > > >
> > > > Right
> > > >
> > > > To make them consistent, we need to call recoveryPausesHere()
> > > > > at the end of WaitForWALToBecomeAvailable() and let
> > > > > pg_wal_replay_pause() call WakeupRecovery().
> > > > >
> > > > > I think we don't need a separate function to find the state.
> > > >
> > > > The idea makes sense to me.  I will try to change the patch as per the
> > > > suggestion.
> > >
> > > Here is the patch based on this idea.
> >
> > I reviewd this patch.
> >
> > First, I made a recovery conflict situation using a table lock.
> >
> > Standby:
> > #= begin;
> > #= select * from t;
> >
> > Primary:
> > #= begin;
> > #= lock t in ;
> >
> > After this, WAL of the table lock cannot be replayed due to a lock acquired
> > in the standby.
> >
> > Second, during the delay, I executed pg_wal_replay_pause() and
> > pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until
> > max_standby_streaming_delay was expired, and eventually returned true.
> >
> > I can also see the same behaviour by setting recovery_min_apply_delay.
> >
> > So, pg_is_wal_replay_paused waits for recovery to get paused and this works
> > successfully as expected.
> >
> > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > although this setting may not be usual. In addition, some users may set
> > recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> > it could wait for a long time.
> >
> > At least, I think we need some descriptions on document to explain
> > pg_is_wal_replay_paused could wait while a time.
>
> Ok
Fixed this, added some comments in .sgml as well as in function header

> > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > control whether this waits for recovery to get paused or not? By setting its
> > default value to true or false, users can use the old format for calling this
> > and the backward compatibility can be maintained.
>
> So basically, if the wait_recovery_pause flag is false then we will
> immediately return true if the pause is requested?  I agree that it is
> good to have an API to know whether the recovery pause is requested or
> not but I am not sure is it good idea to make this API serve both the
> purpose?  Anyone else have any thoughts on this?
>
> >
> > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> >
> >
> > +                   errhint("Recovery control functions can only be executed during recovery.")));
> >
> > There are a few tabs at the end of this line.
>
> I will fix.
Fixed this as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v2-0001-pg_is_wal_replay_paused-will-wait-for-recovery-to.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Yugo Nagata
On Thu, 10 Dec 2020 11:25:23 +0530
Dilip Kumar <[hidden email]> wrote:

> > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > although this setting may not be usual. In addition, some users may set
> > > recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> > > it could wait for a long time.
> > >
> > > At least, I think we need some descriptions on document to explain
> > > pg_is_wal_replay_paused could wait while a time.
> >
> > Ok
>
> Fixed this, added some comments in .sgml as well as in function header

Thank you for fixing this.

Also, is it better to fix the description of pg_wal_replay_pause from
"Pauses recovery." to "Request to pause recovery." in according with
pg_is_wal_replay_paused?

> > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > control whether this waits for recovery to get paused or not? By setting its
> > > default value to true or false, users can use the old format for calling this
> > > and the backward compatibility can be maintained.
> >
> > So basically, if the wait_recovery_pause flag is false then we will
> > immediately return true if the pause is requested?  I agree that it is
> > good to have an API to know whether the recovery pause is requested or
> > not but I am not sure is it good idea to make this API serve both the
> > purpose?  Anyone else have any thoughts on this?
> >

I think the current pg_is_wal_replay_paused() already has another purpose;
this waits recovery to actually get paused. If we want to limit this API's
purpose only to return the pause state, it seems better to fix this to return
the actual state at the cost of lacking the backward compatibility. If we want
to know whether pause is requested, we may add a new API like
pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
get paused, we may add an option to pg_wal_replay_pause() for this purpose.

However, this might be a bikeshedding. If anyone don't care that
pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.

> > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.

How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
this is blocking.

Regards,
Yugo Nagata

--
Yugo NAGATA <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote:

>
> On Thu, 10 Dec 2020 11:25:23 +0530
> Dilip Kumar <[hidden email]> wrote:
>
> > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > > although this setting may not be usual. In addition, some users may set
> > > > recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> > > > it could wait for a long time.
> > > >
> > > > At least, I think we need some descriptions on document to explain
> > > > pg_is_wal_replay_paused could wait while a time.
> > >
> > > Ok
> >
> > Fixed this, added some comments in .sgml as well as in function header
>
> Thank you for fixing this.
>
> Also, is it better to fix the description of pg_wal_replay_pause from
> "Pauses recovery." to "Request to pause recovery." in according with
> pg_is_wal_replay_paused?

Okay

>
> > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > > control whether this waits for recovery to get paused or not? By setting its
> > > > default value to true or false, users can use the old format for calling this
> > > > and the backward compatibility can be maintained.
> > >
> > > So basically, if the wait_recovery_pause flag is false then we will
> > > immediately return true if the pause is requested?  I agree that it is
> > > good to have an API to know whether the recovery pause is requested or
> > > not but I am not sure is it good idea to make this API serve both the
> > > purpose?  Anyone else have any thoughts on this?
> > >
>
> I think the current pg_is_wal_replay_paused() already has another purpose;
> this waits recovery to actually get paused. If we want to limit this API's
> purpose only to return the pause state, it seems better to fix this to return
> the actual state at the cost of lacking the backward compatibility. If we want
> to know whether pause is requested, we may add a new API like
> pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
> get paused, we may add an option to pg_wal_replay_pause() for this purpose.
>
> However, this might be a bikeshedding. If anyone don't care that
> pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.

I don't think that it will be blocked ever, because
pg_wal_replay_pause is sending the WakeupRecovery() which means the
recovery process will not be stuck on waiting for the WAL.

> > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
>
> How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
> this is blocking.

Yeah, we can do this.  I will send the updated patch after putting
some more thought into these comments.  Thanks again for the feedback.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Dilip Kumar-2
On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <[hidden email]> wrote:

>
> On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote:
> >
> > On Thu, 10 Dec 2020 11:25:23 +0530
> > Dilip Kumar <[hidden email]> wrote:
> >
> > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > > > although this setting may not be usual. In addition, some users may set
> > > > > recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> > > > > it could wait for a long time.
> > > > >
> > > > > At least, I think we need some descriptions on document to explain
> > > > > pg_is_wal_replay_paused could wait while a time.
> > > >
> > > > Ok
> > >
> > > Fixed this, added some comments in .sgml as well as in function header
> >
> > Thank you for fixing this.
> >
> > Also, is it better to fix the description of pg_wal_replay_pause from
> > "Pauses recovery." to "Request to pause recovery." in according with
> > pg_is_wal_replay_paused?
>
> Okay
>
> >
> > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > > > control whether this waits for recovery to get paused or not? By setting its
> > > > > default value to true or false, users can use the old format for calling this
> > > > > and the backward compatibility can be maintained.
> > > >
> > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > immediately return true if the pause is requested?  I agree that it is
> > > > good to have an API to know whether the recovery pause is requested or
> > > > not but I am not sure is it good idea to make this API serve both the
> > > > purpose?  Anyone else have any thoughts on this?
> > > >
> >
> > I think the current pg_is_wal_replay_paused() already has another purpose;
> > this waits recovery to actually get paused. If we want to limit this API's
> > purpose only to return the pause state, it seems better to fix this to return
> > the actual state at the cost of lacking the backward compatibility. If we want
> > to know whether pause is requested, we may add a new API like
> > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
> > get paused, we may add an option to pg_wal_replay_pause() for this purpose.
> >
> > However, this might be a bikeshedding. If anyone don't care that
> > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.
>
> I don't think that it will be blocked ever, because
> pg_wal_replay_pause is sending the WakeupRecovery() which means the
> recovery process will not be stuck on waiting for the WAL.
>
> > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> >
> > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
> > this is blocking.
>
> Yeah, we can do this.  I will send the updated patch after putting
> some more thought into these comments.  Thanks again for the feedback.
>
Please find the updated patch.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

v3-0001-pg_is_wal_replay_paused-will-wait-for-recovery-to.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Yugo Nagata
On Wed, 13 Jan 2021 17:49:43 +0530
Dilip Kumar <[hidden email]> wrote:

> On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <[hidden email]> wrote:
> >
> > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote:
> > >
> > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > Dilip Kumar <[hidden email]> wrote:
> > >
> > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > > > > although this setting may not be usual. In addition, some users may set
> > > > > > recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> > > > > > it could wait for a long time.
> > > > > >
> > > > > > At least, I think we need some descriptions on document to explain
> > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > >
> > > > > Ok
> > > >
> > > > Fixed this, added some comments in .sgml as well as in function header
> > >
> > > Thank you for fixing this.
> > >
> > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > "Pauses recovery." to "Request to pause recovery." in according with
> > > pg_is_wal_replay_paused?
> >
> > Okay
> >
> > >
> > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > > > > control whether this waits for recovery to get paused or not? By setting its
> > > > > > default value to true or false, users can use the old format for calling this
> > > > > > and the backward compatibility can be maintained.
> > > > >
> > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > immediately return true if the pause is requested?  I agree that it is
> > > > > good to have an API to know whether the recovery pause is requested or
> > > > > not but I am not sure is it good idea to make this API serve both the
> > > > > purpose?  Anyone else have any thoughts on this?
> > > > >
> > >
> > > I think the current pg_is_wal_replay_paused() already has another purpose;
> > > this waits recovery to actually get paused. If we want to limit this API's
> > > purpose only to return the pause state, it seems better to fix this to return
> > > the actual state at the cost of lacking the backward compatibility. If we want
> > > to know whether pause is requested, we may add a new API like
> > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
> > > get paused, we may add an option to pg_wal_replay_pause() for this purpose.
> > >
> > > However, this might be a bikeshedding. If anyone don't care that
> > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.
> >
> > I don't think that it will be blocked ever, because
> > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > recovery process will not be stuck on waiting for the WAL.

Yes, there is no stuck on waiting for the WAL. However, it can be stuck during resolving
a recovery conflict. The process could wait for max_standby_streaming_delay or
max_standby_archive_delay at most before recovery get completely paused.

Also, it could wait for recovery_min_apply_delay if it has a valid value. It is possible
that a user set this parameter to a large value, so it could wait for a long time. However,
this will be avoided by calling recoveryPausesHere() or CheckAndSetRecoveryPause() in
recoveryApplyDelay().

> > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> > >
> > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
> > > this is blocking.
> >
> > Yeah, we can do this.  I will send the updated patch after putting
> > some more thought into these comments.  Thanks again for the feedback.
> >
>
> Please find the updated patch.

Thanks.  I confirmed that I can cancel pg_is_wal_repaly_paused() during stuck.


Although it is a very trivial comment, I think that the new line before
HandleStartupProcInterrupts() is unnecessary.

@@ -6052,12 +6062,20 @@ recoveryPausesHere(bool endOfRecovery)
  (errmsg("recovery has paused"),
  errhint("Execute pg_wal_replay_resume() to continue.")));
 
- while (RecoveryIsPaused())
+ while (RecoveryPauseRequested())
  {
+
  HandleStartupProcInterrupts();


Regards,
Yugo Nagata

--
Yugo NAGATA <[hidden email]>


Reply | Threaded
Open this post in threaded view
|

Re: Is Recovery actually paused?

Masahiko Sawada
In reply to this post by Dilip Kumar-2
On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar <[hidden email]> wrote:

>
> On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <[hidden email]> wrote:
> >
> > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote:
> > >
> > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > Dilip Kumar <[hidden email]> wrote:
> > >
> > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait.
> > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever,
> > > > > > although this setting may not be usual. In addition, some users may set
> > > > > > recovery_min_apply_delay for a large.  If such users call pg_is_wal_replay_paused,
> > > > > > it could wait for a long time.
> > > > > >
> > > > > > At least, I think we need some descriptions on document to explain
> > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > >
> > > > > Ok
> > > >
> > > > Fixed this, added some comments in .sgml as well as in function header
> > >
> > > Thank you for fixing this.
> > >
> > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > "Pauses recovery." to "Request to pause recovery." in according with
> > > pg_is_wal_replay_paused?
> >
> > Okay
> >
> > >
> > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to
> > > > > > control whether this waits for recovery to get paused or not? By setting its
> > > > > > default value to true or false, users can use the old format for calling this
> > > > > > and the backward compatibility can be maintained.
> > > > >
> > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > immediately return true if the pause is requested?  I agree that it is
> > > > > good to have an API to know whether the recovery pause is requested or
> > > > > not but I am not sure is it good idea to make this API serve both the
> > > > > purpose?  Anyone else have any thoughts on this?
> > > > >
> > >
> > > I think the current pg_is_wal_replay_paused() already has another purpose;
> > > this waits recovery to actually get paused. If we want to limit this API's
> > > purpose only to return the pause state, it seems better to fix this to return
> > > the actual state at the cost of lacking the backward compatibility. If we want
> > > to know whether pause is requested, we may add a new API like
> > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually
> > > get paused, we may add an option to pg_wal_replay_pause() for this purpose.
> > >
> > > However, this might be a bikeshedding. If anyone don't care that
> > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either.
> >
> > I don't think that it will be blocked ever, because
> > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > recovery process will not be stuck on waiting for the WAL.
> >
> > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel
> > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop.
> > >
> > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during
> > > this is blocking.
> >
> > Yeah, we can do this.  I will send the updated patch after putting
> > some more thought into these comments.  Thanks again for the feedback.
> >
>
> Please find the updated patch.

I've looked at the patch. Here are review comments:

+       /* Recovery pause state */
+       RecoveryPauseState              recoveryPause;

Now that the value can have tri-state, how about renaming it to
recoveryPauseState?

---
 bool
 RecoveryIsPaused(void)
+{
+       bool    recoveryPause;
+
+       SpinLockAcquire(&XLogCtl->info_lck);
+       recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
true : false;
+       SpinLockRelease(&XLogCtl->info_lck);
+
+       return recoveryPause;
+}
+
+bool
+RecoveryPauseRequested(void)
 {
        bool            recoveryPause;

        SpinLockAcquire(&XLogCtl->info_lck);
-       recoveryPause = XLogCtl->recoveryPause;
+       recoveryPause = (XLogCtl->recoveryPause !=
RECOVERY_IN_PROGRESS) ? true : false;
        SpinLockRelease(&XLogCtl->info_lck);

        return recoveryPause;
 }

We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED);

Also, since these functions do the almost same thing, I think we can
have a common function to get XLogCtl->recoveryPause, say
GetRecoveryPauseState() or GetRecoveryPause(), and both
RecoveryIsPaused() and RecoveryPauseRequested() use the returned
value. What do you think?

---
+static void
+CheckAndSetRecoveryPause(void)

Maybe we need to declare the prototype of this function like other
functions in xlog.c.

---
+       /*
+        * If recovery is not in progress anymore then report an error this
+        * could happen if the standby is promoted while we were waiting for
+        * recovery to get paused.
+        */
+       if (!RecoveryInProgress())
+           ereport(ERROR,
+                   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                   errmsg("recovery is not in progress"),
+                   errhint("Recovery control functions can only be
executed during recovery.")));

I think we can improve the error message so that we can tell users the
standby has been promoted during the wait. For example,

                  errmsg("the standby was promoted during waiting for
recovery to be paused")));

---
+       /* test for recovery pause if user has requested the pause */
+       if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
+           recoveryPausesHere(false);
+
+       now = GetCurrentTimestamp();
+

Hmm, if the recovery pauses here, the wal receiver isn't launched even
when wal_retrieve_retry_interval has passed, which seems not good. I
think we want the recovery to be paused but want the wal receiver to
continue receiving WAL.

And why do we need to set 'now' here?

---
/*
 * Wait until shared recoveryPause flag is cleared.
 *
 * endOfRecovery is true if the recovery target is reached and
 * the paused state starts at the end of recovery because of
 * recovery_target_action=pause, and false otherwise.
 *
 * XXX Could also be done with shared latch, avoiding the pg_usleep loop.
 * Probably not worth the trouble though.  This state shouldn't be one that
 * anyone cares about server power consumption in.
 */
static void
recoveryPausesHere(bool endOfRecovery)

We can improve the first sentence in the above function comment to
"Wait until shared recoveryPause is set to RECOVERY_IN_PROGRESS" or
something.

---
-   PG_RETURN_BOOL(RecoveryIsPaused());
+   if (!RecoveryPauseRequested())
+       PG_RETURN_BOOL(false);
+
+   /* loop until the recovery is actually paused */
+   while(!RecoveryIsPaused())
+   {
+       pg_usleep(10000L);  /* wait for 10 msec */
+
+       /* meanwhile if recovery is resume requested then return false */
+       if (!RecoveryPauseRequested())
+           PG_RETURN_BOOL(false);
+
+       CHECK_FOR_INTERRUPTS();
+
+       /*
+        * If recovery is not in progress anymore then report an error this
+        * could happen if the standby is promoted while we were waiting for
+        * recovery to get paused.
+        */
+       if (!RecoveryInProgress())
+           ereport(ERROR,
+                   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                   errmsg("recovery is not in progress"),
+                   errhint("Recovery control functions can only be
executed during recovery.")));
+   }
+
+   PG_RETURN_BOOL(true);

We have the same !RecoveryPauseRequested() check twice, how about the
following arrangement?

    for (;;)
    {
        if (!RecoveryPauseRequested())
            PG_RETURN_BOOL(false);

        if (RecoveryIsPaused())
            break;

        pg_usleep(10000L);

        CHECK_FOR_INTERRUPTS();

        if (!RecoveryInProgress())
            ereport(...);
    }

PG_RETURN_BOOL(true);

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


123