Hello,
We have an interface to pause the WAL replay (pg_wal_replay_pause) and to know whether the WAL replay pause is requested (pg_is_wal_replay_paused). But there is no way to know whether the recovery is actually paused or not. Actually, the recovery process might process an extra WAL before pausing the recovery. So does it make sense to provide a new interface to tell whether the recovery is actually paused or not? One solution could be that we convert the XLogCtlData->recoveryPause from bool to tri-state variable (0-> recovery not paused 1-> pause requested 2-> actually paused). Any opinion on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Mon, 19 Oct 2020 at 15:11, Dilip Kumar <[hidden email]> wrote:
> We have an interface to pause the WAL replay (pg_wal_replay_pause) and > to know whether the WAL replay pause is requested > (pg_is_wal_replay_paused). But there is no way to know whether the > recovery is actually paused or not. Actually, the recovery process > might process an extra WAL before pausing the recovery. So does it > make sense to provide a new interface to tell whether the recovery is > actually paused or not? > > One solution could be that we convert the XLogCtlData->recoveryPause > from bool to tri-state variable (0-> recovery not paused 1-> pause > requested 2-> actually paused). > > Any opinion on this? Why would we want this? What problem are you trying to solve? If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? -- Simon Riggs http://www.EnterpriseDB.com/ |
On Tue, Oct 20, 2020 at 1:11 PM Simon Riggs <[hidden email]> wrote:
> > On Mon, 19 Oct 2020 at 15:11, Dilip Kumar <[hidden email]> wrote: > > > We have an interface to pause the WAL replay (pg_wal_replay_pause) and > > to know whether the WAL replay pause is requested > > (pg_is_wal_replay_paused). But there is no way to know whether the > > recovery is actually paused or not. Actually, the recovery process > > might process an extra WAL before pausing the recovery. So does it > > make sense to provide a new interface to tell whether the recovery is > > actually paused or not? > > > > One solution could be that we convert the XLogCtlData->recoveryPause > > from bool to tri-state variable (0-> recovery not paused 1-> pause > > requested 2-> actually paused). > > > > Any opinion on this? > > Why would we want this? What problem are you trying to solve? The requirement is to know the last replayed WAL on the standby so unless we can guarantee that the recovery is actually paused we can never get the safe last_replay_lsn value. > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? Maybe we can also do that but pg_is_wal_replay_paused is an existing API and the behavior is to know whether the recovery paused is requested or not, So I am not sure is it a good idea to change the behavior of the existing API? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Tue, Oct 20, 2020 at 1:22 PM Dilip Kumar <[hidden email]> wrote:
> > On Tue, Oct 20, 2020 at 1:11 PM Simon Riggs <[hidden email]> wrote: > > > > On Mon, 19 Oct 2020 at 15:11, Dilip Kumar <[hidden email]> wrote: > > > > > We have an interface to pause the WAL replay (pg_wal_replay_pause) and > > > to know whether the WAL replay pause is requested > > > (pg_is_wal_replay_paused). But there is no way to know whether the > > > recovery is actually paused or not. Actually, the recovery process > > > might process an extra WAL before pausing the recovery. So does it > > > make sense to provide a new interface to tell whether the recovery is > > > actually paused or not? > > > > > > One solution could be that we convert the XLogCtlData->recoveryPause > > > from bool to tri-state variable (0-> recovery not paused 1-> pause > > > requested 2-> actually paused). > > > > > > Any opinion on this? > > > > Why would we want this? What problem are you trying to solve? > > The requirement is to know the last replayed WAL on the standby so > unless we can guarantee that the recovery is actually paused we can > never get the safe last_replay_lsn value. > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? > > Maybe we can also do that but pg_is_wal_replay_paused is an existing > API and the behavior is to know whether the recovery paused is > requested or not, So I am not sure is it a good idea to change the > behavior of the existing API? > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote:
> > > Why would we want this? What problem are you trying to solve? > > > > The requirement is to know the last replayed WAL on the standby so > > unless we can guarantee that the recovery is actually paused we can > > never get the safe last_replay_lsn value. > > > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? > > > > Maybe we can also do that but pg_is_wal_replay_paused is an existing > > API and the behavior is to know whether the recovery paused is > > requested or not, So I am not sure is it a good idea to change the > > behavior of the existing API? > > > > Attached is the POC patch to show what I have in mind. If you don't like it, I doubt anyone else cares for the exact current behavior either. Thanks for pointing those issues out. It would make sense to alter pg_wal_replay_pause() so that it blocks until paused. I suggest you add the 3-value state as you suggest, but make pg_is_wal_replay_paused() respond: if paused, true if requested, wait until paused, then return true else false That then solves your issues with a smoother interface. -- Simon Riggs http://www.EnterpriseDB.com/ |
On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs <[hidden email]> wrote:
> > On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote: > > > > > Why would we want this? What problem are you trying to solve? > > > > > > The requirement is to know the last replayed WAL on the standby so > > > unless we can guarantee that the recovery is actually paused we can > > > never get the safe last_replay_lsn value. > > > > > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? > > > > > > Maybe we can also do that but pg_is_wal_replay_paused is an existing > > > API and the behavior is to know whether the recovery paused is > > > requested or not, So I am not sure is it a good idea to change the > > > behavior of the existing API? > > > > > > > Attached is the POC patch to show what I have in mind. > > If you don't like it, I doubt anyone else cares for the exact current > behavior either. Thanks for pointing those issues out. > > It would make sense to alter pg_wal_replay_pause() so that it blocks > until paused. > > I suggest you add the 3-value state as you suggest, but make > pg_is_wal_replay_paused() respond: > if paused, true > if requested, wait until paused, then return true > else false > > That then solves your issues with a smoother interface. > Make sense to me, I will change as per the suggestion. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Tue, Oct 20, 2020 at 5:59 PM Dilip Kumar <[hidden email]> wrote:
> > On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs <[hidden email]> wrote: > > > > On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote: > > > > > > > Why would we want this? What problem are you trying to solve? > > > > > > > > The requirement is to know the last replayed WAL on the standby so > > > > unless we can guarantee that the recovery is actually paused we can > > > > never get the safe last_replay_lsn value. > > > > > > > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? > > > > > > > > Maybe we can also do that but pg_is_wal_replay_paused is an existing > > > > API and the behavior is to know whether the recovery paused is > > > > requested or not, So I am not sure is it a good idea to change the > > > > behavior of the existing API? > > > > > > > > > > Attached is the POC patch to show what I have in mind. > > > > If you don't like it, I doubt anyone else cares for the exact current > > behavior either. Thanks for pointing those issues out. > > > > It would make sense to alter pg_wal_replay_pause() so that it blocks > > until paused. > > > > I suggest you add the 3-value state as you suggest, but make > > pg_is_wal_replay_paused() respond: > > if paused, true > > if requested, wait until paused, then return true > > else false > > > > That then solves your issues with a smoother interface. > > > > Make sense to me, I will change as per the suggestion. I have noticed one more issue, the problem is that if the recovery process is currently not processing any WAL and just waiting for the WAL to become available then the pg_is_wal_replay_paused will be stuck forever. Having said that there is the same problem even if we design the new interface which checks whether the recovery is actually paused or not because until the recovery process gets the next wal it will not check whether the recovery pause is requested or not so the actual recovery paused flag will never be set. One idea could be, if the recovery process is waiting for WAL and a recovery pause is requested then we can assume that the recovery is paused because before processing the next wal it will always check whether the recovery pause is requested or not. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote:
> I have noticed one more issue, the problem is that if the recovery > process is currently not processing any WAL and just waiting for the > WAL to become available then the pg_is_wal_replay_paused will be stuck > forever. Having said that there is the same problem even if we design > the new interface which checks whether the recovery is actually paused > or not because until the recovery process gets the next wal it will > not check whether the recovery pause is requested or not so the actual > recovery paused flag will never be set. > > One idea could be, if the recovery process is waiting for WAL and a > recovery pause is requested then we can assume that the recovery is > paused because before processing the next wal it will always check > whether the recovery pause is requested or not. That seems fine, because the user's question is presumably whether the pause has taken effect so that no more records will be replayed barring an un-paused. However, it might be better to implement this by having the system absorb the pause immediately when it's in this state, rather than trying to detect this state and treat it specially. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company |
In reply to this post by Dilip Kumar-2
On Wed, 21 Oct 2020 at 12:16, Dilip Kumar <[hidden email]> wrote:
> > On Tue, Oct 20, 2020 at 5:59 PM Dilip Kumar <[hidden email]> wrote: > > > > On Tue, Oct 20, 2020 at 3:00 PM Simon Riggs <[hidden email]> wrote: > > > > > > On Tue, 20 Oct 2020 at 09:50, Dilip Kumar <[hidden email]> wrote: > > > > > > > > > Why would we want this? What problem are you trying to solve? > > > > > > > > > > The requirement is to know the last replayed WAL on the standby so > > > > > unless we can guarantee that the recovery is actually paused we can > > > > > never get the safe last_replay_lsn value. > > > > > > > > > > > If we do care, why not fix pg_is_wal_replay_paused() so it responds as you wish? > > > > > > > > > > Maybe we can also do that but pg_is_wal_replay_paused is an existing > > > > > API and the behavior is to know whether the recovery paused is > > > > > requested or not, So I am not sure is it a good idea to change the > > > > > behavior of the existing API? > > > > > > > > > > > > > Attached is the POC patch to show what I have in mind. > > > > > > If you don't like it, I doubt anyone else cares for the exact current > > > behavior either. Thanks for pointing those issues out. > > > > > > It would make sense to alter pg_wal_replay_pause() so that it blocks > > > until paused. > > > > > > I suggest you add the 3-value state as you suggest, but make > > > pg_is_wal_replay_paused() respond: > > > if paused, true > > > if requested, wait until paused, then return true > > > else false > > > > > > That then solves your issues with a smoother interface. > > > > > > > Make sense to me, I will change as per the suggestion. > > I have noticed one more issue, the problem is that if the recovery > process is currently not processing any WAL and just waiting for the > WAL to become available then the pg_is_wal_replay_paused will be stuck > forever. Having said that there is the same problem even if we design > the new interface which checks whether the recovery is actually paused > or not because until the recovery process gets the next wal it will > not check whether the recovery pause is requested or not so the actual > recovery paused flag will never be set. > > One idea could be, if the recovery process is waiting for WAL and a > recovery pause is requested then we can assume that the recovery is > paused because before processing the next wal it will always check > whether the recovery pause is requested or not. If ReadRecord() is waiting for WAL (at bottom of recovery loop), then when it does return it will immediately move to pause (at top of next loop). Which makes it easy to cover these cases. It would be easy enough to create another variable that shows "waiting for WAL", since that is in itself a useful and interesting thing to be able to report. pg_is_wal_replay_paused() and pg_wal_replay_pause() would then return whenever it is either (fully paused || waiting for WAL && pause_requested)) We can then create a new function called pg_wal_replay_status() that returns multiple values: RECOVERING | WAITING_FOR_WAL | PAUSED -- Simon Riggs http://www.EnterpriseDB.com/ |
In reply to this post by Robert Haas
At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in
> On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote: > > One idea could be, if the recovery process is waiting for WAL and a > > recovery pause is requested then we can assume that the recovery is > > paused because before processing the next wal it will always check > > whether the recovery pause is requested or not. .. > However, it might be better to implement this by having the system > absorb the pause immediately when it's in this state, rather than > trying to detect this state and treat it specially. The paused state is shown in pg_stat_activity.wait_event and it is strange that pg_is_wal_replay_paused() is inconsistent with the column. To make them consistent, we need to call recoveryPausesHere() at the end of WaitForWALToBecomeAvailable() and let pg_wal_replay_pause() call WakeupRecovery(). I think we don't need a separate function to find the state. regards. -- Kyotaro Horiguchi NTT Open Source Software Center |
On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi
<[hidden email]> wrote: > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote: > > > One idea could be, if the recovery process is waiting for WAL and a > > > recovery pause is requested then we can assume that the recovery is > > > paused because before processing the next wal it will always check > > > whether the recovery pause is requested or not. > .. > > However, it might be better to implement this by having the system > > absorb the pause immediately when it's in this state, rather than > > trying to detect this state and treat it specially. > > The paused state is shown in pg_stat_activity.wait_event and it is > strange that pg_is_wal_replay_paused() is inconsistent with the > column. Right To make them consistent, we need to call recoveryPausesHere() > at the end of WaitForWALToBecomeAvailable() and let > pg_wal_replay_pause() call WakeupRecovery(). > > I think we don't need a separate function to find the state. The idea makes sense to me. I will try to change the patch as per the suggestion. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote:
> > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi > <[hidden email]> wrote: > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote: > > > > One idea could be, if the recovery process is waiting for WAL and a > > > > recovery pause is requested then we can assume that the recovery is > > > > paused because before processing the next wal it will always check > > > > whether the recovery pause is requested or not. > > .. > > > However, it might be better to implement this by having the system > > > absorb the pause immediately when it's in this state, rather than > > > trying to detect this state and treat it specially. > > > > The paused state is shown in pg_stat_activity.wait_event and it is > > strange that pg_is_wal_replay_paused() is inconsistent with the > > column. > > Right > > To make them consistent, we need to call recoveryPausesHere() > > at the end of WaitForWALToBecomeAvailable() and let > > pg_wal_replay_pause() call WakeupRecovery(). > > > > I think we don't need a separate function to find the state. > > The idea makes sense to me. I will try to change the patch as per the > suggestion. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Thu, 22 Oct 2020 20:36:48 +0530
Dilip Kumar <[hidden email]> wrote: > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote: > > > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi > > <[hidden email]> wrote: > > > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote: > > > > > One idea could be, if the recovery process is waiting for WAL and a > > > > > recovery pause is requested then we can assume that the recovery is > > > > > paused because before processing the next wal it will always check > > > > > whether the recovery pause is requested or not. > > > .. > > > > However, it might be better to implement this by having the system > > > > absorb the pause immediately when it's in this state, rather than > > > > trying to detect this state and treat it specially. > > > > > > The paused state is shown in pg_stat_activity.wait_event and it is > > > strange that pg_is_wal_replay_paused() is inconsistent with the > > > column. > > > > Right > > > > To make them consistent, we need to call recoveryPausesHere() > > > at the end of WaitForWALToBecomeAvailable() and let > > > pg_wal_replay_pause() call WakeupRecovery(). > > > > > > I think we don't need a separate function to find the state. > > > > The idea makes sense to me. I will try to change the patch as per the > > suggestion. > > Here is the patch based on this idea. I reviewd this patch. First, I made a recovery conflict situation using a table lock. Standby: #= begin; #= select * from t; Primary: #= begin; #= lock t in ; After this, WAL of the table lock cannot be replayed due to a lock acquired in the standby. Second, during the delay, I executed pg_wal_replay_pause() and pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until max_standby_streaming_delay was expired, and eventually returned true. I can also see the same behaviour by setting recovery_min_apply_delay. So, pg_is_wal_replay_paused waits for recovery to get paused and this works successfully as expected. However, I wonder users don't expect pg_is_wal_replay_paused to wait. Especially, if max_standby_streaming_delay is -1, this will be blocked forever, although this setting may not be usual. In addition, some users may set recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, it could wait for a long time. At least, I think we need some descriptions on document to explain pg_is_wal_replay_paused could wait while a time. Also, how about adding a new boolean argument to pg_is_wal_replay_paused to control whether this waits for recovery to get paused or not? By setting its default value to true or false, users can use the old format for calling this and the backward compatibility can be maintained. As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. + errhint("Recovery control functions can only be executed during recovery."))); There are a few tabs at the end of this line. Regards, Yugo Nagata -- Yugo NAGATA <[hidden email]> |
On Mon, Nov 30, 2020 at 12:17 PM Yugo NAGATA <[hidden email]> wrote:
Thanks for looking into this. > On Thu, 22 Oct 2020 20:36:48 +0530 > Dilip Kumar <[hidden email]> wrote: > > > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote: > > > > > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi > > > <[hidden email]> wrote: > > > > > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in > > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote: > > > > > > One idea could be, if the recovery process is waiting for WAL and a > > > > > > recovery pause is requested then we can assume that the recovery is > > > > > > paused because before processing the next wal it will always check > > > > > > whether the recovery pause is requested or not. > > > > .. > > > > > However, it might be better to implement this by having the system > > > > > absorb the pause immediately when it's in this state, rather than > > > > > trying to detect this state and treat it specially. > > > > > > > > The paused state is shown in pg_stat_activity.wait_event and it is > > > > strange that pg_is_wal_replay_paused() is inconsistent with the > > > > column. > > > > > > Right > > > > > > To make them consistent, we need to call recoveryPausesHere() > > > > at the end of WaitForWALToBecomeAvailable() and let > > > > pg_wal_replay_pause() call WakeupRecovery(). > > > > > > > > I think we don't need a separate function to find the state. > > > > > > The idea makes sense to me. I will try to change the patch as per the > > > suggestion. > > > > Here is the patch based on this idea. > > I reviewd this patch. > > First, I made a recovery conflict situation using a table lock. > > Standby: > #= begin; > #= select * from t; > > Primary: > #= begin; > #= lock t in ; > > After this, WAL of the table lock cannot be replayed due to a lock acquired > in the standby. > > Second, during the delay, I executed pg_wal_replay_pause() and > pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until > max_standby_streaming_delay was expired, and eventually returned true. > > I can also see the same behaviour by setting recovery_min_apply_delay. > > So, pg_is_wal_replay_paused waits for recovery to get paused and this works > successfully as expected. > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > although this setting may not be usual. In addition, some users may set > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > it could wait for a long time. > > At least, I think we need some descriptions on document to explain > pg_is_wal_replay_paused could wait while a time. Ok > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > control whether this waits for recovery to get paused or not? By setting its > default value to true or false, users can use the old format for calling this > and the backward compatibility can be maintained. So basically, if the wait_recovery_pause flag is false then we will immediately return true if the pause is requested? I agree that it is good to have an API to know whether the recovery pause is requested or not but I am not sure is it good idea to make this API serve both the purpose? Anyone else have any thoughts on this? > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > > + errhint("Recovery control functions can only be executed during recovery."))); > > There are a few tabs at the end of this line. I will fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Mon, Nov 30, 2020 at 2:40 PM Dilip Kumar <[hidden email]> wrote:
> > On Mon, Nov 30, 2020 at 12:17 PM Yugo NAGATA <[hidden email]> wrote: > > Thanks for looking into this. > > > On Thu, 22 Oct 2020 20:36:48 +0530 > > Dilip Kumar <[hidden email]> wrote: > > > > > On Thu, Oct 22, 2020 at 7:50 PM Dilip Kumar <[hidden email]> wrote: > > > > > > > > On Thu, Oct 22, 2020 at 6:59 AM Kyotaro Horiguchi > > > > <[hidden email]> wrote: > > > > > > > > > > At Wed, 21 Oct 2020 11:14:24 -0400, Robert Haas <[hidden email]> wrote in > > > > > > On Wed, Oct 21, 2020 at 7:16 AM Dilip Kumar <[hidden email]> wrote: > > > > > > > One idea could be, if the recovery process is waiting for WAL and a > > > > > > > recovery pause is requested then we can assume that the recovery is > > > > > > > paused because before processing the next wal it will always check > > > > > > > whether the recovery pause is requested or not. > > > > > .. > > > > > > However, it might be better to implement this by having the system > > > > > > absorb the pause immediately when it's in this state, rather than > > > > > > trying to detect this state and treat it specially. > > > > > > > > > > The paused state is shown in pg_stat_activity.wait_event and it is > > > > > strange that pg_is_wal_replay_paused() is inconsistent with the > > > > > column. > > > > > > > > Right > > > > > > > > To make them consistent, we need to call recoveryPausesHere() > > > > > at the end of WaitForWALToBecomeAvailable() and let > > > > > pg_wal_replay_pause() call WakeupRecovery(). > > > > > > > > > > I think we don't need a separate function to find the state. > > > > > > > > The idea makes sense to me. I will try to change the patch as per the > > > > suggestion. > > > > > > Here is the patch based on this idea. > > > > I reviewd this patch. > > > > First, I made a recovery conflict situation using a table lock. > > > > Standby: > > #= begin; > > #= select * from t; > > > > Primary: > > #= begin; > > #= lock t in ; > > > > After this, WAL of the table lock cannot be replayed due to a lock acquired > > in the standby. > > > > Second, during the delay, I executed pg_wal_replay_pause() and > > pg_is_wal_replay_paused(). Then, pg_is_wal_replay_paused was blocked until > > max_standby_streaming_delay was expired, and eventually returned true. > > > > I can also see the same behaviour by setting recovery_min_apply_delay. > > > > So, pg_is_wal_replay_paused waits for recovery to get paused and this works > > successfully as expected. > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > although this setting may not be usual. In addition, some users may set > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > it could wait for a long time. > > > > At least, I think we need some descriptions on document to explain > > pg_is_wal_replay_paused could wait while a time. > > Ok > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > control whether this waits for recovery to get paused or not? By setting its > > default value to true or false, users can use the old format for calling this > > and the backward compatibility can be maintained. > > So basically, if the wait_recovery_pause flag is false then we will > immediately return true if the pause is requested? I agree that it is > good to have an API to know whether the recovery pause is requested or > not but I am not sure is it good idea to make this API serve both the > purpose? Anyone else have any thoughts on this? > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > > > > > + errhint("Recovery control functions can only be executed during recovery."))); > > > > There are a few tabs at the end of this line. > > I will fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Thu, 10 Dec 2020 11:25:23 +0530
Dilip Kumar <[hidden email]> wrote: > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > > although this setting may not be usual. In addition, some users may set > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > > it could wait for a long time. > > > > > > At least, I think we need some descriptions on document to explain > > > pg_is_wal_replay_paused could wait while a time. > > > > Ok > > Fixed this, added some comments in .sgml as well as in function header Thank you for fixing this. Also, is it better to fix the description of pg_wal_replay_pause from "Pauses recovery." to "Request to pause recovery." in according with pg_is_wal_replay_paused? > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > > control whether this waits for recovery to get paused or not? By setting its > > > default value to true or false, users can use the old format for calling this > > > and the backward compatibility can be maintained. > > > > So basically, if the wait_recovery_pause flag is false then we will > > immediately return true if the pause is requested? I agree that it is > > good to have an API to know whether the recovery pause is requested or > > not but I am not sure is it good idea to make this API serve both the > > purpose? Anyone else have any thoughts on this? > > I think the current pg_is_wal_replay_paused() already has another purpose; this waits recovery to actually get paused. If we want to limit this API's purpose only to return the pause state, it seems better to fix this to return the actual state at the cost of lacking the backward compatibility. If we want to know whether pause is requested, we may add a new API like pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually get paused, we may add an option to pg_wal_replay_pause() for this purpose. However, this might be a bikeshedding. If anyone don't care that pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during this is blocking. Regards, Yugo Nagata -- Yugo NAGATA <[hidden email]> |
On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote:
> > On Thu, 10 Dec 2020 11:25:23 +0530 > Dilip Kumar <[hidden email]> wrote: > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > > > although this setting may not be usual. In addition, some users may set > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > > > it could wait for a long time. > > > > > > > > At least, I think we need some descriptions on document to explain > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > Ok > > > > Fixed this, added some comments in .sgml as well as in function header > > Thank you for fixing this. > > Also, is it better to fix the description of pg_wal_replay_pause from > "Pauses recovery." to "Request to pause recovery." in according with > pg_is_wal_replay_paused? Okay > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > > > control whether this waits for recovery to get paused or not? By setting its > > > > default value to true or false, users can use the old format for calling this > > > > and the backward compatibility can be maintained. > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > immediately return true if the pause is requested? I agree that it is > > > good to have an API to know whether the recovery pause is requested or > > > not but I am not sure is it good idea to make this API serve both the > > > purpose? Anyone else have any thoughts on this? > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > this waits recovery to actually get paused. If we want to limit this API's > purpose only to return the pause state, it seems better to fix this to return > the actual state at the cost of lacking the backward compatibility. If we want > to know whether pause is requested, we may add a new API like > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually > get paused, we may add an option to pg_wal_replay_pause() for this purpose. > > However, this might be a bikeshedding. If anyone don't care that > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. I don't think that it will be blocked ever, because pg_wal_replay_pause is sending the WakeupRecovery() which means the recovery process will not be stuck on waiting for the WAL. > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during > this is blocking. Yeah, we can do this. I will send the updated patch after putting some more thought into these comments. Thanks again for the feedback. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <[hidden email]> wrote:
> > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote: > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > Dilip Kumar <[hidden email]> wrote: > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > > > > although this setting may not be usual. In addition, some users may set > > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > > > > it could wait for a long time. > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > Ok > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > Thank you for fixing this. > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > "Pauses recovery." to "Request to pause recovery." in according with > > pg_is_wal_replay_paused? > > Okay > > > > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > > > > control whether this waits for recovery to get paused or not? By setting its > > > > > default value to true or false, users can use the old format for calling this > > > > > and the backward compatibility can be maintained. > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > immediately return true if the pause is requested? I agree that it is > > > > good to have an API to know whether the recovery pause is requested or > > > > not but I am not sure is it good idea to make this API serve both the > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > this waits recovery to actually get paused. If we want to limit this API's > > purpose only to return the pause state, it seems better to fix this to return > > the actual state at the cost of lacking the backward compatibility. If we want > > to know whether pause is requested, we may add a new API like > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually > > get paused, we may add an option to pg_wal_replay_pause() for this purpose. > > > > However, this might be a bikeshedding. If anyone don't care that > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. > > I don't think that it will be blocked ever, because > pg_wal_replay_pause is sending the WakeupRecovery() which means the > recovery process will not be stuck on waiting for the WAL. > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during > > this is blocking. > > Yeah, we can do this. I will send the updated patch after putting > some more thought into these comments. Thanks again for the feedback. > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com |
On Wed, 13 Jan 2021 17:49:43 +0530
Dilip Kumar <[hidden email]> wrote: > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <[hidden email]> wrote: > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote: > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > Dilip Kumar <[hidden email]> wrote: > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > > > > > although this setting may not be usual. In addition, some users may set > > > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > > > > > it could wait for a long time. > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > Ok > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > Thank you for fixing this. > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > "Pauses recovery." to "Request to pause recovery." in according with > > > pg_is_wal_replay_paused? > > > > Okay > > > > > > > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > > > > > control whether this waits for recovery to get paused or not? By setting its > > > > > > default value to true or false, users can use the old format for calling this > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > immediately return true if the pause is requested? I agree that it is > > > > > good to have an API to know whether the recovery pause is requested or > > > > > not but I am not sure is it good idea to make this API serve both the > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > > this waits recovery to actually get paused. If we want to limit this API's > > > purpose only to return the pause state, it seems better to fix this to return > > > the actual state at the cost of lacking the backward compatibility. If we want > > > to know whether pause is requested, we may add a new API like > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually > > > get paused, we may add an option to pg_wal_replay_pause() for this purpose. > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. > > > > I don't think that it will be blocked ever, because > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > recovery process will not be stuck on waiting for the WAL. Yes, there is no stuck on waiting for the WAL. However, it can be stuck during resolving a recovery conflict. The process could wait for max_standby_streaming_delay or max_standby_archive_delay at most before recovery get completely paused. Also, it could wait for recovery_min_apply_delay if it has a valid value. It is possible that a user set this parameter to a large value, so it could wait for a long time. However, this will be avoided by calling recoveryPausesHere() or CheckAndSetRecoveryPause() in recoveryApplyDelay(). > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > > > > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during > > > this is blocking. > > > > Yeah, we can do this. I will send the updated patch after putting > > some more thought into these comments. Thanks again for the feedback. > > > > Please find the updated patch. Thanks. I confirmed that I can cancel pg_is_wal_repaly_paused() during stuck. Although it is a very trivial comment, I think that the new line before HandleStartupProcInterrupts() is unnecessary. @@ -6052,12 +6062,20 @@ recoveryPausesHere(bool endOfRecovery) (errmsg("recovery has paused"), errhint("Execute pg_wal_replay_resume() to continue."))); - while (RecoveryIsPaused()) + while (RecoveryPauseRequested()) { + HandleStartupProcInterrupts(); Regards, Yugo Nagata -- Yugo NAGATA <[hidden email]> |
In reply to this post by Dilip Kumar-2
On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar <[hidden email]> wrote:
> > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar <[hidden email]> wrote: > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA <[hidden email]> wrote: > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > Dilip Kumar <[hidden email]> wrote: > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to wait. > > > > > > Especially, if max_standby_streaming_delay is -1, this will be blocked forever, > > > > > > although this setting may not be usual. In addition, some users may set > > > > > > recovery_min_apply_delay for a large. If such users call pg_is_wal_replay_paused, > > > > > > it could wait for a long time. > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > Ok > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > Thank you for fixing this. > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > "Pauses recovery." to "Request to pause recovery." in according with > > > pg_is_wal_replay_paused? > > > > Okay > > > > > > > > > > > Also, how about adding a new boolean argument to pg_is_wal_replay_paused to > > > > > > control whether this waits for recovery to get paused or not? By setting its > > > > > > default value to true or false, users can use the old format for calling this > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > immediately return true if the pause is requested? I agree that it is > > > > > good to have an API to know whether the recovery pause is requested or > > > > > not but I am not sure is it good idea to make this API serve both the > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > > this waits recovery to actually get paused. If we want to limit this API's > > > purpose only to return the pause state, it seems better to fix this to return > > > the actual state at the cost of lacking the backward compatibility. If we want > > > to know whether pause is requested, we may add a new API like > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery to actually > > > get paused, we may add an option to pg_wal_replay_pause() for this purpose. > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't care either. > > > > I don't think that it will be blocked ever, because > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I can not cancel > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the waiting loop. > > > > > > How about this fix? I think users may want to cancel pg_is_wal_replay_paused() during > > > this is blocking. > > > > Yeah, we can do this. I will send the updated patch after putting > > some more thought into these comments. Thanks again for the feedback. > > > > Please find the updated patch. I've looked at the patch. Here are review comments: + /* Recovery pause state */ + RecoveryPauseState recoveryPause; Now that the value can have tri-state, how about renaming it to recoveryPauseState? --- bool RecoveryIsPaused(void) +{ + bool recoveryPause; + + SpinLockAcquire(&XLogCtl->info_lck); + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? true : false; + SpinLockRelease(&XLogCtl->info_lck); + + return recoveryPause; +} + +bool +RecoveryPauseRequested(void) { bool recoveryPause; SpinLockAcquire(&XLogCtl->info_lck); - recoveryPause = XLogCtl->recoveryPause; + recoveryPause = (XLogCtl->recoveryPause != RECOVERY_IN_PROGRESS) ? true : false; SpinLockRelease(&XLogCtl->info_lck); return recoveryPause; } We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED); Also, since these functions do the almost same thing, I think we can have a common function to get XLogCtl->recoveryPause, say GetRecoveryPauseState() or GetRecoveryPause(), and both RecoveryIsPaused() and RecoveryPauseRequested() use the returned value. What do you think? --- +static void +CheckAndSetRecoveryPause(void) Maybe we need to declare the prototype of this function like other functions in xlog.c. --- + /* + * If recovery is not in progress anymore then report an error this + * could happen if the standby is promoted while we were waiting for + * recovery to get paused. + */ + if (!RecoveryInProgress()) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("recovery is not in progress"), + errhint("Recovery control functions can only be executed during recovery."))); I think we can improve the error message so that we can tell users the standby has been promoted during the wait. For example, errmsg("the standby was promoted during waiting for recovery to be paused"))); --- + /* test for recovery pause if user has requested the pause */ + if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) + recoveryPausesHere(false); + + now = GetCurrentTimestamp(); + Hmm, if the recovery pauses here, the wal receiver isn't launched even when wal_retrieve_retry_interval has passed, which seems not good. I think we want the recovery to be paused but want the wal receiver to continue receiving WAL. And why do we need to set 'now' here? --- /* * Wait until shared recoveryPause flag is cleared. * * endOfRecovery is true if the recovery target is reached and * the paused state starts at the end of recovery because of * recovery_target_action=pause, and false otherwise. * * XXX Could also be done with shared latch, avoiding the pg_usleep loop. * Probably not worth the trouble though. This state shouldn't be one that * anyone cares about server power consumption in. */ static void recoveryPausesHere(bool endOfRecovery) We can improve the first sentence in the above function comment to "Wait until shared recoveryPause is set to RECOVERY_IN_PROGRESS" or something. --- - PG_RETURN_BOOL(RecoveryIsPaused()); + if (!RecoveryPauseRequested()) + PG_RETURN_BOOL(false); + + /* loop until the recovery is actually paused */ + while(!RecoveryIsPaused()) + { + pg_usleep(10000L); /* wait for 10 msec */ + + /* meanwhile if recovery is resume requested then return false */ + if (!RecoveryPauseRequested()) + PG_RETURN_BOOL(false); + + CHECK_FOR_INTERRUPTS(); + + /* + * If recovery is not in progress anymore then report an error this + * could happen if the standby is promoted while we were waiting for + * recovery to get paused. + */ + if (!RecoveryInProgress()) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("recovery is not in progress"), + errhint("Recovery control functions can only be executed during recovery."))); + } + + PG_RETURN_BOOL(true); We have the same !RecoveryPauseRequested() check twice, how about the following arrangement? for (;;) { if (!RecoveryPauseRequested()) PG_RETURN_BOOL(false); if (RecoveryIsPaused()) break; pg_usleep(10000L); CHECK_FOR_INTERRUPTS(); if (!RecoveryInProgress()) ereport(...); } PG_RETURN_BOOL(true); Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/ |
Free forum by Nabble | Edit this page |