Standby accepts recovery_target_timeline setting?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Standby accepts recovery_target_timeline setting?

David Steele
While testing against PG12 I noticed the documentation states that
recovery targets are not valid when standby.signal is present.

But surely the exception is recovery_target_timeline?  My testing
confirms that this works just as in prior versions with standy_mode=on.

Documentation patch is attached.

Regards,
--
-David
[hidden email]

standby-timeline-target-v01.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Fujii Masao-2
On Thu, Sep 26, 2019 at 5:22 AM David Steele <[hidden email]> wrote:
>
> While testing against PG12 I noticed the documentation states that
> recovery targets are not valid when standby.signal is present.

Or that description in the doc is not true? Other recovery target
parameters seem to take effect even when standby.signal exists.

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

David Steele
On 9/26/19 5:55 AM, Fujii Masao wrote:
> On Thu, Sep 26, 2019 at 5:22 AM David Steele <[hidden email]> wrote:
>>
>> While testing against PG12 I noticed the documentation states that
>> recovery targets are not valid when standby.signal is present.
>
> Or that description in the doc is not true? Other recovery target
> parameters seem to take effect even when standby.signal exists.

Yes, and this is true for or any combination of recovery.signal and
standby signal as far as I can see.  We have been tracking down some
strange behaviors over the last few days as we have been adding PG12
support to pgBackRest.  Late in the day I know, but we just got the
relevant code migrated to C and we did not fancy coding it twice.

The main thing is if you set recovery_target_time in
postgresql.auto.conf then recovery will always try to hit that target
with any combination of recovery.signal and standby.signal.  But
target_action is only active when recovery.signal, standby.signal, or
both are present.

All these tests were done on 12rc1.

So given this postgresql.auto.conf:

recovery_target_time = '2019-09-26 14:39:51.280711+00'
restore_command = 'cp /home/vagrant/test/archive/%f "%p"'
recovery_target_timeline = current
recovery_target_action = promote

And these settings added to postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'test ! -f /home/vagrant/test/archive/%f && cp %p
/home/vagrant/test/archive/%f'

And this backup_label:

START WAL LOCATION: 0/2000028 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/2000060
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2019-09-26 14:39:49 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

The backup we are recovering contains a table that exists at the target
time but is dropped after that as an additional confirmation.  In all
the recovery scenarios below the table exists after recovery.

Here's what recovery looks like with recovery.signal:

2019-09-26 14:49:52.758 UTC [25353] LOG:  database system was
interrupted; last known up at 2019-09-26 14:39:49 UTC
2019-09-26 14:49:52.824 UTC [25353] LOG:  starting point-in-time
recovery to 2019-09-26 14:39:51.280711+00
2019-09-26 14:49:52.836 UTC [25353] LOG:  restored log file
"000000010000000000000002" from archive
2019-09-26 14:49:52.885 UTC [25353] LOG:  redo starts at 0/2000028
2019-09-26 14:49:52.894 UTC [25353] LOG:  consistent recovery state
reached at 0/2000100
2019-09-26 14:49:52.894 UTC [25352] LOG:  database system is ready to
accept read only connections
2019-09-26 14:49:52.905 UTC [25353] LOG:  restored log file
"000000010000000000000003" from archive
2019-09-26 14:49:52.940 UTC [25353] LOG:  recovery stopping before
commit of transaction 487, time 2019-09-26 14:39:54.981557+00
2019-09-26 14:49:52.940 UTC [25353] LOG:  redo done at 0/30096A0
cp: cannot stat '/home/vagrant/test/archive/00000002.history': No such
file or directory
2019-09-26 14:49:52.943 UTC [25353] LOG:  selected new timeline ID: 2
2019-09-26 14:49:52.998 UTC [25353] LOG:  archive recovery complete
2019-09-26 14:49:52.998 UTC [25353] LOG:  database system is ready to
accept connections

This is completely normal and what you would expect.

Now without recovery.signal from a fresh restore:

2019-09-26 14:52:29.491 UTC [25409] LOG:  database system was
interrupted; last known up at 2019-09-26 14:39:49 UTC
2019-09-26 14:52:29.574 UTC [25409] LOG:  restored log file
"000000010000000000000002" from archive
2019-09-26 14:52:29.622 UTC [25409] LOG:  redo starts at 0/2000028
2019-09-26 14:52:29.631 UTC [25409] LOG:  consistent recovery state
reached at 0/2000100
2019-09-26 14:52:29.642 UTC [25409] LOG:  restored log file
"000000010000000000000003" from archive
2019-09-26 14:52:29.666 UTC [25409] LOG:  recovery stopping before
commit of transaction 487, time 2019-09-26 14:39:54.981557+00
2019-09-26 14:52:29.666 UTC [25409] LOG:  redo done at 0/30096A0
2019-09-26 14:52:29.716 UTC [25408] LOG:  database system is ready to
accept connections

Now there is no "starting point-in-time recovery" message but we are
still stopping in the same place, "recovery stopping before commit of
transaction 487".  There is no promotion so now we are now logging on
timeline 1 (so there are duplicate WAL errors as soon as archive_command
runs).  In PG < 12 you could do this by shutting down, removing
recovery.conf and restarting, but it is now much easier to end up on the
same timeline.

Now with with standby.signal only from a fresh restore:

2019-09-26 14:59:36.889 UTC [25522] LOG:  database system was
interrupted; last known up at 2019-09-26 14:39:49 UTC
2019-09-26 14:59:36.983 UTC [25522] LOG:  entering standby mode
2019-09-26 14:59:36.994 UTC [25522] LOG:  restored log file
"000000010000000000000002" from archive
2019-09-26 14:59:37.038 UTC [25522] LOG:  redo starts at 0/2000028
2019-09-26 14:59:37.047 UTC [25522] LOG:  consistent recovery state
reached at 0/2000100
2019-09-26 14:59:37.047 UTC [25521] LOG:  database system is ready to
accept read only connections
2019-09-26 14:59:37.061 UTC [25522] LOG:  restored log file
"000000010000000000000003" from archive
2019-09-26 14:59:37.093 UTC [25522] LOG:  recovery stopping before
commit of transaction 487, time 2019-09-26 14:39:54.981557+00
2019-09-26 14:59:37.093 UTC [25522] LOG:  redo done at 0/30096A0
cp: cannot stat '/home/vagrant/test/archive/00000002.history': No such
file or directory
2019-09-26 14:59:37.097 UTC [25522] LOG:  selected new timeline ID: 2
2019-09-26 14:59:37.270 UTC [25522] LOG:  archive recovery complete
cp: cannot stat '/home/vagrant/test/archive/00000001.history': No such
file or directory
2019-09-26 14:59:37.338 UTC [25521] LOG:  database system is ready to
accept connections

The cluster starts in standby mode, hits the recovery target,
then promotes even though no recovery.signal is present.

And finally with both recovery.signal and standby.signal you get the
same as with standby.signal only.

I was able to get the same results using an xid target:

recovery_target_xid = 487
recovery_target_inclusive = false

All of this is roughly analogous to use cases that were possible
before, but there were fewer permutations then.  You had no standby and
no recovery target without recovery.conf so "recovery.signal" was always
there, more or less.

At the very least, according to the docs, none of the target options are
supposed to be active unless recovery.signal is in place.  Since
outdated entries in postgresql.auto.conf can have effect even in
the absence of recovery.signal, it seems pretty important to make sure
that mechanism is working correctly - or that the caveat is clearly
documented.

I do think this issue needs to be addressed before GA.

Fujii -- I just became aware of your email at [1] so I'll respond to
that as well.

--
-David
[hidden email]

[1]
https://www.postgresql.org/message-id/CAHGQGwEYYg_Ng%2B03FtZczacCpYgJ2Pn%3DB_wPtWF%2BFFLYDgpa1g%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Peter Eisentraut-6
In reply to this post by David Steele
On 2019-09-25 22:21, David Steele wrote:
> While testing against PG12 I noticed the documentation states that
> recovery targets are not valid when standby.signal is present.
>
> But surely the exception is recovery_target_timeline?  My testing
> confirms that this works just as in prior versions with standy_mode=on.

Or maybe we should move recovery_target_timeline to a different section?
 But which one?

I don't know if recovery_target_timeline is actually useful to change in
standby mode.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

David Steele
On 9/26/19 4:48 PM, Peter Eisentraut wrote:
> On 2019-09-25 22:21, David Steele wrote:
>> While testing against PG12 I noticed the documentation states that
>> recovery targets are not valid when standby.signal is present.
>>
>> But surely the exception is recovery_target_timeline?  My testing
>> confirms that this works just as in prior versions with standy_mode=on.
>
> Or maybe we should move recovery_target_timeline to a different section?
>   But which one?

Not sure.  I think just noting it as an exception is OK, if it is the
only exception.  But currently that does not seem to be the case.

> I don't know if recovery_target_timeline is actually useful to change in
> standby mode.

It is.  I just dealt with a split-brain case that required the standbys
to be rebuilt on a specific timeline (not latest).

Of course, you could do recovery on that timeline, shutdown, and then
bring the cluster back up as a standby, but that seems like a lot of
extra work.

But as Fujii noted and I've demonstrated in the follow-up pretty much
all target options are allowed for standby recovery.  I don't think that
makes sense, personally, but apparently it was allowed in prior versions
so we'll need to think carefully before disallowing it.

--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Peter Eisentraut-6
On 2019-09-26 23:02, David Steele wrote:

> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>> On 2019-09-25 22:21, David Steele wrote:
>>> While testing against PG12 I noticed the documentation states that
>>> recovery targets are not valid when standby.signal is present.
>>>
>>> But surely the exception is recovery_target_timeline?  My testing
>>> confirms that this works just as in prior versions with standy_mode=on.
>>
>> Or maybe we should move recovery_target_timeline to a different section?
>>   But which one?
>
> Not sure.  I think just noting it as an exception is OK, if it is the
> only exception.  But currently that does not seem to be the case.
>
>> I don't know if recovery_target_timeline is actually useful to change in
>> standby mode.

OK, I have committed your original documentation patch.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

David Steele
Hi Peter,

On 9/27/19 10:36 AM, Peter Eisentraut wrote:
> On 2019-09-26 23:02, David Steele wrote:
>> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>>
>>> I don't know if recovery_target_timeline is actually useful to change in
>>> standby mode.
>
> OK, I have committed your original documentation patch.

Thanks, that's a good start.

As Fujii noticed and I have demonstrated upthread, just about any target
setting can be used in a standby restore.  This matches the behavior of
prior versions so it's not exactly a regression, but the old docs made
no claim that standby_mode disabled targeted restore.

If fact, for both PG12 and before, setting a recovery target in standby
mode causes the cluster to drop out of standby mode.

Also, the presence or absence of recovery.signal does not seem to have
any effect on how targeted recovery proceeds, except as Fujii has
demonstrated in [1].

I'm not sure what the best thing to do is.  The docs are certainly
incorrect, but fixing them would be weird.  What do we say, setting
targets will exit standby mode?  That certainly what happens, though.

Also, the fact that target settings are being used when recovery.signal
is missing is contrary to the docs, and this is a new behavior in PG12.
 Prior to 12 you could not have target settings without recovery.conf
being present by definition.

I think, at the very least, the fact that targeted recovery proceeds in
the absence of recovery.signal represents a bug.

--
-David
[hidden email]

[1]
https://www.postgresql.org/message-id/CAHGQGwEYYg_Ng%2B03FtZczacCpYgJ2Pn%3DB_wPtWF%2BFFLYDgpa1g%40mail.gmail.com


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Fujii Masao-2
On Sat, Sep 28, 2019 at 12:14 AM David Steele <[hidden email]> wrote:

>
> Hi Peter,
>
> On 9/27/19 10:36 AM, Peter Eisentraut wrote:
> > On 2019-09-26 23:02, David Steele wrote:
> >> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
> >>
> >>> I don't know if recovery_target_timeline is actually useful to change in
> >>> standby mode.
> >
> > OK, I have committed your original documentation patch.
>
> Thanks, that's a good start.
>
> As Fujii noticed and I have demonstrated upthread, just about any target
> setting can be used in a standby restore.  This matches the behavior of
> prior versions so it's not exactly a regression, but the old docs made
> no claim that standby_mode disabled targeted restore.
>
> If fact, for both PG12 and before, setting a recovery target in standby
> mode causes the cluster to drop out of standby mode.
>
> Also, the presence or absence of recovery.signal does not seem to have
> any effect on how targeted recovery proceeds, except as Fujii has
> demonstrated in [1].
>
> I'm not sure what the best thing to do is.  The docs are certainly
> incorrect, but fixing them would be weird.  What do we say, setting
> targets will exit standby mode?  That certainly what happens, though.
>
> Also, the fact that target settings are being used when recovery.signal
> is missing is contrary to the docs, and this is a new behavior in PG12.
>  Prior to 12 you could not have target settings without recovery.conf
> being present by definition.
>
> I think, at the very least, the fact that targeted recovery proceeds in
> the absence of recovery.signal represents a bug.

Yes, recovery target settings are used even when neither backup_label
nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
completely different behavior from prior versions.

IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
So at least for v12, we basically should change the recovery logic so that
it behaves in the same way as prior versions. That is,

- Stop the recovery with an error if any recovery target is set in
   crash recovery
- Use recovery target settings if set even when standby mode
- Do not enter an archive recovery mode if recovery.signal is missing

Thought?

If we want new behavior in recovery, we can change the logic for v13.

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

David Steele
On 9/27/19 11:58 AM, Fujii Masao wrote:
> On Sat, Sep 28, 2019 at 12:14 AM David Steele <[hidden email]> wrote:
>>
>> I think, at the very least, the fact that targeted recovery proceeds in
>> the absence of recovery.signal represents a bug.
>
> Yes, recovery target settings are used even when neither backup_label
> nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
> completely different behavior from prior versions.

I'm not able to reproduce this.  I only see recovery settings being used
if backup_label, recovery.signal, or standby.signal is present.

Do you have an example?

> IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
> So at least for v12, we basically should change the recovery logic so that
> it behaves in the same way as prior versions. That is,
>
> - Stop the recovery with an error if any recovery target is set in
>    crash recovery

This seems reasonable.  I tried adding a recovery.signal and
restore_command for crash recovery and I just got an error that it
couldn't find 00000002.history in the archive.

> - Use recovery target settings if set even when standby mode

Yes, this is weird, but it is present in current versions.

> - Do not enter an archive recovery mode if recovery.signal is missing

Agreed.  Perhaps it's OK to use restore_command if a backup_label is
present, but we certainly should not be doing targeted recovery.

> If we want new behavior in recovery, we can change the logic for v13.

Agreed, but it's not at all clear to me how invasive these changes would be.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Fujii Masao-2
On Sat, Sep 28, 2019 at 2:01 AM David Steele <[hidden email]> wrote:

>
> On 9/27/19 11:58 AM, Fujii Masao wrote:
> > On Sat, Sep 28, 2019 at 12:14 AM David Steele <[hidden email]> wrote:
> >>
> >> I think, at the very least, the fact that targeted recovery proceeds in
> >> the absence of recovery.signal represents a bug.
> >
> > Yes, recovery target settings are used even when neither backup_label
> > nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
> > completely different behavior from prior versions.
>
> I'm not able to reproduce this.  I only see recovery settings being used
> if backup_label, recovery.signal, or standby.signal is present.
>
> Do you have an example?
Yes, here is the example:

initdb -D data
pg_ctl -D data start
psql -c "select pg_create_restore_point('hoge')"
psql -c "alter system set recovery_target_name to 'hoge'"
psql -c "create table test as select num from generate_series(1, 100) num"
pg_ctl -D data -m i stop
pg_ctl -D data start

After restarting the server at the above final step, you will see
the following log messages indicating that recovery stops at
recovery_target_name.

2019-09-28 22:42:04.849 JST [16944] LOG:  recovery stopping at restore
point "hoge", time 2019-09-28 22:42:03.86558+09
2019-09-28 22:42:04.849 JST [16944] FATAL:  requested recovery stop
point is before consistent recovery point

> > IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
> > So at least for v12, we basically should change the recovery logic so that
> > it behaves in the same way as prior versions. That is,
> >
> > - Stop the recovery with an error if any recovery target is set in
> >    crash recovery
>
> This seems reasonable.  I tried adding a recovery.signal and
> restore_command for crash recovery and I just got an error that it
> couldn't find 00000002.history in the archive.
You added recovery.signal, so it means that you started an archive recovery,
not crash recovery. Right?

Anyway I'm thinking to apply something like attached patch, to emit an error
if recovery target is set in crash recovery.

> > - Use recovery target settings if set even when standby mode
>
> Yes, this is weird, but it is present in current versions.

Yes, and some users might be using this current behavior.
If we keep this behavior as it is in v12, the documentation
needs to be corrected.

> > - Do not enter an archive recovery mode if recovery.signal is missing
>
> Agreed.  Perhaps it's OK to use restore_command if a backup_label is
> present

Yeah, it's maybe OK, but differenet behavior from current version.
So, at least for v12, I'm inclined to prevent crash recovery with backup_label
from using restore_command, i.e., only WAL files in pg_wal will be replayed
in this case.

Regards,

--
Fujii Masao

error-if-recovery-taget-set-in-crash-recovery.patch (1018 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

David Steele
On 9/28/19 10:54 AM, Fujii Masao wrote:

> On Sat, Sep 28, 2019 at 2:01 AM David Steele <[hidden email]> wrote:
>> On 9/27/19 11:58 AM, Fujii Masao wrote:
>>>
>>> Yes, recovery target settings are used even when neither backup_label
>>> nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
>>> completely different behavior from prior versions.
>>
>> I'm not able to reproduce this.  I only see recovery settings being used
>> if backup_label, recovery.signal, or standby.signal is present.
>>
>> Do you have an example?
>
> Yes, here is the example:
>
> initdb -D data
> pg_ctl -D data start
> psql -c "select pg_create_restore_point('hoge')"
> psql -c "alter system set recovery_target_name to 'hoge'"
> psql -c "create table test as select num from generate_series(1, 100) num"
> pg_ctl -D data -m i stop
> pg_ctl -D data start
>
> After restarting the server at the above final step, you will see
> the following log messages indicating that recovery stops at
> recovery_target_name.
>
> 2019-09-28 22:42:04.849 JST [16944] LOG:  recovery stopping at restore
> point "hoge", time 2019-09-28 22:42:03.86558+09
> 2019-09-28 22:42:04.849 JST [16944] FATAL:  requested recovery stop
> point is before consistent recovery point

That's definitely not good behavior.

>>> IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
>>> So at least for v12, we basically should change the recovery logic so that
>>> it behaves in the same way as prior versions. That is,
>>>
>>> - Stop the recovery with an error if any recovery target is set in
>>>    crash recovery
>>
>> This seems reasonable.  I tried adding a recovery.signal and
>> restore_command for crash recovery and I just got an error that it
>> couldn't find 00000002.history in the archive.
>
> You added recovery.signal, so it means that you started an archive recovery,
> not crash recovery. Right?

Correct.

> Anyway I'm thinking to apply something like attached patch, to emit an error
> if recovery target is set in crash recovery.

The patch looks reasonable.

>>> - Do not enter an archive recovery mode if recovery.signal is missing
>>
>> Agreed.  Perhaps it's OK to use restore_command if a backup_label is
>> present
>
> Yeah, it's maybe OK, but differenet behavior from current version.
> So, at least for v12, I'm inclined to prevent crash recovery with backup_label
> from using restore_command, i.e., only WAL files in pg_wal will be replayed
> in this case.

Agreed.  Seems like that could be added to the patch above easily
enough.  More checks would be needed to prevent the behaviors I've been
seeing in the other thread, but it should be possible to more or less
mimic the old behavior with sufficient checks.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Fujii Masao-2
On Sun, Sep 29, 2019 at 12:51 AM David Steele <[hidden email]> wrote:

>
> On 9/28/19 10:54 AM, Fujii Masao wrote:
> > On Sat, Sep 28, 2019 at 2:01 AM David Steele <[hidden email]> wrote:
> >> On 9/27/19 11:58 AM, Fujii Masao wrote:
> >>>
> >>> Yes, recovery target settings are used even when neither backup_label
> >>> nor recovery.signal exist, i.e., just a crash recovery, in v12. This is
> >>> completely different behavior from prior versions.
> >>
> >> I'm not able to reproduce this.  I only see recovery settings being used
> >> if backup_label, recovery.signal, or standby.signal is present.
> >>
> >> Do you have an example?
> >
> > Yes, here is the example:
> >
> > initdb -D data
> > pg_ctl -D data start
> > psql -c "select pg_create_restore_point('hoge')"
> > psql -c "alter system set recovery_target_name to 'hoge'"
> > psql -c "create table test as select num from generate_series(1, 100) num"
> > pg_ctl -D data -m i stop
> > pg_ctl -D data start
> >
> > After restarting the server at the above final step, you will see
> > the following log messages indicating that recovery stops at
> > recovery_target_name.
> >
> > 2019-09-28 22:42:04.849 JST [16944] LOG:  recovery stopping at restore
> > point "hoge", time 2019-09-28 22:42:03.86558+09
> > 2019-09-28 22:42:04.849 JST [16944] FATAL:  requested recovery stop
> > point is before consistent recovery point
>
> That's definitely not good behavior.
>
> >>> IMO, since v12 is RC1 now, it's not good idea to change the logic to new.
> >>> So at least for v12, we basically should change the recovery logic so that
> >>> it behaves in the same way as prior versions. That is,
> >>>
> >>> - Stop the recovery with an error if any recovery target is set in
> >>>    crash recovery
> >>
> >> This seems reasonable.  I tried adding a recovery.signal and
> >> restore_command for crash recovery and I just got an error that it
> >> couldn't find 00000002.history in the archive.
> >
> > You added recovery.signal, so it means that you started an archive recovery,
> > not crash recovery. Right?
>
> Correct.
>
> > Anyway I'm thinking to apply something like attached patch, to emit an error
> > if recovery target is set in crash recovery.
>
> The patch looks reasonable.
>
> >>> - Do not enter an archive recovery mode if recovery.signal is missing
> >>
> >> Agreed.  Perhaps it's OK to use restore_command if a backup_label is
> >> present
> >
> > Yeah, it's maybe OK, but differenet behavior from current version.
> > So, at least for v12, I'm inclined to prevent crash recovery with backup_label
> > from using restore_command, i.e., only WAL files in pg_wal will be replayed
> > in this case.
>
> Agreed.  Seems like that could be added to the patch above easily
> enough.  More checks would be needed to prevent the behaviors I've been
> seeing in the other thread, but it should be possible to more or less
> mimic the old behavior with sufficient checks.

Yeah, more checks would be necessary. IMO easy fix is to forbid not only
recovery target parameters but also any recovery parameters (specified
in recovery.conf in previous versions) in crash recovery.

In v11 or before, any parameters in recovery.conf cannot take effect in
crash recovery because crash recovery always starts without recovery.conf.
But in v12, those parameters are specified in postgresql.conf,
so they may take effect even in crash recovery (i.e., when both
recovery.signal and standby.signal are missing). This would be the root
cause of the problems that we are discussing, I think.

There might be some recovery parameters that we can safely use
even in crash recovery, e.g., maybe recovery_end_command
(now, you can see that recovery_end_command is executed in
crash recovery in v12). But at this stage of v12, it's worth thinking to
just cause crash recovery to exit with an error when any recovery
parameter is set. Thought?

Or if that change is overkill, alternatively we can make crash recovery
"ignore" any recovery parameters, e.g., by forcibly disabling
the parameters.

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Tom Lane-2
Fujii Masao <[hidden email]> writes:
>> Agreed.  Seems like that could be added to the patch above easily
>> enough.  More checks would be needed to prevent the behaviors I've been
>> seeing in the other thread, but it should be possible to more or less
>> mimic the old behavior with sufficient checks.

> Yeah, more checks would be necessary. IMO easy fix is to forbid not only
> recovery target parameters but also any recovery parameters (specified
> in recovery.conf in previous versions) in crash recovery.

> In v11 or before, any parameters in recovery.conf cannot take effect in
> crash recovery because crash recovery always starts without recovery.conf.
> But in v12, those parameters are specified in postgresql.conf,
> so they may take effect even in crash recovery (i.e., when both
> recovery.signal and standby.signal are missing). This would be the root
> cause of the problems that we are discussing, I think.

So ... what I'm wondering about here is what happens during *actual* crash
recovery, eg a postmaster-driven restart of the startup process after
a backend crash in hot standby.  The direction you guys are going in
seems likely to cause the startup process to refuse to function until
those parameters are removed from postgresql.conf, which seems quite
user-unfriendly.

Maybe I'm misunderstanding, but I think that rather than adding error
checks that were not there before, the right path to fixing this is
to cause these settings to be ignored if we're doing crash recovery.
Not make the user take them out (and possibly later put them back).

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

David Steele
In reply to this post by Fujii Masao-2
On 9/28/19 1:26 PM, Fujii Masao wrote:

> On Sun, Sep 29, 2019 at 12:51 AM David Steele <[hidden email]> wrote:
>
> Yeah, more checks would be necessary. IMO easy fix is to forbid not only
> recovery target parameters but also any recovery parameters (specified
> in recovery.conf in previous versions) in crash recovery.
>
> In v11 or before, any parameters in recovery.conf cannot take effect in
> crash recovery because crash recovery always starts without recovery.conf.
> But in v12, those parameters are specified in postgresql.conf,
> so they may take effect even in crash recovery (i.e., when both
> recovery.signal and standby.signal are missing). This would be the root
> cause of the problems that we are discussing, I think.
>
> There might be some recovery parameters that we can safely use
> even in crash recovery, e.g., maybe recovery_end_command
> (now, you can see that recovery_end_command is executed in
> crash recovery in v12). But at this stage of v12, it's worth thinking to
> just cause crash recovery to exit with an error when any recovery
> parameter is set. Thought?

I dislike the idea of crash recovery throwing fatal errors because there
are recovery settings in postgresql.auto.conf.  Since there is no
defined mechanism for cleaning out old recovery settings we have to
assume that they will persist (and accumulate) more or less forever.

> Or if that change is overkill, alternatively we can make crash recovery
> "ignore" any recovery parameters, e.g., by forcibly disabling
> the parameters.

I'd rather load recovery settings *only* if recovery.signal or
standby.signal is present and do this only after crash recovery is
complete, i.e. in the absence of backup_label.

I think blindly loading recovery settings then trying to ignore them
later is pretty much why we are having these issues in the first place.
 I'd rather not extend that pattern if possible.

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Peter Eisentraut-6
In reply to this post by Tom Lane-2
On 2019-09-28 19:45, Tom Lane wrote:
> Maybe I'm misunderstanding, but I think that rather than adding error
> checks that were not there before, the right path to fixing this is
> to cause these settings to be ignored if we're doing crash recovery.

That makes sense to me.  Something like this (untested)?

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 0daab3ff4b..25cae57131 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5618,6 +5618,13 @@ recoveryStopsBefore(XLogReaderState *record)
  TimestampTz recordXtime = 0;
  TransactionId recordXid;

+ /*
+ * Ignore recovery target settings when not in archive recovery (meaning
+ * we are in crash recovery).
+ */
+ if (!InArchiveRecovery)
+ return false;
+
  /* Check if we should stop as soon as reaching consistency */
  if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE && reachedConsistency)
  {
@@ -5759,6 +5766,13 @@ recoveryStopsAfter(XLogReaderState *record)
  uint8 rmid;
  TimestampTz recordXtime;

+ /*
+ * Ignore recovery target settings when not in archive recovery (meaning
+ * we are in crash recovery).
+ */
+ if (!InArchiveRecovery)
+ return false;
+
  info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
  rmid = XLogRecGetRmid(record);



--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Fujii Masao-2
On Sun, Sep 29, 2019 at 6:08 AM Peter Eisentraut
<[hidden email]> wrote:
>
> On 2019-09-28 19:45, Tom Lane wrote:
> > Maybe I'm misunderstanding, but I think that rather than adding error
> > checks that were not there before, the right path to fixing this is
> > to cause these settings to be ignored if we're doing crash recovery.
>
> That makes sense to me.

+1

> Something like this (untested)?

Yes, but ArchiveRecoveryRequested should be checked instead of
InArchiveRecovery, I think. Otherwise recovery targets would take effect
when recovery.signal is missing but backup_label exists. In this case,
InArchiveRecovery is set to true though ArchiveRecoveryRequested is
false because recovery.signal is missing.

With the attached patch, I checked that the steps that I described
upthread didn't reproduce the issue.

Regards,

--
Fujii Masao

ignore-recovery-targets-in-crash-recovery.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Peter Eisentraut-6
In reply to this post by David Steele
On 2019-09-27 17:14, David Steele wrote:

> On 9/27/19 10:36 AM, Peter Eisentraut wrote:
>> On 2019-09-26 23:02, David Steele wrote:
>>> On 9/26/19 4:48 PM, Peter Eisentraut wrote:
>>>
>>>> I don't know if recovery_target_timeline is actually useful to change in
>>>> standby mode.
>> OK, I have committed your original documentation patch.
> Thanks, that's a good start.
>
> As Fujii noticed and I have demonstrated upthread, just about any target
> setting can be used in a standby restore.  This matches the behavior of
> prior versions so it's not exactly a regression, but the old docs made
> no claim that standby_mode disabled targeted restore.

I have further fixed the documentation.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Peter Eisentraut-6
In reply to this post by Fujii Masao-2
On 2019-09-29 18:36, Fujii Masao wrote:
> Yes, but ArchiveRecoveryRequested should be checked instead of
> InArchiveRecovery, I think. Otherwise recovery targets would take effect
> when recovery.signal is missing but backup_label exists. In this case,
> InArchiveRecovery is set to true though ArchiveRecoveryRequested is
> false because recovery.signal is missing.
>
> With the attached patch, I checked that the steps that I described
> upthread didn't reproduce the issue.

Your patch looks correct to me.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Fujii Masao-2
On Mon, Sep 30, 2019 at 6:59 AM Peter Eisentraut
<[hidden email]> wrote:

>
> On 2019-09-29 18:36, Fujii Masao wrote:
> > Yes, but ArchiveRecoveryRequested should be checked instead of
> > InArchiveRecovery, I think. Otherwise recovery targets would take effect
> > when recovery.signal is missing but backup_label exists. In this case,
> > InArchiveRecovery is set to true though ArchiveRecoveryRequested is
> > false because recovery.signal is missing.
> >
> > With the attached patch, I checked that the steps that I described
> > upthread didn't reproduce the issue.
>
> Your patch looks correct to me.
Thanks! So I committed the patch.

Also we need to do the same thing for other recovery options like
restore_command. Attached is the patch which makes crash recovery
ignore restore_command and recovery_end_command.

Regards,

--
Fujii Masao

ignore-restore-command-in-crash-recovery.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Standby accepts recovery_target_timeline setting?

Stephen Frost
In reply to this post by David Steele
Greetings,

* David Steele ([hidden email]) wrote:

> On 9/28/19 1:26 PM, Fujii Masao wrote:
> > On Sun, Sep 29, 2019 at 12:51 AM David Steele <[hidden email]> wrote:
> >
> > Yeah, more checks would be necessary. IMO easy fix is to forbid not only
> > recovery target parameters but also any recovery parameters (specified
> > in recovery.conf in previous versions) in crash recovery.
> >
> > In v11 or before, any parameters in recovery.conf cannot take effect in
> > crash recovery because crash recovery always starts without recovery.conf.
> > But in v12, those parameters are specified in postgresql.conf,
> > so they may take effect even in crash recovery (i.e., when both
> > recovery.signal and standby.signal are missing). This would be the root
> > cause of the problems that we are discussing, I think.
> >
> > There might be some recovery parameters that we can safely use
> > even in crash recovery, e.g., maybe recovery_end_command
> > (now, you can see that recovery_end_command is executed in
> > crash recovery in v12). But at this stage of v12, it's worth thinking to
> > just cause crash recovery to exit with an error when any recovery
> > parameter is set. Thought?
>
> I dislike the idea of crash recovery throwing fatal errors because there
> are recovery settings in postgresql.auto.conf.  Since there is no
> defined mechanism for cleaning out old recovery settings we have to
> assume that they will persist (and accumulate) more or less forever.
>
> > Or if that change is overkill, alternatively we can make crash recovery
> > "ignore" any recovery parameters, e.g., by forcibly disabling
> > the parameters.
>
> I'd rather load recovery settings *only* if recovery.signal or
> standby.signal is present and do this only after crash recovery is
> complete, i.e. in the absence of backup_label.
>
> I think blindly loading recovery settings then trying to ignore them
> later is pretty much why we are having these issues in the first place.
>  I'd rather not extend that pattern if possible.
Agreed.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
12