pause recovery if pitr target not reached

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

pause recovery if pitr target not reached

Leif Gunnar Erlandsen
This patch allows PostgreSQL to pause recovery before PITR target is reached
if recovery_target_time is specified.

Missing WAL's could then be restored from backup and applied on next restart.

Today PostgreSQL opens the database in read/write on a new timeline even when
PITR tareg is not reached.

make check is run with this patch with result "All 192 tests passed."
Source used is from version 12b4.

For both examples below "recovery_target_time = '2019-09-17 09:24:00'"

_________________________
Log from todays behavior:

[20875] LOG:  starting point-in-time recovery to 2019-09-17 09:24:00+02
[20875] LOG:  restored log file "000000010000000000000002" from archive
[20875] LOG:  redo starts at 0/2000028
[20875] LOG:  consistent recovery state reached at 0/2000100
[20870] LOG:  database system is ready to accept read only connections
[20875] LOG:  restored log file "000000010000000000000003" from archive
[20875] LOG:  restored log file "000000010000000000000004" from archive
cp: cannot stat '/var/lib/pgsql/12/archivedwal/000000010000000000000005': No such file or directory
[20875] LOG:  redo done at 0/40080C8
[20875] LOG:  last completed transaction was at log time 2019-09-17 09:13:10.524645+02
[20875] LOG:  restored log file "000000010000000000000004" from archive
cp: cannot stat '/var/lib/pgsql/12/archivedwal/00000002.history': No such file or directory
[20875] LOG:  selected new timeline ID: 2
[20875] LOG:  archive recovery complete
cp: cannot stat '/var/lib/pgsql/12/archivedwal/00000001.history': No such file or directory
[20870] LOG:  database system is ready to accept connections

________________________
And with patched source:

[20899] LOG:  starting point-in-time recovery to 2019-09-17 09:24:00+02
[20899] LOG:  restored log file "000000010000000000000002" from archive
[20899] LOG:  redo starts at 0/2000028
[20899] LOG:  consistent recovery state reached at 0/20002B0
[20895] LOG:  database system is ready to accept read only connections
[20899] LOG:  restored log file "000000010000000000000003" from archive
[20899] LOG:  restored log file "000000010000000000000004" from archive
cp: cannot stat '/var/lib/pgsql/12m/archivedwal/000000010000000000000005': No such file or directory
[20899] LOG:  Recovery target not reached but next WAL record culd not be read
[20899] LOG:  redo done at 0/4007D40
[20899] LOG:  last completed transaction was at log time 2019-09-17 09:13:10.539546+02
[20899] LOG:  recovery has paused
[20899] HINT:  Execute pg_wal_replay_resume() to continue.


You could restore WAL in several steps and when target is reached you get this log

[21943] LOG:  starting point-in-time recovery to 2019-09-17 09:24:00+02
[21943] LOG:  restored log file "000000010000000000000005" from archive
[21943] LOG:  redo starts at 0/5003C38
[21943] LOG:  consistent recovery state reached at 0/6000000
[21941] LOG:  database system is ready to accept read only connections
[21943] LOG:  restored log file "000000010000000000000006" from archive
[21943] LOG:  recovery stopping before commit of transaction 859, time 2019-09-17 09:24:02.58576+02
[21943] LOG:  recovery has paused
[21943] HINT:  Execute pg_wal_replay_resume() to continue.

Execute pg_wal_replay_resume() as hinted.

[21943] LOG:  redo done at 0/6001830
[21943] LOG:  last completed transaction was at log time 2019-09-17 09:23:57.496945+02
cp: cannot stat '/var/lib/pgsql/12m/archivedwal/00000002.history': No such file or directory
[21943] LOG:  selected new timeline ID: 2
[21943] LOG:  archive recovery complete
cp: cannot stat '/var/lib/pgsql/12m/archivedwal/00000001.history': No such file or directory
[21941] LOG:  database system is ready to accept connections



----------------

Leif Gunnar Erlandsen

0001-pause-recovery-if-pitr-target-not-reached.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Peter Eisentraut-6
On 2019-09-17 13:23, Leif Gunnar Erlandsen wrote:
> This patch allows PostgreSQL to pause recovery before PITR target is reached
> if recovery_target_time is specified.
>
> Missing WAL's could then be restored from backup and applied on next restart.
>
> Today PostgreSQL opens the database in read/write on a new timeline even when
> PITR tareg is not reached.

I think this idea is worth thinking about.  I don't think this should be
specific to a time-based recovery target.  This could apply for example
to a target xid as well.  Also, there should be a way to get the old
behavior.  Perhaps this whole thing should be a new
recovery_target_action, say, 'pause_unless_reached'.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Laurenz Albe
On Sat, 2019-10-19 at 21:45 +0200, Peter Eisentraut wrote:

> On 2019-09-17 13:23, Leif Gunnar Erlandsen wrote:
> > This patch allows PostgreSQL to pause recovery before PITR target is reached
> > if recovery_target_time is specified.
> >
> > Missing WAL's could then be restored from backup and applied on next restart.
> >
> > Today PostgreSQL opens the database in read/write on a new timeline even when
> > PITR tareg is not reached.
>
> I think this idea is worth thinking about.  I don't think this should be
> specific to a time-based recovery target.  This could apply for example
> to a target xid as well.  Also, there should be a way to get the old
> behavior.  Perhaps this whole thing should be a new
> recovery_target_action, say, 'pause_unless_reached'.

+1 for pausing if end-of-logs is reached before the recovery target.

I don't think that we need to add a new "recovery_target_action" to
retain the old behavior, because I think that nobody ever wants that.
I'd say that this typically happens in two cases:

1. Someone forgot to archive the WAL segment that contains the target.
   In this case the proposed change will solve the problem.

2. Someone specified the recovery target wrong, e.g. used CET rather
   than CEST in the recovery target time, so that the recovery target
   was later than intended.
   In that case the only solution is to start recovery from scratch.

But perhaps there are use cases I didn't think of.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Fujii Masao-2
In reply to this post by Peter Eisentraut-6
On Sun, Oct 20, 2019 at 4:46 AM Peter Eisentraut
<[hidden email]> wrote:

>
> On 2019-09-17 13:23, Leif Gunnar Erlandsen wrote:
> > This patch allows PostgreSQL to pause recovery before PITR target is reached
> > if recovery_target_time is specified.
> >
> > Missing WAL's could then be restored from backup and applied on next restart.
> >
> > Today PostgreSQL opens the database in read/write on a new timeline even when
> > PITR tareg is not reached.
>
> I think this idea is worth thinking about.  I don't think this should be
> specific to a time-based recovery target.  This could apply for example
> to a target xid as well.  Also, there should be a way to get the old
> behavior.  Perhaps this whole thing should be a new
> recovery_target_action, say, 'pause_unless_reached'.

Probably we can use standby mode + recovery target setting for
the almost same purpose. In this configuration, if end-of-WAL is reached
before recovery target, the startup process keeps waiting for new WAL to
be available. Then, if recovery target is reached, the startup process works
as recovery_target_action indicates.

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Peter Eisentraut-6
On 2019-10-21 08:44, Fujii Masao wrote:
> Probably we can use standby mode + recovery target setting for
> the almost same purpose. In this configuration, if end-of-WAL is reached
> before recovery target, the startup process keeps waiting for new WAL to
> be available. Then, if recovery target is reached, the startup process works
> as recovery_target_action indicates.

So basically get rid of recovery.signal mode and honor recovery target
parameters in standby mode?  That has some appeal because it simplify
this whole space significantly, but perhaps it would be too confusing
for end users?

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Fujii Masao-2
On Fri, Nov 1, 2019 at 9:41 PM Peter Eisentraut
<[hidden email]> wrote:

>
> On 2019-10-21 08:44, Fujii Masao wrote:
> > Probably we can use standby mode + recovery target setting for
> > the almost same purpose. In this configuration, if end-of-WAL is reached
> > before recovery target, the startup process keeps waiting for new WAL to
> > be available. Then, if recovery target is reached, the startup process works
> > as recovery_target_action indicates.
>
> So basically get rid of recovery.signal mode and honor recovery target
> parameters in standby mode?

Yes, currently not only archive recovery mode but also standby mode honors
the recovery target settings.

> That has some appeal because it simplify
> this whole space significantly, but perhaps it would be too confusing
> for end users?

This looks less confusing than extending archive recovery. But I'd like to
hear more opinions about that.

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Peter Eisentraut-6
In reply to this post by Leif Gunnar Erlandsen
On 2019-09-17 13:23, Leif Gunnar Erlandsen wrote:
> This patch allows PostgreSQL to pause recovery before PITR target is reached
> if recovery_target_time is specified.

Btw., this discussion/patch seems related:
https://www.postgresql.org/message-id/flat/a3f650f1-fb0f-c913-a000-a4671f12a013%40postgrespro.ru

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Leif Gunnar Erlandsen
>"Peter Eisentraut" <[hidden email]> skrev 6. november 2019 kl. 08:32:
>
>> Btw., this discussion/patch seems related:
>> https://www.postgresql.org/message-id/flat/a3f650f1-fb0f-c913-a000-a4671f12a013@...

I have read through this other proposal. As far as I could see in the suggested patch, it does not solve the same problem.
It still stops recovery when the recovery process does not find any more WAL.
I would like the process to pause so administrator get to choose to find more WAL to apply.


My patch should probably be extended to include
RECOVERY_TARGET_XID, RECOVERY_TARGET_NAME, RECOVERY_TARGET_LSN as well as RECOVERY_TARGET_TIME.


---
Leif Gunnar Erlandsen


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Peter Eisentraut-6
In reply to this post by Fujii Masao-2
After studying this a bit more, I think the current behavior is totally
bogus and needs a serious rethink.

If you specify a recovery target and it is reached, recovery pauses
(depending on recovery_target_action).

If you specify a recovery target and it is not reached when the end of
the archive is reached (i.e., restore_command fails), then recovery ends
and the server is promoted, without any further information.  This is
clearly wrong in multiple ways.

I think what we should do is if we specify a recovery target and we
don't reach it, we should ereport(FATAL).  Somewhere around

             /*
              * end of main redo apply loop
              */

in StartupXLOG(), where we already check for other conditions that are
undesirable at the end of recovery.  Then a user can make fixes either
by getting more WAL files to restore and adjusting the recovery target
and starting again.  I don't think pausing is the right behavior, but
perhaps an argument could be made to offer it as a nondefault behavior.

There is an interesting overlap with the other thread that wants to make
"end of archive" and explicitly settable recovery target.  The current
behavior, however, is more like "recovery time (say) or end of archive,
whichever happens first", which is not a behavior that is currently
selectable or intended with other methods of recovery target
specification.  Also, if you want the end of the archive as your
recovery target, that currently does not respect the
recovery_target_action setting, but perhaps it should.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Leif Gunnar Erlandsen
Adding another patch which is not only for recovery_target_time but also for xid, name and lsn.

> After studying this a bit more, I think the current behavior is totally bogus and needs a serious
> rethink.
>
> If you specify a recovery target and it is reached, recovery pauses (depending on
> recovery_target_action).
>
> If you specify a recovery target and it is not reached when the end of the archive is reached
> (i.e., restore_command fails), then recovery ends and the server is promoted, without any further
> information. This is clearly wrong in multiple ways.

Yes, that is why I have created the patch.

>
> I think what we should do is if we specify a recovery target and we don't reach it, we should
> ereport(FATAL). Somewhere around
>
If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get some more WAL and continue recovery. Pause has the benefit of the possibility to inspect tables in the database.

> in StartupXLOG(), where we already check for other conditions that are undesirable at the end of
> recovery. Then a user can make fixes either by getting more WAL files to restore and adjusting the
> recovery target and starting again. I don't think pausing is the right behavior, but perhaps an
> argument could be made to offer it as a nondefault behavior.

Pausing was choosen in the patch as pause was the expected behaivior if target was reached.

And the patch does not interfere with any other functionality as far as I know.

--
Leif Gunnar Erlandsen

0002-pause-recovery-if-pitr-target-not-reached.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Kyotaro Horiguchi-4
Hello, Lief, Peter.

At Thu, 21 Nov 2019 12:50:18 +0000, "Leif Gunnar Erlandsen" <[hidden email]> wrote in

> Adding another patch which is not only for recovery_target_time but also for xid, name and lsn.
>
> > After studying this a bit more, I think the current behavior is totally bogus and needs a serious
> > rethink.
> >
> > If you specify a recovery target and it is reached, recovery pauses (depending on
> > recovery_target_action).
> >
> > If you specify a recovery target and it is not reached when the end of the archive is reached
> > (i.e., restore_command fails), then recovery ends and the server is promoted, without any further
> > information. This is clearly wrong in multiple ways.
>
> Yes, that is why I have created the patch.

It seems premising to be used in prepeated trial-and-error recovery by
well experiecned operators. When it is used, I think that the target
goes back gradually through repetitions so anyway we need to start
from a clean backup for each repetition, in the expected
usage. Unintended promotion doesn't harm in the case.

In this persipective, I don't think the behavior is totally wrong but
FATAL'ing at EO-WAL before target seems good to do.

> > I think what we should do is if we specify a recovery target and we don't reach it, we should
> > ereport(FATAL). Somewhere around
> >
> If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get some more WAL and continue recovery. Pause has the benefit of the possibility to inspect tables in the database.
>
> > in StartupXLOG(), where we already check for other conditions that are undesirable at the end of
> > recovery. Then a user can make fixes either by getting more WAL files to restore and adjusting the
> > recovery target and starting again. I don't think pausing is the right behavior, but perhaps an
> > argument could be made to offer it as a nondefault behavior.
>
> Pausing was choosen in the patch as pause was the expected behaivior if target was reached.
>
> And the patch does not interfere with any other functionality as far as I know.

With the current behavior, if server promotes without stopping as told
by target_action variables, it is a sign that something's wrong. But
if server pauses before reaching target, operators may overlook the
message if they don't know of the behavior. And if server poses in the
case, I think there's nothing to do.

So +1 for FATAL.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Peter Eisentraut-6
In reply to this post by Leif Gunnar Erlandsen
On 2019-11-21 13:50, Leif Gunnar Erlandsen wrote:
> Pausing was choosen in the patch as pause was the expected behaivior if target was reached.

Pausing is the expect behavior when the target is reached because that
is the default setting of recovery_target_action.  Your patch does not
take recovery_target_action into account.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Leif Gunnar Erlandsen
In reply to this post by Kyotaro Horiguchi-4
"Kyotaro Horiguchi" <[hidden email]> skrev 22. november 2019 kl. 05:26:

> Hello, Lief, Peter.
>
> At Thu, 21 Nov 2019 12:50:18 +0000, "Leif Gunnar Erlandsen" <[hidden email]> wrote in
>
>> Adding another patch which is not only for recovery_target_time but also for xid, name and lsn.
>>
>> After studying this a bit more, I think the current behavior is totally bogus and needs a serious
>> rethink.
>>
>> If you specify a recovery target and it is reached, recovery pauses (depending on
>> recovery_target_action).
>>
>> If you specify a recovery target and it is not reached when the end of the archive is reached
>> (i.e., restore_command fails), then recovery ends and the server is promoted, without any further
>> information. This is clearly wrong in multiple ways.
>>
>> Yes, that is why I have created the patch.
>
> It seems premising to be used in prepeated trial-and-error recovery by
> well experiecned operators. When it is used, I think that the target
> goes back gradually through repetitions so anyway we need to start
> from a clean backup for each repetition, in the expected
> usage. Unintended promotion doesn't harm in the case.
If going back in time and gradually recover less WAL todays behaiviour is adequate.
The patch is for circumstances where for some reason you do not have all the WAL's ready at once.

>
> In this persipective, I don't think the behavior is totally wrong but
> FATAL'ing at EO-WAL before target seems good to do.
>
>> I think what we should do is if we specify a recovery target and we don't reach it, we should
>> ereport(FATAL). Somewhere around
>>
>> If recovery pauses or a FATAL error is reported, is not important, as long as it is possible to get
>> some more WAL and continue recovery. Pause has the benefit of the possibility to inspect tables in
>> the database.
>>
>> in StartupXLOG(), where we already check for other conditions that are undesirable at the end of
>> recovery. Then a user can make fixes either by getting more WAL files to restore and adjusting the
>> recovery target and starting again. I don't think pausing is the right behavior, but perhaps an
>> argument could be made to offer it as a nondefault behavior.
>>
>> Pausing was choosen in the patch as pause was the expected behaivior if target was reached.
>>
>> And the patch does not interfere with any other functionality as far as I know.
>
> With the current behavior, if server promotes without stopping as told
> by target_action variables, it is a sign that something's wrong. But
> if server pauses before reaching target, operators may overlook the
> message if they don't know of the behavior. And if server poses in the
> case, I think there's nothing to do.
Yes, that is correct. FATAL might be the correct behaiviour.
>
> So +1 for FATAL.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Leif Gunnar Erlandsen
In reply to this post by Peter Eisentraut-6
"Peter Eisentraut" <[hidden email]> skrev 22. november 2019 kl. 11:50:

> On 2019-11-21 13:50, Leif Gunnar Erlandsen wrote:
>
>> Pausing was choosen in the patch as pause was the expected behaivior if target was reached.
>
> Pausing is the expect behavior when the target is reached because that is the default setting of
> recovery_target_action. Your patch does not take recovery_target_action into account.

No it does not. It works well to demonstrate its purpose though.
And it might be to stop with FATAL would be more correct.

>
> -- Peter Eisentraut http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Michael Paquier-2
On Fri, Nov 22, 2019 at 11:26:59AM +0000, Leif Gunnar Erlandsen wrote:
> No it does not. It works well to demonstrate its purpose though.
> And it might be to stop with FATAL would be more correct.

This is still under active discussion.  Please note that the latest
patch does not apply, so a rebase would be nice to have.  I have moved
the patch to next CF, waiting on author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Leif Gunnar Erlandsen
Adding patch written for 13dev from git

"Michael Paquier" <[hidden email]> skrev 1. desember 2019 kl. 03:08:

> On Fri, Nov 22, 2019 at 11:26:59AM +0000, Leif Gunnar Erlandsen wrote:
>
>> No it does not. It works well to demonstrate its purpose though.
>> And it might be to stop with FATAL would be more correct.
>
> This is still under active discussion. Please note that the latest
> patch does not apply, so a rebase would be nice to have. I have moved
> the patch to next CF, waiting on author.
> --
> Michael

0004-pause-recovery-if-pitr-target-not-reached.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Peter Eisentraut-6
On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote:

> Adding patch written for 13dev from git
>
> "Michael Paquier" <[hidden email]> skrev 1. desember 2019 kl. 03:08:
>
>> On Fri, Nov 22, 2019 at 11:26:59AM +0000, Leif Gunnar Erlandsen wrote:
>>
>>> No it does not. It works well to demonstrate its purpose though.
>>> And it might be to stop with FATAL would be more correct.
>>
>> This is still under active discussion. Please note that the latest
>> patch does not apply, so a rebase would be nice to have. I have moved
>> the patch to next CF, waiting on author.
I reworked your patch a bit.  I changed the outcome to be an error, as
was discussed.  I also added tests and documentation.  Please take a look.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v5-0001-Fail-if-recovery-target-is-not-reached.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Kyotaro Horiguchi-4
At Tue, 14 Jan 2020 21:13:51 +0100, Peter Eisentraut <[hidden email]> wrote in

> On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote:
> > Adding patch written for 13dev from git
> > "Michael Paquier" <[hidden email]> skrev 1. desember 2019
> > kl. 03:08:
> >
> >> On Fri, Nov 22, 2019 at 11:26:59AM +0000, Leif Gunnar Erlandsen wrote:
> >>
> >>> No it does not. It works well to demonstrate its purpose though.
> >>> And it might be to stop with FATAL would be more correct.
> >>
> >> This is still under active discussion. Please note that the latest
> >> patch does not apply, so a rebase would be nice to have. I have moved
> >> the patch to next CF, waiting on author.
>
> I reworked your patch a bit.  I changed the outcome to be an error, as
> was discussed.  I also added tests and documentation.  Please take a
> look.

It doesn't show how far the last recovery actually reached. I don't
think activating resource managers harms. Don't we check the
not-reached condition *only* after the else block of the "if (record
!= NULL)" statement?

>     /* just have to read next record after CheckPoint */
>     record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
>   }
>
>   if (record != NULL)
>   {
> ...
> }
> else
> {
>      /* there are no WAL records following the checkpoint */
>      ereport(LOG,
>          (errmsg("redo is not required")));
>   }
>
+   if (recoveryTarget != RECOVERY_TARGET_UNSET && !reachedStopPoint)
..


recvoery_target_* is not cleared after startup. If a server crashed
just after the last shutdown checkpoint, any recovery_target_* setting
prevents the server from starting regardless of its value.

> LOG:  database system was not properly shut down; automatic recovery in progress
> LOG:  invalid record length at 0/9000420: wanted 24, got 0
(recovery is skipped)
> FATAL:  recovery ended before configured recovery target was reached

I think we should ignore the setting while crash recovery. Targeted
recovery mode is documented as a feature of archive recovery.  Perhaps
ArchiveRecoveryRequested is needed in the condition.

> if (ArchiveRecoveryRequested &&
>     recoveryTarget != RECOVERY_TARGET_UNSET && !reachedStopPoint)
         
regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Kyotaro Horiguchi-4
FWIW, I restate this (perhaps) more clearly.

At Wed, 15 Jan 2020 11:02:24 +0900 (JST), Kyotaro Horiguchi <[hidden email]> wrote in
> recvoery_target_* is not cleared after startup. If a server crashed
> just after the last shutdown checkpoint, any recovery_target_* setting
> prevents the server from starting regardless of its value.

recvoery_target_* is not automatically cleared after a successful
archive recovery.  After that, if the server crashed just after the
last shutdown checkpoint, any recovery_target_* setting prevents the
server from starting regardless of its value.

> > LOG:  database system was not properly shut down; automatic recovery in progress
> > LOG:  invalid record length at 0/9000420: wanted 24, got 0
> (recovery is skipped)
> > FATAL:  recovery ended before configured recovery target was reached
>
> I think we should ignore the setting while crash recovery. Targeted
> recovery mode is documented as a feature of archive recovery.  Perhaps
> ArchiveRecoveryRequested is needed in the condition.
>
> > if (ArchiveRecoveryRequested &&
> >     recoveryTarget != RECOVERY_TARGET_UNSET && !reachedStopPoint)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: pause recovery if pitr target not reached

Leif Gunnar Erlandsen
In reply to this post by Peter Eisentraut-6
> "Peter Eisentraut" <[hidden email]> skrev 14. januar 2020 kl. 21:13:
>
> On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote:
>> Adding patch written for 13dev from git
>> "Michael Paquier" <[hidden email]> skrev 1. desember 2019 kl. 03:08:
>> On Fri, Nov 22, 2019 at 11:26:59AM +0000, Leif Gunnar Erlandsen wrote:
>
>> No it does not. It works well to demonstrate its purpose though.
>> And it might be to stop with FATAL would be more correct.
>
> This is still under active discussion. Please note that the latest
> patch does not apply, so a rebase would be nice to have. I have moved
> the patch to next CF, waiting on author.
>
> I reworked your patch a bit. I changed the outcome to be an error, as was discussed. I also added
> tests and documentation. Please take a look.

Thank you, it was not unexpexted for the patch to be a little bit smaller.
Although it would have been nice to log where recover ended before reporting fatal error.
And since you use RECOVERY_TARGET_UNSET, RECOVERY_TARGET_IMMEDIATE also gets included, is this correct?


> -- Peter Eisentraut http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services