Stronger safeguard for archive recovery not to miss data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
55 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
Hello


The attached patch is intended to prevent a scenario that
archive recovery hits WALs which come from wal_level=minimal
and the server continues to work, which was discussed in the thread of [1].
The motivation is to protect that user ends up with both getting replica
that could miss data and getting the server to miss data in targeted recovery mode.

About how to modify this, we reached the consensus in the thread.
It is by changing the ereport's level from WARNING to FATAL in CheckRequiredParameterValues().

In order to test this fix, what I did is
1 - get a base backup during wal_level is replica
2 - stop the server and change the wal_level from replica to minimal
3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
4 - stop the server and make the wal_level back to replica
5 - start the server again
6 - execute archive recoveries in both cases
        (1) by editing the postgresql.conf and
        touching recovery.signal in the base backup from 1th step
        (2) by making a replica with standby.signal
* During wal_level is replica, I enabled archive_mode in this test.

First of all, I confirmed the server started up without this patch.
After applying this safeguard patch, I checked that
the server cannot start up any more in the scenario case.
I checked the log and got the result below with this patch.

2020-11-26 06:49:46.003 UTC [19715] FATAL:  WAL was generated with wal_level=minimal, data may be missing
2020-11-26 06:49:46.003 UTC [19715] HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.

Lastly, this should be backpatched.
Any comments ?

[1]
https://www.postgresql.org/message-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320%40TYAPR01MB2990.jpnprd01.prod.outlook.com


Best,
        Takamichi Osumi

stronger_safeguard_for_archive_recovery.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Kyotaro Horiguchi-4
At Thu, 26 Nov 2020 07:18:39 +0000, "[hidden email]" <[hidden email]> wrote in

> Hello
>
>
> The attached patch is intended to prevent a scenario that
> archive recovery hits WALs which come from wal_level=minimal
> and the server continues to work, which was discussed in the thread of [1].
> The motivation is to protect that user ends up with both getting replica
> that could miss data and getting the server to miss data in targeted recovery mode.
>
> About how to modify this, we reached the consensus in the thread.
> It is by changing the ereport's level from WARNING to FATAL in CheckRequiredParameterValues().
>
> In order to test this fix, what I did is
> 1 - get a base backup during wal_level is replica
> 2 - stop the server and change the wal_level from replica to minimal
> 3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
> 4 - stop the server and make the wal_level back to replica
> 5 - start the server again
> 6 - execute archive recoveries in both cases
> (1) by editing the postgresql.conf and
> touching recovery.signal in the base backup from 1th step
> (2) by making a replica with standby.signal
> * During wal_level is replica, I enabled archive_mode in this test.
>
> First of all, I confirmed the server started up without this patch.
> After applying this safeguard patch, I checked that
> the server cannot start up any more in the scenario case.
> I checked the log and got the result below with this patch.
>
> 2020-11-26 06:49:46.003 UTC [19715] FATAL:  WAL was generated with wal_level=minimal, data may be missing
> 2020-11-26 06:49:46.003 UTC [19715] HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
>
> Lastly, this should be backpatched.
> Any comments ?

Perhaps we need the TAP test that conducts the above steps.

> [1]
> https://www.postgresql.org/message-id/TYAPR01MB29901EBE5A3ACCE55BA99186FE320%40TYAPR01MB2990.jpnprd01.prod.outlook.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Sergei Kornilov
In reply to this post by osumi.takamichi@fujitsu.com
Hello

> First of all, I confirmed the server started up without this patch.

It is possible only with manually configured hot_standby = off, right?
We have ERROR in hot standby mode just below in CheckRequiredParameterValues.

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
In reply to this post by Kyotaro Horiguchi-4
Hello, Horiguchi-San


On Thursday, Nov 26, 2020 4:29 PM Kyotaro Horiguchi <[hidden email]>

> At Thu, 26 Nov 2020 07:18:39 +0000, "[hidden email]"
> <[hidden email]> wrote in
> > In order to test this fix, what I did is
> > 1 - get a base backup during wal_level is replica
> > 2 - stop the server and change the wal_level from replica to minimal
> > 3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
> > 4 - stop the server and make the wal_level back to replica
> > 5 - start the server again
> > 6 - execute archive recoveries in both cases
> > (1) by editing the postgresql.conf and
> > touching recovery.signal in the base backup from 1th step
> > (2) by making a replica with standby.signal
> > * During wal_level is replica, I enabled archive_mode in this test.
> >
> Perhaps we need the TAP test that conducts the above steps.
It really makes sense that it's better to show the procedures
about how to reproduce the flow.
Thanks for your advice.

Best,
        Takamichi Osumi


Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
In reply to this post by Sergei Kornilov
Hello, Sergei

> It is possible only with manually configured hot_standby = off, right?
> We have ERROR in hot standby mode just below in
> CheckRequiredParameterValues.
Yes, you are right. I turned off the hot_standby in the test in the previous e-mail.
I'm sorry that I missed writing this tip of information.

Best,
        Takamichi Osumi


Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
In reply to this post by Kyotaro Horiguchi-4
Hi


On Thursday, November 26, 2020 4:29 PM
Kyotaro Horiguchi <[hidden email]> wrote:

> At Thu, 26 Nov 2020 07:18:39 +0000, "[hidden email]"
> <[hidden email]> wrote in
> > The attached patch is intended to prevent a scenario that archive
> > recovery hits WALs which come from wal_level=minimal and the server
> > continues to work, which was discussed in the thread of [1].
> > The motivation is to protect that user ends up with both getting
> > replica that could miss data and getting the server to miss data in targeted
> recovery mode.
> >
> > About how to modify this, we reached the consensus in the thread.
> > It is by changing the ereport's level from WARNING to FATAL in
> CheckRequiredParameterValues().
> >
> > In order to test this fix, what I did is
> > 1 - get a base backup during wal_level is replica
> > 2 - stop the server and change the wal_level from replica to minimal
> > 3 - restart the server(to generate XLOG_PARAMETER_CHANGE)
> > 4 - stop the server and make the wal_level back to replica
> > 5 - start the server again
> > 6 - execute archive recoveries in both cases
> > (1) by editing the postgresql.conf and
> > touching recovery.signal in the base backup from 1th step
> > (2) by making a replica with standby.signal
> > * During wal_level is replica, I enabled archive_mode in this test.
> >
> > First of all, I confirmed the server started up without this patch.
> > After applying this safeguard patch, I checked that the server cannot
> > start up any more in the scenario case.
> > I checked the log and got the result below with this patch.
> >
> > 2020-11-26 06:49:46.003 UTC [19715] FATAL:  WAL was generated with
> > wal_level=minimal, data may be missing
> > 2020-11-26 06:49:46.003 UTC [19715] HINT:  This happens if you
> temporarily set wal_level=minimal without taking a new base backup.
> >
> > Lastly, this should be backpatched.
> > Any comments ?
>
> Perhaps we need the TAP test that conducts the above steps.
I added the TAP tests to reproduce and share the result,
using the case of 6-(1) described above.
Here, I created a new file for it because the purposes of other files in
src/recovery didn't match the purpose of my TAP tests perfectly.
If you are dubious about this idea, please have a look at the comments
in each file.

When the attached patch is applied,
my TAP tests are executed like other ones like below.

t/018_wal_optimize.pl ................ ok
t/019_replslot_limit.pl .............. ok
t/020_archive_status.pl .............. ok
t/021_row_visibility.pl .............. ok
t/022_archive_recovery.pl ............ ok
All tests successful.

Also, I confirmed that there's no regression by make check-world.
Any comments ?

Best,
        Takamichi Osumi


stronger_safeguard_for_archive_recovery_v02.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Tue, 2020-12-08 at 03:08 +0000, [hidden email] wrote:

> On Thursday, November 26, 2020 4:29 PM
> Kyotaro Horiguchi <[hidden email]> wrote:
> > At Thu, 26 Nov 2020 07:18:39 +0000, "[hidden email]"
> > <[hidden email]> wrote in
> > > The attached patch is intended to prevent a scenario that archive
> > > recovery hits WALs which come from wal_level=minimal and the server
> > > continues to work, which was discussed in the thread of [1].
> >
> > Perhaps we need the TAP test that conducts the above steps.
>
> I added the TAP tests to reproduce and share the result,
> using the case of 6-(1) described above.
> Here, I created a new file for it because the purposes of other files in
> src/recovery didn't match the purpose of my TAP tests perfectly.
> If you are dubious about this idea, please have a look at the comments
> in each file.
>
> When the attached patch is applied,
> my TAP tests are executed like other ones like below.
>
> t/018_wal_optimize.pl ................ ok
> t/019_replslot_limit.pl .............. ok
> t/020_archive_status.pl .............. ok
> t/021_row_visibility.pl .............. ok
> t/022_archive_recovery.pl ............ ok
> All tests successful.
>
> Also, I confirmed that there's no regression by make check-world.
> Any comments ?

The patch applies and passes regression tests, as well as the new TAP test.

I think this should be backpatched, since it fixes a bug.

I am not quite happy with the message:

FATAL:  WAL was generated with wal_level=minimal, data may be missing
HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.

This sounds too harmless to me and doesn't give the user a clue
what would be the best way to proceed.

Suggestion:

FATAL:  WAL was generated with wal_level=minimal, cannot continue recovering
DETAIL:  This happens if you temporarily set wal_level=minimal on the primary server.
HINT:  Create a new standby from a new base backup after setting wal_level=replica.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
Hi, Laurenz


On Friday, January 15, 2021 12:56 AM Laurenz Albe <[hidden email]> wrote:

> On Tue, 2020-12-08 at 03:08 +0000, [hidden email] wrote:
> > On Thursday, November 26, 2020 4:29 PM Kyotaro Horiguchi
> > <[hidden email]> wrote:
> > > At Thu, 26 Nov 2020 07:18:39 +0000, "[hidden email]"
> > > <[hidden email]> wrote in
> > > > The attached patch is intended to prevent a scenario that archive
> > > > recovery hits WALs which come from wal_level=minimal and the
> > > > server continues to work, which was discussed in the thread of [1].
> > >
> > > Perhaps we need the TAP test that conducts the above steps.
> >
> > I added the TAP tests to reproduce and share the result, using the
> > case of 6-(1) described above.
> > Here, I created a new file for it because the purposes of other files
> > in src/recovery didn't match the purpose of my TAP tests perfectly.
> > If you are dubious about this idea, please have a look at the comments
> > in each file.
> >
> > When the attached patch is applied,
> > my TAP tests are executed like other ones like below.
> >
> > t/018_wal_optimize.pl ................ ok t/019_replslot_limit.pl
> > .............. ok t/020_archive_status.pl .............. ok
> > t/021_row_visibility.pl .............. ok t/022_archive_recovery.pl
> > ............ ok All tests successful.
> >
> > Also, I confirmed that there's no regression by make check-world.
> > Any comments ?
>
> The patch applies and passes regression tests, as well as the new TAP test.
Thank you for checking.

> I think this should be backpatched, since it fixes a bug.
Agreed.

> I am not quite happy with the message:
>
> FATAL:  WAL was generated with wal_level=minimal, data may be missing
> HINT:  This happens if you temporarily set wal_level=minimal without taking a
> new base backup.
>
> This sounds too harmless to me and doesn't give the user a clue what would be
> the best way to proceed.
>
> Suggestion:
>
> FATAL:  WAL was generated with wal_level=minimal, cannot continue
> recovering
Adopted.

> DETAIL:  This happens if you temporarily set wal_level=minimal on the primary
> server.
> HINT:  Create a new standby from a new base backup after setting
> wal_level=replica.
Thanks for your suggestion.
I noticed that this message should cover both archive recovery modes,
which means in recovery mode and standby mode. Then, I combined your
suggestion above with this point of view. Have a look at the updated patch.
I also enriched the new tap tests to show this perspective.

Best Regards,
        Takamichi Osumi


stronger_safeguard_for_archive_recovery_v03.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Mon, 2021-01-18 at 07:34 +0000, [hidden email] wrote:
> I noticed that this message should cover both archive recovery modes,
> which means in recovery mode and standby mode. Then, I combined your
> suggestion above with this point of view. Have a look at the updated patch.
> I also enriched the new tap tests to show this perspective.

Looks good, thanks.

I'll mark this patch as "ready for committer".

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Fujii Masao-4


On 2021/01/20 1:05, Laurenz Albe wrote:
> On Mon, 2021-01-18 at 07:34 +0000, [hidden email] wrote:
>> I noticed that this message should cover both archive recovery modes,
>> which means in recovery mode and standby mode. Then, I combined your
>> suggestion above with this point of view. Have a look at the updated patch.
>> I also enriched the new tap tests to show this perspective.
>
> Looks good, thanks.
>
> I'll mark this patch as "ready for committer".

+ errhint("Run recovery again from a new base backup taken after setting wal_level higher than minimal")));

Isn't it impossible to do this in normal archive recovery case? In that case,
since the server crashed and the database got corrupted, probably
we cannot take a new base backup.

Originally even when users accidentally set wal_level to minimal, they could
start the server from the backup by disabling hot_standby and salvage the data.
But with the patch, we lost the way to do that. Right? I was wondering that
WARNING was used intentionally there for that case.


                if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
                        ereport(ERROR,
                                        (errmsg("hot standby is not possible because wal_level was not set to \"replica\" or higher on the primary server"),
                                         errhint("Either set wal_level to \"replica\" on the primary, or turn off hot_standby here.")));

With the patch, we never reach the above code?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Wed, 2021-01-20 at 13:10 +0900, Fujii Masao wrote:

> +                                errhint("Run recovery again from a new base backup taken after setting wal_level higher than minimal")));
>
> Isn't it impossible to do this in normal archive recovery case? In that case,
> since the server crashed and the database got corrupted, probably
> we cannot take a new base backup.
>
> Originally even when users accidentally set wal_level to minimal, they could
> start the server from the backup by disabling hot_standby and salvage the data.
> But with the patch, we lost the way to do that. Right? I was wondering that
> WARNING was used intentionally there for that case.

I would argue that if you set your "wal_level" to minimal, do some work,
reset it to "replica" and recover past that, two things could happen:

1. Since "archive_mode" was off, you are missing some WAL segments and
   cannot recover past that point anyway.

2. You are missing some relevant WAL records, and your recovered
   database is corrupted.  This is very likely, because you probably
   switched to "minimal" with the intention to generate less WAL.

Now I see your point that a corrupted database may be better than no
database at all, but wouldn't you agree that a warning in the log
(that nobody reads) is too little information?

Normally we don't take such a relaxed attitude towards data corruption.

Perhaps there could be a GUC "recovery_allow_data_corruption" to
override this check, but I'd say that a warning is too little.

>                 if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
>                         ereport(ERROR,
>                                         (errmsg("hot standby is not possible because wal_level was not set to \"replica\" or higher on the primary server"),
>                                          errhint("Either set wal_level to \"replica\" on the primary, or turn off hot_standby here.")));
>
> With the patch, we never reach the above code?

Right, that would have to go.  I didn't notice that.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
Hi


On Wed, Jan 20, 2021 9:03 PM Laurenz Albe <[hidden email]> wrote:

>
> On Wed, 2021-01-20 at 13:10 +0900, Fujii Masao wrote:
> > +                                errhint("Run recovery again from a
> > + new base backup taken after setting wal_level higher than
> > + minimal")));
> >
> > Isn't it impossible to do this in normal archive recovery case? In
> > that case, since the server crashed and the database got corrupted,
> > probably we cannot take a new base backup.
> >
> > Originally even when users accidentally set wal_level to minimal, they
> > could start the server from the backup by disabling hot_standby and salvage
> the data.
> > But with the patch, we lost the way to do that. Right? I was wondering
> > that WARNING was used intentionally there for that case.
OK. I understand that this WARNING is necessary to give user a chance
to start up the server again and salvage his/her data,
when hot_standby=off and the server goes through a period of wal_level=minimal
during an archive recovery.


> I would argue that if you set your "wal_level" to minimal, do some work, reset it
> to "replica" and recover past that, two things could happen:
>
> 1. Since "archive_mode" was off, you are missing some WAL segments and
>    cannot recover past that point anyway.
>
> 2. You are missing some relevant WAL records, and your recovered
>    database is corrupted.  This is very likely, because you probably
>    switched to "minimal" with the intention to generate less WAL.
>
> Now I see your point that a corrupted database may be better than no database
> at all, but wouldn't you agree that a warning in the log (that nobody reads) is
> too little information?
>
> Normally we don't take such a relaxed attitude towards data corruption.
Yeah, we thought we needed stronger protection for that reason.


> Perhaps there could be a GUC "recovery_allow_data_corruption" to override this
> check, but I'd say that a warning is too little.
>
> >                 if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
> >                         ereport(ERROR,
> >                                         (errmsg("hot standby is not
> possible because wal_level was not set to \"replica\" or higher on the primary
> server"),
> >                                          errhint("Either set wal_level
> > to \"replica\" on the primary, or turn off hot_standby here.")));
> >
> > With the patch, we never reach the above code?
>
> Right, that would have to go.  I didn't notice that.
Adding a condition to check if "recovery_allow_data_corruption" is 'on' around the end of
CheckRequiredParameterValues() sounds safer for me too, although
implementing a new GUC parameter sounds bigger than what I expected at first.
The default of the value should be 'off' to protect users from getting the corrupted server.
Does everyone agree with this direction ?


Best Regards,
        Takamichi Osumi
Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Thu, 2021-01-21 at 11:49 +0000, [hidden email] wrote:
> Adding a condition to check if "recovery_allow_data_corruption" is 'on' around the end of
> CheckRequiredParameterValues() sounds safer for me too, although
> implementing a new GUC parameter sounds bigger than what I expected at first.
> The default of the value should be 'off' to protect users from getting the corrupted server.
> Does everyone agree with this direction ?

I'd say that adding such a GUC is material for another patch, if we want it at all.

I think it is very unlikely that people will switch from "wal_level=replica" to
"minimal" and back very soon afterwards and also try to recover past such
a switch, which probably explains why nobody has complained about data corruption
generated that way.  To get the server to start with "wal_level=minimal", you must
set "archive_mode=off" and "max_wal_senders=0", and few people will do that and
still expect recovery to work.

My vote is that we should not have a GUC for such an unlikely event, and that
stopping recovery is good enough.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
Hi, Laurenz


On Thursday, January 21, 2021 9:51 PM Laurenz Albe <[hidden email]> wrote:

> On Thu, 2021-01-21 at 11:49 +0000, [hidden email] wrote:
> > Adding a condition to check if "recovery_allow_data_corruption" is
> > 'on' around the end of
> > CheckRequiredParameterValues() sounds safer for me too, although
> > implementing a new GUC parameter sounds bigger than what I expected at
> first.
> > The default of the value should be 'off' to protect users from getting the
> corrupted server.
> > Does everyone agree with this direction ?
>
> I'd say that adding such a GUC is material for another patch, if we want it at all.
OK. You meant another different patch.

> I think it is very unlikely that people will switch from "wal_level=replica" to
> "minimal" and back very soon afterwards and also try to recover past such a
> switch, which probably explains why nobody has complained about data
> corruption generated that way.  To get the server to start with
> "wal_level=minimal", you must set "archive_mode=off" and
> "max_wal_senders=0", and few people will do that and still expect recovery to
> work.
Yeah, the possibility is low of course.

> My vote is that we should not have a GUC for such an unlikely event, and that
> stopping recovery is good enough.
OK. IIUC, my current patch for this fix doesn't need to be changed or withdrawn.
Thank you for your explanation.


Best Regards,
        Takamichi Osumi
Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Thu, 2021-01-21 at 13:09 +0000, [hidden email] wrote:
> > My vote is that we should not have a GUC for such an unlikely event, and that
> > stopping recovery is good enough.
>
> OK. IIUC, my current patch for this fix doesn't need to be changed or withdrawn.
> Thank you for your explanation.

Well, that's just my opinion.
Fujii Masao seemed to disagree with the patch, and his voice carries weight.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Thu, 2021-01-21 at 15:30 +0100, I wrote:

> On Thu, 2021-01-21 at 13:09 +0000, [hidden email] wrote:
>
> > > My vote is that we should not have a GUC for such an unlikely event, and that
> > > stopping recovery is good enough.
> > OK. IIUC, my current patch for this fix doesn't need to be changed or withdrawn.
> > Thank you for your explanation.
>
> Well, that's just my opinion.
>
> Fujii Masao seemed to disagree with the patch, and his voice carries weight.

I think you should pst another patch where the second, now superfluous,
error message is removed.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

RE: Stronger safeguard for archive recovery not to miss data

osumi.takamichi@fujitsu.com
Hi


On Monday, January 25, 2021 5:13 AM Laurenz Albe <[hidden email]> wrote:

> On Thu, 2021-01-21 at 15:30 +0100, I wrote:
> > On Thu, 2021-01-21 at 13:09 +0000, [hidden email] wrote:
> >
> > > > My vote is that we should not have a GUC for such an unlikely
> > > > event, and that stopping recovery is good enough.
> > > OK. IIUC, my current patch for this fix doesn't need to be changed or
> withdrawn.
> > > Thank you for your explanation.
> >
> > Well, that's just my opinion.
> >
> > Fujii Masao seemed to disagree with the patch, and his voice carries weight.
>
> I think you should pst another patch where the second, now superfluous, error
> message is removed.
Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.

Best Regards,
        Takamichi Osumi


stronger_safeguard_for_archive_recovery_v04.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Laurenz Albe
On Mon, 2021-01-25 at 08:19 +0000, [hidden email] wrote:
> > I think you should pst another patch where the second, now superfluous, error
> > message is removed.
>
> Updated. This patch showed no failure during regression tests
> and has been aligned by pgindent.

Looks good to me.
I'll set it to "ready for committer" again.

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

David Steele
On 1/25/21 3:55 AM, Laurenz Albe wrote:
> On Mon, 2021-01-25 at 08:19 +0000, [hidden email] wrote:
>>> I think you should pst another patch where the second, now superfluous, error
>>> message is removed.
>>
>> Updated. This patch showed no failure during regression tests
>> and has been aligned by pgindent.
>
> Looks good to me.
> I'll set it to "ready for committer" again.

Fujii, does the new patch in [1] address your concerns?

Regards,
--
-David
[hidden email]

[1]
https://www.postgresql.org/message-id/OSBPR01MB48887EFFCA39FA9B1DBAFB0FEDBD0%40OSBPR01MB4888.jpnprd01.prod.outlook.com


Reply | Threaded
Open this post in threaded view
|

Re: Stronger safeguard for archive recovery not to miss data

Fujii Masao-4


On 2021/03/25 23:21, David Steele wrote:

> On 1/25/21 3:55 AM, Laurenz Albe wrote:
>> On Mon, 2021-01-25 at 08:19 +0000, [hidden email] wrote:
>>>> I think you should pst another patch where the second, now superfluous, error
>>>> message is removed.
>>>
>>> Updated. This patch showed no failure during regression tests
>>> and has been aligned by pgindent.
>>
>> Looks good to me.
>> I'll set it to "ready for committer" again.
>
> Fujii, does the new patch in [1] address your concerns?

No. I'm still not sure if this patch is good idea... I understand
why this safeguard is necessary. OTOH I'm afraid it increases
a bit the risk that users get unstartable database, i.e., lose whole database.
But maybe I'm concerned about rare case and my opinion is minority one.
So I'd like to hear more opinions about this patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


123