[proposal] recovery_target "latest"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

[proposal] recovery_target "latest"

Grigory Smolkin
Hello, hackers!

I`d like to propose a new argument for recovery_target parameter, which
will stand to recovering until all available WAL segments are applied.

Current PostgreSQL recovery default behavior(when no recovery target is
provided) does exactly that, but there are several shortcomings:
   - without explicit recovery target standing for default behavior,
recovery_target_action is not coming to action at the end of recovery
   - with PG12 changes, the life of all backup tools became very hard,
because now recovery parameters can be set outside of single config
file(recovery.conf), so it is impossible to ensure, that default
recovery behavior, desired in some cases, will not be silently
overwritten by some recovery parameter forgotten by user.

Proposed path is very simple and solves the aforementioned problems by
introducing new argument "latest" for recovery_target parameter.

Old recovery behavior is still available if no recovery target is
provided. I`m not sure, whether it should it be left as it is now, or not.

Another open question is what to do with recovery_target_inclusive if
recovery_target = "latest" is used.

--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-recovery_target_latest.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Kyotaro Horiguchi-4
Hello.

At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin <[hidden email]> wrote in

> Hello, hackers!
>
> I`d like to propose a new argument for recovery_target parameter,
> which will stand to recovering until all available WAL segments are
> applied.
>
> Current PostgreSQL recovery default behavior(when no recovery target
> is provided) does exactly that, but there are several shortcomings:
>   - without explicit recovery target standing for default behavior,
> recovery_target_action is not coming to action at the end of recovery
>   - with PG12 changes, the life of all backup tools became very hard,
> because now recovery parameters can be set outside of single config
> file(recovery.conf), so it is impossible to ensure, that default
> recovery behavior, desired in some cases, will not be silently
> overwritten by some recovery parameter forgotten by user.
>
> Proposed path is very simple and solves the aforementioned problems by
> introducing new argument "latest" for recovery_target parameter.

Does the tool remove or rename recovery.conf to cancel the settings?
And do you intend that the new option is used to override settings by
appending it at the end of postgresql.conf? If so, the commit
f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
finds two different kinds of recovery target settings.

> Old recovery behavior is still available if no recovery target is
> provided. I`m not sure, whether it should it be left as it is now, or
> not.
>
> Another open question is what to do with recovery_target_inclusive if
> recovery_target = "latest" is used.

Anyway inclusiveness doesn't affect to "immediate". If we had the
"latest" option, it would behave the same way.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin
Thank you for you interest in this topic!

On 11/5/19 10:41 AM, Kyotaro Horiguchi wrote:

> Hello.
>
> At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin <[hidden email]> wrote in
>> Hello, hackers!
>>
>> I`d like to propose a new argument for recovery_target parameter,
>> which will stand to recovering until all available WAL segments are
>> applied.
>>
>> Current PostgreSQL recovery default behavior(when no recovery target
>> is provided) does exactly that, but there are several shortcomings:
>>    - without explicit recovery target standing for default behavior,
>> recovery_target_action is not coming to action at the end of recovery
>>    - with PG12 changes, the life of all backup tools became very hard,
>> because now recovery parameters can be set outside of single config
>> file(recovery.conf), so it is impossible to ensure, that default
>> recovery behavior, desired in some cases, will not be silently
>> overwritten by some recovery parameter forgotten by user.
>>
>> Proposed path is very simple and solves the aforementioned problems by
>> introducing new argument "latest" for recovery_target parameter.
> Does the tool remove or rename recovery.conf to cancel the settings?
> And do you intend that the new option is used to override settings by
> appending it at the end of postgresql.conf? If so, the commit
> f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
> finds two different kinds of recovery target settings.
Yes, previously it was possible to remove/rename old recovery.conf, but
not anymore.
My assumption is exactly that PG should reject to start because of
multiple recovery targets.
Failing to start is infinitely better that recovering to the wrong
recovery target.
>
>> Old recovery behavior is still available if no recovery target is
>> provided. I`m not sure, whether it should it be left as it is now, or
>> not.
>>
>> Another open question is what to do with recovery_target_inclusive if
>> recovery_target = "latest" is used.
> Anyway inclusiveness doesn't affect to "immediate". If we had the
> "latest" option, it would behave the same way.
Right, thank you.
>
> regards.
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin
Attached new version of a patch with TAP test.

On 11/5/19 11:51 AM, Grigory Smolkin wrote:

> Thank you for you interest in this topic!
>
> On 11/5/19 10:41 AM, Kyotaro Horiguchi wrote:
>> Hello.
>>
>> At Mon, 4 Nov 2019 16:03:38 +0300, Grigory Smolkin
>> <[hidden email]> wrote in
>>> Hello, hackers!
>>>
>>> I`d like to propose a new argument for recovery_target parameter,
>>> which will stand to recovering until all available WAL segments are
>>> applied.
>>>
>>> Current PostgreSQL recovery default behavior(when no recovery target
>>> is provided) does exactly that, but there are several shortcomings:
>>>    - without explicit recovery target standing for default behavior,
>>> recovery_target_action is not coming to action at the end of recovery
>>>    - with PG12 changes, the life of all backup tools became very hard,
>>> because now recovery parameters can be set outside of single config
>>> file(recovery.conf), so it is impossible to ensure, that default
>>> recovery behavior, desired in some cases, will not be silently
>>> overwritten by some recovery parameter forgotten by user.
>>>
>>> Proposed path is very simple and solves the aforementioned problems by
>>> introducing new argument "latest" for recovery_target parameter.
>> Does the tool remove or rename recovery.conf to cancel the settings?
>> And do you intend that the new option is used to override settings by
>> appending it at the end of postgresql.conf? If so, the commit
>> f2cbffc7a6 seems to break the assumption. PG12 rejects to start if it
>> finds two different kinds of recovery target settings.
> Yes, previously it was possible to remove/rename old recovery.conf,
> but not anymore.
> My assumption is exactly that PG should reject to start because of
> multiple recovery targets.
> Failing to start is infinitely better that recovering to the wrong
> recovery target.
>>
>>> Old recovery behavior is still available if no recovery target is
>>> provided. I`m not sure, whether it should it be left as it is now, or
>>> not.
>>>
>>> Another open question is what to do with recovery_target_inclusive if
>>> recovery_target = "latest" is used.
>> Anyway inclusiveness doesn't affect to "immediate". If we had the
>> "latest" option, it would behave the same way.
> Right, thank you.
>>
>> regards.
>>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0002-recovery_target_latest.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Peter Eisentraut-6
This seems to also be related to this discussion:
<https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>

I like this idea.

I don't like the name "latest".  What does that mean?  Other
documentation talks about the "end of the archive".  What does that
mean?  It means until restore_command errors.  Let's think of a name
that reflects that better.  Maybe "all_archive" or something like that.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin

On 11/6/19 10:39 AM, Peter Eisentraut wrote:
> This seems to also be related to this discussion:
> <https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>

Yes, in a way. Strengthening current lax recovery behavior is a very
good idea.

>
> I like this idea.
>
> I don't like the name "latest".  What does that mean?  Other
> documentation talks about the "end of the archive".  What does that
> mean?  It means until restore_command errors.  Let's think of a name
> that reflects that better.  Maybe "all_archive" or something like that.

As with "immediate", "latest" reflects the latest possible state this
PostgreSQL instance can achieve when using PITR. I think it is simple
and easy to understand for an end user, which sees PITR as a way to go
from one state to another. In my experience, at least, which is, of
course, subjective.

But if we want an argument name to be technically accurate, then, I
think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
a way to go.


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Fujii Masao-2
On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin <[hidden email]> wrote:

>
>
> On 11/6/19 10:39 AM, Peter Eisentraut wrote:
> > This seems to also be related to this discussion:
> > <https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>
>
> Yes, in a way. Strengthening current lax recovery behavior is a very
> good idea.
>
> >
> > I like this idea.
> >
> > I don't like the name "latest".  What does that mean?  Other
> > documentation talks about the "end of the archive".  What does that
> > mean?  It means until restore_command errors.  Let's think of a name
> > that reflects that better.  Maybe "all_archive" or something like that.
>
> As with "immediate", "latest" reflects the latest possible state this
> PostgreSQL instance can achieve when using PITR. I think it is simple
> and easy to understand for an end user, which sees PITR as a way to go
> from one state to another. In my experience, at least, which is, of
> course, subjective.
>
> But if we want an argument name to be technically accurate, then, I
> think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
> a way to go.

What happens if this parameter is set to latest in the standby mode?
Or the combination of those settings should be prohibited?

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin

On 11/6/19 12:56 PM, Fujii Masao wrote:

> On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin <[hidden email]> wrote:
>>
>> On 11/6/19 10:39 AM, Peter Eisentraut wrote:
>>> This seems to also be related to this discussion:
>>> <https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>
>> Yes, in a way. Strengthening current lax recovery behavior is a very
>> good idea.
>>
>>> I like this idea.
>>>
>>> I don't like the name "latest".  What does that mean?  Other
>>> documentation talks about the "end of the archive".  What does that
>>> mean?  It means until restore_command errors.  Let's think of a name
>>> that reflects that better.  Maybe "all_archive" or something like that.
>> As with "immediate", "latest" reflects the latest possible state this
>> PostgreSQL instance can achieve when using PITR. I think it is simple
>> and easy to understand for an end user, which sees PITR as a way to go
>> from one state to another. In my experience, at least, which is, of
>> course, subjective.
>>
>> But if we want an argument name to be technically accurate, then, I
>> think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
>> a way to go.
> What happens if this parameter is set to latest in the standby mode?
> Or the combination of those settings should be prohibited?


Currently it will behave just like regular standby, so I think, to avoid
confusion and keep things simple, the combination of them should be
prohibited.
Thank you for pointing this out, I will work on it.

The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
as something more complex, for example, the latest possible endptr in
latest WAL segment. But it is tricky, because WAL archive may keeps
growing as recovery is progressing or, in case of standby, master keeps
sending more and more WAL.

>
> Regards,
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin
On 11/6/19 1:55 PM, Grigory Smolkin wrote:

>
> On 11/6/19 12:56 PM, Fujii Masao wrote:
>> On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin
>> <[hidden email]> wrote:
>>>
>>> On 11/6/19 10:39 AM, Peter Eisentraut wrote:
>>>> This seems to also be related to this discussion:
>>>> <https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>
>>>>
>>> Yes, in a way. Strengthening current lax recovery behavior is a very
>>> good idea.
>>>
>>>> I like this idea.
>>>>
>>>> I don't like the name "latest".  What does that mean?  Other
>>>> documentation talks about the "end of the archive".  What does that
>>>> mean?  It means until restore_command errors.  Let's think of a name
>>>> that reflects that better.  Maybe "all_archive" or something like
>>>> that.
>>> As with "immediate", "latest" reflects the latest possible state this
>>> PostgreSQL instance can achieve when using PITR. I think it is simple
>>> and easy to understand for an end user, which sees PITR as a way to go
>>> from one state to another. In my experience, at least, which is, of
>>> course, subjective.
>>>
>>> But if we want an argument name to be technically accurate, then, I
>>> think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL" is
>>> a way to go.
>> What happens if this parameter is set to latest in the standby mode?
>> Or the combination of those settings should be prohibited?
>
>
> Currently it will behave just like regular standby, so I think, to
> avoid confusion and keep things simple, the combination of them should
> be prohibited.
> Thank you for pointing this out, I will work on it.
Attached new patch revision, now it is impossible to use recovery_target
'latest' in standby mode.
TAP test is updated to reflect this behavior.


>
> The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
> as something more complex, for example, the latest possible endptr in
> latest WAL segment. But it is tricky, because WAL archive may keeps
> growing as recovery is progressing or, in case of standby, master
> keeps sending more and more WAL.
>
>>
>> Regards,
>>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0003-recovery_target_latest.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Kyotaro Horiguchi-4
At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin <[hidden email]> wrote in

> On 11/6/19 1:55 PM, Grigory Smolkin wrote:
> >
> > On 11/6/19 12:56 PM, Fujii Masao wrote:
> >> On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin
> >> <[hidden email]> wrote:
> >>>
> >>> On 11/6/19 10:39 AM, Peter Eisentraut wrote:
> >>>> This seems to also be related to this discussion:
> >>>> <https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>
> >>> Yes, in a way. Strengthening current lax recovery behavior is a very
> >>> good idea.
> >>>
> >>>> I like this idea.
> >>>>
> >>>> I don't like the name "latest".  What does that mean?  Other
> >>>> documentation talks about the "end of the archive".  What does that
> >>>> mean?  It means until restore_command errors.  Let's think of a name
> >>>> that reflects that better.  Maybe "all_archive" or something like
> >>>> that.
> >>> As with "immediate", "latest" reflects the latest possible state this
> >>> PostgreSQL instance can achieve when using PITR. I think it is simple
> >>> and easy to understand for an end user, which sees PITR as a way to go
> >>> from one state to another. In my experience, at least, which is, of
> >>> course, subjective.
> >>>
> >>> But if we want an argument name to be technically accurate, then, I
> >>> think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL"
> >>> is
> >>> a way to go.
> >> What happens if this parameter is set to latest in the standby mode?
> >> Or the combination of those settings should be prohibited?
> >
> >
> > Currently it will behave just like regular standby, so I think, to
> > avoid confusion and keep things simple, the combination of them should
> > be prohibited.
> > Thank you for pointing this out, I will work on it.
>
> Attached new patch revision, now it is impossible to use
> recovery_target 'latest' in standby mode.
> TAP test is updated to reflect this behavior.

In the first place, latest (or anything it could be named as) is
defined as the explit label for the default behavior. Thus the latest
should work as if nothing is set to recovery_target* following the
definition.  That might seems somewhat strange but I think at least it
is harmless.

recovery_target=immediate + r_t_action=shutdown for a standby works as
commanded. Do we need to inhibit that, too?

> > The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
> > as something more complex, for example, the latest possible endptr in
> > latest WAL segment. But it is tricky, because WAL archive may keeps
> > growing as recovery is progressing or, in case of standby, master
> > keeps sending more and more WAL.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin

On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:

> At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin <[hidden email]> wrote in
>> On 11/6/19 1:55 PM, Grigory Smolkin wrote:
>>> On 11/6/19 12:56 PM, Fujii Masao wrote:
>>>> On Wed, Nov 6, 2019 at 6:33 PM Grigory Smolkin
>>>> <[hidden email]> wrote:
>>>>> On 11/6/19 10:39 AM, Peter Eisentraut wrote:
>>>>>> This seems to also be related to this discussion:
>>>>>> <https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b653839f5@...>
>>>>> Yes, in a way. Strengthening current lax recovery behavior is a very
>>>>> good idea.
>>>>>
>>>>>> I like this idea.
>>>>>>
>>>>>> I don't like the name "latest".  What does that mean?  Other
>>>>>> documentation talks about the "end of the archive".  What does that
>>>>>> mean?  It means until restore_command errors.  Let's think of a name
>>>>>> that reflects that better.  Maybe "all_archive" or something like
>>>>>> that.
>>>>> As with "immediate", "latest" reflects the latest possible state this
>>>>> PostgreSQL instance can achieve when using PITR. I think it is simple
>>>>> and easy to understand for an end user, which sees PITR as a way to go
>>>>> from one state to another. In my experience, at least, which is, of
>>>>> course, subjective.
>>>>>
>>>>> But if we want an argument name to be technically accurate, then, I
>>>>> think, something like "end-of-available-WAL"/"all-WAL", "end-of-WAL"
>>>>> is
>>>>> a way to go.
>>>> What happens if this parameter is set to latest in the standby mode?
>>>> Or the combination of those settings should be prohibited?
>>>
>>> Currently it will behave just like regular standby, so I think, to
>>> avoid confusion and keep things simple, the combination of them should
>>> be prohibited.
>>> Thank you for pointing this out, I will work on it.
>> Attached new patch revision, now it is impossible to use
>> recovery_target 'latest' in standby mode.
>> TAP test is updated to reflect this behavior.
> In the first place, latest (or anything it could be named as) is
> defined as the explit label for the default behavior. Thus the latest
> should work as if nothing is set to recovery_target* following the
> definition.  That might seems somewhat strange but I think at least it
> is harmless.


Well, it was more about getting default behavior by using some explicit
recovery_target, not the other way around. Because it will break some
3rd party backup and replication applications, that may rely on old
behavior of ignoring recovery_target_action when no recovery_target is
provided.
But you think that it is worth pursuing, I can do that.


> recovery_target=immediate + r_t_action=shutdown for a standby works as
> commanded. Do we need to inhibit that, too?

Why something, that work as expected, should be inhibited?


>
>>> The other way around, as I see it, is to define RECOVERY_TARGET_LATEST
>>> as something more complex, for example, the latest possible endptr in
>>> latest WAL segment. But it is tricky, because WAL archive may keeps
>>> growing as recovery is progressing or, in case of standby, master
>>> keeps sending more and more WAL.
> regards.
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Kyotaro Horiguchi-4
At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin <[hidden email]> wrote in

>
> On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
> > At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
> > <[hidden email]> wrote in
> >> On 11/6/19 1:55 PM, Grigory Smolkin wrote:
> >>> On 11/6/19 12:56 PM, Fujii Masao wrote:
> >>>> What happens if this parameter is set to latest in the standby mode?
> >>>> Or the combination of those settings should be prohibited?
> >>>
> >>> Currently it will behave just like regular standby, so I think, to
> >>> avoid confusion and keep things simple, the combination of them should
> >>> be prohibited.
> >>> Thank you for pointing this out, I will work on it.
> >> Attached new patch revision, now it is impossible to use
> >> recovery_target 'latest' in standby mode.
> >> TAP test is updated to reflect this behavior.
> > In the first place, latest (or anything it could be named as) is
> > defined as the explit label for the default behavior. Thus the latest
> > should work as if nothing is set to recovery_target* following the
> > definition.  That might seems somewhat strange but I think at least it
> > is harmless.
>
>
> Well, it was more about getting default behavior by using some
> explicit recovery_target, not the other way around. Because it will
> break some 3rd party backup and replication applications, that may
> rely on old behavior of ignoring recovery_target_action when no
> recovery_target is provided.
> But you think that it is worth pursuing, I can do that.

Ah. Sorry for the misleading statement. What I had in my mind was
somewhat the mixture of them.  I thought that recovery_target =''
behaves the same way as now, r_t_action is ignored. And 'latest' just
makes recovery_target_action work as the current non-empty
recovery_target's does. But I'm not confident that it is a good
design.

> > recovery_target=immediate + r_t_action=shutdown for a standby works as
> > commanded. Do we need to inhibit that, too?
>
> Why something, that work as expected, should be inhibited?

To make sure, I don't think we should do that. I meant by the above
that standby mode is already accepting recovery_target_action so
inhibiting that only for 'latest' is not orthogonal and could be more
confusing for users, and complicatig the code. So my opinion is we
shouldn't inhibit 'latest' unless r_t_action harms.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin

On 11/7/19 12:56 PM, Kyotaro Horiguchi wrote:

> At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin <[hidden email]> wrote in
>> On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
>>> At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
>>> <[hidden email]> wrote in
>>>> On 11/6/19 1:55 PM, Grigory Smolkin wrote:
>>>>> On 11/6/19 12:56 PM, Fujii Masao wrote:
>>>>>> What happens if this parameter is set to latest in the standby mode?
>>>>>> Or the combination of those settings should be prohibited?
>>>>> Currently it will behave just like regular standby, so I think, to
>>>>> avoid confusion and keep things simple, the combination of them should
>>>>> be prohibited.
>>>>> Thank you for pointing this out, I will work on it.
>>>> Attached new patch revision, now it is impossible to use
>>>> recovery_target 'latest' in standby mode.
>>>> TAP test is updated to reflect this behavior.
>>> In the first place, latest (or anything it could be named as) is
>>> defined as the explit label for the default behavior. Thus the latest
>>> should work as if nothing is set to recovery_target* following the
>>> definition.  That might seems somewhat strange but I think at least it
>>> is harmless.
>>
>> Well, it was more about getting default behavior by using some
>> explicit recovery_target, not the other way around. Because it will
>> break some 3rd party backup and replication applications, that may
>> rely on old behavior of ignoring recovery_target_action when no
>> recovery_target is provided.
>> But you think that it is worth pursuing, I can do that.
> Ah. Sorry for the misleading statement. What I had in my mind was
> somewhat the mixture of them.  I thought that recovery_target =''
> behaves the same way as now, r_t_action is ignored. And 'latest' just
> makes recovery_target_action work as the current non-empty
> recovery_target's does. But I'm not confident that it is a good
> design.
>
>>> recovery_target=immediate + r_t_action=shutdown for a standby works as
>>> commanded. Do we need to inhibit that, too?
>> Why something, that work as expected, should be inhibited?
> To make sure, I don't think we should do that. I meant by the above
> that standby mode is already accepting recovery_target_action so
> inhibiting that only for 'latest' is not orthogonal and could be more
> confusing for users, and complicatig the code. So my opinion is we
> shouldn't inhibit 'latest' unless r_t_action harms.

I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby mode.
All recovery_targets have a clear deterministic 'target' where recovery
should end.
In case of recovery_target "latest" this target is the end of available
WAL, therefore the end of available WAL must be more clearly defined.
I will work on it.

Thank you for a feedback.


>
> regards.
>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin

On 11/7/19 4:36 PM, Grigory Smolkin wrote:

>
> On 11/7/19 12:56 PM, Kyotaro Horiguchi wrote:
>> At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin
>> <[hidden email]> wrote in
>>> On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
>>>> At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
>>>> <[hidden email]> wrote in
>>>>> On 11/6/19 1:55 PM, Grigory Smolkin wrote:
>>>>>> On 11/6/19 12:56 PM, Fujii Masao wrote:
>>>>>>> What happens if this parameter is set to latest in the standby
>>>>>>> mode?
>>>>>>> Or the combination of those settings should be prohibited?
>>>>>> Currently it will behave just like regular standby, so I think, to
>>>>>> avoid confusion and keep things simple, the combination of them
>>>>>> should
>>>>>> be prohibited.
>>>>>> Thank you for pointing this out, I will work on it.
>>>>> Attached new patch revision, now it is impossible to use
>>>>> recovery_target 'latest' in standby mode.
>>>>> TAP test is updated to reflect this behavior.
>>>> In the first place, latest (or anything it could be named as) is
>>>> defined as the explit label for the default behavior. Thus the latest
>>>> should work as if nothing is set to recovery_target* following the
>>>> definition.  That might seems somewhat strange but I think at least it
>>>> is harmless.
>>>
>>> Well, it was more about getting default behavior by using some
>>> explicit recovery_target, not the other way around. Because it will
>>> break some 3rd party backup and replication applications, that may
>>> rely on old behavior of ignoring recovery_target_action when no
>>> recovery_target is provided.
>>> But you think that it is worth pursuing, I can do that.
>> Ah. Sorry for the misleading statement. What I had in my mind was
>> somewhat the mixture of them.  I thought that recovery_target =''
>> behaves the same way as now, r_t_action is ignored. And 'latest' just
>> makes recovery_target_action work as the current non-empty
>> recovery_target's does. But I'm not confident that it is a good
>> design.
>>
>>>> recovery_target=immediate + r_t_action=shutdown for a standby works as
>>>> commanded. Do we need to inhibit that, too?
>>> Why something, that work as expected, should be inhibited?
>> To make sure, I don't think we should do that. I meant by the above
>> that standby mode is already accepting recovery_target_action so
>> inhibiting that only for 'latest' is not orthogonal and could be more
>> confusing for users, and complicatig the code. So my opinion is we
>> shouldn't inhibit 'latest' unless r_t_action harms.
>
> I gave it some thought and now think that prohibiting recovery_target
> 'latest' and standby was a bad idea.
> All recovery_targets follow the same pattern of usage, so
> recovery_target "latest" also must be capable of working in standby mode.
> All recovery_targets have a clear deterministic 'target' where
> recovery should end.
> In case of recovery_target "latest" this target is the end of
> available WAL, therefore the end of available WAL must be more clearly
> defined.
> I will work on it.
>
> Thank you for a feedback.

Attached new patch revision, now end of available WAL is defined as the
fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive switches
of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced after
3 failed attempts to get new data from walreceiver.

All constants are taken off the top of my head and serves as proof of
concept.
TAP test is updated.


>
>
>>
>> regards.
>>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0004-recovery_target_latest.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin

On 11/8/19 7:00 AM, Grigory Smolkin wrote:

>
> On 11/7/19 4:36 PM, Grigory Smolkin wrote:
>>
>> On 11/7/19 12:56 PM, Kyotaro Horiguchi wrote:
>>> At Thu, 7 Nov 2019 12:22:28 +0300, Grigory Smolkin
>>> <[hidden email]> wrote in
>>>> On 11/7/19 8:36 AM, Kyotaro Horiguchi wrote:
>>>>> At Thu, 7 Nov 2019 02:28:39 +0300, Grigory Smolkin
>>>>> <[hidden email]> wrote in
>>>>>> On 11/6/19 1:55 PM, Grigory Smolkin wrote:
>>>>>>> On 11/6/19 12:56 PM, Fujii Masao wrote:
>>>>>>>> What happens if this parameter is set to latest in the standby
>>>>>>>> mode?
>>>>>>>> Or the combination of those settings should be prohibited?
>>>>>>> Currently it will behave just like regular standby, so I think, to
>>>>>>> avoid confusion and keep things simple, the combination of them
>>>>>>> should
>>>>>>> be prohibited.
>>>>>>> Thank you for pointing this out, I will work on it.
>>>>>> Attached new patch revision, now it is impossible to use
>>>>>> recovery_target 'latest' in standby mode.
>>>>>> TAP test is updated to reflect this behavior.
>>>>> In the first place, latest (or anything it could be named as) is
>>>>> defined as the explit label for the default behavior. Thus the latest
>>>>> should work as if nothing is set to recovery_target* following the
>>>>> definition.  That might seems somewhat strange but I think at
>>>>> least it
>>>>> is harmless.
>>>>
>>>> Well, it was more about getting default behavior by using some
>>>> explicit recovery_target, not the other way around. Because it will
>>>> break some 3rd party backup and replication applications, that may
>>>> rely on old behavior of ignoring recovery_target_action when no
>>>> recovery_target is provided.
>>>> But you think that it is worth pursuing, I can do that.
>>> Ah. Sorry for the misleading statement. What I had in my mind was
>>> somewhat the mixture of them.  I thought that recovery_target =''
>>> behaves the same way as now, r_t_action is ignored. And 'latest' just
>>> makes recovery_target_action work as the current non-empty
>>> recovery_target's does. But I'm not confident that it is a good
>>> design.
>>>
>>>>> recovery_target=immediate + r_t_action=shutdown for a standby
>>>>> works as
>>>>> commanded. Do we need to inhibit that, too?
>>>> Why something, that work as expected, should be inhibited?
>>> To make sure, I don't think we should do that. I meant by the above
>>> that standby mode is already accepting recovery_target_action so
>>> inhibiting that only for 'latest' is not orthogonal and could be more
>>> confusing for users, and complicatig the code. So my opinion is we
>>> shouldn't inhibit 'latest' unless r_t_action harms.
>>
>> I gave it some thought and now think that prohibiting recovery_target
>> 'latest' and standby was a bad idea.
>> All recovery_targets follow the same pattern of usage, so
>> recovery_target "latest" also must be capable of working in standby
>> mode.
>> All recovery_targets have a clear deterministic 'target' where
>> recovery should end.
>> In case of recovery_target "latest" this target is the end of
>> available WAL, therefore the end of available WAL must be more
>> clearly defined.
>> I will work on it.
>>
>> Thank you for a feedback.
>
>
> Attached new patch revision, now end of available WAL is defined as
> the fact of missing required WAL.
> In case of standby, the end of WAL is defined as 2 consecutive
> switches of WAL source, that didn`t provided requested record.
> In case of streaming standby, each switch of WAL source is forced
> after 3 failed attempts to get new data from walreceiver.
>
> All constants are taken off the top of my head and serves as proof of
> concept.
> TAP test is updated.
>
Attached new revision, it contains some minor refactoring.

While working on it, I stumbled upon something strange:

why DisownLatch(&XLogCtl->recoveryWakeupLatch) is called before
ReadRecord(xlogreader, LastRec, PANIC, false) ?

Isn`t this latch may be accessed in WaitForWALToBecomeAvailable() if
streaming standby gets promoted?


>
>
>>
>>
>>>
>>> regards.
>>>
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0005-recovery_target_latest.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Kyotaro Horiguchi-4
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin <[hidden email]> wrote in
> While working on it, I stumbled upon something strange:
>
> why DisownLatch(&XLogCtl->recoveryWakeupLatch) is called before
> ReadRecord(xlogreader, LastRec, PANIC, false) ?
> Isn`t this latch may be accessed in WaitForWALToBecomeAvailable() if
> streaming standby gets promoted?

The DisownLatch is just for the sake of tidiness and can be placed
anywhere after the ShutdownWalRcv() call but the current place (just
before "StandbyMode = false") seems natural. The ReadRecord call
doesn't launch another wal receiver because we cleard StandbyMode just
before.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Kyotaro Horiguchi-4
In reply to this post by Grigory Smolkin
At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin <[hidden email]> wrote in

>
> On 11/8/19 7:00 AM, Grigory Smolkin wrote:
> >
> > On 11/7/19 4:36 PM, Grigory Smolkin wrote:
> >> I gave it some thought and now think that prohibiting recovery_target
> >> 'latest' and standby was a bad idea.
> >> All recovery_targets follow the same pattern of usage, so
> >> recovery_target "latest" also must be capable of working in standby
> >> mode.
> >> All recovery_targets have a clear deterministic 'target' where
> >> recovery should end.
> >> In case of recovery_target "latest" this target is the end of
> >> available WAL, therefore the end of available WAL must be more clearly
> >> defined.
> >> I will work on it.
> >>
> >> Thank you for a feedback.
> >
> >
> > Attached new patch revision, now end of available WAL is defined as
> > the fact of missing required WAL.
> > In case of standby, the end of WAL is defined as 2 consecutive
> > switches of WAL source, that didn`t provided requested record.
> > In case of streaming standby, each switch of WAL source is forced
> > after 3 failed attempts to get new data from walreceiver.
> >
> > All constants are taken off the top of my head and serves as proof of
> > concept.
> > TAP test is updated.
> >
> Attached new revision, it contains some minor refactoring.
Thanks for the new patch. I found that it needs more than I thought,
but it seems a bit too complicated and less stable.

As the patch does, WaitForWALToBecomeAvailable needs to exit when
avaiable sources are exhausted. However, we don't need to count
failures to do that. It is enough that the function have two more exit
point. When streaming timeout fires, and when we found that streaming
is not set up when archive/wal failed.

In my opinion, it is better that we have less dependency to global
variables in such deep levels in a call hierachy. Such nformation can
be stored in XLogPageReadPrivate.

I think The doc needs to exiplain on the difference between default
and latest.

Please find the attached, which illustrates the first two points of
the aboves.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3b766e66b9..8c8a1c6bf0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -808,6 +808,7 @@ typedef struct XLogPageReadPrivate
  int emode;
  bool fetching_ckpt; /* are we fetching a checkpoint record? */
  bool randAccess;
+ bool return_on_eow;  /* returns when reaching End of WAL */
 } XLogPageReadPrivate;
 
 /*
@@ -887,7 +888,9 @@ static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
 static int XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr,
  int reqLen, XLogRecPtr targetRecPtr, char *readBuf);
 static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
- bool fetching_ckpt, XLogRecPtr tliRecPtr);
+ bool fetching_ckpt,
+ XLogRecPtr tliRecPtr,
+ bool return_on_eow);
 static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -4253,6 +4256,7 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
  private->fetching_ckpt = fetching_ckpt;
  private->emode = emode;
  private->randAccess = (RecPtr != InvalidXLogRecPtr);
+ private->return_on_eow = (recoveryTarget == RECOVERY_TARGET_LATEST);
 
  /* This is the first attempt to read this page. */
  lastSourceFailed = false;
@@ -4371,8 +4375,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,
  continue;
  }
 
- /* In standby mode, loop back to retry. Otherwise, give up. */
- if (StandbyMode && !CheckForStandbyTrigger())
+ /*
+ * In standby mode, loop back to retry. Otherwise, give up.
+ * If we told to return on end of WAL, also give up.
+ */
+ if (StandbyMode && !CheckForStandbyTrigger() &&
+ !private->return_on_eow)
  continue;
  else
  return NULL;
@@ -7271,7 +7279,7 @@ StartupXLOG(void)
  * end of main redo apply loop
  */
 
- if (reachedStopPoint)
+ if (reachedStopPoint || recoveryTarget == RECOVERY_TARGET_LATEST)
  {
  if (!reachedConsistency)
  ereport(FATAL,
@@ -7473,6 +7481,8 @@ StartupXLOG(void)
  recoveryStopName);
  else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
  snprintf(reason, sizeof(reason), "reached consistency");
+ else if (recoveryTarget == RECOVERY_TARGET_LATEST)
+ snprintf(reason, sizeof(reason), "end of WAL");
  else
  snprintf(reason, sizeof(reason), "no recovery target specified");
 
@@ -11605,7 +11615,8 @@ retry:
  if (!WaitForWALToBecomeAvailable(targetPagePtr + reqLen,
  private->randAccess,
  private->fetching_ckpt,
- targetRecPtr))
+ targetRecPtr,
+ private->return_on_eow))
  {
  if (readFile >= 0)
  close(readFile);
@@ -11745,7 +11756,9 @@ next_record_is_invalid:
  *
  * If the record is not immediately available, the function returns false
  * if we're not in standby mode. In standby mode, waits for it to become
- * available.
+ * available looping over all sources if return_on_eow is false. Otherwise the
+ * function returns false if the record is not immediately available on all
+ * sources.
  *
  * When the requested record becomes available, the function opens the file
  * containing it (if not open already), and returns true. When end of standby
@@ -11754,7 +11767,8 @@ next_record_is_invalid:
  */
 static bool
 WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
- bool fetching_ckpt, XLogRecPtr tliRecPtr)
+ bool fetching_ckpt, XLogRecPtr tliRecPtr,
+ bool return_on_eow)
 {
  static TimestampTz last_fail_time = 0;
  TimestampTz now;
@@ -11862,6 +11876,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
  receivedUpto = 0;
  }
 
+ /*
+ * If we don't get into streaming, this is the end of WAL
+ */
+ else if (return_on_eow)
+ return false;
+
  /*
  * Move to XLOG_FROM_STREAM state in either case. We'll
  * get immediate failure if we didn't launch walreceiver,
@@ -12124,6 +12144,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
  WL_EXIT_ON_PM_DEATH,
  5000L, WAIT_EVENT_RECOVERY_WAL_ALL);
  ResetLatch(&XLogCtl->recoveryWakeupLatch);
+
+ /*
+ * This is not exactly the end of WAL, but assume the
+ * primary has no more record to send.
+ */
+ if (return_on_eow)
+ return false;
+
  break;
  }
 
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 4b3769b8b0..42367a793d 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -11637,9 +11637,11 @@ error_multiple_recovery_targets(void)
 static bool
 check_recovery_target(char **newval, void **extra, GucSource source)
 {
- if (strcmp(*newval, "immediate") != 0 && strcmp(*newval, "") != 0)
+ if (strcmp(*newval, "immediate") != 0 &&
+ strcmp(*newval, "latest") != 0 &&
+ strcmp(*newval, "") != 0)
  {
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed value is \"immediate\" and \"latest\".");
  return false;
  }
  return true;
@@ -11649,13 +11651,16 @@ static void
 assign_recovery_target(const char *newval, void *extra)
 {
  if (recoveryTarget != RECOVERY_TARGET_UNSET &&
- recoveryTarget != RECOVERY_TARGET_IMMEDIATE)
+ recoveryTarget != RECOVERY_TARGET_IMMEDIATE &&
+ recoveryTarget != RECOVERY_TARGET_LATEST)
  error_multiple_recovery_targets();
 
- if (newval && strcmp(newval, "") != 0)
+ if (!newval || strcmp(newval, "") == 0)
+ recoveryTarget = RECOVERY_TARGET_UNSET;
+ else if(strcmp(newval, "immediate") == 0)
  recoveryTarget = RECOVERY_TARGET_IMMEDIATE;
  else
- recoveryTarget = RECOVERY_TARGET_UNSET;
+ recoveryTarget = RECOVERY_TARGET_LATEST;
 }
 
 static bool
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index d519252aad..55e7e4bd2c 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -84,7 +84,8 @@ typedef enum
  RECOVERY_TARGET_TIME,
  RECOVERY_TARGET_NAME,
  RECOVERY_TARGET_LSN,
- RECOVERY_TARGET_IMMEDIATE
+ RECOVERY_TARGET_IMMEDIATE,
+ RECOVERY_TARGET_LATEST
 } RecoveryTargetType;
 
 /*
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Peter Eisentraut-6
In reply to this post by Grigory Smolkin
On 2019-11-08 05:00, Grigory Smolkin wrote:
> Attached new patch revision, now end of available WAL is defined as the
> fact of missing required WAL.
> In case of standby, the end of WAL is defined as 2 consecutive switches
> of WAL source, that didn`t provided requested record.
> In case of streaming standby, each switch of WAL source is forced after
> 3 failed attempts to get new data from walreceiver.
>
> All constants are taken off the top of my head and serves as proof of
> concept.

Well, this is now changing the meaning of the patch quite a bit.  I'm on
board with making the existing default behavior explicit.  (This is
similar to how we added recovery_target_timeline = 'current' in PG12.)
Still not a fan of the name yet, but that's trivial.

If, however, you want to change the default behavior or introduce a new
behavior, as you are suggesting here, that should be a separate discussion.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Grigory Smolkin
On 11/21/19 1:46 PM, Peter Eisentraut wrote:
On 2019-11-08 05:00, Grigory Smolkin wrote:
Attached new patch revision, now end of available WAL is defined as the
fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive switches
of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced after
3 failed attempts to get new data from walreceiver.

All constants are taken off the top of my head and serves as proof of
concept.

Well, this is now changing the meaning of the patch quite a bit.  I'm on board with making the existing default behavior explicit.  (This is similar to how we added recovery_target_timeline = 'current' in PG12.) Still not a fan of the name yet, but that's trivial.

If, however, you want to change the default behavior or introduce a new behavior, as you are suggesting here, that should be a separate discussion.

No, default behavior is not to be changed. As I previously mentioned, it may break the backward compatibility for 3rd party backup and replication applications.

At Fri, 8 Nov 2019 16:08:47 +0300, Grigory Smolkin [hidden email] wrote in 
On 11/8/19 7:00 AM, Grigory Smolkin wrote:
On 11/7/19 4:36 PM, Grigory Smolkin wrote:
I gave it some thought and now think that prohibiting recovery_target
'latest' and standby was a bad idea.
All recovery_targets follow the same pattern of usage, so
recovery_target "latest" also must be capable of working in standby
mode.
All recovery_targets have a clear deterministic 'target' where
recovery should end.
In case of recovery_target "latest" this target is the end of
available WAL, therefore the end of available WAL must be more clearly
defined.
I will work on it.

Thank you for a feedback.
Attached new patch revision, now end of available WAL is defined as
the fact of missing required WAL.
In case of standby, the end of WAL is defined as 2 consecutive
switches of WAL source, that didn`t provided requested record.
In case of streaming standby, each switch of WAL source is forced
after 3 failed attempts to get new data from walreceiver.

All constants are taken off the top of my head and serves as proof of
concept.
TAP test is updated.

Attached new revision, it contains some minor refactoring.
Thanks for the new patch. I found that it needs more than I thought,
but it seems a bit too complicated and less stable.

As the patch does, WaitForWALToBecomeAvailable needs to exit when
avaiable sources are exhausted. However, we don't need to count
failures to do that. It is enough that the function have two more exit
point. When streaming timeout fires, and when we found that streaming
is not set up when archive/wal failed.

In my opinion, it is better that we have less dependency to global
variables in such deep levels in a call hierachy. Such nformation can
be stored in XLogPageReadPrivate.

Many thanks!
It looks much better that my approach with global variables.

I`ve tested it and have some thoughts/concerns:

1. Recovery should report the exact reason why it has been forced to stop. In case of recovering to the end of WAL, standby promotion request received during recovery could be mistaken for reaching the end of WAL and reported as such. To avoid this, I think that reachedEndOfWal variable should be introduced.

In attached patch it is added as a global variable, but maybe something more clever may be devised. I was not sure that reachedEndOfWal could be placed in XLogPageReadPrivate. Because we need to access it at the higher level than ReadRecord(), and I was under impression placing it in XLogPageReadPrivate could violate abstraction level of XLogPageReadPrivate.

2. During the testing, I`ve stumbled upon assertion failure in case of recovering in standby mode to the the end of WAL coupled with recovery_target_action as "promote", caused by the WAL source in state machine not been changed after reaching the recovery target (script to reproduce is attached):

2019-12-07 13:45:54.468 MSK [23067] LOG:  starting PostgreSQL 13devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1), 64-bit
2019-12-07 13:45:54.468 MSK [23067] LOG:  listening on IPv4 address "127.0.0.1", port 15433
2019-12-07 13:45:54.470 MSK [23067] LOG:  listening on Unix socket "/tmp/.s.PGSQL.15433"
2019-12-07 13:45:54.475 MSK [23068] LOG:  database system was interrupted; last known up at 2019-12-07 13:45:49 MSK
cp: cannot stat '/home/gsmol/task/13_devel/archive/00000002.history': No such file or directory
2019-12-07 13:45:54.602 MSK [23068] LOG:  entering standby mode
2019-12-07 13:45:54.614 MSK [23068] LOG:  restored log file "000000010000000000000002" from archive
2019-12-07 13:45:54.679 MSK [23068] LOG:  redo starts at 0/2000028
2019-12-07 13:45:54.682 MSK [23068] LOG:  consistent recovery state reached at 0/2000100
2019-12-07 13:45:54.682 MSK [23067] LOG:  database system is ready to accept read only connections
2019-12-07 13:45:54.711 MSK [23068] LOG:  restored log file "000000010000000000000003" from archive
2019-12-07 13:45:54.891 MSK [23068] LOG:  restored log file "000000010000000000000004" from archive
2019-12-07 13:45:55.046 MSK [23068] LOG:  restored log file "000000010000000000000005" from archive
2019-12-07 13:45:55.210 MSK [23068] LOG:  restored log file "000000010000000000000006" from archive
2019-12-07 13:45:55.377 MSK [23068] LOG:  restored log file "000000010000000000000007" from archive
2019-12-07 13:45:55.566 MSK [23068] LOG:  restored log file "000000010000000000000008" from archive
2019-12-07 13:45:55.737 MSK [23068] LOG:  restored log file "000000010000000000000009" from archive
cp: cannot stat '/home/gsmol/task/13_devel/archive/00000001000000000000000A': No such file or directory
2019-12-07 13:45:56.233 MSK [23083] LOG:  started streaming WAL from primary at 0/A000000 on timeline 1
2019-12-07 13:45:56.365 MSK [23068] LOG:  recovery stopping after reaching the end of available WAL
2019-12-07 13:45:56.365 MSK [23068] LOG:  redo done at 0/9FFC670
2019-12-07 13:45:56.365 MSK [23068] LOG:  last completed transaction was at log time 2019-12-07 13:45:53.627746+03
2019-12-07 13:45:56.365 MSK [23083] FATAL:  terminating walreceiver process due to administrator command
TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12032)
postgres: startup   waiting for 00000001000000000000000A(ExceptionalCondition+0xa8)[0xa88b55]
postgres: startup   waiting for 00000001000000000000000A[0x573417]
postgres: startup   waiting for 00000001000000000000000A[0x572b68]
postgres: startup   waiting for 00000001000000000000000A[0x579066]
postgres: startup   waiting for 00000001000000000000000A(XLogReadRecord+0xe3)[0x5788ac]
postgres: startup   waiting for 00000001000000000000000A[0x5651f8]
postgres: startup   waiting for 00000001000000000000000A(StartupXLOG+0x23aa)[0x56b26e]
postgres: startup   waiting for 00000001000000000000000A(StartupProcessMain+0xc7)[0x8642a1]
postgres: startup   waiting for 00000001000000000000000A(AuxiliaryProcessMain+0x5b8)[0x5802ad]
postgres: startup   waiting for 00000001000000000000000A[0x863175]
postgres: startup   waiting for 00000001000000000000000A(PostmasterMain+0x1214)[0x85e0a4]
postgres: startup   waiting for 00000001000000000000000A[0x76f247]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f8867958f33]
postgres: startup   waiting for 00000001000000000000000A(_start+0x2e)[0x47afee]

#0  0x00007f886796ce75 in raise () from /lib64/libc.so.6
#1  0x00007f8867957895 in abort () from /lib64/libc.so.6
#2  0x0000000000a88b82 in ExceptionalCondition (conditionName=0xb24acc "StandbyMode", errorType=0xb208a7 "FailedAssertion",
    fileName=0xb208a0 "xlog.c", lineNumber=12032) at assert.c:67
#3  0x0000000000573417 in WaitForWALToBecomeAvailable (RecPtr=151003136, randAccess=true, fetching_ckpt=false, tliRecPtr=167757424,
    return_on_eow=true) at xlog.c:12032
#4  0x0000000000572b68 in XLogPageRead (xlogreader=0xf08ed8, targetPagePtr=150994944, reqLen=8192, targetRecPtr=167757424,
    readBuf=0xf37d50 "\002\321\005") at xlog.c:11611
#5  0x0000000000579066 in ReadPageInternal (state=0xf08ed8, pageptr=167755776, reqLen=1672) at xlogreader.c:579
#6  0x00000000005788ac in XLogReadRecord (state=0xf08ed8, RecPtr=167757424, errormsg=0x7fff1f5cdeb8) at xlogreader.c:300
#7  0x00000000005651f8 in ReadRecord (xlogreader=0xf08ed8, RecPtr=167757424, emode=22, fetching_ckpt=false) at xlog.c:4271
#8  0x000000000056b26e in StartupXLOG () at xlog.c:7373
#9  0x00000000008642a1 in StartupProcessMain () at startup.c:196
#10 0x00000000005802ad in AuxiliaryProcessMain (argc=2, argv=0x7fff1f5cea80) at bootstrap.c:451
#11 0x0000000000863175 in StartChildProcess (type=StartupProcess) at postmaster.c:5461
#12 0x000000000085e0a4 in PostmasterMain (argc=3, argv=0xf082b0) at postmaster.c:1392
#13 0x000000000076f247 in main (argc=3, argv=0xf082b0) at main.c:210
(gdb) frame 8
#8  0x000000000056b26e in StartupXLOG () at xlog.c:7373
7373            record = ReadRecord(xlogreader, LastRec, PANIC, false);
(gdb) p StandbyMode
$1 = false
(gdb) list
7368
7369            /*
7370             * Re-fetch the last valid or last applied record, so we can identify the
7371             * exact endpoint of what we consider the valid portion of WAL.
7372             */
7373            record = ReadRecord(xlogreader, LastRec, PANIC, false);
7374            EndOfLog = EndRecPtr;
7375
7376            /*
7377             * EndOfLogTLI is the TLI in the filename of the XLOG segment containing


Both issues are fixed in the new patch version.
Any review and thoughts on the matters would be much appreciated.



I think The doc needs to exiplain on the difference between default
and latest.
Sure, I will work on it.

Please find the attached, which illustrates the first two points of
the aboves.

regards.


-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

recovery_target_latest.sh (1K) Download Attachment
target_latest_PoC_v2.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [proposal] recovery_target "latest"

Kyotaro Horiguchi-4
Thanks for the new version.

At Sun, 8 Dec 2019 04:03:01 +0300, Grigory Smolkin <[hidden email]> wrote in

> On 11/21/19 1:46 PM, Peter Eisentraut wrote:
> > On 2019-11-08 05:00, Grigory Smolkin wrote:
> I`ve tested it and have some thoughts/concerns:
>
> 1. Recovery should report the exact reason why it has been forced to
> stop. In case of recovering to the end of WAL, standby promotion
> request received during recovery could be mistaken for reaching the
> end of WAL and reported as such. To avoid this, I think that
> reachedEndOfWal variable should be introduced.
>
> In attached patch it is added as a global variable, but maybe
> something more clever may be devised. I was not sure that
> reachedEndOfWal could be placed in XLogPageReadPrivate. Because we
> need to access it at the higher level than ReadRecord(), and I was
> under impression placing it in XLogPageReadPrivate could violate
> abstraction level of XLogPageReadPrivate.

CheckForStandbyTrigger() always returns true once the trigger is
pulled. We don't care whether end-of-WAL is reached if promote is
already triggered. Thus, we can tell the promote case by asking
CheckForStandbyTrigger() when we exited the redo main loop with
recoveryTarget = RECOVERY_TARGET_LATEST. Is this works as you expect?

> 2. During the testing, I`ve stumbled upon assertion failure in case of
> recovering in standby mode to the the end of WAL coupled with
> recovery_target_action as "promote", caused by the WAL source in state
> machine not been changed after reaching the recovery target (script to
> reproduce is attached):
...
> TRAP: FailedAssertion("StandbyMode", File: "xlog.c", Line: 12032)
...
> #2  0x0000000000a88b82 in ExceptionalCondition (conditionName=0xb24acc
> #"StandbyMode", errorType=0xb208a7 "FailedAssertion",
>     fileName=0xb208a0 "xlog.c", lineNumber=12032) at assert.c:67
> #3  0x0000000000573417 in WaitForWALToBecomeAvailable
> #(RecPtr=151003136, randAccess=true, fetching_ckpt=false,
> #tliRecPtr=167757424,
>     return_on_eow=true) at xlog.c:12032
...
> #7  0x00000000005651f8 in ReadRecord (xlogreader=0xf08ed8,
> #RecPtr=167757424, emode=22, fetching_ckpt=false) at xlog.c:4271
..

ReadRecord is called with currentSource=STREAM after StandbyMode was
turned off. I suppose the fix means the "currentSource =
XLOG_FROM_PG_WAL" line but I don't think it not the right way.

Streaming timeout means failure when return_on_eow is true. Thus the
right thing to do there is setting lastSourceFailed to true. The first
half of WaitForWALToBecomeAvailable handles failure of the current
source thus source transition happens only there. The second half just
reports failure to the first half.

> Both issues are fixed in the new patch version.
> Any review and thoughts on the matters would be much appreciated.
>
>
> >
> > I think The doc needs to exiplain on the difference between default
> > and latest.
> Sure, I will work on it.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center