archive_command / pg_stat_archiver & documentation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

archive_command / pg_stat_archiver & documentation

talk to ben
Hi,

The documentation describes how a return code > 125 on the restore_command would prevent the server from starting [1] :

"
It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
"

But, I dont see such a note on the archive_command side of thing. [2]

It could happend in case the archive command is not checked beforehand  or if the archive command becomes unavailable while PostgreSQL is running. rsync can also return 255 in some cases (bad ssh configuration or typos). In this case a fatal error is emitted, the archiver stops and is restarted by the postmaster.

The view pg_stat_archiver is also not updated in this case. Is it on purpose ? It could be problematic if someone uses it to check the archiver process health.

Should we document this ? (I can make a patch)

regards,
Benoit

Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
Hi,

On Wed, Feb 24, 2021 at 8:21 PM talk to ben <[hidden email]> wrote:

>
> The documentation describes how a return code > 125 on the restore_command would prevent the server from starting [1] :
>
> "
> It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
> "
>
> But, I dont see such a note on the archive_command side of thing. [2]
>
> It could happend in case the archive command is not checked beforehand  or if the archive command becomes unavailable while PostgreSQL is running. rsync can also return 255 in some cases (bad ssh configuration or typos). In this case a fatal error is emitted, the archiver stops and is restarted by the postmaster.
>
> The view pg_stat_archiver is also not updated in this case. Is it on purpose ? It could be problematic if someone uses it to check the archiver process health.

That's on purpose, see for instance that discussion:
https://www.postgresql.org/message-id/flat/55731BB8.1050605%40dalibo.com

> Should we document this ? (I can make a patch)

I thought that this behavior was documented, especially for the lack
of update of pg_stat_archiver.  If it's not the case then we should
definitely fix that!


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Benoit Lobréau
Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud <[hidden email]> a écrit :
Hi,

On Wed, Feb 24, 2021 at 8:21 PM talk to ben <[hidden email]> wrote:
>
> The documentation describes how a return code > 125 on the restore_command would prevent the server from starting [1] :
>
> "
> It is important that the command return nonzero exit status on failure. The command will be called requesting files that are not present in the archive; it must return nonzero when so asked. This is not an error condition. An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up.
> "
>
> But, I dont see such a note on the archive_command side of thing. [2]
>
> It could happend in case the archive command is not checked beforehand  or if the archive command becomes unavailable while PostgreSQL is running. rsync can also return 255 in some cases (bad ssh configuration or typos). In this case a fatal error is emitted, the archiver stops and is restarted by the postmaster.
>
> The view pg_stat_archiver is also not updated in this case. Is it on purpose ? It could be problematic if someone uses it to check the archiver process health.

That's on purpose, see for instance that discussion:
https://www.postgresql.org/message-id/flat/55731BB8.1050605%40dalibo.com

Thanks for pointing that out, I should have checked.
 
> Should we document this ? (I can make a patch)

I thought that this behavior was documented, especially for the lack
of update of pg_stat_archiver.  If it's not the case then we should
definitely fix that!

I tried to do it in the attached patch.
Building the doc worked fine on my computer.

0001-Document-archive_command-failures-in-more-details.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
On Thu, Feb 25, 2021 at 7:25 PM Benoit Lobréau <[hidden email]> wrote:
>
> Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud <[hidden email]> a écrit :
>>
>> I thought that this behavior was documented, especially for the lack
>> of update of pg_stat_archiver.  If it's not the case then we should
>> definitely fix that!
>
> I tried to do it in the attached patch.
> Building the doc worked fine on my computer.

Great, thanks!  Can you register it in the next commitfest to make
sure it won't be forgotten?


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Benoit Lobréau

Le jeu. 25 févr. 2021 à 15:34, Julien Rouhaud <[hidden email]> a écrit :
On Thu, Feb 25, 2021 at 7:25 PM Benoit Lobréau <[hidden email]> wrote:
>
> Le mer. 24 févr. 2021 à 14:52, Julien Rouhaud <[hidden email]> a écrit :
>>
>> I thought that this behavior was documented, especially for the lack
>> of update of pg_stat_archiver.  If it's not the case then we should
>> definitely fix that!
>
> I tried to do it in the attached patch.
> Building the doc worked fine on my computer.

Great, thanks!  Can you register it in the next commitfest to make
sure it won't be forgotten?
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Michael Paquier-2
On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote:
> Done here : https://commitfest.postgresql.org/32/3012/

Documenting that properly for the archive command, as already done for
restore_command, sounds good to me.  I am not sure that there is much
point in doing a cross-reference to the archiving section for one
specific field of pg_stat_archiver.

For the second paragraph, I would recommend to move that to a
different <para> to outline this special case, leading to the
attached.

What do you think?
--
Michael

archiver-doc-v2.patch (963 bytes) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
On Mon, Mar 1, 2021 at 3:36 PM Michael Paquier <[hidden email]> wrote:
>
> On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote:
> > Done here : https://commitfest.postgresql.org/32/3012/
>
> Documenting that properly for the archive command, as already done for
> restore_command, sounds good to me.  I am not sure that there is much
> point in doing a cross-reference to the archiving section for one
> specific field of pg_stat_archiver.

Agreed.

> For the second paragraph, I would recommend to move that to a
> different <para> to outline this special case, leading to the
> attached.

+1

> What do you think?

LGTM!


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Benoit Lobréau
In reply to this post by Michael Paquier-2

Le lun. 1 mars 2021 à 08:36, Michael Paquier <[hidden email]> a écrit :
On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote:
> Done here : https://commitfest.postgresql.org/32/3012/

Documenting that properly for the archive command, as already done for
restore_command, sounds good to me.  I am not sure that there is much
point in doing a cross-reference to the archiving section for one
specific field of pg_stat_archiver.

I wanted to add a warning that using pg_stat_archiver to monitor the good health of the
archiver comes with a caveat in the view documentation itself. But couldn't find a concise
way to do it. So I added a link.

If you think it's unnecessary, that's ok.
 
For the second paragraph, I would recommend to move that to a
different <para> to outline this special case, leading to the
attached.

Good.
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
On Mon, Mar 1, 2021 at 4:33 PM Benoit Lobréau <[hidden email]> wrote:

>
> Le lun. 1 mars 2021 à 08:36, Michael Paquier <[hidden email]> a écrit :
>>
>> On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote:
>> > Done here : https://commitfest.postgresql.org/32/3012/
>>
>> Documenting that properly for the archive command, as already done for
>> restore_command, sounds good to me.  I am not sure that there is much
>> point in doing a cross-reference to the archiving section for one
>> specific field of pg_stat_archiver.
>
>
> I wanted to add a warning that using pg_stat_archiver to monitor the good health of the
> archiver comes with a caveat in the view documentation itself. But couldn't find a concise
> way to do it. So I added a link.
>
> If you think it's unnecessary, that's ok.

Maybe this can be better addressed than with a link in the
documentation.  The final outcome is that it can be difficult to
monitor the archiver state in such case.  That's orthogonal to this
patch but maybe we can add a new "archiver_start" timestamptz column
in pg_stat_archiver, so monitoring tools can detect a problem if it's
too far away from pg_postmaster_start_time() for instance?


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Benoit Lobréau
I like the idea !

If it's not too complicated, I'd like to take a stab at it.

Le lun. 1 mars 2021 à 10:16, Julien Rouhaud <[hidden email]> a écrit :
On Mon, Mar 1, 2021 at 4:33 PM Benoit Lobréau <[hidden email]> wrote:
>
> Le lun. 1 mars 2021 à 08:36, Michael Paquier <[hidden email]> a écrit :
>>
>> On Fri, Feb 26, 2021 at 10:03:05AM +0100, Benoit Lobréau wrote:
>> > Done here : https://commitfest.postgresql.org/32/3012/
>>
>> Documenting that properly for the archive command, as already done for
>> restore_command, sounds good to me.  I am not sure that there is much
>> point in doing a cross-reference to the archiving section for one
>> specific field of pg_stat_archiver.
>
>
> I wanted to add a warning that using pg_stat_archiver to monitor the good health of the
> archiver comes with a caveat in the view documentation itself. But couldn't find a concise
> way to do it. So I added a link.
>
> If you think it's unnecessary, that's ok.

Maybe this can be better addressed than with a link in the
documentation.  The final outcome is that it can be difficult to
monitor the archiver state in such case.  That's orthogonal to this
patch but maybe we can add a new "archiver_start" timestamptz column
in pg_stat_archiver, so monitoring tools can detect a problem if it's
too far away from pg_postmaster_start_time() for instance?
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
On Mon, Mar 1, 2021 at 5:24 PM Benoit Lobréau <[hidden email]> wrote:
>
> I like the idea !
>
> If it's not too complicated, I'd like to take a stab at it.

Great!  And it shouldn't be too complicated.  Note that unfortunately
this will likely not be included in pg14 as the last commitfest should
begin today.


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Michael Paquier-2
In reply to this post by Julien Rouhaud
On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote:
> Maybe this can be better addressed than with a link in the
> documentation.  The final outcome is that it can be difficult to
> monitor the archiver state in such case.  That's orthogonal to this
> patch but maybe we can add a new "archiver_start" timestamptz column
> in pg_stat_archiver, so monitoring tools can detect a problem if it's
> too far away from pg_postmaster_start_time() for instance?

There may be other solutions as well.  I have applied the doc patch
for now.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
On Tue, Mar 2, 2021 at 9:29 AM Michael Paquier <[hidden email]> wrote:

>
> On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote:
> > Maybe this can be better addressed than with a link in the
> > documentation.  The final outcome is that it can be difficult to
> > monitor the archiver state in such case.  That's orthogonal to this
> > patch but maybe we can add a new "archiver_start" timestamptz column
> > in pg_stat_archiver, so monitoring tools can detect a problem if it's
> > too far away from pg_postmaster_start_time() for instance?
>
> There may be other solutions as well.  I have applied the doc patch
> for now.

Thanks!


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Benoit Lobréau
Thanks !

Le mar. 2 mars 2021 à 04:10, Julien Rouhaud <[hidden email]> a écrit :
On Tue, Mar 2, 2021 at 9:29 AM Michael Paquier <[hidden email]> wrote:
>
> On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote:
> > Maybe this can be better addressed than with a link in the
> > documentation.  The final outcome is that it can be difficult to
> > monitor the archiver state in such case.  That's orthogonal to this
> > patch but maybe we can add a new "archiver_start" timestamptz column
> > in pg_stat_archiver, so monitoring tools can detect a problem if it's
> > too far away from pg_postmaster_start_time() for instance?
>
> There may be other solutions as well.  I have applied the doc patch
> for now.

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

David Steele
In reply to this post by Michael Paquier-2
On 3/1/21 8:29 PM, Michael Paquier wrote:

> On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote:
>> Maybe this can be better addressed than with a link in the
>> documentation.  The final outcome is that it can be difficult to
>> monitor the archiver state in such case.  That's orthogonal to this
>> patch but maybe we can add a new "archiver_start" timestamptz column
>> in pg_stat_archiver, so monitoring tools can detect a problem if it's
>> too far away from pg_postmaster_start_time() for instance?
>
> There may be other solutions as well.  I have applied the doc patch
> for now.

This was applied (except for a small part). Should we now consider this
committed?

If not, can we get a new patch for the remaining changes?

Regards,
--
-David
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Julien Rouhaud
On Wed, Mar 03, 2021 at 07:37:02AM -0500, David Steele wrote:

> On 3/1/21 8:29 PM, Michael Paquier wrote:
> > On Mon, Mar 01, 2021 at 05:17:06PM +0800, Julien Rouhaud wrote:
> > > Maybe this can be better addressed than with a link in the
> > > documentation.  The final outcome is that it can be difficult to
> > > monitor the archiver state in such case.  That's orthogonal to this
> > > patch but maybe we can add a new "archiver_start" timestamptz column
> > > in pg_stat_archiver, so monitoring tools can detect a problem if it's
> > > too far away from pg_postmaster_start_time() for instance?
> >
> > There may be other solutions as well.  I have applied the doc patch
> > for now.
>
> This was applied (except for a small part). Should we now consider this
> committed?
>

I think that we should consider this as committed.


Reply | Threaded
Open this post in threaded view
|

Re: archive_command / pg_stat_archiver & documentation

Michael Paquier-2
On Wed, Mar 03, 2021 at 09:13:09PM +0800, Julien Rouhaud wrote:
> I think that we should consider this as committed.

It should, so done now.
--
Michael

signature.asc (849 bytes) Download Attachment