outdated references to replication timeout

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

outdated references to replication timeout

John Naylor-3
Hi,

The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references.

--

stray-repl-timeout.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: outdated references to replication timeout

Fujii Masao-2
On Wed, Jan 13, 2021 at 5:39 AM John Naylor
<[hidden email]> wrote:
>
> Hi,
>
> The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references.

Thanks for the patch! I think this change makes sense.

-                               (errmsg("terminating walsender process
due to replication timeout")));
+                               (errmsg("terminating walsender process
due to WAL sender timeout")));

Isn't it a bit strange to include different expressions "walsender" and
"WAL sender" for the same thing in one message?


This is a bit related, but different topic, though. If we change the above
message about walsender timeout, I also want to change the message about
walreceiver timeout, so as to make them more consistent. For example,

- (errmsg("terminating walreceiver due to timeout")));
+ (errmsg("terminating WAL receiver process due to WAL receiver timeout")));

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: outdated references to replication timeout

John Naylor-3

On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <[hidden email]> wrote:

>
> Thanks for the patch! I think this change makes sense.
>
> -                               (errmsg("terminating walsender process
> due to replication timeout")));
> +                               (errmsg("terminating walsender process
> due to WAL sender timeout")));
>
> Isn't it a bit strange to include different expressions "walsender" and
> "WAL sender" for the same thing in one message?

It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'll take a broader look and send an updated patch.

--
John Naylor
EDB: http://www.enterprisedb.com
Reply | Threaded
Open this post in threaded view
|

Re: outdated references to replication timeout

Fujii Masao-2
On Wed, Jan 13, 2021 at 10:51 PM John Naylor
<[hidden email]> wrote:

>
>
> On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <[hidden email]> wrote:
> >
> > Thanks for the patch! I think this change makes sense.
> >
> > -                               (errmsg("terminating walsender process
> > due to replication timeout")));
> > +                               (errmsg("terminating walsender process
> > due to WAL sender timeout")));
> >
> > Isn't it a bit strange to include different expressions "walsender" and
> > "WAL sender" for the same thing in one message?
>
> It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores?

I'm ok with that. But there seems no other timeout messages using
the parameter name.

src/backend/replication/logical/worker.c: (errmsg("terminating logical
replication worker due to timeout")));
src/backend/replication/walreceiver.c: (errmsg("terminating
walreceiver due to timeout")));
src/backend/replication/walsender.c: (errmsg("terminating walsender
process due to replication timeout")));
src/backend/tcop/postgres.c: errmsg("canceling authentication due to
timeout")));
src/backend/tcop/postgres.c: errmsg("canceling statement due to lock
timeout")));
src/backend/tcop/postgres.c: errmsg("canceling statement due to
statement timeout")));
src/backend/tcop/postgres.c: errmsg("terminating connection due to
idle-in-transaction timeout")));
src/backend/tcop/postgres.c: errmsg("terminating connection due to
idle-session timeout")));
src/backend/utils/misc/timeout.c: errmsg("cannot add more timeout reasons")));


> I'll take a broader look and send an updated patch.

Thanks!

Regards,

--
Fujii Masao


Reply | Threaded
Open this post in threaded view
|

Re: outdated references to replication timeout

Michael Paquier-2
On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:

> On Wed, Jan 13, 2021 at 10:51 PM John Naylor <[hidden email]> wrote:
>> It is strange, now that I think about it. My thinking was that the
>> former wording of "replication timeout" was a less literal
>> reference to the replication_timeout parameter, so I did the same
>> for wal_sender_timeout. A quick look shows we are not consistent
>> in the documentation as far as walsender vs. WAL sender. For the
>> purpose of the patch I agree it should be consistent within a
>> single message. Maybe the parameter should be spelled exactly as
>> is, with underscores?
>
> I'm ok with that. But there seems no other timeout messages using
> the parameter name.
Could it be that nothing needs to change here because this refers to
timeouts with the replication protocol?  The current state of things
on HEAD does not sound completely incorrect to me either.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: outdated references to replication timeout

John Naylor-3


On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier <[hidden email]> wrote:

>
> On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <[hidden email]> wrote:
> >> It is strange, now that I think about it. My thinking was that the
> >> former wording of "replication timeout" was a less literal
> >> reference to the replication_timeout parameter, so I did the same
> >> for wal_sender_timeout. A quick look shows we are not consistent
> >> in the documentation as far as walsender vs. WAL sender. For the
> >> purpose of the patch I agree it should be consistent within a
> >> single message. Maybe the parameter should be spelled exactly as
> >> is, with underscores?
> >
> > I'm ok with that. But there seems no other timeout messages using
> > the parameter name.
>
> Could it be that nothing needs to change here because this refers to
> timeouts with the replication protocol?  The current state of things
> on HEAD does not sound completely incorrect to me either.

It was off enough to cause brief confusion during a customer emergency. To show a bit more context around one of the proposed corrections:

 timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
 wal_sender_timeout);

if (wal_sender_timeout > 0 && last_processing >= timeout)
{
/*
* Since typically expiration of replication timeout means
* communication problem, we don't send the error message to the
* standby.
*/
ereport(COMMERROR,
(errmsg("terminating walsender process due to replication timeout")));

My reading is, this is a case where the message didn't keep up with the change in parameter name.

--
John Naylor
EDB: http://www.enterprisedb.com