Hi,
The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references. |
On Wed, Jan 13, 2021 at 5:39 AM John Naylor
<[hidden email]> wrote: > > Hi, > > The parameter replication_timeout was retired in commit 6f60fdd701 back in 2012, but some comments and error messages seem to refer to that old setting instead of wal_sender_timeout or wal_receiver_timeout. The attached patch replaces the old language with more specific references. Thanks for the patch! I think this change makes sense. - (errmsg("terminating walsender process due to replication timeout"))); + (errmsg("terminating walsender process due to WAL sender timeout"))); Isn't it a bit strange to include different expressions "walsender" and "WAL sender" for the same thing in one message? This is a bit related, but different topic, though. If we change the above message about walsender timeout, I also want to change the message about walreceiver timeout, so as to make them more consistent. For example, - (errmsg("terminating walreceiver due to timeout"))); + (errmsg("terminating WAL receiver process due to WAL receiver timeout"))); Regards, -- Fujii Masao |
On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <[hidden email]> wrote: > > Thanks for the patch! I think this change makes sense. > > - (errmsg("terminating walsender process > due to replication timeout"))); > + (errmsg("terminating walsender process > due to WAL sender timeout"))); > > Isn't it a bit strange to include different expressions "walsender" and > "WAL sender" for the same thing in one message? It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'll take a broader look and send an updated patch. -- John Naylor EDB: http://www.enterprisedb.com |
On Wed, Jan 13, 2021 at 10:51 PM John Naylor
<[hidden email]> wrote: > > > On Tue, Jan 12, 2021 at 9:37 PM Fujii Masao <[hidden email]> wrote: > > > > Thanks for the patch! I think this change makes sense. > > > > - (errmsg("terminating walsender process > > due to replication timeout"))); > > + (errmsg("terminating walsender process > > due to WAL sender timeout"))); > > > > Isn't it a bit strange to include different expressions "walsender" and > > "WAL sender" for the same thing in one message? > > It is strange, now that I think about it. My thinking was that the former wording of "replication timeout" was a less literal reference to the replication_timeout parameter, so I did the same for wal_sender_timeout. A quick look shows we are not consistent in the documentation as far as walsender vs. WAL sender. For the purpose of the patch I agree it should be consistent within a single message. Maybe the parameter should be spelled exactly as is, with underscores? I'm ok with that. But there seems no other timeout messages using the parameter name. src/backend/replication/logical/worker.c: (errmsg("terminating logical replication worker due to timeout"))); src/backend/replication/walreceiver.c: (errmsg("terminating walreceiver due to timeout"))); src/backend/replication/walsender.c: (errmsg("terminating walsender process due to replication timeout"))); src/backend/tcop/postgres.c: errmsg("canceling authentication due to timeout"))); src/backend/tcop/postgres.c: errmsg("canceling statement due to lock timeout"))); src/backend/tcop/postgres.c: errmsg("canceling statement due to statement timeout"))); src/backend/tcop/postgres.c: errmsg("terminating connection due to idle-in-transaction timeout"))); src/backend/tcop/postgres.c: errmsg("terminating connection due to idle-session timeout"))); src/backend/utils/misc/timeout.c: errmsg("cannot add more timeout reasons"))); > I'll take a broader look and send an updated patch. Thanks! Regards, -- Fujii Masao |
On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote:
> On Wed, Jan 13, 2021 at 10:51 PM John Naylor <[hidden email]> wrote: >> It is strange, now that I think about it. My thinking was that the >> former wording of "replication timeout" was a less literal >> reference to the replication_timeout parameter, so I did the same >> for wal_sender_timeout. A quick look shows we are not consistent >> in the documentation as far as walsender vs. WAL sender. For the >> purpose of the patch I agree it should be consistent within a >> single message. Maybe the parameter should be spelled exactly as >> is, with underscores? > > I'm ok with that. But there seems no other timeout messages using > the parameter name. timeouts with the replication protocol? The current state of things on HEAD does not sound completely incorrect to me either. -- Michael |
On Thu, Jan 14, 2021 at 1:55 AM Michael Paquier <[hidden email]> wrote: > > On Wed, Jan 13, 2021 at 11:28:55PM +0900, Fujii Masao wrote: > > On Wed, Jan 13, 2021 at 10:51 PM John Naylor <[hidden email]> wrote: > >> It is strange, now that I think about it. My thinking was that the > >> former wording of "replication timeout" was a less literal > >> reference to the replication_timeout parameter, so I did the same > >> for wal_sender_timeout. A quick look shows we are not consistent > >> in the documentation as far as walsender vs. WAL sender. For the > >> purpose of the patch I agree it should be consistent within a > >> single message. Maybe the parameter should be spelled exactly as > >> is, with underscores? > > > > I'm ok with that. But there seems no other timeout messages using > > the parameter name. > > Could it be that nothing needs to change here because this refers to > timeouts with the replication protocol? The current state of things > on HEAD does not sound completely incorrect to me either. It was off enough to cause brief confusion during a customer emergency. To show a bit more context around one of the proposed corrections: timeout = TimestampTzPlusMilliseconds(last_reply_timestamp, wal_sender_timeout); if (wal_sender_timeout > 0 && last_processing >= timeout) { /* * Since typically expiration of replication timeout means * communication problem, we don't send the error message to the * standby. */ ereport(COMMERROR, (errmsg("terminating walsender process due to replication timeout"))); My reading is, this is a case where the message didn't keep up with the change in parameter name. -- John Naylor EDB: http://www.enterprisedb.com |
Free forum by Nabble | Edit this page |