Changing the setting of wal_sender_timeout per standby

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
Hello,

What do you think about changing wal_sender_timeout from PGC_HUP to PGC_BACKEND or PGC_USERSET?

Some customer wants to change the setting per standby, i.e., a shorter timeout for a standby in the same region to enable faster detection failure and failover, and a longer timeout for a standby in the remote region (for disaster recovery) to avoid mis-judging its health.

The current PGC_HUP allows to change the setting by editing postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific walsender.  But that's not easy to use.  The user has to do it upon every switchover and failover.

With PGC_BACKEND, the user would be able to tune the timeout as follows:

[recovery.conf]
primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'

With PGC_USERSET, the user would be able to use different user accounts for each standby, and tune the setting as follows:

ALTER USER repluser_remote SET wal_sender_timeout = 60000;


FYI
In Oracle Data Guard, the user configures the timeout for each standby in the primary server's configuration file like this:

LOG_ARCHIVE_DEST_1 = "SERVICE=local_conn_info SYNC NET_TIMEOUT=5"
LOG_ARCHIVE_DEST_2 = "SERVICE=remote_conn_info ASYNC NET_TIMEOUT=60"


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:
> Some customer wants to change the setting per standby, i.e., a shorter
> timeout for a standby in the same region to enable faster detection
> failure and failover, and a longer timeout for a standby in the remote
> region (for disaster recovery) to avoid mis-judging its health.

This argument is sensible.

> The current PGC_HUP allows to change the setting by editing
> postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
> walsender.  But that's not easy to use.  The user has to do it upon
> every switchover and failover.
>
> With PGC_BACKEND, the user would be able to tune the timeout as follows:
>
> [recovery.conf]
> primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'
>
> With PGC_USERSET, the user would be able to use different user
> accounts for each standby, and tune the setting as follows:
>
> ALTER USER repluser_remote SET wal_sender_timeout = 60000;
It seems to me that switching to PGC_BACKENDwould cover already all the
use-cases you are mentioning, as at the end one would just want to
adjust the WAL sender timeout on a connection basis depending on the
geographical location of the receiver and the latency between primary
and standby.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Masahiko Sawada
On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <[hidden email]> wrote:

> On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:
>> Some customer wants to change the setting per standby, i.e., a shorter
>> timeout for a standby in the same region to enable faster detection
>> failure and failover, and a longer timeout for a standby in the remote
>> region (for disaster recovery) to avoid mis-judging its health.
>
> This argument is sensible.
>
>> The current PGC_HUP allows to change the setting by editing
>> postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
>> walsender.  But that's not easy to use.  The user has to do it upon
>> every switchover and failover.
>>
>> With PGC_BACKEND, the user would be able to tune the timeout as follows:
>>
>> [recovery.conf]
>> primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'
>>
>> With PGC_USERSET, the user would be able to use different user
>> accounts for each standby, and tune the setting as follows:
>>
>> ALTER USER repluser_remote SET wal_sender_timeout = 60000;
>
> It seems to me that switching to PGC_BACKENDwould cover already all the
> use-cases you are mentioning, as at the end one would just want to
> adjust the WAL sender timeout on a connection basis depending on the
> geographical location of the receiver and the latency between primary
> and standby.

+1 for PGC_BACKEND. It looks enough for most use cases.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Kyotaro HORIGUCHI-2
At Fri, 14 Sep 2018 18:22:33 +0900, Masahiko Sawada <[hidden email]> wrote in <CAD21AoBr2Y=[hidden email]>

> On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <[hidden email]> wrote:
> > On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:
> >> Some customer wants to change the setting per standby, i.e., a shorter
> >> timeout for a standby in the same region to enable faster detection
> >> failure and failover, and a longer timeout for a standby in the remote
> >> region (for disaster recovery) to avoid mis-judging its health.
> >
> > This argument is sensible.
> >
> >> The current PGC_HUP allows to change the setting by editing
> >> postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
> >> walsender.  But that's not easy to use.  The user has to do it upon
> >> every switchover and failover.
> >>
> >> With PGC_BACKEND, the user would be able to tune the timeout as follows:
> >>
> >> [recovery.conf]
> >> primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'
> >>
> >> With PGC_USERSET, the user would be able to use different user
> >> accounts for each standby, and tune the setting as follows:
> >>
> >> ALTER USER repluser_remote SET wal_sender_timeout = 60000;
> >
> > It seems to me that switching to PGC_BACKENDwould cover already all the
> > use-cases you are mentioning, as at the end one would just want to
> > adjust the WAL sender timeout on a connection basis depending on the
> > geographical location of the receiver and the latency between primary
> > and standby.
>
> +1 for PGC_BACKEND. It looks enough for most use cases.

+1, and we need a means to see the actual value, in
pg_stat_replication?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Tue, Sep 18, 2018 at 11:20:03AM +0900, Kyotaro HORIGUCHI wrote:
> +1, and we need a means to see the actual value, in
> pg_stat_replication?

Well, being able to see what another session is using as settings is
not a trivial problem, perhaps not worth solving, and orthogonal to
what's discussed here...
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
In reply to this post by Kyotaro HORIGUCHI-2
From: Kyotaro HORIGUCHI [mailto:[hidden email]]

> At Fri, 14 Sep 2018 18:22:33 +0900, Masahiko Sawada <[hidden email]>
> wrote in
> <CAD21AoBr2Y=[hidden email]>
> > On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <[hidden email]>
> wrote:
> > > On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:
> > >> Some customer wants to change the setting per standby, i.e., a shorter
> > >> timeout for a standby in the same region to enable faster detection
> > >> failure and failover, and a longer timeout for a standby in the remote
> > >> region (for disaster recovery) to avoid mis-judging its health.
> > >
> > > This argument is sensible.
> > >
> > >> The current PGC_HUP allows to change the setting by editing
> > >> postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
> > >> walsender.  But that's not easy to use.  The user has to do it upon
> > >> every switchover and failover.
> > >>
> > >> With PGC_BACKEND, the user would be able to tune the timeout as follows:
> > >>
> > >> [recovery.conf]
> > >> primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'
> > >>
> > >> With PGC_USERSET, the user would be able to use different user
> > >> accounts for each standby, and tune the setting as follows:
> > >>
> > >> ALTER USER repluser_remote SET wal_sender_timeout = 60000;
> > >
> > > It seems to me that switching to PGC_BACKENDwould cover already all
> the
> > > use-cases you are mentioning, as at the end one would just want to
> > > adjust the WAL sender timeout on a connection basis depending on the
> > > geographical location of the receiver and the latency between primary
> > > and standby.
> >
> > +1 for PGC_BACKEND. It looks enough for most use cases.
>
> +1, and we need a means to see the actual value, in
> pg_stat_replication?
Thanks, the patch attached.  I'll add this to the next CF shortly.  As Michael said, I think viewing the configured value would be a separate feature.


Regards
Takayuki Tsunakawa



walsender_timeout_PGC_BACKEND.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Masahiko Sawada
On Tue, Sep 18, 2018 at 5:27 PM, Tsunakawa, Takayuki
<[hidden email]> wrote:

> From: Kyotaro HORIGUCHI [mailto:[hidden email]]
>> At Fri, 14 Sep 2018 18:22:33 +0900, Masahiko Sawada <[hidden email]>
>> wrote in
>> <CAD21AoBr2Y=[hidden email]>
>> > On Thu, Sep 13, 2018 at 12:32 PM, Michael Paquier <[hidden email]>
>> wrote:
>> > > On Thu, Sep 13, 2018 at 01:14:12AM +0000, Tsunakawa, Takayuki wrote:
>> > >> Some customer wants to change the setting per standby, i.e., a shorter
>> > >> timeout for a standby in the same region to enable faster detection
>> > >> failure and failover, and a longer timeout for a standby in the remote
>> > >> region (for disaster recovery) to avoid mis-judging its health.
>> > >
>> > > This argument is sensible.
>> > >
>> > >> The current PGC_HUP allows to change the setting by editing
>> > >> postgresql.conf or ALTER SYSTEM and then sending SIGHUP to a specific
>> > >> walsender.  But that's not easy to use.  The user has to do it upon
>> > >> every switchover and failover.
>> > >>
>> > >> With PGC_BACKEND, the user would be able to tune the timeout as follows:
>> > >>
>> > >> [recovery.conf]
>> > >> primary_conninfo = '... options=''-c wal_sender_timeout=60000'' ...'
>> > >>
>> > >> With PGC_USERSET, the user would be able to use different user
>> > >> accounts for each standby, and tune the setting as follows:
>> > >>
>> > >> ALTER USER repluser_remote SET wal_sender_timeout = 60000;
>> > >
>> > > It seems to me that switching to PGC_BACKENDwould cover already all
>> the
>> > > use-cases you are mentioning, as at the end one would just want to
>> > > adjust the WAL sender timeout on a connection basis depending on the
>> > > geographical location of the receiver and the latency between primary
>> > > and standby.
>> >
>> > +1 for PGC_BACKEND. It looks enough for most use cases.
>>
>> +1, and we need a means to see the actual value, in
>> pg_stat_replication?
>
> Thanks, the patch attached.  I'll add this to the next CF shortly.  As Michael said, I think viewing the configured value would be a separate feature.
>

Thank you for the patch.

+       <tip>
+        <para>
+         The <literal>%q</literal> escape is useful when including
information that is
+         You can also set this in recovery.conf as follows.  This
allows you to set a
+         longer timeout for a standby in the remote data center
across the slow WAN.
+<programlisting>
+primary_conninfo = 'host=192.168.1.50 port=5432 user=foo
password=foopass options=''-c wal_sender_timeout=5000'''
+</programlisting>
+        </para>
+       </tip>

I didn't follow the first sentence of the above hunk. Is the
wal_sender_timeout relevant with %q?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Reply | Threaded
Open this post in threaded view
|

RE: Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
From: Masahiko Sawada [mailto:[hidden email]]
> I didn't follow the first sentence of the above hunk. Is the
> wal_sender_timeout relevant with %q?

Ouch, that's a careless mistake.  I copied the paragraph from another parameter and failed to remove some sentence.  Patch revised.

Regards
Takayuki Tsunakawa



walsender_timeout_PGC_BACKEND_v2.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Wed, Sep 19, 2018 at 12:14:57AM +0000, Tsunakawa, Takayuki wrote:
> From: Masahiko Sawada [mailto:[hidden email]]
> > I didn't follow the first sentence of the above hunk. Is the
> > wal_sender_timeout relevant with %q?
>
> Ouch, that's a careless mistake.  I copied the paragraph from another
> parameter and failed to remove some sentence.  Patch revised.

-        A value of zero disables the timeout mechanism.  This parameter
-        can only be set in
-        the <filename>postgresql.conf</filename> file or on the server
-        command line.
+        A value of zero disables the timeout mechanism.
+        Only superusers can change this parameter at session start,
+        and it cannot be changed at all within a session.
         The default value is 60 seconds.

Parameters classified as PGC_BACKEND can be updated by any users, and
those marked as PGC_SU_BACKEND can only be updated by superusers.
Replication users are not superusers, which is why PGC_BACKEND is most
adapted.  Your description should just say "This parameter cannot be
changed after session start.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Wed, Sep 19, 2018 at 02:40:31PM +0900, Michael Paquier wrote:
> Parameters classified as PGC_BACKEND can be updated by any users, and
> those marked as PGC_SU_BACKEND can only be updated by superusers.
> Replication users are not superusers, which is why PGC_BACKEND is most
> adapted.  Your description should just say "This parameter cannot be
> changed after session start.

Actually, now that I look at guc.c, using PGC_BACKEND breaks one case:
such parameter types cannot be changed even with SIGHUP, and this even
if the session relies on the default value.  So you could break
applications relying on reloads.  PGC_USERSET would actually do the work
as if the session uses a non-default value specified by SET or within
the connection string, then SIGHUP updates have no effect.  On the
contrary, if the client relies on the default value, then SIGHUP updates
take effect.  Sorry for the confusion, I should have looked at guc.c
more carefully.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
In reply to this post by Michael Paquier-2
From: Michael Paquier [mailto:[hidden email]]
> Parameters classified as PGC_BACKEND can be updated by any users, and those
> marked as PGC_SU_BACKEND can only be updated by superusers.
> Replication users are not superusers, which is why PGC_BACKEND is most
> adapted.  Your description should just say "This parameter cannot be
> changed after session start.

How embarrassing...  I'm sorry to cause you trouble to point out a silly mistake like this (I thought I would write as you suggested, but it seems that I was not who I usually am.)  The revised patch attached.


Regards
Takayuki Tsunakawa




walsender_timeout_PGC_BACKEND_v3.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Wed, Sep 19, 2018 at 06:21:52AM +0000, Tsunakawa, Takayuki wrote:
> How embarrassing...  I'm sorry to cause you trouble to point out a
> silly mistake like this (I thought I would write as you suggested, but
> it seems that I was not who I usually am.)  The revised patch
> attached.

Thanks for the new version.  Per my comments up-thread here, you cannot
actually use PGC_BACKEND:
https://www.postgresql.org/message-id/20180919061303.GB19808@...

This would break the case where this parameter is reloaded when a
session does not use a custom value for wal_sender_timeout.  I have also
looked at all the code paths using wal_sender_timeout, and the change
looks safe to me.  Please find attached an update, simplified, version.
Does that look fine to you?
--
Michael

wal-timeout-userset-michael.patch (1K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
From: Michael Paquier [mailto:[hidden email]]

> Thanks for the new version.  Per my comments up-thread here, you cannot
> actually use PGC_BACKEND:
> https://www.postgresql.org/message-id/20180919061303.GB19808@...
> yz
>
> This would break the case where this parameter is reloaded when a session
> does not use a custom value for wal_sender_timeout.  I have also looked
> at all the code paths using wal_sender_timeout, and the change looks safe
> to me.  Please find attached an update, simplified, version.
> Does that look fine to you?

Thank you, PGC_USERSET is off course fine to me.  I thought PGC_BACKEND includes the capability of PGC_SIGHUP, because PGC_BACKEND is placed after PGC_SIGHUP in the definition of enum GucContext.  That was a pitfall.  I should have paid more attention.

I find it more user friendly to include a description somewhere that the user can tune the timeout per standby, like I added a tip in the description of wal_sender_timeout.  I'm afraid users won't know whether and how to tune the setting per standby, as libpq's options parameter doesn't seem well-known in my experience.


Regards
Takayuki Tsunakawa




Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Fri, Sep 21, 2018 at 02:28:07AM +0000, Tsunakawa, Takayuki wrote:
> I find it more user friendly to include a description somewhere that
> the user can tune the timeout per standby, like I added a tip in the
> description of wal_sender_timeout.  I'm afraid users won't know
> whether and how to tune the setting per standby, as libpq's options
> parameter doesn't seem well-known in my experience.

But that does not apply to this single parameter, no?  I would think
that a section in recovery.conf is more adapted.  I can see that the
patch I proposed up-thread could be more precise though, so why not
adding at an extra paragraph?  Here is an idea:
"For a cluster distributed across multiple geographic locations, using
a different value per location brings more flexibility in the cluster
management.  A smaller value is useful for faster failure detection with
a standby having a low connection latency, and a larger value helps in
judging better the health of a standby if located in a remote location,
with a longer connection latency."
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
From: Michael Paquier [mailto:[hidden email]]

> But that does not apply to this single parameter, no?  I would think that
> a section in recovery.conf is more adapted.  I can see that the patch I
> proposed up-thread could be more precise though, so why not adding at an
> extra paragraph?  Here is an idea:
> "For a cluster distributed across multiple geographic locations, using a
> different value per location brings more flexibility in the cluster
> management.  A smaller value is useful for faster failure detection with
> a standby having a low connection latency, and a larger value helps
> judging better the health of a standby if located in a remote location,
> with a longer connection latency."

That paragraph looks cool, thanks.  Regarding where the paragraph should be, there are three candidate locations:

(1) 19.6.1. Sending Servers, wal_sender_timeout description
https://www.postgresql.org/docs/devel/static/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-SENDER

(2) 26.2.4. Setting Up a Standby Server
https://www.postgresql.org/docs/devel/static/warm-standby.html#STANDBY-SERVER-SETUP

(3) 27.3. Standby Server Settings, primary_conninfo description
https://www.postgresql.org/docs/devel/static/standby-settings.html

I think all of these are almost equally good.  I chose (1) at first, and you chose (3).  But (2) may be the best, because it's the natural place the user will see when configuring the standby, and it already contains an example of recovery.conf.  We can add "options=''-c wal_sender_timeout=5000''" or something in that example.  I'm OK with anyplace, but I recommend adding how to specify wal_sender_timeout in primary_conninfo, because libpq's options parameter may not be well-konown, and it's a bit difficult to figure out the need to enclose the value with double single-quotes.


Regards
Takayuki Tsunakawa







Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Fri, Sep 21, 2018 at 05:37:42AM +0000, Tsunakawa, Takayuki wrote:
> I think all of these are almost equally good.  I chose (1) at first,
> and you chose (3).  But (2) may be the best, because it's the natural
> place the user will see when configuring the standby, and it already
> contains an example of recovery.conf.  We can add "options=''-c
> wal_sender_timeout=5000''" or something in that example.  I'm OK with
> anyplace, but I recommend adding how to specify wal_sender_timeout in
> primary_conninfo, because libpq's options parameter may not be
> well-konown, and it's a bit difficult to figure out the need to
> enclose the value with double single-quotes.

I think that the description of wal_sender_timeout and its properties
should remain where the parameter is defined, so (3) is not a good
option in my opinion.  (2) has a point with the use of quotes actually,
so why not just mention options=''-c wal_sender_timeout=5000'' in the
example of recovery.conf as you suggest and call it a day, but keep the
paragraph I suggested in (1)?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Changing the setting of wal_sender_timeout per standby

Tsunakawa, Takayuki
From: Michael Paquier [mailto:[hidden email]]
> I think that the description of wal_sender_timeout and its properties should
> remain where the parameter is defined, so (3) is not a good option in my
> opinion.  (2) has a point with the use of quotes actually, so why not just
> mention options=''-c wal_sender_timeout=5000'' in the example of
> recovery.conf as you suggest and call it a day, but keep the paragraph I
> suggested in (1)?

Agreed.  Sorry to cause you to take this long time for such a tiny patch...


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
On Fri, Sep 21, 2018 at 06:26:19AM +0000, Tsunakawa, Takayuki wrote:
> Agreed.  Sorry to cause you to take this long time for such a tiny
> patch...

Well, that is arguing about how to shape things and agree on those,
which is not wasted time, far from that.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Michael Paquier-2
In reply to this post by Tsunakawa, Takayuki
On Fri, Sep 21, 2018 at 06:26:19AM +0000, Tsunakawa, Takayuki wrote:
> Agreed.

Okay, I have pushed the patch with all your suggestions included.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Changing the setting of wal_sender_timeout per standby

Andres Freund
Hi,

On 2018-09-22 15:27:24 +0900, Michael Paquier wrote:
> On Fri, Sep 21, 2018 at 06:26:19AM +0000, Tsunakawa, Takayuki wrote:
> > Agreed.
>
> Okay, I have pushed the patch with all your suggestions included.

Have there been discussions about the security effects of this change?
Previously the server admin could control the timeout, which could
affect things like syncrep, after this it's not possible anymore.  I
*think* that's ok, but it should be discussed.

Greetings,

Andres Freund

12