allow online change primary_conninfo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
75 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

allow online change primary_conninfo

Sergei Kornilov
Hello all
Now we integrate recovery.conf into GUC system. So i would like to start discussion to make primary_conninfo and primary_slot_name PGC_SIGHUP.
My work-in-progress patch attached.

I think we have no futher reason to have a copy of conninfo and slot name in WalRcvData, so i propose remove these fields from pg_stat_get_wal_receiver() (and pg_stat_wal_receiver view). This data can be accesible now via regular GUC commands.

Thank you for advance!

regards, Sergei

reload_walreceiver_conninfo_v001.patch (23K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Andres Freund
Hi,

On 2018-11-26 00:25:43 +0300, Sergei Kornilov wrote:
> Now we integrate recovery.conf into GUC system. So i would like to start discussion to make primary_conninfo and primary_slot_name PGC_SIGHUP.
> My work-in-progress patch attached.

Cool!


> I think we have no futher reason to have a copy of conninfo and slot name in WalRcvData, so i propose remove these fields from pg_stat_get_wal_receiver() (and pg_stat_wal_receiver view). This data can be accesible now via regular GUC commands.

Is that wise? It seems possible that wal receivers run for a while with
the old connection information, and without this information that seems
hard to debug.


> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index db1a2d4..faa8e17 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -3797,7 +3797,6 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
>            <varname>primary_conninfo</varname> string.
>           </para>
>           <para>
> -          This parameter can only be set at server start.
>            This setting has no effect if the server is not in standby mode.
>           </para>
>          </listitem>

Should probably note something like

"This parameter can only be set in the <filename>postgresql.conf</filename>
file or on the server command line."


> @@ -435,9 +420,33 @@ WalReceiverMain(void)
>  
>   if (got_SIGHUP)
>   {
> + char *conninfo = pstrdup(PrimaryConnInfo);
> + char *slotname = pstrdup(PrimarySlotName);
> +
>   got_SIGHUP = false;
>   ProcessConfigFile(PGC_SIGHUP);
>   XLogWalRcvSendHSFeedback(true);
> +
> + /*
> + * If primary_conninfo has been changed while walreceiver is running,
> + * shut down walreceiver so that a new walreceiver is started and
> + * initiates replication with the new connection information.
> + */
> + if (strcmp(conninfo, PrimaryConnInfo) != 0)
> + ereport(FATAL,
> + (errcode(ERRCODE_ADMIN_SHUTDOWN),
> + errmsg("closing replication connection because primary_conninfo was changed")));
> +
> + /*
> + * And the same for primary_slot_name.
> + */
> + if (strcmp(slotname, PrimarySlotName) != 0)
> + ereport(FATAL,
> + (errcode(ERRCODE_ADMIN_SHUTDOWN),
> + errmsg("closing replication connection because primary_slot_name was changed")));
> +
> + pfree(conninfo);
> + pfree(slotname);

I'd strongly advocate moving this to a separate function.


Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
On Sun, Nov 25, 2018 at 01:43:13PM -0800, Andres Freund wrote:
> On 2018-11-26 00:25:43 +0300, Sergei Kornilov wrote:
>> I think we have no futher reason to have a copy of conninfo and slot
>> name in WalRcvData, so i propose remove these fields from
>> pg_stat_get_wal_receiver() (and pg_stat_wal_receiver view). This data
>> can be accesible now via regular GUC commands.
>
> Is that wise? It seems possible that wal receivers run for a while with
> the old connection information, and without this information that seems
> hard to debug.

No, that's unwise.  One of the reasons why conninfo is around is that we
know to which server a receiver is connected when specifying multiple
host and/or port targets in the configuration.  Please see 9a895462 for
the details.

Removing the slot does not look like an improvement to me either,
following Andres' argument.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
In reply to this post by Andres Freund
Hi

>>  I think we have no futher reason to have a copy of conninfo and slot name in WalRcvData, so i propose remove these fields from pg_stat_get_wal_receiver() (and pg_stat_wal_receiver view). This data can be accesible now via regular GUC commands.
>
> Is that wise? It seems possible that wal receivers run for a while with
> the old connection information, and without this information that seems
> hard to debug.
Hmm... I considered SIGHUP processing was in fast loop and therefore shutdown should be fast. But i recheck code and found a possible long loop without processing SIGHUP (in case we receive new data faster than writes to disk). Ok, i will revert back.
How about write to WalRcvData only clobbered conninfo?

> Should probably note something like
>
> "This parameter can only be set in the <filename>postgresql.conf</filename>
> file or on the server command line."
Seems other PGC_SIGHUP settings have such note, fixed, thank you.

> I'd strongly advocate moving this to a separate function.
Done

regards, Sergei

reload_walreceiver_conninfo_v002.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Andres Freund
Hi,

On 2018-11-26 12:30:06 +0300, Sergei Kornilov wrote:
> Hmm... I considered SIGHUP processing was in fast loop and therefore shutdown should be fast. But i recheck code and found a possible long loop without processing SIGHUP (in case we receive new data faster than writes to disk). Ok, i will revert back.
> How about write to WalRcvData only clobbered conninfo?

I'm not sure I understand what you mean?  The way I'd solve this is that
that only walreceiver, at startup, writes out its conninfo/slot_name,
sourcing the values from the GUCs. That way there's no issue with values
being overwritten early.

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hi

>>  Hmm... I considered SIGHUP processing was in fast loop and therefore shutdown should be fast. But i recheck code and found a possible long loop without processing SIGHUP (in case we receive new data faster than writes to disk). Ok, i will revert back.
>>  How about write to WalRcvData only clobbered conninfo?
>
> I'm not sure I understand what you mean?
I am about my initial proposal with remove conninfo wrom WalRcvData - walreceiver may run some time with old conninfo and
> without this information that seems hard to debug.
Earlier i thought walreceiver will shutdown fast on SIGHUP.

> The way I'd solve this is that
> that only walreceiver, at startup, writes out its conninfo/slot_name,
> sourcing the values from the GUCs. That way there's no issue with values
> being overwritten early.
In second patch i follow exactly this logic.

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
In reply to this post by Andres Freund
On Mon, Nov 26, 2018 at 09:46:36AM -0800, Andres Freund wrote:
> I'm not sure I understand what you mean?  The way I'd solve this is that
> that only walreceiver, at startup, writes out its conninfo/slot_name,
> sourcing the values from the GUCs. That way there's no issue with values
> being overwritten early.

WalRcv->conninfo, as stored in the WAL receiver, clobbers some of its
fields and includes a set of default parameters with its values after
establishing the connection so as no sensible data shows up in
pg_stat_wal_receiver so you cannot simply compare what is stored in the
WAL receiver with the GUCs to do the decision-making.  For the password,
you'd want to do again a connection anyway even if the cloberred string
is the same.

What's proposed in the patch to issue an ERROR if the parameter has
changed does not look that bad to me as it depends on the process
context, but I think that this patch should document clearly that if
primary_conninfo or primary_slot_name are changed then the WAL receiver
is stopped immediately.

    /*
-    * replication slot name; is also used for walreceiver to connect with the
-    * primary
+    * replication slot name used by runned walreceiver
     */
Not sure that there is much point in updating those comments, because
it still has the same meaning in the new context.

-RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
-                    const char *slotname)
+RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr)
That's perhaps a matter of taste, but keeping the current API as-is
looks cleaner to me, particularly because those fields get copied into
WalRcv, so I'd rather not change what is done in
WaitForWALToBecomeAvailable and keep the patch at its simplest shape.

+$node_standby_2->reload; # should have effect without restart
This does not wait for the change to be effective, so I think that you
introduce a race condition for slow machines with this test, making it
unstable.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

thank you for review!

> but I think that this patch should document clearly that if
> primary_conninfo or primary_slot_name are changed then the WAL receiver
> is stopped immediately.
Good idea, will change.

>     /*
> - * replication slot name; is also used for walreceiver to connect with the
> - * primary
> + * replication slot name used by runned walreceiver
>      */
> Not sure that there is much point in updating those comments, because
> it still has the same meaning in the new context.
"is also used" seems outdated for me. With proposed patch this field means only currently active slot, not "also".
Well, depends on RequestXLogStreaming discussion

> -RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
> - const char *slotname)
> +RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr)
> That's perhaps a matter of taste, but keeping the current API as-is
> looks cleaner to me, particularly because those fields get copied into
> WalRcv, so I'd rather not change what is done in
> WaitForWALToBecomeAvailable and keep the patch at its simplest shape.
Unfortunally here is race condition if i leave RequestXLogStreaming infrastructure as-is and walreceiver will use this struct for startup.
This case:
- startup process calls RequestXLogStreaming and fill WalRcv conninfo/slotname
- reload primary_conninnfo by SIGHUP
- walreceiver start with new GUC but trying conninfo/slotname from WalRcv struct. If walreceiver is able to connect - it will not restart till another error or primary_conninfo/slotname will be changed again (SIGHUP without change is not enough).

I already had such failures while testing this patch.

If walreceiver will use primary_conninfo GUC at startup - i see no reason to leave *conninfo in RequestXLogStreaming. These parameters will be misleading, we request them, but not using for anything.

> +$node_standby_2->reload; # should have effect without restart
> This does not wait for the change to be effective, so I think that you
> introduce a race condition for slow machines with this test, making it
> unstable.
No, here is no race condition because just after this line we wait this replication slot in upstream pg_catalog.pg_replication_slots (get_slot_xmins routine).
Before reload we have no replication slot and should reconnect here with replication slot. I can add some comment here.

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
In reply to this post by Michael Paquier-2
Hello

After some thinking i can rewrite this patch in another way.

This is better or worse?

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
oops, forgot attach patch

11.12.2018, 13:14, "Sergei Kornilov" <[hidden email]>:
> Hello
>
> After some thinking i can rewrite this patch in another way.
>
> This is better or worse?
>
> regards, Sergei

reload_walreceiver_conninfo_v003.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
In reply to this post by Sergei Kornilov
On Tue, Dec 11, 2018 at 11:59:08AM +0300, Sergei Kornilov wrote:
>> but I think that this patch should document clearly that if
>> primary_conninfo or primary_slot_name are changed then the WAL receiver
>> is stopped immediately.
>
> Good idea, will change.

+         <para>
+          <application>walreceiver</application> will be restarted after
+          <varname>primary_conninfo</varname> was changed.
+         </para>
Hm.  It may be cleaner to use "WAL receiver" here for clarity.  Perhaps
that's a matter of state but that seems cleaner when referring to the
actual process.  The docs share both grammar, depending on the context.

> If walreceiver will use primary_conninfo GUC at startup - i see no
> reason to leave *conninfo in RequestXLogStreaming. These parameters
> will be misleading, we request them, but not using for anything.

Oh, indeed.  My apologies for being confused here, I can see now your
point.  It seems to me that this is an extra cleanup caused by the
recovery parameters switched to be GUCs, and not something that we
should be changed as an effect to SIGHUP for primary_conninfo and
primary_slot_name.  So let's do this cleanup first.  For this purpose, I
am going to post a new thread with a proper patch, with in CC the folks
who moved the recovery parameters to be GUCs.

>> +$node_standby_2->reload; # should have effect without restart
>> This does not wait for the change to be effective, so I think that you
>> introduce a race condition for slow machines with this test, making it
>> unstable.
>
> No, here is no race condition because just after this line we wait
> this replication slot in upstream pg_catalog.pg_replication_slots
> (get_slot_xmins routine).
> Before reload we have no replication slot and should reconnect here
> with replication slot. I can add some comment here.
Good point, I have forgotten the call to get_slot_xmins() below.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Kyotaro HORIGUCHI-2
In reply to this post by Sergei Kornilov
At Tue, 11 Dec 2018 13:16:23 +0300, Sergei Kornilov <[hidden email]> wrote in <[hidden email]>
sk> oops, forgot attach patch
sk>
sk> 11.12.2018, 13:14, "Sergei Kornilov" <[hidden email]>:
sk> > Hello
sk> >
sk> > After some thinking i can rewrite this patch in another way.
sk> >
sk> > This is better or worse?

As the whole the new version looks better for me.

===
Do we no longer need static version of conninfo/slotname in
walreceiver.c? We can detect a change of the variables without
them (as you did in the previous version.).


===
I don't think it is acceptable that raw conninfo is exposed into
log file.

>  LOG:  parameter "primary_conninfo" changed to "host=/tmp port=5432 password=hoge"


===
> errmsg("closing replication connection because primary_conninfo was changed")));

Maybe it is better being as follows according to similar messages:

> errmsg("terminating walreceiver process due to change of primary_conninfo")));

And it *might* be good to have detail.

> errdetail("In a moment starts streaming WAL with new configuration.");


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

> Do we no longer need static version of conninfo/slotname in
> walreceiver.c? We can detect a change of the variables without
> them (as you did in the previous version.).

Depends on this discussion: https://www.postgresql.org/message-id/20181212053042.GK17695@...
If walreceiver will use GUC conninfo for startup - then yes, we can remove such static variables.
If walreceiver will still use conninfo from WalRcv - we have race condition between RequestXLogStreaming from startup, config reload, and walreceiver start. In this case i need save conninfo from WalRcv and compare new GUC value with them.

> I don't think it is acceptable that raw conninfo is exposed into
> log file.
>
>>   LOG: parameter "primary_conninfo" changed to "host=/tmp port=5432 password=hoge"

Hmm. We have infrastructure to hide such values?
I need implement something like flag GUC_HIDE_VALUE_FROM_LOG near GUC_SUPERUSER_ONLY in separate thread? Log message will be changed to "LOG: parameter "primary_conninfo" changed" with this flag.

I also think that this is not a big problem. We may already have secret data in the logs. For example, in query text from application.

>>  errmsg("closing replication connection because primary_conninfo was changed")));
>
> Maybe it is better being as follows according to similar messages:
>
>>  errmsg("terminating walreceiver process due to change of primary_conninfo")));
>
> And it *might* be good to have detail.
>
>>  errdetail("In a moment starts streaming WAL with new configuration.");

Agreed, will change.

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
On Thu, Dec 13, 2018 at 01:51:48PM +0300, Sergei Kornilov wrote:
> Depends on this discussion: https://www.postgresql.org/message-id/20181212053042.GK17695@...
> If walreceiver will use GUC conninfo for startup - then yes, we can
> remove such static variables.  If walreceiver will still use conninfo
> from WalRcv - we have race condition between RequestXLogStreaming from
> startup, config reload, and walreceiver start. In this case I need
> save conninfo from WalRcv and compare new GUC value with them.

I would recommend waiting for the conclusion of other thread before
moving on with this one.  There are arguments for letting the startup
process deciding which connection string the WAL receiver should use as
well as letting the WAL receiver use directly the connection string from
the GUC.

> Hmm. We have infrastructure to hide such values?
> I need implement something like flag GUC_HIDE_VALUE_FROM_LOG near
> GUC_SUPERUSER_ONLY in separate thread? Log message will be changed to
> "LOG: parameter "primary_conninfo" changed" with this flag.

Good point.  Passwords in logs can be considered as a security issue.
Having such an option may be interesting for other things, though I am
not sure if just having an option to hide logs related to a given
parameter are the correct way to shape it.  We could also consider using
the show hook function of a parameter to print its value in such logs.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hi

> I would recommend waiting for the conclusion of other thread before
> moving on with this one.
Agreed.
I mark this patch as Waiting on Author. Not quite correct for waiting another discussion, but most suitable.

> We could also consider using
> the show hook function of a parameter to print its value in such logs.
But show hook also affects "show primary_conninfo;" command and pg_settings.value
I already marked primary_conninfo as GUC_SUPERUSER_ONLY. I doubt if we need hide some connection info even from superuser.
Also this hook would be a bit complicated, i found suitable code only in libpqrcv_get_conninfo after establishing connect

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Andres Freund
In reply to this post by Michael Paquier-2
Hi,

On 2018-12-14 16:55:42 +0900, Michael Paquier wrote:

> On Thu, Dec 13, 2018 at 01:51:48PM +0300, Sergei Kornilov wrote:
> > Depends on this discussion: https://www.postgresql.org/message-id/20181212053042.GK17695@...
> > If walreceiver will use GUC conninfo for startup - then yes, we can
> > remove such static variables.  If walreceiver will still use conninfo
> > from WalRcv - we have race condition between RequestXLogStreaming from
> > startup, config reload, and walreceiver start. In this case I need
> > save conninfo from WalRcv and compare new GUC value with them.
>
> I would recommend waiting for the conclusion of other thread before
> moving on with this one.  There are arguments for letting the startup
> process deciding which connection string the WAL receiver should use as
> well as letting the WAL receiver use directly the connection string from
> the GUC.

I suggest continuing the development of the patch without relying on
that refactoring.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

Yeah, we have no consensus.

Another open question is about logging new primary_conninfo:

> LOG: parameter "primary_conninfo" changed to "host=/tmp port=5432 password=hoge"

I my opinion this is not issue, database logs can have sensitive data. User queries, for example.
If we not want expose such info - it is ok just hide new value from logs with new GUC flag? Or i need implement masked conninfo for this purpose?

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
On Thu, Jan 31, 2019 at 04:13:22PM +0300, Sergei Kornilov wrote:
> I my opinion this is not issue, database logs can have sensitive
> data. User queries, for example.  If we not want expose such info -
> it is ok just hide new value from logs with new GUC flag? Or i need
> implement masked conninfo for this purpose?

You have problems with things in this area for any commands logged and
able to show a connection string or a password, which can go down as
well to CREATE/ALTER ROLE or FDWs.  So for the purpose of what's
discussed on this thread it does not sound like a requirement to be
able to hide that.  Role DDLs can take an already-hashed input to
avoid that, still knowing the MD5 hash is sufficient for connection
(not for SCRAM!).  Now for FDWs..
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Andres Freund
In reply to this post by Sergei Kornilov
Hi,

On 2019-01-31 16:13:22 +0300, Sergei Kornilov wrote:
> Hello
>
> Yeah, we have no consensus.
>

Are you planning to update the patch?  Given there's not been much
progress here, I think we ough tot mark the CF entry as returned with
feedback for now.


> Another open question is about logging new primary_conninfo:

> > LOG: parameter "primary_conninfo" changed to "host=/tmp port=5432 password=hoge"
>
> I my opinion this is not issue, database logs can have sensitive data. User queries, for example.
> If we not want expose such info - it is ok just hide new value from logs with new GUC flag? Or i need implement masked conninfo for this purpose?

I agree that this doesn't need to be solved as part of this patch. Given
the config is in the conf file, I don't think it's meaningful to hide
this from the log. If necessary one can use client certs, service files,
etc.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hi

> I agree that this doesn't need to be solved as part of this patch.

Thank you!

> Are you planning to update the patch?

Sorry, i was busy last month. But, well, i already did such update: https://www.postgresql.org/message-id/9653601544523383%40iva8-37fc2ad204cd.qloud-c.yandex.net
v003 version does not change RequestXLogStreaming, not require "Making WAL receiver startup rely on GUC context" patch and does not hide new value from logs.

Here is updated version (from v003 patch) with few wording changes suggested by Kyotaro HORIGUCHI and Michael Paquier

regards, Sergei

reload_walreceiver_conninfo_v004.patch (10K) Download Attachment
1234