allow online change primary_conninfo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
77 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

>>  Well, it seems better to move this patch to next commitfest?
>
> What? You want to make wal_receiver_create_temp_slot unchangeable and
> default to off in pg13, and delay the patch that fixes those things to
> pg14? That makes no sense to me.

I want to handle similar things in a similar way.
wal_receiver_create_temp_slot has good design? I will change my patch in same way in this case. But something like that was strongly rejected a year ago.

> Please keep them both here so that we can get things to a usable state.

Yes, of course.

Here I attached 3 patches:
0001 is copy from https://commitfest.postgresql.org/27/2456/ It changes wal_receiver_create_temp_slot to PGC_POSTMASTER, changes the default value to off, and moves the logic to the startup process.
0002 changes primary_conninfo and primary_slot_name to be PGC_SIGHUP
0003 changes wal_receiver_create_temp_slot back to be PGC_SIGHUP. Michael Paquier asks to remove this from 0002, you ask to leave it in this thread. So, I made separate patch on top of 0002.

Thank you

regards, Sergei

0003_v9_reload_wal_receiver_create_temp_slot.patch (5K) Download Attachment
0002_v9_allow_reload_walreceiver_conninfo.patch (18K) Download Attachment
0001-v3-change-wal_receiver_create_temp_slot.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Alvaro Herrera-9
I think the startup sighup handler should be in startup.c, not xlog.c,
which has enough random code already.  We can add an accessor in xlog.c
to let changing the walrcv status flag, to be called from the signal
handler.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Alvaro Herrera-9
On 2020-Mar-24, Alvaro Herrera wrote:

> I think the startup sighup handler should be in startup.c, not xlog.c,
> which has enough random code already.  We can add an accessor in xlog.c
> to let changing the walrcv status flag, to be called from the signal
> handler.

... as in the attached version.

Sergei included LOG messages to indicate which setting has been changed.
I put these in "#if 0" for now, but I'm thinking to remove these
altogether; we already have LOG messages when a setting changes value,
per ProcessConfigFileInternal(); the log sequence looks like this, taken
from tmp_check/log/001_stream_rep_standby_2.log after running the tests:

2020-03-25 20:54:19.413 -03 [17493] LOG:  received SIGHUP, reloading configuration files
2020-03-25 20:54:19.415 -03 [17493] LOG:  parameter "primary_slot_name" changed to "standby_2"
2020-03-25 20:54:19.415 -03 [17493] LOG:  parameter "wal_receiver_status_interval" changed to "1"
2020-03-25 20:54:19.422 -03 [17569] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1
2020-03-25 20:54:19.426 -03 [17494] LOG:  wal receiver process shutdown requested
2020-03-25 20:54:19.426 -03 [17569] FATAL:  terminating walreceiver process due to administrator command
2020-03-25 20:54:19.433 -03 [17572] LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

which looks sufficient.  Maybe we can reword that new message, say "wal
receiver process shutdown forced by parameter change".  Not sure if we
can or should adjust the FATAL line; probably not worth the trouble.

Thoughts welcome.  I'm thinking on getting this applied shortly.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v10-0001-Set-wal_receiver_create_temp_slot-PGC_POSTMASTER.patch (10K) Download Attachment
v10-0002-Allow-changing-primary_conninfo-and-primary_slot.patch (16K) Download Attachment
v10-0003-Set-wal_receiver_create_temp_slot-as-PGC_SIGHUP.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
On Wed, Mar 25, 2020 at 09:08:12PM -0300, Alvaro Herrera wrote:
> ... as in the attached version.

Patch 0001 is one that has already been discussed previously (thanks
for keeping the extra comments about GUCs and WAL receivers).  That
looks fine to me.

> Sergei included LOG messages to indicate which setting has been changed.
> I put these in "#if 0" for now, but I'm thinking to remove these
> altogether; we already have LOG messages when a setting changes value,
> per ProcessConfigFileInternal(); the log sequence looks like this, taken
> from tmp_check/log/001_stream_rep_standby_2.log after running the tests:

Yes, it makes sense to remove any knowledge of GUC updates from
StartupRequestWalReceiverRestart().  I would add a description at the
top of this routine by the way.

+extern void ProcessStartupSigHup(void);
This is declared, but used nowhere in patch 0002.

Wouldn't it be better to be more careful about the NULL-ness of the
string parameters in StartupRereadConfig()?

+   if (!slotnameChanged && strcmp(PrimarySlotName, "") == 0)
+       tempSlotChanged = tempSlot != wal_receiver_create_temp_slot;
I would add parens here for clarity.

> which looks sufficient.  Maybe we can reword that new message, say "wal
> receiver process shutdown forced by parameter change".  Not sure if we
> can or should adjust the FATAL line; probably not worth the trouble.

There is merit to keep the LOG in StartupRequestWalReceiverRestart()
unchanged, as the routine may get called in the future in some other
context.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
In reply to this post by Alvaro Herrera-9
Hello

Thank you! You were ahead of me. I wanted to double-check my changes this morning before posting.

> Sergei included LOG messages to indicate which setting has been changed.
> I put these in "#if 0" for now, but I'm thinking to remove these
> altogether;

No objections.

> Not sure if we can or should adjust the FATAL line; probably not worth the trouble.

I think it's ok. walreceiver is terminated exactly due to administrator command.

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Alvaro Herrera-9
Now, would anyone be too upset if I push these in this other order?  I
realized that the reason the tests broke after Sergei's patch is that
recovery/t/001_stream_rep.pl's get_slot_xmins() is broken for temp
walreceiver slots, since it's using the non-temp name it tries to give
to the slot, rather than the temp name under which it is actually
created.  The workaround proposed by 0002 is to edit standby_1's config
to set walreceiver's slot to be non-temp.

Thanks to Justin Pryzby for offlist typo corrections.

The reason is that I think I would like to get Sergei's patch pushed
right away (0001+0002, as a single commit); but that I think there's
more to attack in the walreceiver temp slot code than 0003 here, and I
don't want to delay the new feature any longer while I figure out the
fix for that.


(The thing is: if I specify primary_slot_name in the config, why is the
temp walreceiver slot code not obeying that name?  I think walreceiver
should create a temp slot, sure, but using the given name rather than
coming up with a random name.)

(The other reason is that I would like to push one patch to fix
walreceiver tmp slot rather than two, setting the thing first this way
and then the opposite way.)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v11-0001-Allow-changing-primary_conninfo-and-primary_slot.patch (16K) Download Attachment
v11-0002-Fix-tests.patch (899 bytes) Download Attachment
v11-0003-Rework-wal_receiver_create_temp_slot-using-new-i.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Michael Paquier-2
On Thu, Mar 26, 2020 at 09:39:17PM -0300, Alvaro Herrera wrote:
> Now, would anyone be too upset if I push these in this other order?  I
> realized that the reason the tests broke after Sergei's patch is that
> recovery/t/001_stream_rep.pl's get_slot_xmins() is broken for temp
> walreceiver slots, since it's using the non-temp name it tries to give
> to the slot, rather than the temp name under which it is actually
> created.  The workaround proposed by 0002 is to edit standby_1's config
> to set walreceiver's slot to be non-temp.

FWIW, I find a bit strange that the change introduced in
001_stream_rep.pl as of patch 0002 is basically undone in 0003 by
changing the default value of wal_receiver_create_temp_slot.

> The reason is that I think I would like to get Sergei's patch pushed
> right away (0001+0002, as a single commit); but that I think there's
> more to attack in the walreceiver temp slot code than 0003 here, and I
> don't want to delay the new feature any longer while I figure out the
> fix for that.

Not sure I agree with this approach.  I'd rather fix all the existing
known problems first, and then introduce the new features on top of
what we consider to be a clean state.  If we lack of time between the
first and second patches, we may have a risk of keeping the code with
the new feature but without the fixes discussed for
wal_receiver_create_temp_slot.

> (The thing is: if I specify primary_slot_name in the config, why is the
> temp walreceiver slot code not obeying that name?  I think walreceiver
> should create a temp slot, sure, but using the given name rather than
> coming up with a random name.)

Good point.  I am not sure either why the current logic has been
chosen.  The discussion related to the original patch is in this area:
https://www.postgresql.org/message-id/4792e456-d75f-8e6a-2d47-34b8f78c266f@...

> (The other reason is that I would like to push one patch to fix
> walreceiver tmp slot rather than two, setting the thing first this way
> and then the opposite way.)

So your problem here is that by applying first 0003 and then 0001-0002
you would finish by switching wal_receiver_create_temp_slot to
PGC_POSTMASTER, and then back to PGC_SIGHUP again?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
In reply to this post by Alvaro Herrera-9
Hello

> I realized that the reason the tests broke after Sergei's patch is that
> recovery/t/001_stream_rep.pl's get_slot_xmins() is broken for temp
> walreceiver slots, since it's using the non-temp name it tries to give
> to the slot, rather than the temp name under which it is actually
> created. The workaround proposed by 0002 is to edit standby_1's config
> to set walreceiver's slot to be non-temp.

This is bug in behavior, not in tests.
We need walrcv->is_temp_slot = false; somewhere in RequestXLogStreaming to works correctly.

HEAD is not affected since primary_slot_name cannot be changed online.

> (The thing is: if I specify primary_slot_name in the config, why is the
> temp walreceiver slot code not obeying that name? I think walreceiver
> should create a temp slot, sure, but using the given name rather than
> coming up with a random name.)

Hm, interesting idea.

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

In other words, patches in reverse order:
0001 will change primary_conninfo and primary_slot_name to reloadable
0002 will move wal_receiver_create_temp_slot logic to startup process (without changing to PGC_POSTMASTER)
0003 is new draft patch: wal_receiver_create_temp_slot will use the given name or generate new one.

regards, Sergei

v12-0003-draft-temp-slot-name.patch (5K) Download Attachment
v12-0002-Rework-wal_receiver_create_temp_slot-using-new-i.patch (14K) Download Attachment
v12-0001-Allow-changing-primary_conninfo-and-primary_slot.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Alvaro Herrera-9
In reply to this post by Sergei Kornilov
On 2020-Mar-27, Sergei Kornilov wrote:

> Hello
>
> > I realized that the reason the tests broke after Sergei's patch is that
> > recovery/t/001_stream_rep.pl's get_slot_xmins() is broken for temp
> > walreceiver slots, since it's using the non-temp name it tries to give
> > to the slot, rather than the temp name under which it is actually
> > created. The workaround proposed by 0002 is to edit standby_1's config
> > to set walreceiver's slot to be non-temp.
>
> This is bug in behavior, not in tests.
> We need walrcv->is_temp_slot = false; somewhere in RequestXLogStreaming to works correctly.
>
> HEAD is not affected since primary_slot_name cannot be changed online.

I pushed the wal_receiver_create_temp_slot bugfix, because I realized
after looking for long enough at WalReceiverMain() that the code was
beyond saving.  I'll be pushing the rest of this later today.

Thanks,

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Alvaro Herrera-9
On 2020-Mar-27, Alvaro Herrera wrote:

> I pushed the wal_receiver_create_temp_slot bugfix, because I realized
> after looking for long enough at WalReceiverMain() that the code was
> beyond saving.  I'll be pushing the rest of this later today.

So here's the next one.  I still have to go over the doc changes here,
but at least the tests are passing for me.

I think I should set aside your new draft for now, but I think we should
still get it in pg13 to avoid having the change the semantics of the
walreceiver temp slot in the next release.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v13-0001-Allow-changing-primary_conninfo-and-primary_slot.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

Thank you!

> I think I should set aside your new draft for now

I agree, this patch definitely needs a bit more time to review. (currently it applies on top of v13 patch cleanly)

> but I think we should still get it in pg13 to avoid having the change the semantics of the
> walreceiver temp slot in the next release.

Good point.

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Alvaro Herrera-9
In reply to this post by Alvaro Herrera-9
On 2020-Mar-27, Alvaro Herrera wrote:

> On 2020-Mar-27, Alvaro Herrera wrote:
>
> > I pushed the wal_receiver_create_temp_slot bugfix, because I realized
> > after looking for long enough at WalReceiverMain() that the code was
> > beyond saving.  I'll be pushing the rest of this later today.
>
> So here's the next one.  I still have to go over the doc changes here,
> but at least the tests are passing for me.

Pushed with some documentation tweaks.

Many thanks for working on this!

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

Thank you very much!

I attached updated patch: walreceiver will use configured primary_slot_name as temporary slot name if wal_receiver_create_temp_slot is enabled.

regards, Sergei

v14_primary_slot_name_temporary.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Peter Eisentraut-6
On 2020-03-28 11:49, Sergei Kornilov wrote:
> I attached updated patch: walreceiver will use configured primary_slot_name as temporary slot name if wal_receiver_create_temp_slot is enabled.

The original setup worked consistently with pg_basebackup.  In
pg_basebackup, if you specify -S/--slot, then it uses a permanent slot
with the given name.  This is the same behavior as primary_slot_name has
now.  We should be careful that we don't create two sets of
options/settings that look similar but combine in different ways.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Maxim Orlov
In reply to this post by Alvaro Herrera-9
On 2020-03-28 02:39, Alvaro Herrera wrote:

> On 2020-Mar-27, Alvaro Herrera wrote:
>
>> On 2020-Mar-27, Alvaro Herrera wrote:
>>
>> > I pushed the wal_receiver_create_temp_slot bugfix, because I realized
>> > after looking for long enough at WalReceiverMain() that the code was
>> > beyond saving.  I'll be pushing the rest of this later today.
>>
>> So here's the next one.  I still have to go over the doc changes here,
>> but at least the tests are passing for me.
>
> Pushed with some documentation tweaks.
>
> Many thanks for working on this!
Nice work! What do you think of possibility to add restore_command?

Is it a good idea to make a restore_command to be reloadable via SUGHUP
too?

On the one hand it looks reasonable since primary_conninfo got such
ability. On the other hand we don't exactly know whether the actual
process after reload config would use "new" command or "old"  since it
may take a long time to finish running old command (in case of scp or
smth).

Also setting faulty restore_command lead to server termination, which is
rather unexpectedly.
If anyone makes some minor typo in restore_command, say write
restore_command='cp1 /mnt/srv/%f %p' instead of restore_command='cp
/mnt/srv/%f %p' and do SELECT pg_reload_conf() then server will
terminate.

Do you consider this feature useful? Any suggestions?

---
Best regards,
Maxim Orlov.

restore_command-sighup.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: allow online change primary_conninfo

Sergei Kornilov
Hello

Yep, I think it's useful and I already posted patch in this thread: https://www.postgresql.org/message-id/flat/3090621585393698%40myt5-1466095fe4e5.qloud-c.yandex.net#ee6574e93982b5628d140f15ffffcb44
Currently without consensus

regards, Sergei


1234