Improve handling of parameter differences in physical replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Improve handling of parameter differences in physical replication

Peter Eisentraut-6
When certain parameters are changed on a physical replication primary,
   this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down with
a fatal error.

The correspondence of settings between primary and standby is required
because those settings influence certain shared memory sizings that are
required for processing WAL records that the primary might send.  For
example, if the primary sends a prepared transaction, the standby must
have had max_prepared_transaction set appropriately or it won't be able
to process those WAL records.

However, fatally shutting down the standby immediately upon receipt of
the parameter change record might be a bit of an overreaction.  The
resources related to those settings are not required immediately at that
point, and might never be required if the activity on the primary does
not exhaust all those resources.  An extreme example is raising
max_prepared_transactions on the primary but never actually using
prepared transactions.

Where this becomes a serious problem is if you have many standbys and
you do a failover.  If the newly promoted standby happens to have a
higher setting for one of the relevant parameters, all the other
standbys that have followed it then shut down immediately and won't be
able to continue until you change all their settings.

If we didn't do the hard shutdown and we just let the standby roll on
with recovery, nothing bad will happen and it will eventually produce an
appropriate error when those resources are required (e.g., "maximum
number of prepared transactions reached").

So I think there are better ways to handle this.  It might be reasonable
to provide options.  The attached patch doesn't do that but it would be
pretty easy.  What the attached patch does is:

Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but
only issue a warning and set a global flag if there is a problem.  Then
when we actually hit the resource issue and the flag was set, we issue
another warning message with relevant information.  Additionally, at
that point we pause recovery instead of shutting down, so a hot standby
remains usable.  (That could certainly be configurable.)

Btw., I think the current setup is slightly buggy.  The MaxBackends
value that is used to size shared memory is computed as MaxConnections +
autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but
we don't track autovacuum_max_workers in WAL.

(This patch was developed together with Simon Riggs.)

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v1-0001-Improve-handling-of-parameter-differences-in-phys.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Sergei Kornilov
Hello

Thank you for working on this!

> Where this becomes a serious problem is if you have many standbys and you do a failover.

+1
Several times my team would like to pause recovery instead of panic after change settings on primary. (same thing for create_tablespace_directories replay errors too...)

We documented somewhere (excluding code) shutting down the standby immediately upon receipt of the parameter change? doc/src/sgml/high-availability.sgml says only about "refuse to start".

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Fujii Masao-4
In reply to this post by Peter Eisentraut-6


On 2020/02/27 17:23, Peter Eisentraut wrote:

> When certain parameters are changed on a physical replication primary,   this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL record.  The standby then checks whether its own settings are at least as big as the ones on the primary.  If not, the standby shuts down with a fatal error.
>
> The correspondence of settings between primary and standby is required because those settings influence certain shared memory sizings that are required for processing WAL records that the primary might send.  For example, if the primary sends a prepared transaction, the standby must have had max_prepared_transaction set appropriately or it won't be able to process those WAL records.
>
> However, fatally shutting down the standby immediately upon receipt of the parameter change record might be a bit of an overreaction.  The resources related to those settings are not required immediately at that point, and might never be required if the activity on the primary does not exhaust all those resources.  An extreme example is raising max_prepared_transactions on the primary but never actually using prepared transactions.
>
> Where this becomes a serious problem is if you have many standbys and you do a failover.  If the newly promoted standby happens to have a higher setting for one of the relevant parameters, all the other standbys that have followed it then shut down immediately and won't be able to continue until you change all their settings.
>
> If we didn't do the hard shutdown and we just let the standby roll on with recovery, nothing bad will happen and it will eventually produce an appropriate error when those resources are required (e.g., "maximum number of prepared transactions reached").
>
> So I think there are better ways to handle this.  It might be reasonable to provide options.  The attached patch doesn't do that but it would be pretty easy.  What the attached patch does is:
>
> Upon receipt of XLOG_PARAMETER_CHANGE, we still check the settings but only issue a warning and set a global flag if there is a problem.  Then when we actually hit the resource issue and the flag was set, we issue another warning message with relevant information.  Additionally, at that point we pause recovery instead of shutting down, so a hot standby remains usable.  (That could certainly be configurable.)

+1
> Btw., I think the current setup is slightly buggy.  The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL.

Maybe this is because autovacuum doesn't work during recovery?

Regards,

--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
On 2020-02-27 11:13, Fujii Masao wrote:
>> Btw., I think the current setup is slightly buggy.  The MaxBackends value that is used to size shared memory is computed as MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + max_wal_senders, but we don't track autovacuum_max_workers in WAL.
> Maybe this is because autovacuum doesn't work during recovery?

Autovacuum on the primary can use locks or xids, and so it's possible
that the standby when processing WAL encounters more of those than it
has locally allocated shared memory to handle.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Michael Paquier-2
On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote:

> On 2020-02-27 11:13, Fujii Masao wrote:
>>> Btw., I think the current setup is slightly buggy.  The
> MaxBackends value that is used to size shared memory is computed as
> MaxConnections + autovacuum_max_workers + 1 + max_worker_processes +
> max_wal_senders, but we don't track autovacuum_max_workers in WAL.
>> Maybe this is because autovacuum doesn't work during recovery?
>
> Autovacuum on the primary can use locks or xids, and so it's possible that
> the standby when processing WAL encounters more of those than it has locally
> allocated shared memory to handle.
Putting aside your patch because that sounds like a separate issue..
Doesn't this mean that autovacuum_max_workers should be added to the
control file, that we need to record in WAL any updates done to it and
that CheckRequiredParameterValues() is wrong?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
On 2020-02-28 08:45, Michael Paquier wrote:

> On Thu, Feb 27, 2020 at 02:37:24PM +0100, Peter Eisentraut wrote:
>> On 2020-02-27 11:13, Fujii Masao wrote:
>>>> Btw., I think the current setup is slightly buggy.  The
>> MaxBackends value that is used to size shared memory is computed as
>> MaxConnections + autovacuum_max_workers + 1 + max_worker_processes +
>> max_wal_senders, but we don't track autovacuum_max_workers in WAL.
>>> Maybe this is because autovacuum doesn't work during recovery?
>>
>> Autovacuum on the primary can use locks or xids, and so it's possible that
>> the standby when processing WAL encounters more of those than it has locally
>> allocated shared memory to handle.
>
> Putting aside your patch because that sounds like a separate issue..
> Doesn't this mean that autovacuum_max_workers should be added to the
> control file, that we need to record in WAL any updates done to it and
> that CheckRequiredParameterValues() is wrong?

That would be a direct fix, yes.

Perhaps it might be better to track the combined MaxBackends instead,
however.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Michael Paquier-2
On Fri, Feb 28, 2020 at 08:49:08AM +0100, Peter Eisentraut wrote:
> Perhaps it might be better to track the combined MaxBackends instead,
> however.

Not sure about that.  I think that we should keep them separated, as
that's more useful for debugging and more verbose for error reporting.

(Worth noting that max_prepared_xacts is separate because of its dummy
PGPROC entries created by PREPARE TRANSACTION, so it cannot be
included in the set).
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Alvaro Herrera-9
In reply to this post by Peter Eisentraut-6
On 2020-Feb-27, Peter Eisentraut wrote:

> So this patch relaxes this a bit.  Upon receipt of
> XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
> warning and set a global flag if there is a problem.  Then when we
> actually hit the resource issue and the flag was set, we issue another
> warning message with relevant information.  Additionally, at that
> point we pause recovery, so a hot standby remains usable.

Hmm, so what is the actual end-user behavior?  As I read the code, we
first send the WARNING, then pause recovery until the user resumes
replication; at that point we raise the original error.  Presumably, at
that point the startup process terminates and is relaunched, and replay
continues normally.  Is that it?

I think if the startup process terminates because of the original error,
after it is unpaused, postmaster will get that as a signal to do a
crash-recovery cycle, closing all existing connections.  Is that right?
If so, it would be worth improving that (possibly by adding a
sigsetjmp() block) to avoid the disruption.

Thanks,

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Masahiko Sawada-2
On Sat, 29 Feb 2020 at 06:39, Alvaro Herrera <[hidden email]> wrote:

>
> On 2020-Feb-27, Peter Eisentraut wrote:
>
> > So this patch relaxes this a bit.  Upon receipt of
> > XLOG_PARAMETER_CHANGE, we still check the settings but only issue a
> > warning and set a global flag if there is a problem.  Then when we
> > actually hit the resource issue and the flag was set, we issue another
> > warning message with relevant information.  Additionally, at that
> > point we pause recovery, so a hot standby remains usable.
>
> Hmm, so what is the actual end-user behavior?  As I read the code, we
> first send the WARNING, then pause recovery until the user resumes
> replication; at that point we raise the original error.

I think after recovery is paused users will be better to restart the
server rather than resume the recovery. I agree with this idea but I'm
slightly concerned that users might not realize that recovery is
paused until they look at that line in server log or at
pg_stat_replication because the standby server is still functional. So
I think we can periodically send WARNING to inform user that we're
still waiting for parameter change and restart.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
In reply to this post by Alvaro Herrera-9
On 2020-02-28 16:33, Alvaro Herrera wrote:
> Hmm, so what is the actual end-user behavior?  As I read the code, we
> first send the WARNING, then pause recovery until the user resumes
> replication; at that point we raise the original error.  Presumably, at
> that point the startup process terminates and is relaunched, and replay
> continues normally.  Is that it?

No, at that point you get the original, current behavior that the server
instance shuts down with a fatal error.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
In reply to this post by Masahiko Sawada-2
On 2020-03-09 09:11, Masahiko Sawada wrote:
> I think after recovery is paused users will be better to restart the
> server rather than resume the recovery. I agree with this idea but I'm
> slightly concerned that users might not realize that recovery is
> paused until they look at that line in server log or at
> pg_stat_replication because the standby server is still functional. So
> I think we can periodically send WARNING to inform user that we're
> still waiting for parameter change and restart.

I think that would be annoying, unless you create a system for
configuring those periodic warnings.

I imagine in a case like having set max_prepared_transactions but never
actually using prepared transactions, people will just ignore the
warning until they have their next restart, so it could be months of
periodic warnings.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Masahiko Sawada-2
On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut
<[hidden email]> wrote:

>
> On 2020-03-09 09:11, Masahiko Sawada wrote:
> > I think after recovery is paused users will be better to restart the
> > server rather than resume the recovery. I agree with this idea but I'm
> > slightly concerned that users might not realize that recovery is
> > paused until they look at that line in server log or at
> > pg_stat_replication because the standby server is still functional. So
> > I think we can periodically send WARNING to inform user that we're
> > still waiting for parameter change and restart.
>
> I think that would be annoying, unless you create a system for
> configuring those periodic warnings.
>
> I imagine in a case like having set max_prepared_transactions but never
> actually using prepared transactions, people will just ignore the
> warning until they have their next restart, so it could be months of
> periodic warnings.

Well I meant to periodically send warning messages while waiting for
parameter change, that is after exhausting resources and stopping
recovery. In this situation user need to notice that as soon as
possible.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Kyotaro Horiguchi-4
At Mon, 9 Mar 2020 21:13:38 +0900, Masahiko Sawada <[hidden email]> wrote in

> On Mon, 9 Mar 2020 at 18:45, Peter Eisentraut
> <[hidden email]> wrote:
> >
> > On 2020-03-09 09:11, Masahiko Sawada wrote:
> > > I think after recovery is paused users will be better to restart the
> > > server rather than resume the recovery. I agree with this idea but I'm
> > > slightly concerned that users might not realize that recovery is
> > > paused until they look at that line in server log or at
> > > pg_stat_replication because the standby server is still functional. So
> > > I think we can periodically send WARNING to inform user that we're
> > > still waiting for parameter change and restart.
> >
> > I think that would be annoying, unless you create a system for
> > configuring those periodic warnings.
> >
> > I imagine in a case like having set max_prepared_transactions but never
> > actually using prepared transactions, people will just ignore the
> > warning until they have their next restart, so it could be months of
> > periodic warnings.
>
> Well I meant to periodically send warning messages while waiting for
> parameter change, that is after exhausting resources and stopping
> recovery. In this situation user need to notice that as soon as
> possible.

If we lose connection, standby continues to complain about lost
connection every 5 seconds.  This is a situation of that kind.

By the way, when I reduced max_connection only on master then take
exclusive locks until standby complains on lock exchaustion, I see a
WARNING that is saying max_locks_per_transaction instead of
max_connection.


WARNING:  insufficient setting for parameter max_connections
DETAIL:  max_connections = 2 is a lower setting than on the master server (where its value was 3).
HINT:  Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
CONTEXT:  WAL redo at 0/60000A0 for XLOG/PARAMETER_CHANGE: max_connections=3 max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0 max_locks_per_xact=10 wal_level=replica wal_log_hints=off track_commit_timestamp=off
WARNING:  recovery paused because of insufficient setting of parameter max_locks_per_transaction (currently 10)
DETAIL:  The value must be at least as high as on the primary server.
HINT:  Recovery cannot continue unless the parameter is changed and the server restarted.
CONTEXT:  WAL redo at 0/6004A80 for Standb


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
On 2020-03-10 09:57, Kyotaro Horiguchi wrote:
>> Well I meant to periodically send warning messages while waiting for
>> parameter change, that is after exhausting resources and stopping
>> recovery. In this situation user need to notice that as soon as
>> possible.
>
> If we lose connection, standby continues to complain about lost
> connection every 5 seconds.  This is a situation of that kind.

My argument is that it's not really the same.  If a standby is
disconnected for more than a few minutes, it's really not going to be a
good standby anymore after a while.  In this case, however, having
certain parameter discrepancies is really harmless and you can run with
it for a long time.  I'm not strictly opposed to a periodic warning, but
it's unclear to me how we would find a good interval.

> By the way, when I reduced max_connection only on master then take
> exclusive locks until standby complains on lock exchaustion, I see a
> WARNING that is saying max_locks_per_transaction instead of
> max_connection.
>
>
> WARNING:  insufficient setting for parameter max_connections
> DETAIL:  max_connections = 2 is a lower setting than on the master server (where its value was 3).
> HINT:  Change parameters and restart the server, or there may be resource exhaustion errors sooner or later.
> CONTEXT:  WAL redo at 0/60000A0 for XLOG/PARAMETER_CHANGE: max_connections=3 max_worker_processes=8 max_wal_senders=2 max_prepared_xacts=0 max_locks_per_xact=10 wal_level=replica wal_log_hints=off track_commit_timestamp=off
> WARNING:  recovery paused because of insufficient setting of parameter max_locks_per_transaction (currently 10)
> DETAIL:  The value must be at least as high as on the primary server.
> HINT:  Recovery cannot continue unless the parameter is changed and the server restarted.
> CONTEXT:  WAL redo at 0/6004A80 for Standb

This is all a web of half-truths.  The lock tables are sized based on
max_locks_per_xact * (MaxBackends + max_prepared_xacts).  So if you run
out of lock space, we currently recommend (in the single-server case),
that you raise max_locks_per_xact, but you could also raise
max_prepared_xacts or something else.  So this is now the opposite case
where the lock table on the master was bigger because of max_connections.

We could make the advice less specific and just say, in essence, you
need to make some parameter changes; see earlier for some hints.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Kyotaro Horiguchi-4
At Tue, 10 Mar 2020 14:47:47 +0100, Peter Eisentraut <[hidden email]> wrote in

> On 2020-03-10 09:57, Kyotaro Horiguchi wrote:
> >> Well I meant to periodically send warning messages while waiting for
> >> parameter change, that is after exhausting resources and stopping
> >> recovery. In this situation user need to notice that as soon as
> >> possible.
> > If we lose connection, standby continues to complain about lost
> > connection every 5 seconds.  This is a situation of that kind.
>
> My argument is that it's not really the same.  If a standby is
> disconnected for more than a few minutes, it's really not going to be
> a good standby anymore after a while.  In this case, however, having
> certain parameter discrepancies is really harmless and you can run
> with it for a long time.  I'm not strictly opposed to a periodic
> warning, but it's unclear to me how we would find a good interval.

I meant the behavior after streaming is paused.  That situation leads
to loss of WAL or running out of WAL storage on the master.  Actually
5 seconds as a interval would be too frequent, but, maybe, we need at
least one message for a WAL segment-size?

> > By the way, when I reduced max_connection only on master then take
> > exclusive locks until standby complains on lock exchaustion, I see a
> > WARNING that is saying max_locks_per_transaction instead of
> > max_connection.
...

> > WARNING: recovery paused because of insufficient setting of parameter
> > max_locks_per_transaction (currently 10)
> > DETAIL:  The value must be at least as high as on the primary server.
> > HINT: Recovery cannot continue unless the parameter is changed and the
> > server restarted.
> > CONTEXT:  WAL redo at 0/6004A80 for Standb
>
> This is all a web of half-truths.  The lock tables are sized based on
> max_locks_per_xact * (MaxBackends + max_prepared_xacts).  So if you
> run out of lock space, we currently recommend (in the single-server
> case), that you raise max_locks_per_xact, but you could also raise
> max_prepared_xacts or something else.  So this is now the opposite
> case where the lock table on the master was bigger because of
> max_connections.

Yeah, I know.  So, I'm not sure whether the checks on individual GUC
variable (other than wal_level) makes sense.  We might even not need
the WARNING on parameter change.

> We could make the advice less specific and just say, in essence, you
> need to make some parameter changes; see earlier for some hints.

In that sense the direction menetioned above seems sensible.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
Here is an updated patch that incorporates some of the suggestions.  In
particular, some of the warning messages have been rephrased to more
accurate (but also less specific), the warning message at recovery pause
repeats every 1 minute, and the documentation has been updated.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v2-0001-Improve-handling-of-parameter-differences-in-phys.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Masahiko Sawada-2
On Thu, 12 Mar 2020 at 04:34, Peter Eisentraut
<[hidden email]> wrote:
>
> Here is an updated patch that incorporates some of the suggestions.  In
> particular, some of the warning messages have been rephrased to more
> accurate (but also less specific), the warning message at recovery pause
> repeats every 1 minute, and the documentation has been updated.
>

Thank you for updating the patch. I have one comment on the latest
version patch:

+   do
+   {
+       TimestampTz now = GetCurrentTimestamp();
+
+       if (TimestampDifferenceExceeds(last_warning, now, 60000))
+       {
+           ereport(WARNING,
+                   (errmsg("recovery paused because of insufficient
parameter settings"),
+                    errdetail("See earlier in the log about which
settings are insufficient."),
+                    errhint("Recovery cannot continue unless the
configuration is changed and the server restarted.")));
+           last_warning = now;
+       }
+
+       pg_usleep(1000000L);    /* 1000 ms */
+       HandleStartupProcInterrupts();
+   }

I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here.

The others look good to me.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Sergei Kornilov
Hello

> I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here.

+1, since we added this in recoveryPausesHere.

PS: do we need to add a prototype for the RecoveryRequiredIntParameter function in top of xlog.c?

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: Improve handling of parameter differences in physical replication

Peter Eisentraut-6
On 2020-03-27 20:15, Sergei Kornilov wrote:
>> I think we can set wait event WAIT_EVENT_RECOVERY_PAUSE here.
>
> +1, since we added this in recoveryPausesHere.

committed with that addition

> PS: do we need to add a prototype for the RecoveryRequiredIntParameter function in top of xlog.c?

There is no consistent style, I think, but I usually only add prototypes
for static functions if they are required because of the ordering in the
file.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services