Re: max_standby_delay considered harmful

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
173 messages Options
1234 ... 9
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
On Mon, 2010-05-03 at 15:04 -0700, Josh Berkus wrote:

> I don't see the issue with Tom's approach from a wait perspective.  The
> max wait becomes 1.001X max_standby_delay; there's no way I can think of
> that replay would wait longer than that.  I've yet to see an explanation
> why it would be longer.

Yes, the max wait on any *one* blocker will be max_standby_delay. But if
you wait for two blockers, then the total time by which the standby lags
will now be 2*max_standby_delay. Add a third, fourth etc and the standby
lag keeps rising.

We need to avoid confusing these two measurables

* standby lag - defined as the total delay from when a WAL record is
written to the time the WAL record is applied. This includes both
transfer time and any delays imposed by Hot Standby.

* standby query delay - defined as the time that recovery will wait for
a query to complete before a cancellation takes place. (We could
complicate this by asking what happens when recovery is blocked twice by
the same query? Would it wait twice, or does it have to track how much
it has waited for each query in total so far?)

Currently max_standby_delay seeks to constrain the standby lag to a
particular value, as a way of providing a bounded time for failover, and
also to constrain the amount of WAL that needs to be stored as the lag
increases. Currently, there is no guaranteed minimum query delay given
to each query.

If every query is guaranteed its requested query delay then the standby
lag will be unbounded. Less cancellations, higher lag. Some people do
want this, though is not currently available. We can do this with two
new GUCs:

* standby_query_delay - USERSET parameter that allows user to specify a
guaranteed query delay, anywhere from 0 to maximum_standby_query_delay

* max_standby_query_delay - SIGHUP parameter - parameter exists to
provide DBA with a limit on the USERSET standby_query_delay, though I
can see some would say this is optional

Current behaviour is same as global settings of
standby_query_delay = 0
max_standby_query_delay = 0
max_standby_delay = X

So if people want minimal cancellations they would specify
standby_query_delay = Y (e.g. 30)
max_standby_query_delay = Z (e.g. 300)
max_standby_delay = -1

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote:

> As I remember, 9.0 has two behaviors:
>
> o  master delays vacuum cleanup
> o  slave delays WAL application
>
> and in 9.1 we will be adding:
>
> o  slave communicates snapshots to master

> How would this figure into what we ultimately want in 9.1?

We would still want all options, since "slave communicates snapshot to
master" doesn't solve the problem it just moves the problem elsewhere.
It's a question of which factors the user wishes to emphasise for their
specific use.

> I understand Simon's point that the two behaviors have different
> benefits.  However, I believe few users will be able to understand when
> to use which.

If users can understand how to set NDISTINCT for a column, they can
understand this. It's not about complexity of UI, its about solving
problems. When people hit an issue, I don't want to be telling people
"we thought you wouldn't understand it, so we removed the parachute".
They might not understand it *before* they hit a problem, so what? But
users certainly will afterwards and won't say "thanks" if you prevent an
option for them, especially for the stated reason. (My point about
ndistinct: 99% of users have no idea that exists or when to use it, but
it still exists as an option because it solves a known issue, just like
this.)

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Robert Haas
On Tue, May 4, 2010 at 4:37 AM, Simon Riggs <[hidden email]> wrote:
> option for them, especially for the stated reason. (My point about
> ndistinct: 99% of users have no idea that exists or when to use it, but
> it still exists as an option because it solves a known issue, just like
> this.)

Slightly OT, but funnily enough, when I was up in New York a couple of
weeks ago with Bruce and a couple of other folks, I started talking
with a DBA up there about his frustrations with PostgreSQL, and - I'm
not making this up - the first example he gave me of something he
wished he could do in PG to improve query planning was manually
override ndistinct estimates.  He was pleased to here that we'll have
that in 9.0 and I was pleased to be able to tell him it was my patch.
If you'd asked me what the odds that someone picking a missing feature
would have come up with that one were, I'd have said a billion-to-one
against.  But I'm not making this up.

To be honest, I am far from convinced that the existing behavior is a
good one and I'm in favor of modifying it or ripping it out altogether
if we can think of something better.  But it has to really be better,
of course, not just trading one set of pain points for another.

...Robert

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
On Tue, 2010-05-04 at 07:13 -0400, Robert Haas wrote:

> On Tue, May 4, 2010 at 4:37 AM, Simon Riggs <[hidden email]> wrote:
> > option for them, especially for the stated reason. (My point about
> > ndistinct: 99% of users have no idea that exists or when to use it, but
> > it still exists as an option because it solves a known issue, just like
> > this.)
>
> Slightly OT, but funnily enough, when I was up in New York a couple of
> weeks ago with Bruce and a couple of other folks, I started talking
> with a DBA up there about his frustrations with PostgreSQL, and - I'm
> not making this up - the first example he gave me of something he
> wished he could do in PG to improve query planning was manually
> override ndistinct estimates.  He was pleased to here that we'll have
> that in 9.0 and I was pleased to be able to tell him it was my patch.
> If you'd asked me what the odds that someone picking a missing feature
> would have come up with that one were, I'd have said a billion-to-one
> against.  But I'm not making this up.

It matches my experience. I think its a testament to the expertise of
our users as well to the hackers that have done so much to make that the
top of user's lists for change.

> To be honest, I am far from convinced that the existing behavior is a
> good one and I'm in favor of modifying it or ripping it out altogether
> if we can think of something better.  But it has to really be better,
> of course, not just trading one set of pain points for another.

The only way I see as genuine better rather than just a different mix of
trade-offs is to come up with ways where there are no conflicts. Hannu
came up with one, using filesystem snapshots, but we haven't had time to
implement that yet.

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Stephen Frost
In reply to this post by Simon Riggs
* Simon Riggs ([hidden email]) wrote:
> If recovery waits for max_standby_delay every time something gets in its
> way, it should be clear that if many things get in its way it will
> progressively fall behind. There is no limit to this and it can always
> fall further behind. It does result in fewer cancelled queries and I do
> understand many may like that.

Guess I wasn't very clear in my previous description of what I *think*
the change would be (Tom, please jump in if I've got this wrong..).
Recovery wouldn't wait max_standby_delay every time; I agree, that would
be a big change in behaviour and could make it very difficult for the
slave to keep up.  Rather, recovery would proceed as normal until it
encounters a lock, at which point it would start a counting down from
max_standby_delay, if the lock is released before it hits that, then it
will move on, if another lock is encoutered, it would start counting
down from where it left off last time.  If it hits zero, it'll cancel
the other query, and any other queries that get in the way, until it's
caught up again completely.  Once recovery is fully caught up, the
counter would reset again to max_standby_delay.

> That is *significantly* different from how it works now. (Plus: If there
> really was no difference, why not leave it as is?)

Because it's much more complicated the way it is, it doesn't really work
as one would expect in a number of situations, and it's trying to
guarantee something that it probably can't.

> The bottom line is this is about conflict resolution. There is simply no
> way to resolve conflicts without favouring one or other of the
> protagonists. Whatever mechanism you come up with that favours one will,
> disfavour the other. I'm happy to give choices, but I'm not happy to
> force just one kind of conflict resolution.

I don't think anyone is trying to get rid of the knob entirely; you're
right, you can't please everyone all the time, so there has to be some
kind of knob there which people can adjust based on their particular use
case and system.  This is about what exactly the knob is and how it's
implemented and documented.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
On Tue, 2010-05-04 at 09:12 -0400, Stephen Frost wrote:

> * Simon Riggs ([hidden email]) wrote:
> > If recovery waits for max_standby_delay every time something gets in its
> > way, it should be clear that if many things get in its way it will
> > progressively fall behind. There is no limit to this and it can always
> > fall further behind. It does result in fewer cancelled queries and I do
> > understand many may like that.
>
> Guess I wasn't very clear in my previous description of what I *think*
> the change would be (Tom, please jump in if I've got this wrong..).
> Recovery wouldn't wait max_standby_delay every time; I agree, that would
> be a big change in behaviour and could make it very difficult for the
> slave to keep up.  Rather, recovery would proceed as normal until it
> encounters a lock, at which point it would start a counting down from
> max_standby_delay, if the lock is released before it hits that, then it
> will move on, if another lock is encoutered, it would start counting
> down from where it left off last time.  If it hits zero, it'll cancel
> the other query, and any other queries that get in the way, until it's
> caught up again completely.  Once recovery is fully caught up, the
> counter would reset again to max_standby_delay.

This new clarification is almost exactly how it works already. Sounds
like the existing docs need some improvement.

The only difference is that max_standby_delay is measured from log
timestamp. Perhaps it should work from WAL receipt timestamp rather than
from log timestamp? That would make some of the problems go away without
significantly changing the definition. I'll look at that.

(And that conflicts are caused by more situations than just locks, but
that detail doesn't alter your point).

> > The bottom line is this is about conflict resolution. There is simply no
> > way to resolve conflicts without favouring one or other of the
> > protagonists. Whatever mechanism you come up with that favours one will,
> > disfavour the other. I'm happy to give choices, but I'm not happy to
> > force just one kind of conflict resolution.
>
> I don't think anyone is trying to get rid of the knob entirely; you're
> right, you can't please everyone all the time, so there has to be some
> kind of knob there which people can adjust based on their particular use
> case and system.  This is about what exactly the knob is and how it's
> implemented and documented.

I'm happy with more than one way. It'd be nice if a single parameter,
giving one dimension of tuning, suited all ways people have said they
would like it to behave. I've not found a way of doing that.

I have no problem at all with adding additional parameters or mechanisms
to cater for the multiple dimensions of control people have asked for.
So your original interpretation is also valid for some users.

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
In reply to this post by Simon Riggs
Downthread, I said..

On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote:

> The only difference is that max_standby_delay is measured from log
> timestamp. Perhaps it should work from WAL receipt timestamp rather than
> from log timestamp? That would make some of the problems go away without
> significantly changing the definition. I'll look at that.

Patch to implement this idea attached: for discussion, not tested yet.
No docs yet.

The attached patch redefines "standby delay" to be the amount of time
elapsed from point of receipt to point of application. The "point of
receipt" is reset every chunk of data when streaming, or every file when
reading file by file. In all cases this new time is later than the
latest log time we would have used previously.

This addresses all of your points, as shown below.

On Mon, 2010-05-03 at 11:37 -0400, Tom Lane wrote:
> There are three really fundamental problems with it:
>
> 1. The timestamps we are reading from the log might be historical,
> if we are replaying from archive rather than reading a live SR stream.
> In the current implementation that means zero grace period for standby
> queries.  Now if your only interest is catching up as fast as possible,
> that could be a sane behavior, but this is clearly not the only possible
> interest --- in fact, if that's all you care about, why did you allow
> standby queries at all?

The delay used is from time of receipt of WAL, no longer from log date.
So this would no longer apply.

> 2. There could be clock skew between the master and slave servers.
> If the master's clock is a minute or so ahead of the slave's, again we
> get into a situation where standby queries have zero grace period, even
> though killing them won't do a darn thing to permit catchup.  If the
> master is behind the slave then we have an artificially inflated grace
> period, which is going to slow down the slave.

The timestamp is from standby, not master, so this would no longer
apply.

> 3. There could be significant propagation delay from master to slave,
> if the WAL stream is being transmitted with pg_standby or some such.
> Again this results in cutting into the standby queries' grace period,
> for no defensible reason.

The timestamp is taken immediately at the point the WAL is ready for
replay, so other timing overheads would not be included.

> In addition to these fundamental problems there's a fatal implementation
> problem: the actual comparison is not to the master's current clock
> reading, but to the latest commit, abort, or checkpoint timestamp read
> from the WAL.  Thus, if the last commit was more than max_standby_delay
> seconds ago, zero grace time.  Now if the master is really idle then
> there aren't going to be any conflicts anyway, but what if it's running
> only long-running queries?  Or what happens when it was idle for awhile
> and then starts new queries?  Zero grace period, that's what.
>
> We could possibly improve matters for the SR case by having walsender
> transmit the master's current clock reading every so often (probably
> once per activity cycle), outside the WAL stream proper.  The receiver
> could subtract off its own clock reading in order to measure the skew,
> and then we could cancel queries if the de-skewed transmission time
> falls too far behind.  However this doesn't do anything to fix the cases
> where we aren't reading (and caught up to) a live SR broadcast.
--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

walrcv_timestamp.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
In reply to this post by Simon Riggs
On Tue, 2010-05-04 at 14:49 +0100, Simon Riggs wrote:

> The only difference is that max_standby_delay is measured from log
> timestamp. Perhaps it should work from WAL receipt timestamp rather than
> from log timestamp? That would make some of the problems go away without
> significantly changing the definition. I'll look at that.

Patch to implement this idea posted in response to OT, upthread, so I
can respond to the original complaints directly.

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Bruce Momjian
In reply to this post by Simon Riggs
Simon Riggs wrote:

> On Mon, 2010-05-03 at 22:45 -0400, Bruce Momjian wrote:
>
> > As I remember, 9.0 has two behaviors:
> >
> > o  master delays vacuum cleanup
> > o  slave delays WAL application
> >
> > and in 9.1 we will be adding:
> >
> > o  slave communicates snapshots to master
>
> > How would this figure into what we ultimately want in 9.1?
>
> We would still want all options, since "slave communicates snapshot to
> master" doesn't solve the problem it just moves the problem elsewhere.
> It's a question of which factors the user wishes to emphasise for their
> specific use.
>
> > I understand Simon's point that the two behaviors have different
> > benefits.  However, I believe few users will be able to understand when
> > to use which.
>
> If users can understand how to set NDISTINCT for a column, they can
> understand this. It's not about complexity of UI, its about solving
> problems. When people hit an issue, I don't want to be telling people
> "we thought you wouldn't understand it, so we removed the parachute".
> They might not understand it *before* they hit a problem, so what? But
> users certainly will afterwards and won't say "thanks" if you prevent an
> option for them, especially for the stated reason. (My point about
> ndistinct: 99% of users have no idea that exists or when to use it, but
> it still exists as an option because it solves a known issue, just like
> this.)

Well, this is kind of my point --- that if few people are going to need
a parameter and it is going to take us to tell them to use it, it isn't
a good parameter because the other 99.9% are going to stare at the
parameters and not konw what it does or how it is different from other
similar parameters.  Adding another parameter might help 0.1% of our
users, but it is going to confuse the other 99.9%.  :-(

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
On Tue, 2010-05-04 at 13:00 -0400, Bruce Momjian wrote:

> Well, this is kind of my point --- that if few people are going to need
> a parameter and it is going to take us to tell them to use it, it isn't
> a good parameter because the other 99.9% are going to stare at the
> parameters and not konw what it does or how it is different from other
> similar parameters.  Adding another parameter might help 0.1% of our
> users, but it is going to confuse the other 99.9%.  :-(

You've missed my point. Most users of HS will need these parameters.
There is no need to understand them immediately, nor do I expect them to
do so. People won't understand why they exist until they've understood
the actual behaviour, received some errors and *then* they will
understand them, want them and need them. Just like deadlocks, ndistinct
and loads of other features we provide and support.

The current behaviour of max_standby_delay is designed to favour High
Availability users, not query users. I doubt that users with HA concerns
are only 0.1% of our users. I've accepted that some users may not put
that consideration first and so adding some minor, easy to implement
additional parameters will improve the behaviour for those people.
Forcing just one behaviour will be bad for many people.

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Josh berkus
In reply to this post by Simon Riggs
Simon,

> Yes, the max wait on any *one* blocker will be max_standby_delay. But if
> you wait for two blockers, then the total time by which the standby lags
> will now be 2*max_standby_delay. Add a third, fourth etc and the standby
> lag keeps rising.

I still don't see how that works.  If we're locking for applying log
segments, then any query which came in after the recovery lock would,
presumably, wait.  So you'd have a lot of degraded query performance,
but no more than max_standby_delay of waiting to apply logs.

I'm more interested in your assertion that there's a lot in the
replication stream which doesn't take a lock; if that's the case, then
implementing any part of Tom's proposal is hopeless.

> * standby query delay - defined as the time that recovery will wait for
> a query to complete before a cancellation takes place. (We could
> complicate this by asking what happens when recovery is blocked twice by
> the same query? Would it wait twice, or does it have to track how much
> it has waited for each query in total so far?)

Aha!  Now I see the confusion.  AFAIK, Tom was proposing that the
pending recovery data would wait for max_standby_delay, total, then
cancel *all* queries which conflicted with it.  Now that we've talked
this out, though, I can see that this can still result in "mass cancel"
issues, just like the current max_standby_delay.   The main advantage I
can see to Tom's idea is that (presumably) it can be more discriminating
about which queries it cancels.

I agree that waiting on *each* query for "up to # time" would be a
completely different behavior, and as such, should be a option for DBAs.
 We might make it the default option, but we wouldn't make it the only
option.

Speaking of which, was *your* more discriminating query cancel ever applied?

> Currently max_standby_delay seeks to constrain the standby lag to a
> particular value, as a way of providing a bounded time for failover, and
> also to constrain the amount of WAL that needs to be stored as the lag
> increases. Currently, there is no guaranteed minimum query delay given
> to each query.

Yeah, I can just see a lot of combinational issues with this.  For
example, what if the user's network changes in some way to retard
delivery of log segments to the point where the delivery time is longer
than max_standby_delay?  To say nothing about system clock synch, which
isn't perfect even if you have it set up.

I can see DBAs who are very focussed on HA wanting a standby-lag based
control anyway, when HA is far more important than the ability to run
queries on the slave.  But I don't that that is the largest group; I
think that far more people will want to balance the two considerations.

Ultimately, as you say, we would like to have all three knobs:

standby lag: max time measured from master timestamp to slave timestamp

application lag: max time measured from local receipt of WAL records
(via log copy or recovery connection) to their application

query lag: max time any query which is blocking a recovery operation can run

These three, in combination, would let us cover most potential use
cases.  So I think you've assessed that's where we're going in the
9.1-9.2 timeframe.

However, I'd say for 9.0 that "application lag" is the least confusing
option and the least dependant on the DBA's server room setup.  So if we
can only have one of these for 9.0 (and I think going out with more than
one might be too complex, especially at this late date) I think that's
the way to go.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Greg Stark-3
In reply to this post by Simon Riggs
On Mon, May 3, 2010 at 4:37 PM, Tom Lane <[hidden email]> wrote:
> 1. The timestamps we are reading from the log might be historical,

> 2. There could be clock skew between the master and slave servers.

> 3. There could be significant propagation delay from master to slave,

So it sounds like what you're expecting is for max_standby_delay to
represent not the maximum lag between server commit and standby commit
but rather the maximum lag introduced by conflicts. Or perhaps maximum
lag introduced relative to the lag present at startup. I think it's
possible to implement either of these and it would solve all three
problems above:

The slave maintains a static measure of how far behind it is from the
master. Every time it executes a recovery operation or waits on a
conflict it adds the time it spent executing or waiting. Every time it
executes a commit record it subtracts the *difference* between this
commit record and the last. I assume we clip at 0 so it never goes
negative which has odd effects but it seems to match what I would
expect to happen.

In the face of a standby recovering historical logs then it would
start with a assumed delay of 0. As long as the conflicts don't slow
down execution of the logs so that they run slower than the server
then the measured delay would stay near 0. The only time queries would
be canceled would be if the conflicts are causing problems replaying
the logs.

In the face of clock skew it nothing changes as long as the clocks run
at the same speed.

In the face of an environment where the master is idle I think this
scheme has the same problems you described but I think this might be
manageable. Perhaps we need more timestamps in the master's log stream
aside from the commit timestamps. Or perhaps we don't care about
standby delay except when reading a commit record since any other
record isn't actually delayed unless its commit is delayed.

--
greg

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
In reply to this post by Josh berkus
On Tue, 2010-05-04 at 11:27 -0700, Josh Berkus wrote:

> I still don't see how that works.
...

The good news is we agree by the time we get to the bottom... ;-)

> I'm more interested in your assertion that there's a lot in the
> replication stream which doesn't take a lock; if that's the case, then
> implementing any part of Tom's proposal is hopeless.

(No, still valid, the idea is generic)

> > * standby query delay - defined as the time that recovery will wait for
> > a query to complete before a cancellation takes place. (We could
> > complicate this by asking what happens when recovery is blocked twice by
> > the same query? Would it wait twice, or does it have to track how much
> > it has waited for each query in total so far?)
>
> Aha!  Now I see the confusion.

BTW, Tom's proposal was approx half a sentence long so that is the
source of any confusion.

>   AFAIK, Tom was proposing that the
> pending recovery data would wait for max_standby_delay, total, then
> cancel *all* queries which conflicted with it.  Now that we've talked
> this out, though, I can see that this can still result in "mass cancel"
> issues, just like the current max_standby_delay.   The main advantage I
> can see to Tom's idea is that (presumably) it can be more discriminating
> about which queries it cancels.

As I said to Stephen, this is exactly how it works already and wasn't
what was proposed.

> I agree that waiting on *each* query for "up to # time" would be a
> completely different behavior, and as such, should be a option for DBAs.
>  We might make it the default option, but we wouldn't make it the only
> option.

Glad to hear you say that.

> Speaking of which, was *your* more discriminating query cancel ever applied?
>
> > Currently max_standby_delay seeks to constrain the standby lag to a
> > particular value, as a way of providing a bounded time for failover, and
> > also to constrain the amount of WAL that needs to be stored as the lag
> > increases. Currently, there is no guaranteed minimum query delay given
> > to each query.
>
> Yeah, I can just see a lot of combinational issues with this.  For
> example, what if the user's network changes in some way to retard
> delivery of log segments to the point where the delivery time is longer
> than max_standby_delay?  To say nothing about system clock synch, which
> isn't perfect even if you have it set up.
>
> I can see DBAs who are very focussed on HA wanting a standby-lag based
> control anyway, when HA is far more important than the ability to run
> queries on the slave.  But I don't that that is the largest group; I
> think that far more people will want to balance the two considerations.
>
> Ultimately, as you say, we would like to have all three knobs:
>
> standby lag: max time measured from master timestamp to slave timestamp
>
> application lag: max time measured from local receipt of WAL records
> (via log copy or recovery connection) to their application

> query lag: max time any query which is blocking a recovery operation can run
>
> These three, in combination, would let us cover most potential use
> cases.  So I think you've assessed that's where we're going in the
> 9.1-9.2 timeframe.
>
> However, I'd say for 9.0 that "application lag" is the least confusing
> option and the least dependant on the DBA's server room setup.  So if we
> can only have one of these for 9.0 (and I think going out with more than
> one might be too complex, especially at this late date) I think that's
> the way to go.

Before you posted, I submitted a patch on this thread to redefine
max_standby_delay to depend upon the "application lag", as you've newly
defined it here - though obviously I didn't call it that. That solves
Tom's 3 issues. max_apply_delay might be technically more accurate term,
though isn't sufficiently better parameter name as to be worth the
change.

That patch doesn't implement his proposal, but that can be done as well
as (though IMHO not instead of). Given that two people have already
misunderstood what Tom proposed, and various people are saying we need
only one, I'm getting less inclined to have that at all.

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Josh berkus

>>   AFAIK, Tom was proposing that the
>> pending recovery data would wait for max_standby_delay, total, then
>> cancel *all* queries which conflicted with it.  Now that we've talked
>> this out, though, I can see that this can still result in "mass cancel"
>> issues, just like the current max_standby_delay.   The main advantage I
>> can see to Tom's idea is that (presumably) it can be more discriminating
>> about which queries it cancels.
>
> As I said to Stephen, this is exactly how it works already and wasn't
> what was proposed.

Well, it's not exactly how it works, as I understand it ... doesn't the
timer running out on the slave currently cancel *all* running queries
with old snapshots, regardless of what relations they touch?

> Before you posted, I submitted a patch on this thread to redefine
> max_standby_delay to depend upon the "application lag", as you've newly
> defined it here - though obviously I didn't call it that. That solves
> Tom's 3 issues. max_apply_delay might be technically more accurate term,
> though isn't sufficiently better parameter name as to be worth the
> change.

Yeah, that looks less complicated for admins.  Thanks.

> That patch doesn't implement his proposal, but that can be done as well
> as (though IMHO not instead of). Given that two people have already
> misunderstood what Tom proposed, and various people are saying we need
> only one, I'm getting less inclined to have that at all.

Given your clarification on the whole set of behaviors, I'm highly
dubious about the idea of implementing Tom's proposal when we're already
Beta 1.  It seems like a 9.1 thing.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Tom Lane-2
Josh Berkus <[hidden email]> writes:
> Given your clarification on the whole set of behaviors, I'm highly
> dubious about the idea of implementing Tom's proposal when we're already
> Beta 1.  It seems like a 9.1 thing.

I think you missed the point: "do nothing" is not a viable option.
I was proposing something that seemed simple enough to be safe to
drop into 9.0 at this point.  I'm less convinced that what Simon
is proposing is safe enough.

                        regards, tom lane

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Simon Riggs
On Tue, 2010-05-04 at 18:53 -0400, Tom Lane wrote:

> I think you missed the point: "do nothing" is not a viable option.
> I was proposing something that seemed simple enough to be safe to
> drop into 9.0 at this point.

I've posted a patch that meets your stated objections. If you could
review that, this could be done in an hour.

There are other ways, but you'll need to explain a proposal in enough
detail that we're clear what you actually mean.

> I'm less convinced that what Simon is proposing is safe enough.

Which proposal?

--
 Simon Riggs           www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Greg Smith-21
In reply to this post by Simon Riggs
Tom Lane wrote:
> 1. The timestamps we are reading from the log might be historical,
> if we are replaying from archive rather than reading a live SR stream.
> In the current implementation that means zero grace period for standby
> queries.  Now if your only interest is catching up as fast as possible,
> that could be a sane behavior, but this is clearly not the only possible
> interest --- in fact, if that's all you care about, why did you allow
> standby queries at all?
>  

If the standby is not current, you may not want people to execute
queries against it.  In some situations, returning results against
obsolete data is worse than not letting the query execute at all.  As I
see it, the current max_standby_delay implementation includes the
expectation that the results you are getting are no more than
max_standby_delay behind the master, presuming that new data is still
coming in.  If the standby has really fallen further behind than that,
there are situations where you don't want it doing anything but catching
up until that is no longer the case, and you especially don't want it
returning stale query data.

The fact that tuning in that direction could mean the standby never
actually executes any queries is something you need to monitor for--it
suggests the standby isn't powerful/well connected to the master enough
to keep up--but that's not necessarily the wrong behavior.  Saying "I
only want the standby to execute queries if it's not too far behind the
master" is the answer to "why did you allow standby queries at all?"
when tuning for that use case.

> 2. There could be clock skew between the master and slave servers.
>  

Not the database's problem to worry about.  Document that time should be
carefully sync'd and move on.  I'll add that.

> 3. There could be significant propagation delay from master to slave,
> if the WAL stream is being transmitted with pg_standby or some such.
> Again this results in cutting into the standby queries' grace period,
> for no defensible reason.
>  

Then people should adjust their max_standby_delay upwards to account for
that.  For high availability purposes, it's vital that the delay number
be referenced to the commit records on the master.  If lag is eating a
portion of that, again it's something people should be monitoring for,
but not something we can correct.  The whole idea here is that
max_standby_delay is an upper bound on how stale the data on the standby
can be, and whether or not lag is a component to that doesn't impact how
the database is being asked to act.

> In addition to these fundamental problems there's a fatal implementation
> problem: the actual comparison is not to the master's current clock
> reading, but to the latest commit, abort, or checkpoint timestamp read
> from the WAL.
Right; this has been documented for months at
http://wiki.postgresql.org/wiki/Hot_Standby_TODO and on the list before
that, i.e. "If there's little activity in the master, that can lead to
surprising results."  The suggested long-term fix has been adding
keepalive timestamps into SR, which seems to get reinvented every time
somebody plays with this for a bit.  The HS documentation improvements
I'm working on will suggest that you make sure this doesn't happen, that
people have some sort of keepalive  WAL-generating activity on the
master regularly, if they expect max_standby_delay to work reasonably in
the face of an idle master.  It's not ideal, but it's straightforward to
work around in user space.

> I'm inclined to think that we should throw away all this logic and just
> have the slave cancel competing queries if the replay process waits
> more than max_standby_delay seconds to acquire a lock.  This is simple,
> understandable, and behaves the same whether we're reading live data or
> not.

I don't consider something that allows queries to execute when not
playing recent "live" data is necessarily a step forward, from the
perspective of implementations preferring high-availability.  It's
reasonable for some people to request that the last thing a standby
that's not current (<max_standby_delay behind the master, based on the
last thing received) should be doing is answering any queries, when it
doesn't have current data and it should be working on catchup instead.

Discussion here obviously has wandered past your fundamental objections
here and onto implementation trivia, but I didn't think the difference
between what you expected and what's actually committed already was
properly addressed before doing that.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
[hidden email]   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Tom Lane-2
Greg Smith <[hidden email]> writes:

> If the standby is not current, you may not want people to execute
> queries against it.  In some situations, returning results against
> obsolete data is worse than not letting the query execute at all.  As I
> see it, the current max_standby_delay implementation includes the
> expectation that the results you are getting are no more than
> max_standby_delay behind the master, presuming that new data is still
> coming in.  If the standby has really fallen further behind than that,
> there are situations where you don't want it doing anything but catching
> up until that is no longer the case, and you especially don't want it
> returning stale query data.

That is very possibly a useful thing to be able to specify, but the
current implementation has *nothing whatsoever* to do with making such a
guarantee.  It will only kill queries that are creating a lock conflict.
I would even argue that it's a bad thing to have a parameter that looks
like it might do that, when it doesn't.

                        regards, tom lane

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Josh berkus
In reply to this post by Greg Smith-21
On 5/4/10 4:26 PM, Greg Smith wrote:
>
> Not the database's problem to worry about.  Document that time should be
> carefully sync'd and move on.  I'll add that.

Releasing a hot standby which *only* works for users with an operational
ntp implementation is highly unrealistic.   Having built-in replication
in PostgreSQL was supposed to give the *majority* of users a *simple*
option for 2-server failover, not cater only to the high end.  Every
administrative requirement we add to HS/SR eliminates another set of
potential users, as well as adding another set of potential failure
conditions which need to be monitored.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: max_standby_delay considered harmful

Joshua Drake-2
On Tue, 2010-05-04 at 16:34 -0700, Josh Berkus wrote:

> On 5/4/10 4:26 PM, Greg Smith wrote:
> >
> > Not the database's problem to worry about.  Document that time should be
> > carefully sync'd and move on.  I'll add that.
>
> Releasing a hot standby which *only* works for users with an operational
> ntp implementation is highly unrealistic.   Having built-in replication
> in PostgreSQL was supposed to give the *majority* of users a *simple*
> option for 2-server failover, not cater only to the high end.  Every
> administrative requirement we add to HS/SR eliminates another set of
> potential users, as well as adding another set of potential failure
> conditions which need to be monitored.

+1

Joshua D. Drake



--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 9