Change default of checkpoint_completion_target

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Michael Paquier ([hidden email]) wrote:

> On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:
> > * Alvaro Herrera ([hidden email]) wrote:
> >> You keep making this statement, and I don't necessarily disagree, but if
> >> that is the case, please explain why don't we have
> >> checkpoint_completion_target set to 0.9 by default?  Should we change
> >> that?
> >
> > Yes, I do think we should change that..
>
> Agreed.  FWIW, no idea for others, but it is one of those parameters I
> keep telling to update after a default installation.
Concretely, attached is a patch which changes the default and updates
the documentation accordingly.

Passes regression tests and doc build.  Will register in the January
commitfest as Needs Review.

Thanks,

Stephen

cct_def_v1.patch (5K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Peter Eisentraut-7
On 2020-12-07 18:53, Stephen Frost wrote:

> * Michael Paquier ([hidden email]) wrote:
>> On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:
>>> * Alvaro Herrera ([hidden email]) wrote:
>>>> You keep making this statement, and I don't necessarily disagree, but if
>>>> that is the case, please explain why don't we have
>>>> checkpoint_completion_target set to 0.9 by default?  Should we change
>>>> that?
>>>
>>> Yes, I do think we should change that..
>>
>> Agreed.  FWIW, no idea for others, but it is one of those parameters I
>> keep telling to update after a default installation.
>
> Concretely, attached is a patch which changes the default and updates
> the documentation accordingly.

I agree with considering this change, but I wonder why the value 0.9.
Why not, say, 0.95, 0.99, or 1.0?


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Peter Eisentraut ([hidden email]) wrote:

> On 2020-12-07 18:53, Stephen Frost wrote:
> >* Michael Paquier ([hidden email]) wrote:
> >>On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote:
> >>>* Alvaro Herrera ([hidden email]) wrote:
> >>>>You keep making this statement, and I don't necessarily disagree, but if
> >>>>that is the case, please explain why don't we have
> >>>>checkpoint_completion_target set to 0.9 by default?  Should we change
> >>>>that?
> >>>
> >>>Yes, I do think we should change that..
> >>
> >>Agreed.  FWIW, no idea for others, but it is one of those parameters I
> >>keep telling to update after a default installation.
> >
> >Concretely, attached is a patch which changes the default and updates
> >the documentation accordingly.
>
> I agree with considering this change, but I wonder why the value 0.9. Why
> not, say, 0.95, 0.99, or 1.0?
The documentation (which my patch updates to match the new default)
covers this pretty well here:

https://www.postgresql.org/docs/current/wal-configuration.html

"Although checkpoint_completion_target can be set as high as 1.0, it is
best to keep it less than that (perhaps 0.9 at most) since checkpoints
include some other activities besides writing dirty buffers. A setting
of 1.0 is quite likely to result in checkpoints not being completed on
time, which would result in performance loss due to unexpected variation
in the number of WAL segments needed."

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Bossart, Nathan
On 12/7/20, 9:53 AM, "Stephen Frost" <[hidden email]> wrote:
> Concretely, attached is a patch which changes the default and updates
> the documentation accordingly.

+1 to setting checkpoint_completion_target to 0.9 by default.

Nathan

Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Tom Lane-2
"Bossart, Nathan" <[hidden email]> writes:
> On 12/7/20, 9:53 AM, "Stephen Frost" <[hidden email]> wrote:
>> Concretely, attached is a patch which changes the default and updates
>> the documentation accordingly.

> +1 to setting checkpoint_completion_target to 0.9 by default.

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that?  If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Magnus Hagander-2


On Tue, Dec 8, 2020 at 6:42 PM Tom Lane <[hidden email]> wrote:
"Bossart, Nathan" <[hidden email]> writes:
> On 12/7/20, 9:53 AM, "Stephen Frost" <[hidden email]> wrote:
>> Concretely, attached is a patch which changes the default and updates
>> the documentation accordingly.

> +1 to setting checkpoint_completion_target to 0.9 by default.

FWIW, I kind of like the idea of getting rid of it completely.
Is there really ever a good reason to set it to something different
than that?  If not, well, we have too many GUCs already, and each
of them carries nonzero performance, documentation, and maintenance
overhead.

+1.

There are plenty of cases I think where it doesn't really matter with the values, but when it does I'm not sure what it would be where something else would actually be better.

--
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Laurenz Albe
In reply to this post by Bossart, Nathan
On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:
> +1 to setting checkpoint_completion_target to 0.9 by default.

+1 for changing the default or getting rid of it, as Tom suggested.

While we are at it, could we change the default of "log_lock_waits" to "on"?

Yours,
Laurenz Albe



Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Laurenz Albe ([hidden email]) wrote:
> On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:
> > +1 to setting checkpoint_completion_target to 0.9 by default.
>
> +1 for changing the default or getting rid of it, as Tom suggested.

Attached is a patch to change it from a GUC to a compile-time #define
which is set to 0.9, with accompanying documentation updates.

> While we are at it, could we change the default of "log_lock_waits" to "on"?

While I agree that it'd be good to change quite a few of the log_X items
to be 'on' by default, I'm not planning to work on this.

Thanks,

Stephen

cct_def_v2.patch (13K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Álvaro Herrera
Howdy,

On 2020-Dec-10, Stephen Frost wrote:

> * Laurenz Albe ([hidden email]) wrote:
> > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:
> > > +1 to setting checkpoint_completion_target to 0.9 by default.
> >
> > +1 for changing the default or getting rid of it, as Tom suggested.
>
> Attached is a patch to change it from a GUC to a compile-time #define
> which is set to 0.9, with accompanying documentation updates.

I think we should leave a doc stub or at least an <indexterm>, to let
people know the GUC has been removed rather than just making it
completely invisible.  (Maybe piggyback on the stuff in [1]?)

[1] https://postgr.es/m/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@...



Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Alvaro Herrera ([hidden email]) wrote:

> On 2020-Dec-10, Stephen Frost wrote:
> > * Laurenz Albe ([hidden email]) wrote:
> > > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:
> > > > +1 to setting checkpoint_completion_target to 0.9 by default.
> > >
> > > +1 for changing the default or getting rid of it, as Tom suggested.
> >
> > Attached is a patch to change it from a GUC to a compile-time #define
> > which is set to 0.9, with accompanying documentation updates.
>
> I think we should leave a doc stub or at least an <indexterm>, to let
> people know the GUC has been removed rather than just making it
> completely invisible.  (Maybe piggyback on the stuff in [1]?)
>
> [1] https://postgr.es/m/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@...
Yes, I agree, and am involved in that thread as well- currently waiting
feedback from others about the proposed approach.

Getting a few more people looking at that thread and commenting on it
would really help us be able to move forward.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Stephen Frost ([hidden email]) wrote:

> * Alvaro Herrera ([hidden email]) wrote:
> > On 2020-Dec-10, Stephen Frost wrote:
> > > * Laurenz Albe ([hidden email]) wrote:
> > > > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:
> > > > > +1 to setting checkpoint_completion_target to 0.9 by default.
> > > >
> > > > +1 for changing the default or getting rid of it, as Tom suggested.
> > >
> > > Attached is a patch to change it from a GUC to a compile-time #define
> > > which is set to 0.9, with accompanying documentation updates.
> >
> > I think we should leave a doc stub or at least an <indexterm>, to let
> > people know the GUC has been removed rather than just making it
> > completely invisible.  (Maybe piggyback on the stuff in [1]?)
> >
> > [1] https://postgr.es/m/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@...
>
> Yes, I agree, and am involved in that thread as well- currently waiting
> feedback from others about the proposed approach.
I've tried to push that forward.  I'm happy to update this patch once
we've got agreement to move forward on that, to wit, adding to an
'obsolete' section in the documentation information about this
particular GUC and how it's been removed due to not being sensible or
necessary to continue to have.

> Getting a few more people looking at that thread and commenting on it
> would really help us be able to move forward.

This is still the case though..

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Michael Paquier-2
In reply to this post by Stephen Frost
On Thu, Dec 10, 2020 at 12:16:02PM -0500, Stephen Frost wrote:
> Attached is a patch to change it from a GUC to a compile-time #define
> which is set to 0.9, with accompanying documentation updates.

All the references to checkpoint_target_completion are removed (except
for bgwriter.h as per the patch).

>       This is because it performs a checkpoint, and the I/O
> -     required for the checkpoint will be spread out over a significant
> -     period of time, by default half your inter-checkpoint interval
> -     (see the configuration parameter
> -     <xref linkend="guc-checkpoint-completion-target"/>).  This is
> +     required for the checkpoint will be spread out over the inter-checkpoint
> +     interval (see the configuration parameter
> +     <xref linkend="guc-checkpoint-timeout"/>).  This is

It may be worth mentioning that this is spread across 90% of the last
checkpoint's duration instead.

> -   in about half the time before the next checkpoint starts.  On a system
> -   that's very close to maximum I/O throughput during normal operation,
> -   you might want to increase <varname>checkpoint_completion_target</varname>
> -   to reduce the I/O load from checkpoints.  The disadvantage of this is that
> -   prolonging checkpoints affects recovery time, because more WAL segments
> -   will need to be kept around for possible use in recovery.  Although
> -   <varname>checkpoint_completion_target</varname> can be set as high as 1.0,
> -   it is best to keep it less than that (perhaps 0.9 at most) since
> -   checkpoints include some other activities besides writing dirty buffers.
> -   A setting of 1.0 is quite likely to result in checkpoints not being
> -   completed on time, which would result in performance loss due to
> -   unexpected variation in the number of WAL segments needed.
> +   This spreads out the I/O as much as possible to have the I/O load be consistent
> +   during the checkpoint and generally throughout the operation of the system.  The
> +   disadvantage of this is that prolonging checkpoints affects recovery time,
> +   because more WAL segments will need to be kept around for possible use in recovery.
> +   A user concerned about the amount of time required to recover might wish to reduce
> +   <varname>checkpoint_timeout</varname>, causing checkpoints to happen more
> +   frequently.
>    </para>
>  
>    <para>
Again, this makes the description of the I/O spread more general,
removing the portion where half the time is used by default.  Should
this stuff also mention the spread value of 90% instead?

>   * At a checkpoint, how many WAL segments to recycle as preallocated future
>   * XLOG segments? Returns the highest segment that should be preallocated.
> @@ -8694,7 +8687,7 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes)
>   * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
>   * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
>   * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
> - * ignoring checkpoint_completion_target parameter.
> + * ignoring the CheckPointCompletionTarget.

s/the//?

>   * be a large gap between a checkpoint's redo-pointer and the checkpoint
>   * record itself, and we only start the restartpoint after we've seen the
>   * checkpoint record. (The gap is typically up to CheckPointSegments *
> - * checkpoint_completion_target where checkpoint_completion_target is the
> + * CheckPointCompletionTarget where CheckPointCompletionTarget is the
>   * value that was in effect when the WAL was generated).

The last part of this sentence does not make sense.
CheckPointCompletionTarget becomes a constant with this patch.

>   if (RecoveryInProgress())
> @@ -903,7 +902,7 @@ CheckpointerShmemInit(void)
>   * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
>   * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
>   * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
> - * ignoring checkpoint_completion_target parameter.
> + * ignoring the CheckPointCompletionTarget.

s/the//?

> + * CheckPointCompletionTarget used to be exposed as a GUC named
> + * checkpoint_completion_target, but there's little evidence to suggest that
> + * there's actually a case for it being a different value, so it's no longer
> + * exposed as a GUC to be configured.

I would just remove this paragraph.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Andres Freund
In reply to this post by Tom Lane-2
Hi,

On 2020-12-08 12:41:35 -0500, Tom Lane wrote:
> FWIW, I kind of like the idea of getting rid of it completely.
> Is there really ever a good reason to set it to something different
> than that?  If not, well, we have too many GUCs already, and each
> of them carries nonzero performance, documentation, and maintenance
> overhead.

I like the idea of getting rid of it too, but I think we should consider
evaluating the concrete hard-coded value a bit more careful than just
going for 0.9 based on some old recommendations in the docs. It not
being changeable afterwards...

I think it might be a good idea to immediately change the default to
0.9, and concurrently try to evaluate whether it's really the best value
(vs 0.95, 1 or ...).

FWIW I have seen a few cases in the past where setting the target to
something very small helped, but I think that was mostly because we
didn't yet tell the kernel to flush dirty data more aggressively.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Tomas Vondra-6
On 1/15/21 10:51 PM, Andres Freund wrote:

> Hi,
>
> On 2020-12-08 12:41:35 -0500, Tom Lane wrote:
>> FWIW, I kind of like the idea of getting rid of it completely.
>> Is there really ever a good reason to set it to something different
>> than that?  If not, well, we have too many GUCs already, and each
>> of them carries nonzero performance, documentation, and maintenance
>> overhead.
>
> I like the idea of getting rid of it too, but I think we should consider
> evaluating the concrete hard-coded value a bit more careful than just
> going for 0.9 based on some old recommendations in the docs. It not
> being changeable afterwards...
>
> I think it might be a good idea to immediately change the default to
> 0.9, and concurrently try to evaluate whether it's really the best value
> (vs 0.95, 1 or ...).
>
> FWIW I have seen a few cases in the past where setting the target to
> something very small helped, but I think that was mostly because we
> didn't yet tell the kernel to flush dirty data more aggressively.
>

Yeah. The flushing probably makes that mostly unnecessary, but we still
allow disabling that. I'm not really convinced replacing it with a
compile-time #define is a good idea, exactly because it can't be changed
if needed.

As for the exact value, maybe the right solution is to make it dynamic.
The usual approach is to leave "enough time" for the kernel to flush
dirty data, so we could say 60 seconds and calculate the exact target
depending on the checkpoint_timeout.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Andres Freund
Hi,

On 2021-01-15 23:05:02 +0100, Tomas Vondra wrote:
> Yeah. The flushing probably makes that mostly unnecessary, but we still
> allow disabling that. I'm not really convinced replacing it with a
> compile-time #define is a good idea, exactly because it can't be changed
> if needed.

It's also not available everywhere...


> As for the exact value, maybe the right solution is to make it dynamic.
> The usual approach is to leave "enough time" for the kernel to flush
> dirty data, so we could say 60 seconds and calculate the exact target
> depending on the checkpoint_timeout.

IME the kernel flushing at some later time precisely is the problem,
because of the latency spikes that happen when it decides to do so. That
commonly starts to happen well before the fsyncs. The reason that
setting a very small checkpoint_completion_target can help is that it
condenses the period of unrealiable performance into one short time,
rather than spreading it over the whole checkpoint...

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Peter Eisentraut-7
In reply to this post by Stephen Frost
On 2021-01-13 23:10, Stephen Frost wrote:
>> Yes, I agree, and am involved in that thread as well- currently waiting
>> feedback from others about the proposed approach.
> I've tried to push that forward.  I'm happy to update this patch once
> we've got agreement to move forward on that, to wit, adding to an
> 'obsolete' section in the documentation information about this
> particular GUC and how it's been removed due to not being sensible or
> necessary to continue to have.

Some discussion a few days ago was arguing that it was still necessary
in some cases as a way to counteract the possible lack of tuning in the
kernel flushing behavior.  I think in light of that we should go with
your first patch that just changes the default, possibly with the
documentation updated a bit.


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Peter Eisentraut ([hidden email]) wrote:

> On 2021-01-13 23:10, Stephen Frost wrote:
> >>Yes, I agree, and am involved in that thread as well- currently waiting
> >>feedback from others about the proposed approach.
> >I've tried to push that forward.  I'm happy to update this patch once
> >we've got agreement to move forward on that, to wit, adding to an
> >'obsolete' section in the documentation information about this
> >particular GUC and how it's been removed due to not being sensible or
> >necessary to continue to have.
>
> Some discussion a few days ago was arguing that it was still necessary in
> some cases as a way to counteract the possible lack of tuning in the kernel
> flushing behavior.  I think in light of that we should go with your first
> patch that just changes the default, possibly with the documentation updated
> a bit.
Rebased and updated patch attached which moves back to just changing the
default instead of removing the option, with a more explicit call-out of
the '90%', as suggested by Michael on the other patch.

Any further comments or thoughts on this one?

Thanks,

Stephen

cct_def_v3.patch (6K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Tom Lane-2
Stephen Frost <[hidden email]> writes:
> Any further comments or thoughts on this one?

This:

+        total time between checkpoints. The default is 0.9, which spreads the
+        checkpoint across the entire checkpoint timeout period of time,

is confusing because 0.9 is obviously not 1.0; people will wonder
whether the scale is something strange or the text is just wrong.
They will also wonder why not use 1.0 instead.  So perhaps more like

        ... The default is 0.9, which spreads the checkpoint across almost
        all the available interval, providing fairly consistent I/O load
        while also leaving some slop for checkpoint completion overhead.

The other chunk of text seems accurate, but there's no reason to let
this one be misleading.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Stephen Frost
Greetings,

* Tom Lane ([hidden email]) wrote:

> Stephen Frost <[hidden email]> writes:
> > Any further comments or thoughts on this one?
>
> This:
>
> +        total time between checkpoints. The default is 0.9, which spreads the
> +        checkpoint across the entire checkpoint timeout period of time,
>
> is confusing because 0.9 is obviously not 1.0; people will wonder
> whether the scale is something strange or the text is just wrong.
> They will also wonder why not use 1.0 instead.  So perhaps more like
>
> ... The default is 0.9, which spreads the checkpoint across almost
> all the available interval, providing fairly consistent I/O load
> while also leaving some slop for checkpoint completion overhead.
>
> The other chunk of text seems accurate, but there's no reason to let
> this one be misleading.
Good point, updated along those lines.

In passing, I noticed that we have a lot of documentation like:

This parameter can only be set in the postgresql.conf file or on the
server command line.

... which hasn't been true since the introduction of ALTER SYSTEM.  I
don't really think it's this patch's job to clean that up but it doesn't
seem quite right that we don't include ALTER SYSTEM in that list either.
If this was C code, maybe we could get away with just changing such
references as we find them, but I don't think we'd want the
documentation to be in an inconsistent state regarding that.

Anyone want to opine about what to do with that?  Should we consider
changing those to mention ALTER SYSTEM?  Or perhaps have a way of saying
"at server start" that then links to "how to set options at server
start", perhaps..

Thanks,

Stephen

cct_def_v4.patch (7K) Download Attachment
signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Change default of checkpoint_completion_target

Tom Lane-2
Stephen Frost <[hidden email]> writes:
> In passing, I noticed that we have a lot of documentation like:

> This parameter can only be set in the postgresql.conf file or on the
> server command line.

> ... which hasn't been true since the introduction of ALTER SYSTEM.

Well, it's still true if you understand "the postgresql.conf file"
to cover whatever's included by postgresql.conf, notably
postgresql.auto.conf (and the include facility existed long before
that, too, so you needed the expanded interpretation even then).
Still, I take your point that it's confusing.

I like your suggestion of shortening all of these to be "can only be set
at server start", or maybe better "cannot be changed after server start".
I'm not sure whether or not we really need new text elsewhere; I think
section 20.1 is pretty long already.

                        regards, tom lane


12