Greetings,
* Michael Paquier ([hidden email]) wrote: > On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote: > > * Alvaro Herrera ([hidden email]) wrote: > >> You keep making this statement, and I don't necessarily disagree, but if > >> that is the case, please explain why don't we have > >> checkpoint_completion_target set to 0.9 by default? Should we change > >> that? > > > > Yes, I do think we should change that.. > > Agreed. FWIW, no idea for others, but it is one of those parameters I > keep telling to update after a default installation. the documentation accordingly. Passes regression tests and doc build. Will register in the January commitfest as Needs Review. Thanks, Stephen |
On 2020-12-07 18:53, Stephen Frost wrote:
> * Michael Paquier ([hidden email]) wrote: >> On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote: >>> * Alvaro Herrera ([hidden email]) wrote: >>>> You keep making this statement, and I don't necessarily disagree, but if >>>> that is the case, please explain why don't we have >>>> checkpoint_completion_target set to 0.9 by default? Should we change >>>> that? >>> >>> Yes, I do think we should change that.. >> >> Agreed. FWIW, no idea for others, but it is one of those parameters I >> keep telling to update after a default installation. > > Concretely, attached is a patch which changes the default and updates > the documentation accordingly. I agree with considering this change, but I wonder why the value 0.9. Why not, say, 0.95, 0.99, or 1.0? |
Greetings,
* Peter Eisentraut ([hidden email]) wrote: > On 2020-12-07 18:53, Stephen Frost wrote: > >* Michael Paquier ([hidden email]) wrote: > >>On Sun, Dec 06, 2020 at 10:03:08AM -0500, Stephen Frost wrote: > >>>* Alvaro Herrera ([hidden email]) wrote: > >>>>You keep making this statement, and I don't necessarily disagree, but if > >>>>that is the case, please explain why don't we have > >>>>checkpoint_completion_target set to 0.9 by default? Should we change > >>>>that? > >>> > >>>Yes, I do think we should change that.. > >> > >>Agreed. FWIW, no idea for others, but it is one of those parameters I > >>keep telling to update after a default installation. > > > >Concretely, attached is a patch which changes the default and updates > >the documentation accordingly. > > I agree with considering this change, but I wonder why the value 0.9. Why > not, say, 0.95, 0.99, or 1.0? covers this pretty well here: https://www.postgresql.org/docs/current/wal-configuration.html "Although checkpoint_completion_target can be set as high as 1.0, it is best to keep it less than that (perhaps 0.9 at most) since checkpoints include some other activities besides writing dirty buffers. A setting of 1.0 is quite likely to result in checkpoints not being completed on time, which would result in performance loss due to unexpected variation in the number of WAL segments needed." Thanks, Stephen |
On 12/7/20, 9:53 AM, "Stephen Frost" <[hidden email]> wrote:
> Concretely, attached is a patch which changes the default and updates > the documentation accordingly. +1 to setting checkpoint_completion_target to 0.9 by default. Nathan |
"Bossart, Nathan" <[hidden email]> writes:
> On 12/7/20, 9:53 AM, "Stephen Frost" <[hidden email]> wrote: >> Concretely, attached is a patch which changes the default and updates >> the documentation accordingly. > +1 to setting checkpoint_completion_target to 0.9 by default. FWIW, I kind of like the idea of getting rid of it completely. Is there really ever a good reason to set it to something different than that? If not, well, we have too many GUCs already, and each of them carries nonzero performance, documentation, and maintenance overhead. regards, tom lane |
On Tue, Dec 8, 2020 at 6:42 PM Tom Lane <[hidden email]> wrote: "Bossart, Nathan" <[hidden email]> writes: +1. There are plenty of cases I think where it doesn't really matter with the values, but when it does I'm not sure what it would be where something else would actually be better. |
In reply to this post by Bossart, Nathan
On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote:
> +1 to setting checkpoint_completion_target to 0.9 by default. +1 for changing the default or getting rid of it, as Tom suggested. While we are at it, could we change the default of "log_lock_waits" to "on"? Yours, Laurenz Albe |
Greetings,
* Laurenz Albe ([hidden email]) wrote: > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote: > > +1 to setting checkpoint_completion_target to 0.9 by default. > > +1 for changing the default or getting rid of it, as Tom suggested. Attached is a patch to change it from a GUC to a compile-time #define which is set to 0.9, with accompanying documentation updates. > While we are at it, could we change the default of "log_lock_waits" to "on"? While I agree that it'd be good to change quite a few of the log_X items to be 'on' by default, I'm not planning to work on this. Thanks, Stephen |
Howdy,
On 2020-Dec-10, Stephen Frost wrote: > * Laurenz Albe ([hidden email]) wrote: > > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote: > > > +1 to setting checkpoint_completion_target to 0.9 by default. > > > > +1 for changing the default or getting rid of it, as Tom suggested. > > Attached is a patch to change it from a GUC to a compile-time #define > which is set to 0.9, with accompanying documentation updates. I think we should leave a doc stub or at least an <indexterm>, to let people know the GUC has been removed rather than just making it completely invisible. (Maybe piggyback on the stuff in [1]?) [1] https://postgr.es/m/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@... |
Greetings,
* Alvaro Herrera ([hidden email]) wrote: > On 2020-Dec-10, Stephen Frost wrote: > > * Laurenz Albe ([hidden email]) wrote: > > > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote: > > > > +1 to setting checkpoint_completion_target to 0.9 by default. > > > > > > +1 for changing the default or getting rid of it, as Tom suggested. > > > > Attached is a patch to change it from a GUC to a compile-time #define > > which is set to 0.9, with accompanying documentation updates. > > I think we should leave a doc stub or at least an <indexterm>, to let > people know the GUC has been removed rather than just making it > completely invisible. (Maybe piggyback on the stuff in [1]?) > > [1] https://postgr.es/m/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@... feedback from others about the proposed approach. Getting a few more people looking at that thread and commenting on it would really help us be able to move forward. Thanks, Stephen |
Greetings,
* Stephen Frost ([hidden email]) wrote: > * Alvaro Herrera ([hidden email]) wrote: > > On 2020-Dec-10, Stephen Frost wrote: > > > * Laurenz Albe ([hidden email]) wrote: > > > > On Tue, 2020-12-08 at 17:29 +0000, Bossart, Nathan wrote: > > > > > +1 to setting checkpoint_completion_target to 0.9 by default. > > > > > > > > +1 for changing the default or getting rid of it, as Tom suggested. > > > > > > Attached is a patch to change it from a GUC to a compile-time #define > > > which is set to 0.9, with accompanying documentation updates. > > > > I think we should leave a doc stub or at least an <indexterm>, to let > > people know the GUC has been removed rather than just making it > > completely invisible. (Maybe piggyback on the stuff in [1]?) > > > > [1] https://postgr.es/m/CAGRY4nyA=jmBNa4LVwgGO1GyO-RnFmfkesddpT_uO+3=mot8DA@... > > Yes, I agree, and am involved in that thread as well- currently waiting > feedback from others about the proposed approach. we've got agreement to move forward on that, to wit, adding to an 'obsolete' section in the documentation information about this particular GUC and how it's been removed due to not being sensible or necessary to continue to have. > Getting a few more people looking at that thread and commenting on it > would really help us be able to move forward. This is still the case though.. Thanks! Stephen |
In reply to this post by Stephen Frost
On Thu, Dec 10, 2020 at 12:16:02PM -0500, Stephen Frost wrote:
> Attached is a patch to change it from a GUC to a compile-time #define > which is set to 0.9, with accompanying documentation updates. All the references to checkpoint_target_completion are removed (except for bgwriter.h as per the patch). > This is because it performs a checkpoint, and the I/O > - required for the checkpoint will be spread out over a significant > - period of time, by default half your inter-checkpoint interval > - (see the configuration parameter > - <xref linkend="guc-checkpoint-completion-target"/>). This is > + required for the checkpoint will be spread out over the inter-checkpoint > + interval (see the configuration parameter > + <xref linkend="guc-checkpoint-timeout"/>). This is It may be worth mentioning that this is spread across 90% of the last checkpoint's duration instead. > - in about half the time before the next checkpoint starts. On a system > - that's very close to maximum I/O throughput during normal operation, > - you might want to increase <varname>checkpoint_completion_target</varname> > - to reduce the I/O load from checkpoints. The disadvantage of this is that > - prolonging checkpoints affects recovery time, because more WAL segments > - will need to be kept around for possible use in recovery. Although > - <varname>checkpoint_completion_target</varname> can be set as high as 1.0, > - it is best to keep it less than that (perhaps 0.9 at most) since > - checkpoints include some other activities besides writing dirty buffers. > - A setting of 1.0 is quite likely to result in checkpoints not being > - completed on time, which would result in performance loss due to > - unexpected variation in the number of WAL segments needed. > + This spreads out the I/O as much as possible to have the I/O load be consistent > + during the checkpoint and generally throughout the operation of the system. The > + disadvantage of this is that prolonging checkpoints affects recovery time, > + because more WAL segments will need to be kept around for possible use in recovery. > + A user concerned about the amount of time required to recover might wish to reduce > + <varname>checkpoint_timeout</varname>, causing checkpoints to happen more > + frequently. > </para> > > <para> removing the portion where half the time is used by default. Should this stuff also mention the spread value of 90% instead? > * At a checkpoint, how many WAL segments to recycle as preallocated future > * XLOG segments? Returns the highest segment that should be preallocated. > @@ -8694,7 +8687,7 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes) > * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown. > * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery. > * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP, > - * ignoring checkpoint_completion_target parameter. > + * ignoring the CheckPointCompletionTarget. s/the//? > * be a large gap between a checkpoint's redo-pointer and the checkpoint > * record itself, and we only start the restartpoint after we've seen the > * checkpoint record. (The gap is typically up to CheckPointSegments * > - * checkpoint_completion_target where checkpoint_completion_target is the > + * CheckPointCompletionTarget where CheckPointCompletionTarget is the > * value that was in effect when the WAL was generated). The last part of this sentence does not make sense. CheckPointCompletionTarget becomes a constant with this patch. > if (RecoveryInProgress()) > @@ -903,7 +902,7 @@ CheckpointerShmemInit(void) > * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown. > * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery. > * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP, > - * ignoring checkpoint_completion_target parameter. > + * ignoring the CheckPointCompletionTarget. s/the//? > + * CheckPointCompletionTarget used to be exposed as a GUC named > + * checkpoint_completion_target, but there's little evidence to suggest that > + * there's actually a case for it being a different value, so it's no longer > + * exposed as a GUC to be configured. I would just remove this paragraph. -- Michael |
In reply to this post by Tom Lane-2
Hi,
On 2020-12-08 12:41:35 -0500, Tom Lane wrote: > FWIW, I kind of like the idea of getting rid of it completely. > Is there really ever a good reason to set it to something different > than that? If not, well, we have too many GUCs already, and each > of them carries nonzero performance, documentation, and maintenance > overhead. I like the idea of getting rid of it too, but I think we should consider evaluating the concrete hard-coded value a bit more careful than just going for 0.9 based on some old recommendations in the docs. It not being changeable afterwards... I think it might be a good idea to immediately change the default to 0.9, and concurrently try to evaluate whether it's really the best value (vs 0.95, 1 or ...). FWIW I have seen a few cases in the past where setting the target to something very small helped, but I think that was mostly because we didn't yet tell the kernel to flush dirty data more aggressively. Greetings, Andres Freund |
On 1/15/21 10:51 PM, Andres Freund wrote:
> Hi, > > On 2020-12-08 12:41:35 -0500, Tom Lane wrote: >> FWIW, I kind of like the idea of getting rid of it completely. >> Is there really ever a good reason to set it to something different >> than that? If not, well, we have too many GUCs already, and each >> of them carries nonzero performance, documentation, and maintenance >> overhead. > > I like the idea of getting rid of it too, but I think we should consider > evaluating the concrete hard-coded value a bit more careful than just > going for 0.9 based on some old recommendations in the docs. It not > being changeable afterwards... > > I think it might be a good idea to immediately change the default to > 0.9, and concurrently try to evaluate whether it's really the best value > (vs 0.95, 1 or ...). > > FWIW I have seen a few cases in the past where setting the target to > something very small helped, but I think that was mostly because we > didn't yet tell the kernel to flush dirty data more aggressively. > Yeah. The flushing probably makes that mostly unnecessary, but we still allow disabling that. I'm not really convinced replacing it with a compile-time #define is a good idea, exactly because it can't be changed if needed. As for the exact value, maybe the right solution is to make it dynamic. The usual approach is to leave "enough time" for the kernel to flush dirty data, so we could say 60 seconds and calculate the exact target depending on the checkpoint_timeout. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company |
Hi,
On 2021-01-15 23:05:02 +0100, Tomas Vondra wrote: > Yeah. The flushing probably makes that mostly unnecessary, but we still > allow disabling that. I'm not really convinced replacing it with a > compile-time #define is a good idea, exactly because it can't be changed > if needed. It's also not available everywhere... > As for the exact value, maybe the right solution is to make it dynamic. > The usual approach is to leave "enough time" for the kernel to flush > dirty data, so we could say 60 seconds and calculate the exact target > depending on the checkpoint_timeout. IME the kernel flushing at some later time precisely is the problem, because of the latency spikes that happen when it decides to do so. That commonly starts to happen well before the fsyncs. The reason that setting a very small checkpoint_completion_target can help is that it condenses the period of unrealiable performance into one short time, rather than spreading it over the whole checkpoint... Greetings, Andres Freund |
In reply to this post by Stephen Frost
On 2021-01-13 23:10, Stephen Frost wrote:
>> Yes, I agree, and am involved in that thread as well- currently waiting >> feedback from others about the proposed approach. > I've tried to push that forward. I'm happy to update this patch once > we've got agreement to move forward on that, to wit, adding to an > 'obsolete' section in the documentation information about this > particular GUC and how it's been removed due to not being sensible or > necessary to continue to have. Some discussion a few days ago was arguing that it was still necessary in some cases as a way to counteract the possible lack of tuning in the kernel flushing behavior. I think in light of that we should go with your first patch that just changes the default, possibly with the documentation updated a bit. |
Greetings,
* Peter Eisentraut ([hidden email]) wrote: > On 2021-01-13 23:10, Stephen Frost wrote: > >>Yes, I agree, and am involved in that thread as well- currently waiting > >>feedback from others about the proposed approach. > >I've tried to push that forward. I'm happy to update this patch once > >we've got agreement to move forward on that, to wit, adding to an > >'obsolete' section in the documentation information about this > >particular GUC and how it's been removed due to not being sensible or > >necessary to continue to have. > > Some discussion a few days ago was arguing that it was still necessary in > some cases as a way to counteract the possible lack of tuning in the kernel > flushing behavior. I think in light of that we should go with your first > patch that just changes the default, possibly with the documentation updated > a bit. default instead of removing the option, with a more explicit call-out of the '90%', as suggested by Michael on the other patch. Any further comments or thoughts on this one? Thanks, Stephen |
Stephen Frost <[hidden email]> writes:
> Any further comments or thoughts on this one? This: + total time between checkpoints. The default is 0.9, which spreads the + checkpoint across the entire checkpoint timeout period of time, is confusing because 0.9 is obviously not 1.0; people will wonder whether the scale is something strange or the text is just wrong. They will also wonder why not use 1.0 instead. So perhaps more like ... The default is 0.9, which spreads the checkpoint across almost all the available interval, providing fairly consistent I/O load while also leaving some slop for checkpoint completion overhead. The other chunk of text seems accurate, but there's no reason to let this one be misleading. regards, tom lane |
Greetings,
* Tom Lane ([hidden email]) wrote: > Stephen Frost <[hidden email]> writes: > > Any further comments or thoughts on this one? > > This: > > + total time between checkpoints. The default is 0.9, which spreads the > + checkpoint across the entire checkpoint timeout period of time, > > is confusing because 0.9 is obviously not 1.0; people will wonder > whether the scale is something strange or the text is just wrong. > They will also wonder why not use 1.0 instead. So perhaps more like > > ... The default is 0.9, which spreads the checkpoint across almost > all the available interval, providing fairly consistent I/O load > while also leaving some slop for checkpoint completion overhead. > > The other chunk of text seems accurate, but there's no reason to let > this one be misleading. In passing, I noticed that we have a lot of documentation like: This parameter can only be set in the postgresql.conf file or on the server command line. ... which hasn't been true since the introduction of ALTER SYSTEM. I don't really think it's this patch's job to clean that up but it doesn't seem quite right that we don't include ALTER SYSTEM in that list either. If this was C code, maybe we could get away with just changing such references as we find them, but I don't think we'd want the documentation to be in an inconsistent state regarding that. Anyone want to opine about what to do with that? Should we consider changing those to mention ALTER SYSTEM? Or perhaps have a way of saying "at server start" that then links to "how to set options at server start", perhaps.. Thanks, Stephen |
Stephen Frost <[hidden email]> writes:
> In passing, I noticed that we have a lot of documentation like: > This parameter can only be set in the postgresql.conf file or on the > server command line. > ... which hasn't been true since the introduction of ALTER SYSTEM. Well, it's still true if you understand "the postgresql.conf file" to cover whatever's included by postgresql.conf, notably postgresql.auto.conf (and the include facility existed long before that, too, so you needed the expanded interpretation even then). Still, I take your point that it's confusing. I like your suggestion of shortening all of these to be "can only be set at server start", or maybe better "cannot be changed after server start". I'm not sure whether or not we really need new text elsewhere; I think section 20.1 is pretty long already. regards, tom lane |
Free forum by Nabble | Edit this page |