max_wal_size

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

max_wal_size

PG Doc comments form
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/runtime-config-wal.html
Description:

About parameter max_wal_size the documentation says that "Maximum size to
let the WAL grow to between automatic WAL checkpoints."
Is it correct?
The size between automatic WAL checkpoints or the size of the whole pg_wal
directory?
Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

David G Johnston
On Wed, May 27, 2020 at 8:18 AM PG Doc comments form <[hidden email]> wrote:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/runtime-config-wal.html
Description:

About parameter max_wal_size the documentation says that "Maximum size to
let the WAL grow to between automatic WAL checkpoints."
Is it correct?
The size between automatic WAL checkpoints or the size of the whole pg_wal
directory?

Reading on... "This is a soft limit; WAL size can exceed max_wal_size under special circumstances"

The database doesn't provide an option that would basically mean "make sure the wal directory doesn't get larger than X even if that means that it will become corrupted should it crash".

This setting is the indirect means to ensure that the WAL directory doesn't get too large by forcing a checkpoint thus allowing the corresponding WAL to be removed.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Pavel Luzanov
David,

> This setting is the indirect means to ensure that the WAL directory
> doesn't get too large by forcing a checkpoint thus allowing the
> corresponding WAL to be removed.


This is a soft limit, ok.
But the question is a little different.

Suppose we have: version >= 11, no replication slots, archive_mode =
off.
Checkpoint_timeout is big enough, so checkpoints triggered only by
max_wal_size (1GB).
checkpoint_completion_target = 1.

What size of WAL files will be generated between checkpoints?
1GB or 0.5GB?

As I understand the description of max_wal_size(Maximum size to let the
WAL grow to between automatic WAL checkpoints), the answer is 1GB.
But it seems that the right answer is 0.5GB.

-----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

David G Johnston
On Wed, May 27, 2020 at 9:17 AM <[hidden email]> wrote:
David,

> This setting is the indirect means to ensure that the WAL directory
> doesn't get too large by forcing a checkpoint thus allowing the
> corresponding WAL to be removed.


This is a soft limit, ok.
But the question is a little different.

Suppose we have: version >= 11, no replication slots, archive_mode =
off.
Checkpoint_timeout is big enough, so checkpoints triggered only by
max_wal_size (1GB).
checkpoint_completion_target = 1.

What size of WAL files will be generated between checkpoints?
1GB or 0.5GB?

As I understand the description of max_wal_size(Maximum size to let the
WAL grow to between automatic WAL checkpoints), the answer is 1GB.
But it seems that the right answer is 0.5GB.


Given how long it took me to come up with the answer I'm not going to claim the documentation shouldn't be improved...or that the following is even correct...especially not having performed tests

I see where you are coming from better now - in your example the system operates under the simultaneous constraints that the directory should not take up more than X amount of space and also that it wants zero wait time between the end of the last checkpoint and the start of the next one - where the next one will start at the X amount mark.  The (unstated) goal is to minimize I/O throughput allocated to WAL.  Thus it should write out half of the maximum data in exactly the same amount of time that it takes for a new half of the maximum data to accumulate.  If it writes any slower it will have to wait at the end.

For 0.5 you get 2/3rds consumption: ( n / ( 1 + 0.5  ) ) = n * 2/3 - though my head is starting to hurt at the moment to fully explain the timing pattern.  Unlike the 1.0 case there is downtime where non-checkpoint induced writing is performed and the rate is chosen, combined with that, so that some time is left at the end of each cycle.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Pavel Luzanov
David,

> For 0.5 you get 2/3rds consumption: ( n / ( 1 + 0.5  ) ) = n * 2/3

Exactly, for checkpoint_completion_target=0.5 the distance between
checkpoints is 2/3 of max_wal_size.

But back to the documentation of max_wal_size.
"Maximum size to let the WAL grow to between automatic WAL checkpoints."

For me it looks like the distance between checkpoints is equal to
max_wal_size.

English is not my native language, so I admit that I misunderstand the
exact meaning of this phrase.

P.S. Sorry to post this topic twice.
https://www.postgresql.org/message-id/97af9ef7-d7d9-cfd0-a280-d400cf05c68a%40postgrespro.ru
-----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

David G Johnston
On Wed, May 27, 2020, 14:12 <[hidden email]> wrote:
David,

> For 0.5 you get 2/3rds consumption: ( n / ( 1 + 0.5  ) ) = n * 2/3

Exactly, for checkpoint_completion_target=0.5 the distance between
checkpoints is 2/3 of max_wal_size.

But back to the documentation of max_wal_size.
"Maximum size to let the WAL grow to between automatic WAL checkpoints."

For me it looks like the distance between checkpoints is equal to
max_wal_size.

English is not my native language, so I admit that I misunderstand the
exact meaning of this phrase.


I think part of the issue is your operating with a mental model of this contrary to the one the documentation was written for.  Specifically if you rely on a size trigger and set completion target to zero I believe you should get something close to the full 10gb you are looking for.  But there is a reason why the wording for completion target uses the word "time" - the mental model is that time is the typical driving factor and that the size component is simply there to handle unusually high peak volume by setting a ceiling that will indeed immediately trigger an automatic WAL checkpoint.

In any case "grow to" means that it doesn't matter whether at the end of the previous checkpoint 1gb or 9gb were in use - once the usage goes "up to" 10gp the automatic checkpoint will trigger.  If the word "to" wasn't there your interpretation would be correct.  Its a fair argument to say such an important distinction shouldn't be placed on the word "to" but it isn't wrong.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Pavel Luzanov
David,

I think part of the issue is your operating with a mental model of
this contrary to the one the documentation was written for.

I made this assumption(checkpoints triggered by size, checkpoint_completion_target=1) only for simplifying example.
(Although there is a practical task: calculate the possible approximate size of all WAL files. This is especially important for versions prior to 11, where WAL files were stored for the previous checkpoint.)

In any case "grow to" means that it doesn't matter whether at the end
of the previous checkpoint 1gb or 9gb were in use - once the usage
goes "up to" 10gp the automatic checkpoint will trigger.  If the word
"to" wasn't there your interpretation would be correct.  Its a fair
argument to say such an important distinction shouldn't be placed on
the word "to" but it isn't wrong.

Thank you for an explanation. But give me another chance to explain.

Now we have the glossary. According to the glossary, a checkpoint has two meanings.

The first meaning is that this is a point:
"A point in the WAL sequence at which it is guaranteed that the heap and index data files have been updated with all information from shared memory modified before that checkpoint; a checkpoint record is written and flushed to WAL to mark that point."

Simple illustration:
                A           B           C
WAL |-----------|-----------|-----------|----------->

Points A,B,C are records in the WAL.

The second meaning is that this is an action (performed by checkpointer process or CP):
"A checkpoint is also the act of carrying out all the actions that are necessary to reach a checkpoint as defined above."

Add a process to the illustration. Checkpoint_completion_target here is the default value of 0.5:

                A           B           C
WAL |-----------|-----+-----|-----+-----|-----+----->

                   a           b           c  
CP              +-----+     +-----+     +-----+
                b     e     b     e     b     e

Checkpointer process for point A starts at a(b) and ends at a(e).
At the end of the work, CP makes a WAL record(plus symbol on WAL line).

Back to the max_wal_size description:
"Maximum size to let the WAL grow to between automatic WAL checkpoints."

As I understand now, this is true when we talk about the checkpoint as a process.
"Maximum size to let the WAL grow to between a(b) and b(e), between b(b) and c(e), etc".

But this not true when we talk about the checkpoint as a point.
"Maximum size to let the WAL grow to between A and B".

My concerns that someone reading the description of the parameter may confuse the two meanings of checkpoint: as a point and as a process. After all, this happened to me.

-----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

David G Johnston
On Thu, May 28, 2020 at 2:21 AM <[hidden email]> wrote:

Now we have the glossary. According to the glossary, a checkpoint has two meanings.

The first meaning is that this is a point:
"A point in the WAL sequence at which it is guaranteed that the heap and index data files have been updated with all information from shared memory modified before that checkpoint; a checkpoint record is written and flushed to WAL to mark that point."

The second meaning is that this is an action (performed by checkpointer process or CP):
"A checkpoint is also the act of carrying out all the actions that are necessary to reach a checkpoint as defined above."

As I understand now, this is true when we talk about the checkpoint as a process.
"Maximum size to let the WAL grow to between a(b) and b(e), between b(b) and c(e), etc".

But this not true when we talk about the checkpoint as a point.
"Maximum size to let the WAL grow to between A and B".

My concerns that someone reading the description of the parameter may confuse the two meanings of checkpoint: as a point and as a process. After all, this happened to me.

Illuminating perspective.  The question for the documentation then becomes do we introduce the concept of a "checkpoint point/record" here to try and clear up a possible (but I'd argue rare) misunderstanding at the risk of complicating things for the reader who only has the process meaning in mind.  This is a tuning knob for the system and, as with the other two that reference "time", it almost by necessity deals with a process/verb interpretation of checkpoint, not the point/noun one.  While I'm quite familiar with the presence of the noun meaning it didn't even occur to me to frame the discussion in those terms.  As above, I'm not immediately convinced that that is a bad thing.

I don't know whether a change along those lines to the configuration reference page is useful or not - I wouldn't make one without a comprehensive re-read of the actual background/learning section of the documentation.  My gut instinct is that I just don't think the documentation can do this situation justice and that, as illustrated by this thread, there are better resources available for these less common situations.

That said adding something akin to your diagram and showing how the different parts relate to each other is definitely worth considering - though probably in the main section, not the reference section.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Pavel Luzanov
David,

> I don't know whether a change along those lines to the configuration
> reference page is useful or not - I wouldn't make one without a
> comprehensive re-read of the actual background/learning section of the
> documentation.  My gut instinct is that I just don't think the
> documentation can do this situation justice and that, as illustrated
> by this thread, there are better resources available for these less
> common situations.

Hm, After some cool down period, I looked with a fresh look.

I'm sure there is no need to make changes to the configuration reference
page.
And I'm not sure that there is a need to make changes to the main
section.

In any case, it was interesting discussion. Thank you.

-----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Bruce Momjian
In reply to this post by Pavel Luzanov
On Thu, May 28, 2020 at 12:21:14PM +0300, [hidden email] wrote:

> WAL |-----------|-----+-----|-----+-----|-----+----->
>
>                    a           b           c  
> CP              +-----+     +-----+     +-----+
>                 b     e     b     e     b     e
>
> Checkpointer process for point A starts at a(b) and ends at a(e).
> At the end of the work, CP makes a WAL record(plus symbol on WAL line).
>
> Back to the max_wal_size description:
> "Maximum size to let the WAL grow to between automatic WAL checkpoints."
>
> As I understand now, this is true when we talk about the checkpoint as a
> process.
> "Maximum size to let the WAL grow to between a(b) and b(e), between b(b) and c
> (e), etc".

What if we say:

        "Maximum size to let the WAL grow during automatic WAL checkpoints."

That highlights the "process" part.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee



Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Pavel Luzanov
Hello,

On 02.06.2020 22:35, Bruce Momjian wrote:

> On Thu, May 28, 2020 at 12:21:14PM +0300, [hidden email] wrote:
>> WAL |-----------|-----+-----|-----+-----|-----+----->
>>
>>                     a           b           c
>> CP              +-----+     +-----+     +-----+
>>                  b     e     b     e     b     e
>>
>> Checkpointer process for point A starts at a(b) and ends at a(e).
>> At the end of the work, CP makes a WAL record(plus symbol on WAL line).
>>
>> Back to the max_wal_size description:
>> "Maximum size to let the WAL grow to between automatic WAL checkpoints."
>>
>> As I understand now, this is true when we talk about the checkpoint as a
>> process.
>> "Maximum size to let the WAL grow to between a(b) and b(e), between b(b) and c
>> (e), etc".
> What if we say:
>
> "Maximum size to let the WAL grow during automatic WAL checkpoints."
>
> That highlights the "process" part.
>
Really, "during" is associated with the process, while "between" with
points.

If there is no objection, simpe patch attached.

-----
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



docs_max_wal_size.patch (616 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: max_wal_size

Bruce Momjian
On Tue, Jun  2, 2020 at 11:42:01PM +0300, Pavel Luzanov wrote:

> Hello,
>
> On 02.06.2020 22:35, Bruce Momjian wrote:
> > On Thu, May 28, 2020 at 12:21:14PM +0300, [hidden email] wrote:
> > > WAL |-----------|-----+-----|-----+-----|-----+----->
> > >
> > >                     a           b           c
> > > CP              +-----+     +-----+     +-----+
> > >                  b     e     b     e     b     e
> > >
> > > Checkpointer process for point A starts at a(b) and ends at a(e).
> > > At the end of the work, CP makes a WAL record(plus symbol on WAL line).
> > >
> > > Back to the max_wal_size description:
> > > "Maximum size to let the WAL grow to between automatic WAL checkpoints."
> > >
> > > As I understand now, this is true when we talk about the checkpoint as a
> > > process.
> > > "Maximum size to let the WAL grow to between a(b) and b(e), between b(b) and c
> > > (e), etc".
> > What if we say:
> >
> > "Maximum size to let the WAL grow during automatic WAL checkpoints."
> >
> > That highlights the "process" part.
> >
> Really, "during" is associated with the process, while "between" with
> points.
>
> If there is no objection, simple patch attached.

Applied to all supported versions, thanks.

--
  Bruce Momjian  <[hidden email]>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee