Berserk Autovacuum (let's save next Mandrill)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
Hi hackers,

Attached is sketch of small patch that fixes several edge cases with autovacuum. Long story short autovacuum never comes to append only tables, killing large productions.

First case, mine.
 https://www.postgresql.org/message-id/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com

We had a table we were appending and wanted Index Only Scan to work. For it to work, you need to call VACUUM manually, since VACUUM is the only way to mark pages all visible, and autovacuum never comes to append only tables. We were clever to invent a workflow without dead tuples and it painfully bit us.

Second case, just read in the news.
https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/

Mandrill has 6TB append only table that autovacuum probably never vacuumed. Then anti-wraparound came and production went down. If autovacuum did its job before that last moment, it would probably be okay.

Idea: look not on dead tuples, but on changes, just like ANALYZE does.
It's my first patch on Postgres, it's probably all wrong but I hope it helps you get the idea.
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

autovacuum_berserk_v1.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Alvaro Herrera-9
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:

> Attached is sketch of small patch that fixes several edge cases with
> autovacuum. Long story short autovacuum never comes to append only tables,
> killing large productions.

Yeah, autovac is not coping with these scenarios (and probably others).
However, rather than taking your patch's idea verbatim, I think we
should have autovacuum use separate actions for those two (wildly
different) scenarios.  For example:

* certain tables would have some sort of partial scan that sets the
  visibility map.  There's no reason to invoke the whole vacuuming
  machinery.  I don't think this is limited to append-only tables, but
  rather those are just the ones that are affected the most.

* tables nearing wraparound danger should use the (yet to be committed)
  option to skip index cleaning, which makes the cleanup action faster.
  Again, no need for complete vacuuming.


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
Hi,

чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <[hidden email]>:
On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:

> Attached is sketch of small patch that fixes several edge cases with
> autovacuum. Long story short autovacuum never comes to append only tables,
> killing large productions.

Yeah, autovac is not coping with these scenarios (and probably others).
However, rather than taking your patch's idea verbatim, I think we
should have autovacuum use separate actions for those two (wildly
different) scenarios.  For example:

* certain tables would have some sort of partial scan that sets the
  visibility map.  There's no reason to invoke the whole vacuuming
  machinery.  I don't think this is limited to append-only tables, but
  rather those are just the ones that are affected the most.

What other machinery runs on VACUUM invocation that is not wanted there?
Since Postgres 11 index cleanup is already skipped on append-only tables.
 
* tables nearing wraparound danger should use the (yet to be committed)
  option to skip index cleaning, which makes the cleanup action faster.
  Again, no need for complete vacuuming.

"Nearing wraparound" is too late already. In Amazon, reading table from gp2 after you exhausted your IOPS burst budget is like reading a floppy drive, you have to freeze a lot earlier than you hit several terabytes of unfrozen data, or you're dead like Mandrill's Search and Url tables from the link I shared.
 


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Alvaro Herrera-9
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:


> чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <[hidden email]>:
>
> > On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:

> > * certain tables would have some sort of partial scan that sets the
> >   visibility map.  There's no reason to invoke the whole vacuuming
> >   machinery.  I don't think this is limited to append-only tables, but
> >   rather those are just the ones that are affected the most.
>
> What other machinery runs on VACUUM invocation that is not wanted there?
> Since Postgres 11 index cleanup is already skipped on append-only tables.

Well, I think it would be useful to set all-visible earlier than waiting
for a vacuum to be necessary, even for tables that are not append-only.
So if you think about this just for the append-only table, you leave
money on the table.

> > * tables nearing wraparound danger should use the (yet to be committed)
> >   option to skip index cleaning, which makes the cleanup action faster.
> >   Again, no need for complete vacuuming.
>
> "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> after you exhausted your IOPS burst budget is like reading a floppy drive,
> you have to freeze a lot earlier than you hit several terabytes of unfrozen
> data, or you're dead like Mandrill's Search and Url tables from the link I
> shared.

OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that.  I suppose a good question is when to start.


I wonder if Mandrill's problem is related to Mailchimp raising the
freeze_max_age to a point where autovac did not have enough time to
react with an emergency vacuum.  If you keep raising that value because
the vacuums cause problems for you (they block DDL), there's something
wrong.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley-3
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <[hidden email]> wrote:

>
> On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
> > "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> > after you exhausted your IOPS burst budget is like reading a floppy drive,
> > you have to freeze a lot earlier than you hit several terabytes of unfrozen
> > data, or you're dead like Mandrill's Search and Url tables from the link I
> > shared.
>
> OK, then start freezing tuples in the cheap mode (skip index updates)
> earlier than that.  I suppose a good question is when to start.

I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers.  Probably
there would need to be some mode flag that mentioned which workers
were in proactive mode so that these could be cancelled when more
pressing work came in.  I don't have an idea exactly of what
"pro-active" would actually be defined as, but I know that when the
single transaction ID is consumed that causes terra bytes of tables to
suddenly need an anti-wraparound vacuum, then it's not a good
situation to be in. Perhaps getting to some percentage of
autovacuum_freeze_max_age could be classed as pro-active.

> I wonder if Mandrill's problem is related to Mailchimp raising the
> freeze_max_age to a point where autovac did not have enough time to
> react with an emergency vacuum.  If you keep raising that value because
> the vacuums cause problems for you (they block DDL), there's something
> wrong.

I have seen some very high autovacuum_freeze_max_age settings
recently. It would be interesting to know what they had theirs set to.
I see they mentioned "Search and Url tables". I can imagine "search"
never needs any UPDATEs, so quite possibly those were append-only, in
which case the anti-wraparound vacuum would have had quite a lot of
work on its hands since possibly every page needed frozen. A table
receiving regular auto-vacuums from dead tuples would likely get some
pages frozen during those.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by Alvaro Herrera-9


чт, 28 мар. 2019 г. в 01:01, Alvaro Herrera <[hidden email]>:
On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:


> чт, 28 мар. 2019 г. в 00:32, Alvaro Herrera <[hidden email]>:
>
> > On 2019-Mar-27, Darafei "Komяpa" Praliaskouski wrote:

> > * certain tables would have some sort of partial scan that sets the
> >   visibility map.  There's no reason to invoke the whole vacuuming
> >   machinery.  I don't think this is limited to append-only tables, but
> >   rather those are just the ones that are affected the most.
>
> What other machinery runs on VACUUM invocation that is not wanted there?
> Since Postgres 11 index cleanup is already skipped on append-only tables.

Well, I think it would be useful to set all-visible earlier than waiting
for a vacuum to be necessary, even for tables that are not append-only.
So if you think about this just for the append-only table, you leave
money on the table.

Thing is, problem does not exist for non-append-only tables, they're going to be vacuumed after 50 rows got updated, automatically.
 

> > * tables nearing wraparound danger should use the (yet to be committed)
> >   option to skip index cleaning, which makes the cleanup action faster.
> >   Again, no need for complete vacuuming.
>
> "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> after you exhausted your IOPS burst budget is like reading a floppy drive,
> you have to freeze a lot earlier than you hit several terabytes of unfrozen
> data, or you're dead like Mandrill's Search and Url tables from the link I
> shared.

OK, then start freezing tuples in the cheap mode (skip index updates)
earlier than that.  I suppose a good question is when to start.

Attached (autovacuum_berserk_v1.patch)
 code achieves that. For append-only tables since https://commitfest.postgresql.org/16/952/ vacuum skips index cleanup if no updates happened. You just need to trigger it, and it already will be "cheap".

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by David Rowley-3


On Thu, Mar 28, 2019 at 2:36 AM David Rowley <[hidden email]> wrote:
On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <[hidden email]> wrote:
>
> On 2019-Mar-28, Darafei "Komяpa" Praliaskouski wrote:
> > "Nearing wraparound" is too late already. In Amazon, reading table from gp2
> > after you exhausted your IOPS burst budget is like reading a floppy drive,
> > you have to freeze a lot earlier than you hit several terabytes of unfrozen
> > data, or you're dead like Mandrill's Search and Url tables from the link I
> > shared.
>
> OK, then start freezing tuples in the cheap mode (skip index updates)
> earlier than that.  I suppose a good question is when to start.

I thought recently that it would be good to have some sort of
pro-active auto-vacuum mode that made use of idle workers. 

Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.

Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?


--
Darafei Praliaskouski
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

David Rowley-3
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
<[hidden email]> wrote:
>
> On Thu, Mar 28, 2019 at 2:36 AM David Rowley <[hidden email]> wrote:
>> I thought recently that it would be good to have some sort of
>> pro-active auto-vacuum mode that made use of idle workers.
>
> Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.

I'd say auto-vacuum is configured to run too slowly if you never have
an idle worker. The chances that it happens to be running at exactly
the right speed to keep up with demand must be about close to nil.

> Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?

Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes. I've not looked at
it, but there's a patch [1] in the current CF for that.  I'd say a
good course of action would be to review that then write a patch with
a new bool flag in relation_needs_vacanalyze for "freezeonly" and have
auto-vacuum invoke vacuum in this new freeze only mode if freezeonly
is set and dovacuum is not.

Any patch not in the current CF is already PG13 or beyond. Having at
least a freeze only vacuum mode main ease some pain, even if it still
needs to be done manually for anyone finding themselves in a similar
situation as mailchimp.

The idea I was mentioning was more targeted to ease the sudden rush of
auto-vacuum activity when suddenly a bunch of large tables require an
anti-wraparound vacuum all at once.

[1] https://commitfest.postgresql.org/22/1817/

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Masahiko Sawada
On Thu, Mar 28, 2019 at 6:32 PM David Rowley
<[hidden email]> wrote:

>
> On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
> <[hidden email]> wrote:
> >
> > On Thu, Mar 28, 2019 at 2:36 AM David Rowley <[hidden email]> wrote:
> >> I thought recently that it would be good to have some sort of
> >> pro-active auto-vacuum mode that made use of idle workers.
> >
> > Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.
>
> I'd say auto-vacuum is configured to run too slowly if you never have
> an idle worker. The chances that it happens to be running at exactly
> the right speed to keep up with demand must be about close to nil.
>
> > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
>
> Sounds like a good idea, although I do agree with Alvaro when he
> mentions that it would be good to only invoke a worker that was only
> going to freeze tuples and not look at the indexes.

The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking  I think vacuum does the
following operations.

1. heap vacuum
2. HOT pruning
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
5. index vacuum/cleanup
6. truncation

With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.

For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.

[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by David Rowley-3


On Thu, Mar 28, 2019 at 12:32 PM David Rowley <[hidden email]> wrote:
On Thu, 28 Mar 2019 at 22:04, Darafei "Komяpa" Praliaskouski
<[hidden email]> wrote:
>
> On Thu, Mar 28, 2019 at 2:36 AM David Rowley <[hidden email]> wrote:
>> I thought recently that it would be good to have some sort of
>> pro-active auto-vacuum mode that made use of idle workers.
>
> Problem with "idle" is that it never happens on system that are going to wraparound on their lifetime. This has to be a part of normal database functioning.

I'd say auto-vacuum is configured to run too slowly if you never have
an idle worker. The chances that it happens to be running at exactly
the right speed to keep up with demand must be about close to nil.

> Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?

Sounds like a good idea, although I do agree with Alvaro when he
mentions that it would be good to only invoke a worker that was only
going to freeze tuples and not look at the indexes.

This is current behavior of VACUUM on tables without dead tuples, already. Issue is that nothing triggers this VACUUM apart from user performing VACUUM manually, or super late antiwraparound vacuum.

Any patch not in the current CF is already PG13 or beyond. Having at
least a freeze only vacuum mode main ease some pain, even if it still
needs to be done manually for anyone finding themselves in a similar
situation as mailchimp.

If you're in wraparound halt with super large table on Amazon gp2 nothing will help you - issue is, there's no option to "rewrite all of it quickly". Burst limit lets you feel the shared drive as if it was an SSD on most of your load, but reading and re-writing all the storage gets throttled, and there's no option to escape this quickly.

The process that freezes and marks all-visible pages has to run in parallel and at the speed of your backend pushing pages to disk, maybe lagging behind a bit - but not up to "we need to rescan all the table".
 

The idea I was mentioning was more targeted to ease the sudden rush of
auto-vacuum activity when suddenly a bunch of large tables require an
anti-wraparound vacuum all at once.

[1] https://commitfest.postgresql.org/22/1817/

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Darafei Praliaskouski
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by Masahiko Sawada
Hi,

> > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
>
> Sounds like a good idea, although I do agree with Alvaro when he
> mentions that it would be good to only invoke a worker that was only
> going to freeze tuples and not look at the indexes.

The invoking autovacuum on table based on inserts, not only deletes
and updates, seems good idea to me. But in this case, I think that we
can not only freeze tuples but also update visibility map even when
setting all-visible. Roughly speaking  I think vacuum does the
following operations.

1. heap vacuum
2. HOT pruning
Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something? 
 
3. freezing tuples
4. updating visibility map (all-visible and all-frozen)
These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
 
5. index vacuum/cleanup
There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?
 
6. truncation
This shouldn't be a heavy operation?
 

With the proposed patch[1] we can control to do 5 or not. In addition
to that, another proposed patch[2] allows us to control 6.

For append-only tables (and similar tables), what we periodically want
to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
need to have both an option of (auto)vacuum to control whether to do 1
and something like a new autovacuum threshold (or an option) to invoke
the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
and 4 would be much cheaper than today's vacuum and anti-wraparound
vacuum would be able to skip almost pages.

Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.

Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
 

[1] https://commitfest.postgresql.org/22/1817/
[2] https://commitfest.postgresql.org/22/1981/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Darafei Praliaskouski
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Masahiko Sawada
On Thu, Mar 28, 2019 at 8:58 PM Darafei "Komяpa" Praliaskouski
<[hidden email]> wrote:

>
> Hi,
>
> > > Why not select a table that has inserts, updates and deletes for autovacuum just like we do for autoanalyze, not only deletes and updates like we do now?
>>
>> >
>> > Sounds like a good idea, although I do agree with Alvaro when he
>> > mentions that it would be good to only invoke a worker that was only
>> > going to freeze tuples and not look at the indexes.
>>
>> The invoking autovacuum on table based on inserts, not only deletes
>> and updates, seems good idea to me. But in this case, I think that we
>> can not only freeze tuples but also update visibility map even when
>> setting all-visible. Roughly speaking  I think vacuum does the
>> following operations.
>>
>> 1. heap vacuum
>>
>> 2. HOT pruning
>
> Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something?
>

Yeah, these operations are required only when the table has actual
garbage. IOW, append-only tables never require them.

>>
>> 3. freezing tuples
>> 4. updating visibility map (all-visible and all-frozen)
>
> These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
>
>>
>> 5. index vacuum/cleanup
>
> There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?

I think that's not enough. The feature "GUC for cleanup index
threshold" allows us to skip only index cleanup when there are less
insertion than the fraction of the total number of heap tuples since
last index cleanup. Therefore it helps only append-only tables (and
supporting only btree index for now). We still have to do index
vacuuming even if the table has just a few dead tuple. The proposed
patch[1] helps this situation; vacuum can run while skipping index
vacuuming and index cleanup.

>
>>
>> 6. truncation
>
> This shouldn't be a heavy operation?
>

I don't think so. This could take AccessExclusiveLock on the table and
take a long time with large shared buffer as per reported on that
thread[2].

>>
>>
>> With the proposed patch[1] we can control to do 5 or not. In addition
>> to that, another proposed patch[2] allows us to control 6.
>>
>> For append-only tables (and similar tables), what we periodically want
>> to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
>> need to have both an option of (auto)vacuum to control whether to do 1
>> and something like a new autovacuum threshold (or an option) to invoke
>> the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
>> and 4 would be much cheaper than today's vacuum and anti-wraparound
>> vacuum would be able to skip almost pages.
>
>
> Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.
>

Because that's for tables that have many inserts but have some
updates/deletes. I think that this strategy would help not only
append-only tables but also such tables.

> Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
>

I don't understand the meaning of "situation can be made non-emergency
from the very beginning". Could you please elaborate on that?


>> [1] https://commitfest.postgresql.org/22/1817/
>> [2] https://commitfest.postgresql.org/22/1981/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Michael Banck-3
In reply to this post by David Rowley-3
Hi,

On Thu, Mar 28, 2019 at 12:36:24PM +1300, David Rowley wrote:

> On Thu, 28 Mar 2019 at 11:01, Alvaro Herrera <[hidden email]> wrote:
> > I wonder if Mandrill's problem is related to Mailchimp raising the
> > freeze_max_age to a point where autovac did not have enough time to
> > react with an emergency vacuum.  If you keep raising that value because
> > the vacuums cause problems for you (they block DDL), there's something
> > wrong.
>
> I have seen some very high autovacuum_freeze_max_age settings
> recently. It would be interesting to know what they had theirs set to.
> I see they mentioned "Search and Url tables". I can imagine "search"
> never needs any UPDATEs, so quite possibly those were append-only, in
> which case the anti-wraparound vacuum would have had quite a lot of
> work on its hands since possibly every page needed frozen. A table
> receiving regular auto-vacuums from dead tuples would likely get some
> pages frozen during those.

By the way, the Routine Vacuuming chapter of the documentation says:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more space

[...]

If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively]
is trivial compared to your total database size, setting
autovacuum_freeze_max_age to its maximum allowed value is recommended."

Maybe this should be qualified with "unless you have trouble with your
autovacuum keeping up" or so; or generally reworded?


Michael


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Robert Haas
In reply to this post by Alvaro Herrera-9
On Wed, Mar 27, 2019 at 5:32 PM Alvaro Herrera <[hidden email]> wrote:
> * certain tables would have some sort of partial scan that sets the
>   visibility map.  There's no reason to invoke the whole vacuuming
>   machinery.  I don't think this is limited to append-only tables, but
>   rather those are just the ones that are affected the most.

I think this is a really good idea, but in order for it to work well I
think we would need to have some kind of estimate of autovacuum
pressure.

If we know that we're currently fairly on top of things, and there is
not much for autovacuum to do, periodically vacuuming a chunk of some
table that has a lot of unset visibility-map bits is probably a good
idea.  However, we can't possibly guess how aggressively to do this if
we have no idea how long it's going to be before we need to vacuum
that table for real.  If the number of XIDs remaining until the table
gets a wraparound vacuum is X, and the number of XIDs being consumed
per day is Y, we can estimate that in roughly X/Y days, we're going to
need to do a wraparound vacuum.  That value might be in the range of
months, or in the range of hours.

If it's months, we probably want limit vacuum to working at a pretty
slow rate, say 1% of the table size per hour or something.  If it's in
hours, we need to be a lot more aggressive.  Right now we have no
information to tell us which of those things is the case, so we'd just
be shooting in the dark.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Amit Langote
On Sun, Mar 31, 2019 at 1:11 AM Robert Haas <[hidden email]> wrote:

>
> On Wed, Mar 27, 2019 at 5:32 PM Alvaro Herrera <[hidden email]> wrote:
> > * certain tables would have some sort of partial scan that sets the
> >   visibility map.  There's no reason to invoke the whole vacuuming
> >   machinery.  I don't think this is limited to append-only tables, but
> >   rather those are just the ones that are affected the most.
>
> I think this is a really good idea, but in order for it to work well I
> think we would need to have some kind of estimate of autovacuum
> pressure.
>
> If we know that we're currently fairly on top of things, and there is
> not much for autovacuum to do, periodically vacuuming a chunk of some
> table that has a lot of unset visibility-map bits is probably a good
> idea.  However, we can't possibly guess how aggressively to do this if
> we have no idea how long it's going to be before we need to vacuum
> that table for real.  If the number of XIDs remaining until the table
> gets a wraparound vacuum is X, and the number of XIDs being consumed
> per day is Y, we can estimate that in roughly X/Y days, we're going to
> need to do a wraparound vacuum.  That value might be in the range of
> months, or in the range of hours.
>
> If it's months, we probably want limit vacuum to working at a pretty
> slow rate, say 1% of the table size per hour or something.  If it's in
> hours, we need to be a lot more aggressive.  Right now we have no
> information to tell us which of those things is the case, so we'd just
> be shooting in the dark.

Sawada-san presented some ideas in his PGCon 2018 talk that may be related.

https://www.pgcon.org/2018/schedule/attachments/488_Vacuum_More_Efficient_Than_Ever

(slide 32~)

Thanks,
Amit


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Vik Fearing-4
In reply to this post by Darafei "Komяpa" Praliaskouski
On 27/03/2019 21:54, Darafei "Komяpa" Praliaskouski wrote:

> Hi hackers,
>
> Attached is sketch of small patch that fixes several edge cases with
> autovacuum. Long story short autovacuum never comes to append only
> tables, killing large productions.
>
> First case, mine.
>  https://www.postgresql.org/message-id/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com
>
> We had a table we were appending and wanted Index Only Scan to work. For
> it to work, you need to call VACUUM manually, since VACUUM is the only
> way to mark pages all visible, and autovacuum never comes to append only
> tables. We were clever to invent a workflow without dead tuples and it
> painfully bit us.
>
> Second case, just read in the news.
> https://mailchimp.com/what-we-learned-from-the-recent-mandrill-outage/
>
> Mandrill has 6TB append only table that autovacuum probably never
> vacuumed. Then anti-wraparound came and production went down. If
> autovacuum did its job before that last moment, it would probably be okay.
>
> Idea: look not on dead tuples, but on changes, just like ANALYZE does.
> It's my first patch on Postgres, it's probably all wrong but I hope it
> helps you get the idea.

This was suggested and rejected years ago:
https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb@...
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by Masahiko Sawada


On Thu, Mar 28, 2019 at 6:43 PM Masahiko Sawada <[hidden email]> wrote:
>> 1. heap vacuum
>>
>> 2. HOT pruning
>
> Is it worth skipping it if we're writing a page anyway for the sake of hint bits and new xids? This will all be no-op anyway on append-only tables and happen only when we actually need something?
>

Yeah, these operations are required only when the table has actual
garbage. IOW, append-only tables never require them.

>>
>> 3. freezing tuples
>> 4. updating visibility map (all-visible and all-frozen)
>
> These two are needed, and current autovacuum launch process does not take into account that this is also needed for non-dead tuples.
>
>>
>> 5. index vacuum/cleanup
>
> There is a separate patch for that. But, since https://commitfest.postgresql.org/16/952/ for almost a year already Postgres skips index cleanup on tables without new dead tuples, so this case is taken care of already?

I think that's not enough. The feature "GUC for cleanup index
threshold" allows us to skip only index cleanup when there are less
insertion than the fraction of the total number of heap tuples since
last index cleanup. Therefore it helps only append-only tables (and
supporting only btree index for now). We still have to do index
vacuuming even if the table has just a few dead tuple. The proposed
patch[1] helps this situation; vacuum can run while skipping index
vacuuming and index cleanup.

So, the patch I posted can be technically applied after https://commitfest.postgresql.org/22/1817/ gets merged?

The change with my patch is that a table with 49 insertions and one delete:
 - previously will wait for 49 more deletes by default (and ignore insertions), and only then clean up both table and indexes.
 - with patch will freeze/update VM for insertions, and scan the index.

In my experience only btree index is requiring a slow full index scan, that's why only it was in the "GUC for cleanup index
threshold" patch. Is it wrong and more index types do a full index scan on vacuum after deletion of a single tuple?

 
>> 6. truncation
>
> This shouldn't be a heavy operation?
>

I don't think so. This could take AccessExclusiveLock on the table and
take a long time with large shared buffer as per reported on that
thread[2].

While this can be a useful optimization, I believe it is out of scope for this patch. I want to fix vacuum never coming to append only tables without breaking other behaviors. Truncation is likely a case of enough dead tuples to trigger a vacuum via currently existing mechanisms.
 
>>
>>
>> With the proposed patch[1] we can control to do 5 or not. In addition
>> to that, another proposed patch[2] allows us to control 6.
>>
>> For append-only tables (and similar tables), what we periodically want
>> to do would be 3 and 4 (possibly we can do 2 as well). So maybe we
>> need to have both an option of (auto)vacuum to control whether to do 1
>> and something like a new autovacuum threshold (or an option) to invoke
>> the vacuum that disables 1, 5 and 6. The vacuum that does only 2, 3
>> and 4 would be much cheaper than today's vacuum and anti-wraparound
>> vacuum would be able to skip almost pages.
>
>
> Why will we want to get rid of 1? It's a noop from write perspective and saves a scan to do it if it's not noop.
>

Because that's for tables that have many inserts but have some
updates/deletes. I think that this strategy would help not only
append-only tables but also such tables.

How much do we save by skipping a heap vacuum on almost-append-only table, where amount of updates is below 50 which is current threshold?
 

> Why make it faster in emergency situations when situation can be made non-emergency from the very beginning instead?
>

I don't understand the meaning of "situation can be made non-emergency
from the very beginning". Could you please elaborate on that?

Let's imagine a simple append-only workflow on current default settings Postgres. You create a table, and start inserting tuples, one per transaction. Let's imagine a page fits 50 tuples (my case for taxi movement data), and Amazon gp2 storage which caps you say at 1000 IOPS in non-burst mode.
Anti-wrap-around-auto-vacuum (we need a drawing of misreading of this term with a crossed out car bent in Space) will be triggered in autovacuum_freeze_max_age inserts, 200000000 by default. That converts into 4000000 pages, or around 32 GB. It will be the first vacuum ever on that table, since no other mechanism triggers it, and if it steals all the available IOPS, it will finish in 200000000/50 /1000 = 4000 seconds, killing prod for over an hour.

Telemetry workloads can easily generate 32 GB of data a day (I've seen more, but let's stick to that number). Production going down for an hour a day isn't good and I consider it an emergency.

Now, two ways to fix it that reading documentation leads you while you're a sleepy one trying to get prod back:
 - raise autovacuum_freeze_max_age so VACUUM keeps sleeping;
 - rewrite code to use batching to insert more tuples at once.

We don't have a better recommendation mechanism for settings, and experience in tuning autovacuum into right direction comes at the cost of a job or company to people :)

Both ways not fix the problem but just delay the inevitable. Ratio of "one hour of vacuum per day of operation" keeps, you just delay it.
Let's say had same thing with 1000 records batched inserts, and moved autovacuum_freeze_max_age to the highest possible value. How much will the downtime last?

2**31 (max tid) * 1000 (tuples per tid) / 50 (tuples in page) / 1000 (pages per second) / 86400 (seconds in day) = 49 days.

This matches highest estimation in Mandrill's report, so that might be what have happened to them.

This all would not be needed if autovacuum came after 50 inserted tuples. It will just mark page as all visible and all frozen and be gone, while it's still in memory. This will get rid of emergency altogether.

Is this elaborate enough disaster scenario? :)


--
Darafei Praliaskouski
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by Michael Banck-3
By the way, the Routine Vacuuming chapter of the documentation says:

"The sole disadvantage of increasing autovacuum_freeze_max_age (and
vacuum_freeze_table_age along with it) is that the pg_xact and
pg_commit_ts subdirectories of the database cluster will take more space

[...]

If [pg_xact and pg_commit_ts taking 0.5 and 20 GB, respectively]
is trivial compared to your total database size, setting
autovacuum_freeze_max_age to its maximum allowed value is recommended."

Maybe this should be qualified with "unless you have trouble with your
autovacuum keeping up" or so; or generally reworded?

This recommendation is in the mindset of "wraparound never happens".
If your database is large, you have more chances to hit it painfully, and if it's append-only even more so.

Alternative point of "if your database is super large and actively written, you may want to set autovacuum_freeze_max_age to even smaller values so that autovacuum load is more evenly spread over time" may be needed.



 
--
Darafei Praliaskouski
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by Robert Haas
If it's months, we probably want limit vacuum to working at a pretty
slow rate, say 1% of the table size per hour or something.  If it's in
hours, we need to be a lot more aggressive.  Right now we have no
information to tell us which of those things is the case, so we'd just
be shooting in the dark.

Thing is, you don't need to spread out your vacuum in time if the rate of vacuuming matches rate of table growth. Can we mark tuples/pages as all-visible and all-frozen say, the moment they're pushed out of shared_buffers?
 

--
Darafei Praliaskouski
Reply | Threaded
Open this post in threaded view
|

Re: Berserk Autovacuum (let's save next Mandrill)

Darafei "Komяpa" Praliaskouski
In reply to this post by Vik Fearing-4

> Idea: look not on dead tuples, but on changes, just like ANALYZE does.
> It's my first patch on Postgres, it's probably all wrong but I hope it
> helps you get the idea.

This was suggested and rejected years ago:
https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb@...

Thank you for sharing the link. I've read through the thread and see you posted two patches, first being similar but different from mine, and second being about a different matter.

I don't see "rejected" there, just a common distraction of "you should also consider this" and time-out leading to "returned with feedback" at the end.

Thing is, we have dead large productions and post-mortems now as your patch wasn't pushed back in 2016, so situation is different. Let's push at least first of two patches of yours, or mine.

Which one is better and why?

I believe mine, as it just follows a pattern already established and proven in autoanalyze. If vacuum comes and unable to harvest some dead tuples, it will come over again in your case, and just sleep until it gets new dead tuples in mine, which looks better to me - there's no dead loop in case some dead tuples are stuck forever.
If someone thinks yours is better we may also consider it for autoanalyze?


--
Darafei Praliaskouski
12