xid wraparound danger due to INDEX_CLEANUP false

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Robert Haas
On Wed, Jun 24, 2020 at 4:02 PM Peter Geoghegan <[hidden email]> wrote:
> > Apparently, what we're really doing here is that even when
> > INDEX_CLEANUP is OFF, we're still going to keep all the dead tuples.
> > That seems sad, but if it's what we have to do then it at least needs
> > comments explaining it.
>
> +1. Though I think that AMs should technically have the right to
> consider it advisory.

I'm not really convinced. I agree that from a theoretical point of
view an index can have arbitrary needs and is the arbiter of its own
needs, but when I pull the emergency break, I want the vehicle to
stop, not think about stopping. There's a fine argument for the idea
that depressing the regular brake pedal entitles the vehicle to
exercise some discretion, and on modern cars it does (think ABS, if
nothing else). But pulling the emergency break is a statement that I
wish to override any contrary judgement about whether stopping is a
good idea. I think this option is rightly viewed as an emergency
break, and giving AMs the right to decide that we'll instead pull off
at the next exit doesn't sit well with me. At the end of the day, the
human being should be in charge, not the program.

(Great, now Skynet will be gunning for me...)

> 1. How often is that likely to happen in The Real World™?
>
> 2. If we fail to do cleanup and leak already-deleted pages, how bad is
> that? ( Both in general, and in the worst case.)
>
> I'll hazard a guess for 1: I think that it might not come up that
> often. Page deletion is often something that we hardly ever need. And,
> unlike some DB systems, we only do it when pages are fully empty
> (which, as it turns out, isn't necessarily better than our simple
> approach [1]). I tend to think it's unlikely to happen in cases where
> INDEX_CLEANUP is used, because those are cases that also must not have
> that much index churn to begin with.

I don't think I believe this. All you need is one small range-deletion, right?

> Then there's question 2. The intuition behind the approach from
> Sawada-san's patch was that allowing wraparound here feels wrong -- it
> should be in the AM's hands. However, it's not like I can point to
> some ironclad guarantee about not leaking deleted pages that existed
> before the INDEX_CLEANUP feature. We know that the FSM is not crash
> safe, and that's that. Is this really all that different? Maybe it is,
> but it seems like a quantitative difference to me.

I don't think I believe this, either. In the real-world example to
which you linked, the user ran REINDEX afterward to recover from index
bloat, and we could advise other people who use this option that it
may leak space that a subsequent VACUUM may fail to recover, and
therefore they too should consider REINDEX. Bloat sucks and I hate it,
but in the vehicle analogy from up above, it's the equivalent of
getting lost while driving someplace. It is inconvenient and may cause
you many problems, but you will not be dead. Running out of XIDs is a
brick wall. Either the car stops or you hit the wall. Ideally you can
manage to both not get lost and also not hit a brick wall, but in an
emergency situation where you have to choose either to get lost or to
hit a brick wall, there's only one right answer. As bad as bloat is,
and it's really bad, there are users who manage to run incredibly
bloated databases for long periods of time just because the stuff that
gets slow is either stuff that they're not doing at all, or only doing
in batch jobs where it's OK if it runs super-slow and where it may
even be possible to disable the batch job altogether, at least for a
while. The set of users who can survive running out of XIDs is limited
to those who can get by with just read-only queries, and that's
practically nobody. I have yet to encounter a customer who didn't
consider running out of XIDs to be an emergency.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
In reply to this post by Masahiko Sawada-2
On Thu, Jun 25, 2020 at 6:59 AM Masahiko Sawada
<[hidden email]> wrote:
> I think that with the approach implemented in my patch, it could be a
> problem for the user that the user cannot easily know in advance
> whether vacuum with INDEX_CLEANUP false will perform index cleanup,
> even if page deletion doesn’t happen in most cases.

I was unclear. I agree that the VACUUM command with "INDEX_CLEANUP =
off" is an emergency mechanism that should be fully respected, even
when that means that we'll leak deleted pages.

Perhaps it would make sense to behave differently when the index is on
a table that has "vacuum_index_cleanup = off" set, and the vacuum is
started by autovacuum, and is not an anti-wraparound vacuum. That
doesn't seem all that appealing now that I write it down, though,
because it's a non-obvious behavioral difference among cases that
users probably expect to behave similarly. On the other hand, what
user knows that there is something called an aggressive vacuum, which
isn't exactly the same thing as an anti-wraparound vacuum?

I find it hard to decide what the least-worst thing is for the
backbranches. What do you think?

> I don’t come up with a good solution to keep us safe against XID
> wraparound yet but it seems to me that it’s better to have an option
> that forces index cleanup not to happen.

I don't think that there is a good solution that is suitable for
backpatching. The real solution is to redesign the recycling along the
lines I described.

I don't think that it's terrible that we can leak deleted pages,
especially considering the way that users are expected to use the
INDEX_CLEANUP feature. I would like to be sure that the problem is
well understood, though -- we should at least have a plan for Postgres
v14.

> I thought that btbulkdelete and/or btvacuumcleanup can register an
> autovacuum work item to recycle the page that gets deleted but it
> might not able to recycle those pages enough because the autovacuum
> work items could be taken just after vacuum. And if page deletion is
> relatively a rare case in practice, we might be able to take an
> optimistic approach that vacuum registers deleted pages to FSM on the
> deletion and a process who takes a free page checks if the page is
> really recyclable. Anyway, I’ll try to think more about this.

Right -- just putting the pages in the FSM immediately, and making it
a problem that we deal with within _bt_getbuf() is an alternative
approach that might be better.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
In reply to this post by Robert Haas
On Thu, Jun 25, 2020 at 8:28 AM Robert Haas <[hidden email]> wrote:
> I'm not really convinced. I agree that from a theoretical point of
> view an index can have arbitrary needs and is the arbiter of its own
> needs, but when I pull the emergency break, I want the vehicle to
> stop, not think about stopping.

Making this theoretical argument in the context of this discussion was
probably a mistake. I agree that this is the emergency break, and it
needs to work like one.

It might be worth considering some compromise in the event of using
the "vacuum_index_cleanup" reloption (i.e. when the user has set it to
'off'), provided there is good reason to believe that we're not in an
emergency -- I mentioned this to Masahiko just now. I admit that that
isn't very appealing for other reasons, but it's worth considering a
way of ameliorating the problem in back branches. We really ought to
change how recycling works, so that it happens at the tail end of the
same VACUUM operation that deleted the pages -- but that cannot be
backpatched.

It might be that the most appropriate mitigation in the back branches
is a log message that reports on the fact that we've probably leaked
pages due to this issue. Plus some documentation. Though even that
would require calling nbtree to check if that is actually true (by
checking the metapage), so it still requires backpatching something
close to Masahiko's draft patch.

> I don't think I believe this. All you need is one small range-deletion, right?

Right.

> > Then there's question 2. The intuition behind the approach from
> > Sawada-san's patch was that allowing wraparound here feels wrong -- it
> > should be in the AM's hands. However, it's not like I can point to
> > some ironclad guarantee about not leaking deleted pages that existed
> > before the INDEX_CLEANUP feature. We know that the FSM is not crash
> > safe, and that's that. Is this really all that different? Maybe it is,
> > but it seems like a quantitative difference to me.
>
> I don't think I believe this, either. In the real-world example to
> which you linked, the user ran REINDEX afterward to recover from index
> bloat, and we could advise other people who use this option that it
> may leak space that a subsequent VACUUM may fail to recover, and
> therefore they too should consider REINDEX.

I was talking about the intuition behind the design. I did not intend
to suggest that nbtree should ignore "INDEX_CLEANUP = off" regardless
of the consequences.

I am sure about this much: The design embodied by Masahiko's patch is
clearly a better one overall, even if it doesn't fix the problem on
its own. I agree that we cannot allow nbtree to ignore "INDEX_CLEANUP
= off", even if that means leaking pages that could otherwise be
recycled. I'm not sure what we should do about any of this in the back
branches, though. I wish I had a simple idea about what to do there.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Robert Haas
On Thu, Jun 25, 2020 at 8:44 PM Peter Geoghegan <[hidden email]> wrote:
> I am sure about this much: The design embodied by Masahiko's patch is
> clearly a better one overall, even if it doesn't fix the problem on
> its own. I agree that we cannot allow nbtree to ignore "INDEX_CLEANUP
> = off", even if that means leaking pages that could otherwise be
> recycled. I'm not sure what we should do about any of this in the back
> branches, though. I wish I had a simple idea about what to do there.

My opinion is that there's no need to change the code in the
back-branches, and that I don't really like the approach in master
either.

I think what we're saying is that there is no worse consequence to
turning off index_cleanup than some bloat that isn't likely to be
recovered unless you REINDEX. If the problem in question were going to
cause data loss or data corruption or something, we'd have to take
stronger action, but I don't think anyone's saying that this is the
case. Therefore, I think we can handle the back-branches by letting
users know about the bloat hazard and suggesting that they avoid this
option unless it's necessary to avoid running out of XIDs.

Now, what about master? I think it's fine to offer the AM a callback
even when index_cleanup = false, for example so that it can freeze
something in its metapage, but I don't agree with passing it the TIDs.
That seems like it's just inviting it to ignore the emergency brake,
and it's also incurring real overhead, because remembering all those
TIDs can use a lot of memory. If that API limitation causes a problem
for some future index AM, that will be a good point to discuss when
the patch for said AM is submitted for review. I entirely agree with
you that the way btree arranges for btree recycling is crude, and I
would be delighted if you want to improve it, either for v14 or for
any future release, or if somebody else wants to do so. However, even
if that never happens, so what?

In retrospect, I regret committing this patch without better
understanding the issues in this area. That was a fail on my part. At
the same time, it doesn't really sound like the issues are all that
bad. The potential index bloat does suck, but it can still suck less
than the alternatives, and we have evidence that for at least one
user, it was worth a major version upgrade just to replace the
suckitude they had with the suckitude this patch creates.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Fri, Jun 26, 2020 at 5:39 AM Robert Haas <[hidden email]> wrote:
> My opinion is that there's no need to change the code in the
> back-branches, and that I don't really like the approach in master
> either.

I guess it's hard to see a way that we could fix this in the
backbranches, provided we aren't willing to tolerate a big refactor,
or a cleanup scan of the index (note that I mean btvacuumcleanup(),
not btvacuumscan(), which is quite different).

> I think what we're saying is that there is no worse consequence to
> turning off index_cleanup than some bloat that isn't likely to be
> recovered unless you REINDEX.

That's true.

> Now, what about master? I think it's fine to offer the AM a callback
> even when index_cleanup = false, for example so that it can freeze
> something in its metapage, but I don't agree with passing it the TIDs.
> That seems like it's just inviting it to ignore the emergency brake,
> and it's also incurring real overhead, because remembering all those
> TIDs can use a lot of memory.

You don't have to do anything with TIDs passed from vacuumlazy.c to
recycle pages that need to be recycled, since you only have to go
through btvacuumcleanup() to avoid the problem that we're talking
about (you don't have to call btvacuumscan() to kill TIDs that
vacuumlazy.c will have pruned). Killing TIDs/tuples in the index was
never something that would make sense, even within the confines of the
existing flawed nbtree recycling design. However, you do need to scan
the entire index to do that much. FWIW, that doesn't seem like it
*totally* violates the spirit of "index_cleanup = false", since you're
still not doing most of the usual nbtree vacuuming stuff (even though
you have to scan the index, there is still much less work total).

> If that API limitation causes a problem
> for some future index AM, that will be a good point to discuss when
> the patch for said AM is submitted for review. I entirely agree with
> you that the way btree arranges for btree recycling is crude, and I
> would be delighted if you want to improve it, either for v14 or for
> any future release, or if somebody else wants to do so. However, even
> if that never happens, so what?

I think that it's important to be able to describe an ideal (though
still realistic) design, even if it might remain aspirational for a
long time. I suspect that pushing the mechanism down into index AMs
has other non-obvious benefits.

> In retrospect, I regret committing this patch without better
> understanding the issues in this area. That was a fail on my part. At
> the same time, it doesn't really sound like the issues are all that
> bad. The potential index bloat does suck, but it can still suck less
> than the alternatives, and we have evidence that for at least one
> user, it was worth a major version upgrade just to replace the
> suckitude they had with the suckitude this patch creates.

I actually agree -- this is a really important feature, and I'm glad
that we have it. Even in this slightly flawed form. I remember a great
need for the feature back when I was involved in supporting Postgres
in production.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Sat, 27 Jun 2020 at 08:00, Peter Geoghegan <[hidden email]> wrote:

>
> On Fri, Jun 26, 2020 at 5:39 AM Robert Haas <[hidden email]> wrote:
> > My opinion is that there's no need to change the code in the
> > back-branches, and that I don't really like the approach in master
> > either.
>
> I guess it's hard to see a way that we could fix this in the
> backbranches, provided we aren't willing to tolerate a big refactor,
> or a cleanup scan of the index (note that I mean btvacuumcleanup(),
> not btvacuumscan(), which is quite different).

Agreed.

>
> > I think what we're saying is that there is no worse consequence to
> > turning off index_cleanup than some bloat that isn't likely to be
> > recovered unless you REINDEX.
>
> That's true.

Regarding to the extent of the impact, this bug will affect the user
who turned vacuum_index_cleanup off or executed manually vacuum with
INDEX_CLEANUP off for a long time, after some vacuums. On the other
hand, the user who uses INDEX_CLEANUP off on the spot or turns
vacuum_index_cleanup off of the table from the start would not be
affected or less affected.

>
> > In retrospect, I regret committing this patch without better
> > understanding the issues in this area. That was a fail on my part. At
> > the same time, it doesn't really sound like the issues are all that
> > bad. The potential index bloat does suck, but it can still suck less
> > than the alternatives, and we have evidence that for at least one
> > user, it was worth a major version upgrade just to replace the
> > suckitude they had with the suckitude this patch creates.
>
> I actually agree -- this is a really important feature, and I'm glad
> that we have it. Even in this slightly flawed form. I remember a great
> need for the feature back when I was involved in supporting Postgres
> in production.

I apologize for writing this patch without enough consideration. I
should have been more careful as I learned the nbtree page recycle
strategy when discussing  vacuum_cleanup_index_scale_factor patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Fri, Jun 26, 2020 at 10:15 PM Masahiko Sawada
<[hidden email]> wrote:
> Regarding to the extent of the impact, this bug will affect the user
> who turned vacuum_index_cleanup off or executed manually vacuum with
> INDEX_CLEANUP off for a long time, after some vacuums. On the other
> hand, the user who uses INDEX_CLEANUP off on the spot or turns
> vacuum_index_cleanup off of the table from the start would not be
> affected or less affected.

I don't think that it's likely to cause too much trouble. It's already
possible to leak deleted pages, if only because the FSM isn't crash
safe. Actually, the nbtree README says this, and has since 2003:

"""
(Note: if we find a deleted page with an extremely old transaction
number, it'd be worthwhile to re-mark it with FrozenTransactionId so that
a later xid wraparound can't cause us to think the page is unreclaimable.
But in more normal situations this would be a waste of a disk write.)
"""

But, uh, isn't the btvacuumcleanup() call supposed to avoid
wraparound? Who knows?!

It doesn't seem like the recycling aspect of page deletion was
rigorously designed, possibly because it's harder to test than page
deletion itself. This is a problem that we should fix.

> I apologize for writing this patch without enough consideration. I
> should have been more careful as I learned the nbtree page recycle
> strategy when discussing  vacuum_cleanup_index_scale_factor patch.

While it's unfortunate that this was missed, let's not lose
perspective. Anybody using the INDEX_CLEANUP feature (whether it's
through a direct VACUUM, or by using the reloption) is already asking
for an extreme behavior: skipping regular index vacuuming. I imagine
that the vast majority of users that are in that position just don't
care about the possibility of leaking deleted pages. They care about
avoiding a real disaster from XID wraparound.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Sun, 28 Jun 2020 at 02:44, Peter Geoghegan <[hidden email]> wrote:

>
> On Fri, Jun 26, 2020 at 10:15 PM Masahiko Sawada
> <[hidden email]> wrote:
> > Regarding to the extent of the impact, this bug will affect the user
> > who turned vacuum_index_cleanup off or executed manually vacuum with
> > INDEX_CLEANUP off for a long time, after some vacuums. On the other
> > hand, the user who uses INDEX_CLEANUP off on the spot or turns
> > vacuum_index_cleanup off of the table from the start would not be
> > affected or less affected.
>
> I don't think that it's likely to cause too much trouble. It's already
> possible to leak deleted pages, if only because the FSM isn't crash
> safe. Actually, the nbtree README says this, and has since 2003:
>
> """
> (Note: if we find a deleted page with an extremely old transaction
> number, it'd be worthwhile to re-mark it with FrozenTransactionId so that
> a later xid wraparound can't cause us to think the page is unreclaimable.
> But in more normal situations this would be a waste of a disk write.)
> """
>
> But, uh, isn't the btvacuumcleanup() call supposed to avoid
> wraparound? Who knows?!
>
> It doesn't seem like the recycling aspect of page deletion was
> rigorously designed, possibly because it's harder to test than page
> deletion itself. This is a problem that we should fix.

Agreed.

>
> > I apologize for writing this patch without enough consideration. I
> > should have been more careful as I learned the nbtree page recycle
> > strategy when discussing  vacuum_cleanup_index_scale_factor patch.
>
> While it's unfortunate that this was missed, let's not lose
> perspective. Anybody using the INDEX_CLEANUP feature (whether it's
> through a direct VACUUM, or by using the reloption) is already asking
> for an extreme behavior: skipping regular index vacuuming. I imagine
> that the vast majority of users that are in that position just don't
> care about the possibility of leaking deleted pages. They care about
> avoiding a real disaster from XID wraparound.

For back branches, I'm considering how we let users know about this.
For safety, we can let users know that we recommend avoiding
INDEX_CLEANUP false unless it's necessary to avoid running out of XIDs
on the documentation and/or the release note. But on the other hand,
since there is the fact that leaving recyclable pages is already
possible to happen as you mentioned I'm concerned it gets the user
into confusion and might needlessly incite unrest of users. I'm
thinking what we can do for users, in addition to leaving the summary
of this discussion as a source code comment. What do you think?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada
On Mon, Jun 29, 2020 at 9:51 PM Masahiko Sawada
<[hidden email]> wrote:

>
> On Sun, 28 Jun 2020 at 02:44, Peter Geoghegan <[hidden email]> wrote:
> >
> > On Fri, Jun 26, 2020 at 10:15 PM Masahiko Sawada
> > <[hidden email]> wrote:
> > > Regarding to the extent of the impact, this bug will affect the user
> > > who turned vacuum_index_cleanup off or executed manually vacuum with
> > > INDEX_CLEANUP off for a long time, after some vacuums. On the other
> > > hand, the user who uses INDEX_CLEANUP off on the spot or turns
> > > vacuum_index_cleanup off of the table from the start would not be
> > > affected or less affected.
> >
> > I don't think that it's likely to cause too much trouble. It's already
> > possible to leak deleted pages, if only because the FSM isn't crash
> > safe. Actually, the nbtree README says this, and has since 2003:
> >
> > """
> > (Note: if we find a deleted page with an extremely old transaction
> > number, it'd be worthwhile to re-mark it with FrozenTransactionId so that
> > a later xid wraparound can't cause us to think the page is unreclaimable.
> > But in more normal situations this would be a waste of a disk write.)
> > """
> >
> > But, uh, isn't the btvacuumcleanup() call supposed to avoid
> > wraparound? Who knows?!
> >
> > It doesn't seem like the recycling aspect of page deletion was
> > rigorously designed, possibly because it's harder to test than page
> > deletion itself. This is a problem that we should fix.
>
> Agreed.
>
> >
> > > I apologize for writing this patch without enough consideration. I
> > > should have been more careful as I learned the nbtree page recycle
> > > strategy when discussing  vacuum_cleanup_index_scale_factor patch.
> >
> > While it's unfortunate that this was missed, let's not lose
> > perspective. Anybody using the INDEX_CLEANUP feature (whether it's
> > through a direct VACUUM, or by using the reloption) is already asking
> > for an extreme behavior: skipping regular index vacuuming. I imagine
> > that the vast majority of users that are in that position just don't
> > care about the possibility of leaking deleted pages. They care about
> > avoiding a real disaster from XID wraparound.
>
> For back branches, I'm considering how we let users know about this.
> For safety, we can let users know that we recommend avoiding
> INDEX_CLEANUP false unless it's necessary to avoid running out of XIDs
> on the documentation and/or the release note. But on the other hand,
> since there is the fact that leaving recyclable pages is already
> possible to happen as you mentioned I'm concerned it gets the user
> into confusion and might needlessly incite unrest of users. I'm
> thinking what we can do for users, in addition to leaving the summary
> of this discussion as a source code comment. What do you think?
>

Several months passed from the discussion. We decided not to do
anything on back branches but IIUC the fundamental issue is not fixed
yet. The issue pointed out by Andres that we should leave the index AM
to decide whether doing vacuum cleanup or not when INDEX_CLEANUP is
specified is still valid. Is that right?

For HEAD, there was a discussion that we change lazy vacuum and
bulkdelete and vacuumcleanup APIs so that it calls these APIs even
when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
specified, it collects dead tuple TIDs into maintenance_work_mem space
and passes the flag indicating INDEX_CLEANUP is specified or not to
index AMs. Index AM decides whether doing bulkdelete/vacuumcleanup. A
downside of this idea would be that we will end up using
maintenance_work_mem even if all index AMs of the table don't do
bulkdelete/vacuumcleanup at all.

The second idea I came up with is to add an index AM API  (say,
amcanskipindexcleanup = true/false) telling index cleanup is skippable
or not. Lazy vacuum checks this flag for each index on the table
before starting. If index cleanup is skippable in all indexes, it can
choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
maintenance_work_mem. All in-core index AM will set to true. Perhaps
it’s true (skippable) by default for backward compatibility.

The in-core AMs including btree indexes will work same as before. This
fix is to make it more desirable behavior and possibly to help other
AMs that require to call vacuumcleanup in all cases. Once we fix it I
wonder if we can disable index cleanup when autovacuum’s
anti-wraparound vacuum.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Thu, Nov 19, 2020 at 5:58 PM Masahiko Sawada <[hidden email]> wrote:
> Several months passed from the discussion. We decided not to do
> anything on back branches but IIUC the fundamental issue is not fixed
> yet. The issue pointed out by Andres that we should leave the index AM
> to decide whether doing vacuum cleanup or not when INDEX_CLEANUP is
> specified is still valid. Is that right?

I don't remember if Andres actually said that publicly, but I
definitely did. I do remember discussing this with him privately, at
which point he agreed with what I said. Which you just summarized
well.

> For HEAD, there was a discussion that we change lazy vacuum and
> bulkdelete and vacuumcleanup APIs so that it calls these APIs even
> when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
> specified, it collects dead tuple TIDs into maintenance_work_mem space
> and passes the flag indicating INDEX_CLEANUP is specified or not to
> index AMs.

Right.

> Index AM decides whether doing bulkdelete/vacuumcleanup. A
> downside of this idea would be that we will end up using
> maintenance_work_mem even if all index AMs of the table don't do
> bulkdelete/vacuumcleanup at all.

That is a downside, but I don't think that it's a serious downside.
But it may not matter, because there are lots of reasons to move in
this direction.

> The second idea I came up with is to add an index AM API  (say,
> amcanskipindexcleanup = true/false) telling index cleanup is skippable
> or not. Lazy vacuum checks this flag for each index on the table
> before starting. If index cleanup is skippable in all indexes, it can
> choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
> maintenance_work_mem. All in-core index AM will set to true. Perhaps
> it’s true (skippable) by default for backward compatibility.

(The terminology here is very confusing, because the goal of the
INDEX_CLEANUP feature in v12 is not really to skip a call to
btvacuumcleanup(). The goal is really to skip a call to
btbulkdelete(). I will try to be careful.)

I think that the ideal design here would be a new hybrid of two
existing features:

1.) Your INDEX_CLEANUP feature from Postgres 12.

and:

2.) Your vacuum_cleanup_index_scale_factor feature from Postgres 11.

The INDEX_CLEANUP feature is very useful, because skipping indexes
entirely can be very helpful for many reasons (e.g. faster manual
VACUUM in the event of wraparound related emergencies).  But
INDEX_CLEANUP has 2 problems today:

A. It doesn't interact well with vacuum_cleanup_index_scale_factor.
This is the problem that has been discussed on this thread.

and:

B. It is an "all or nothing" thing. Unlike the
vacuum_cleanup_index_scale_factor feature, it does not care about what
the index AM/individual index wants. But it should.

(**Thinks some more***)

Actually, on second thought, maybe INDEX_CLEANUP only has one problem.
Problem A is actually just a special case of problem B. There are many
interesting opportunities created by solving problem B
comprehensively.

So, what useful enhancements to VACUUM are possible once we have
something like INDEX_CLEANUP, that is sensitive to the needs of
indexes? Well, you already identified one yourself, so obviously
you're thinking about this in a similar way already:

> The in-core AMs including btree indexes will work same as before. This
> fix is to make it more desirable behavior and possibly to help other
> AMs that require to call vacuumcleanup in all cases. Once we fix it I
> wonder if we can disable index cleanup when autovacuum’s
> anti-wraparound vacuum.

Obviously this is a good idea. The fact that anti-wraparound vacuum
isn't really special compared to regular autovacuum is *bad*.
Obviously anti-wraparound is in some sense more important than regular
vacuum. Making it as similar as possible to vacuum simply isn't
helpful. Maybe it is slightly more elegant in theory, but in the real
world it is a poor design. (See also: every single PostgreSQL post
mortem that has ever been written.)

But why stop with this? There are other big advantages to allowing
individual indexes/index AMs influence of the INDEX_CLEANUP behavior.
Especially if they're sensitive to the needs of particular indexes on
a table (not just all of the indexes on the table taken together).

As you may know, my bottom-up index deletion patch can more or less
eliminate index bloat in indexes that don't get "logically changed" by
many non-HOT updates. It's *very* effective with non-HOT updates and
lots of indexes. See this benchmark result for a recent example:

https://postgr.es/m/CAGnEbohYF_K6b0v=2uc289=v67qNhc3n01Ftic8X94zP7kKqtw@...

The feature is effective enough to make it almost unnecessary to
VACUUM certain indexes -- though not necessarily other indexes on the
same table. Of course, in the long term it will eventually be
necessary to really vacuum these indexes. Not because the indexes
themselves care, though -- they really don't (if they don't receive
logical changes from non-HOT updates, and so benefit very well from
the proposed bottom-up index deletion mechanism, they really have no
selfish reason to care if they ever get vacuumed by autovacuum).

The reason we eventually need to call ambulkdelete() with these
indexes (with all indexes, actually) even with these enhancements is
related to the heap. We eventually want to make LP_DEAD line pointers
in the heap LP_UNUSED. But we should be lazy about it, and wait until
it becomes a real problem. Maybe we can only do a traditional VACUUM
(with a second pass of the heap for heap LP_UNUSED stuff) much much
less frequently than today. At the same time, we can set the FSM for
index-only scans much more frequently.

It's also important that we really make index vacuuming a per-index
thing. You can see this in the example benchmark I linked to, which
was posted by Victor: no page splits in one never-logically-modified
index, and some page splits in other indexes that were actually
changed by UPDATEs again and again. Clearly you can have several
different indexes on the same table that have very different needs.

With some indexes we want to be extra lazy (these are indexes that
make good use of bottom-up deletion). But with other indexes on the
same table we want to be eager. Maybe even very eager. If we can make
per-index decisions, then every individual part of the system works
well. Currently, the problem with autovacuum scheduling is that it
probably makes sense for the heap with the defaults (or something like
them), and probably doesn't make any sense for indexes (though it also
varies among indexes). So today we have maybe 7 different things
(maybe 6 indexes + 1 table), and we pretend that they are only one
thing. It's just a fantasy. The reality is that we have 7 things that
have only a very loose and complicated relationship to each other. We
need to stop believing in this fantasy, and start paying attention to
the more complicated reality. The only way to do that is ask each
index directly, while being prepared to get very different answers
from each index on the same table.

Here is what I mean by that: it would also probably be very useful to
do something like a ambulkdelete() call for only a subset of indexes
that really need it. So you aggressively vacuum the one index that
really does get logically modified by an UPDATE, and not the other 6
that don't. (Of course it's still true that we cannot have a second
heap pass to make LP_DEAD line pointers in the heap LP_UNUSED --
obviously that's unsafe unless we're 100% sure that nothing in any
index points to the now-LP_UNUSED line pointer. But many big
improvements are possible without violating this basic invariant.)

If you are able to pursue this project, in whole or in part, I would
definitely be supportive of that. I may be able to commit it. I think
that this project has many benefits, not just one or two. It seems
strategic.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada
On Fri, Nov 20, 2020 at 12:56 PM Peter Geoghegan <[hidden email]> wrote:

>
> On Thu, Nov 19, 2020 at 5:58 PM Masahiko Sawada <[hidden email]> wrote:
> > Several months passed from the discussion. We decided not to do
> > anything on back branches but IIUC the fundamental issue is not fixed
> > yet. The issue pointed out by Andres that we should leave the index AM
> > to decide whether doing vacuum cleanup or not when INDEX_CLEANUP is
> > specified is still valid. Is that right?
>
> I don't remember if Andres actually said that publicly, but I
> definitely did. I do remember discussing this with him privately, at
> which point he agreed with what I said. Which you just summarized
> well.
>
> > For HEAD, there was a discussion that we change lazy vacuum and
> > bulkdelete and vacuumcleanup APIs so that it calls these APIs even
> > when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
> > specified, it collects dead tuple TIDs into maintenance_work_mem space
> > and passes the flag indicating INDEX_CLEANUP is specified or not to
> > index AMs.
>
> Right.
>
> > Index AM decides whether doing bulkdelete/vacuumcleanup. A
> > downside of this idea would be that we will end up using
> > maintenance_work_mem even if all index AMs of the table don't do
> > bulkdelete/vacuumcleanup at all.
>
> That is a downside, but I don't think that it's a serious downside.
> But it may not matter, because there are lots of reasons to move in
> this direction.
>
> > The second idea I came up with is to add an index AM API  (say,
> > amcanskipindexcleanup = true/false) telling index cleanup is skippable
> > or not. Lazy vacuum checks this flag for each index on the table
> > before starting. If index cleanup is skippable in all indexes, it can
> > choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
> > maintenance_work_mem. All in-core index AM will set to true. Perhaps
> > it’s true (skippable) by default for backward compatibility.
>
> (The terminology here is very confusing, because the goal of the
> INDEX_CLEANUP feature in v12 is not really to skip a call to
> btvacuumcleanup(). The goal is really to skip a call to
> btbulkdelete(). I will try to be careful.)
>
> I think that the ideal design here would be a new hybrid of two
> existing features:
>
> 1.) Your INDEX_CLEANUP feature from Postgres 12.
>
> and:
>
> 2.) Your vacuum_cleanup_index_scale_factor feature from Postgres 11.
>
> The INDEX_CLEANUP feature is very useful, because skipping indexes
> entirely can be very helpful for many reasons (e.g. faster manual
> VACUUM in the event of wraparound related emergencies).  But
> INDEX_CLEANUP has 2 problems today:
>
> A. It doesn't interact well with vacuum_cleanup_index_scale_factor.
> This is the problem that has been discussed on this thread.
>
> and:
>
> B. It is an "all or nothing" thing. Unlike the
> vacuum_cleanup_index_scale_factor feature, it does not care about what
> the index AM/individual index wants. But it should.

Agreed.

>
> (**Thinks some more***)
>
> Actually, on second thought, maybe INDEX_CLEANUP only has one problem.
> Problem A is actually just a special case of problem B. There are many
> interesting opportunities created by solving problem B
> comprehensively.
>
> So, what useful enhancements to VACUUM are possible once we have
> something like INDEX_CLEANUP, that is sensitive to the needs of
> indexes? Well, you already identified one yourself, so obviously
> you're thinking about this in a similar way already:
>
> > The in-core AMs including btree indexes will work same as before. This
> > fix is to make it more desirable behavior and possibly to help other
> > AMs that require to call vacuumcleanup in all cases. Once we fix it I
> > wonder if we can disable index cleanup when autovacuum’s
> > anti-wraparound vacuum.
>
> Obviously this is a good idea. The fact that anti-wraparound vacuum
> isn't really special compared to regular autovacuum is *bad*.
> Obviously anti-wraparound is in some sense more important than regular
> vacuum. Making it as similar as possible to vacuum simply isn't
> helpful. Maybe it is slightly more elegant in theory, but in the real
> world it is a poor design. (See also: every single PostgreSQL post
> mortem that has ever been written.)
>
> But why stop with this? There are other big advantages to allowing
> individual indexes/index AMs influence of the INDEX_CLEANUP behavior.
> Especially if they're sensitive to the needs of particular indexes on
> a table (not just all of the indexes on the table taken together).
>
> As you may know, my bottom-up index deletion patch can more or less
> eliminate index bloat in indexes that don't get "logically changed" by
> many non-HOT updates. It's *very* effective with non-HOT updates and
> lots of indexes. See this benchmark result for a recent example:
>
> https://postgr.es/m/CAGnEbohYF_K6b0v=2uc289=v67qNhc3n01Ftic8X94zP7kKqtw@...
>
> The feature is effective enough to make it almost unnecessary to
> VACUUM certain indexes -- though not necessarily other indexes on the
> same table. Of course, in the long term it will eventually be
> necessary to really vacuum these indexes. Not because the indexes
> themselves care, though -- they really don't (if they don't receive
> logical changes from non-HOT updates, and so benefit very well from
> the proposed bottom-up index deletion mechanism, they really have no
> selfish reason to care if they ever get vacuumed by autovacuum).
>
> The reason we eventually need to call ambulkdelete() with these
> indexes (with all indexes, actually) even with these enhancements is
> related to the heap. We eventually want to make LP_DEAD line pointers
> in the heap LP_UNUSED. But we should be lazy about it, and wait until
> it becomes a real problem. Maybe we can only do a traditional VACUUM
> (with a second pass of the heap for heap LP_UNUSED stuff) much much
> less frequently than today. At the same time, we can set the FSM for
> index-only scans much more frequently.
>
> It's also important that we really make index vacuuming a per-index
> thing. You can see this in the example benchmark I linked to, which
> was posted by Victor: no page splits in one never-logically-modified
> index, and some page splits in other indexes that were actually
> changed by UPDATEs again and again. Clearly you can have several
> different indexes on the same table that have very different needs.
>
> With some indexes we want to be extra lazy (these are indexes that
> make good use of bottom-up deletion). But with other indexes on the
> same table we want to be eager. Maybe even very eager. If we can make
> per-index decisions, then every individual part of the system works
> well. Currently, the problem with autovacuum scheduling is that it
> probably makes sense for the heap with the defaults (or something like
> them), and probably doesn't make any sense for indexes (though it also
> varies among indexes). So today we have maybe 7 different things
> (maybe 6 indexes + 1 table), and we pretend that they are only one
> thing. It's just a fantasy. The reality is that we have 7 things that
> have only a very loose and complicated relationship to each other. We
> need to stop believing in this fantasy, and start paying attention to
> the more complicated reality. The only way to do that is ask each
> index directly, while being prepared to get very different answers
> from each index on the same table.
>
> Here is what I mean by that: it would also probably be very useful to
> do something like a ambulkdelete() call for only a subset of indexes
> that really need it. So you aggressively vacuum the one index that
> really does get logically modified by an UPDATE, and not the other 6
> that don't. (Of course it's still true that we cannot have a second
> heap pass to make LP_DEAD line pointers in the heap LP_UNUSED --
> obviously that's unsafe unless we're 100% sure that nothing in any
> index points to the now-LP_UNUSED line pointer. But many big
> improvements are possible without violating this basic invariant.)

I had missed your bottom-up index deletion patch but it's a promising
improvement. With that patch, the number of dead tuples in individual
indexes may differ. So it's important that we make index vacuuming a
per-index thing.

Given that patch, it seems to me that it would be better to ask
individual index AM before calling to bulkdelete about the needs of
bulkdelete. That is, passing VACUUM options and the number of
collected dead tuples etc. to index AM, we ask index AM via a new
index AM API whether it wants to do bulkdelete or not. We call
bulkdelete for only indexes that answered 'yes'. If we got 'no' from
any one of the indexes, we cannot have a second heap pass.
INDEX_CLEANUP is not enforceable. When INDEX_CLEANUP is set to false,
we expect index AMs to return 'no' unless they have a special reason
for the needs of bulkdelete.

One possible benefit of this idea even without bottom-up index
deleteion patch would be something like
vacuum_index_cleanup_scale_factor for bulkdelete. For example, in the
case where the amount of dead tuple is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small amount of dead tuples, which is less efficient. If
an index AM is able to determine not to do bulkdelete by comparing the
number of dead tuples to a threshold, it can avoid such bulkdelete
calling.

Also, as a future extension, once we have retail index deletion
feature, we might be able to make that API return a ternary value:
'no', 'do_bulkdelete', ‘do_indexscandelete, so that index AM can
choose the appropriate method of index deletion based on the
statistics.

But for making index vacuuming per-index thing, we need to deal with
the problem that we cannot know which indexes of the table still has
index tuples pointing to the collected dead tuple. For example, if an
index always says 'no' (not need bulkdelete therefore we need to keep
dead line pointers), the collected dead tuples might already be marked
as LP_DEAD and there might already not be index tuples pointing to
them in other index AMs. In that case we don't want to call to
bulkdelete for other indexes. Probably we can have additional
statistics like the number of dead tuples in individual indexes so
that they can determine the needs of bulkdelete. But it’s not a
comprehensive solution.

>
> If you are able to pursue this project, in whole or in part, I would
> definitely be supportive of that. I may be able to commit it. I think
> that this project has many benefits, not just one or two. It seems
> strategic.

Thanks, that’s really helpful. I’m going to work on that. Since things
became complicated by these two features that I proposed I’ll do my
best to sort out the problem and improve it in PG14.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Robert Haas
In reply to this post by Masahiko Sawada
On Thu, Nov 19, 2020 at 8:58 PM Masahiko Sawada <[hidden email]> wrote:

> For HEAD, there was a discussion that we change lazy vacuum and
> bulkdelete and vacuumcleanup APIs so that it calls these APIs even
> when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
> specified, it collects dead tuple TIDs into maintenance_work_mem space
> and passes the flag indicating INDEX_CLEANUP is specified or not to
> index AMs. Index AM decides whether doing bulkdelete/vacuumcleanup. A
> downside of this idea would be that we will end up using
> maintenance_work_mem even if all index AMs of the table don't do
> bulkdelete/vacuumcleanup at all.
>
> The second idea I came up with is to add an index AM API  (say,
> amcanskipindexcleanup = true/false) telling index cleanup is skippable
> or not. Lazy vacuum checks this flag for each index on the table
> before starting. If index cleanup is skippable in all indexes, it can
> choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
> maintenance_work_mem. All in-core index AM will set to true. Perhaps
> it’s true (skippable) by default for backward compatibility.
>
> The in-core AMs including btree indexes will work same as before. This
> fix is to make it more desirable behavior and possibly to help other
> AMs that require to call vacuumcleanup in all cases. Once we fix it I
> wonder if we can disable index cleanup when autovacuum’s
> anti-wraparound vacuum.

It (still) doesn't seem very sane to me to have an index that requires
cleanup in all cases. I mean, VACUUM could error or be killed just
before the index cleanup hase happens anyway, so it's not like an
index AM can licitly depend on getting called just because we visited
the heap. It could, of course, depend on getting called before
relfrozenxid is advanced, or before the heap's dead line pointers are
marked unused, or something like that, but it can't just be like, hey,
you have to call me.

I think this whole discussion is to some extent a product of the
contract between the index AM and the table AM being more than
slightly unclear. Maybe we need to clear up the definitional problems
first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
In reply to this post by Masahiko Sawada
On Fri, Nov 20, 2020 at 3:17 AM Masahiko Sawada <[hidden email]> wrote:
> I had missed your bottom-up index deletion patch but it's a promising
> improvement. With that patch, the number of dead tuples in individual
> indexes may differ. So it's important that we make index vacuuming a
> per-index thing.

Right, but it's important to remember that even bottom-up index
deletion isn't really special in theory. Bottom-up index deletion is
"just" a reliable version of the existing LP_DEAD index deletion thing
(which has been around since Postgres 8.2). In theory it doesn't
change the fundamental nature of the problem. In practice it does,
because it makes it very obvious to pgsql-hackers that indexes on the
same table can have very different needs from VACUUM. And the actual
differences we see are much bigger now. Even still, the fact that you
had certain big differences across indexes on the same table is not a
new thing. (Actually, you can even see this on the master branch in
Victor's bottom-up deletion benchmark, where the primary key index
actually doesn't grow on the master branch, even after 8 hours.)

The bottom-up index deletion patch (and the enhancements we're talking
about here, for VACUUM itself) are based on "the generational
hypothesis" that underlies generational garbage collection. The
philosophy is the same. See:

https://plumbr.io/handbook/garbage-collection-in-java/generational-hypothesis

In theory, "most garbage comes from new objects" is "just" an
empirical observation, that may or may not be true with each
workload/Java program/whatever. In practice it is important enough to
be a big part of how every modern garbage collector works -- it's
almost always true, and even when it isn't true it doesn't actually
hurt to make the assumption that it is true and then be wrong. I
believe that we have to take a holistic view of the problem to make
real progress.

Andres said something similar in a recent blog post:

https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462#interlude-removing-the-need-for-recentglobalxminhorizon

"In most workloads the majority of accesses are to live tuples, and
when encountering non-live tuple versions they are either very old, or
very new."

(This was just a coincidence, but it was good to see that he made the
same observation.)

> Given that patch, it seems to me that it would be better to ask
> individual index AM before calling to bulkdelete about the needs of
> bulkdelete. That is, passing VACUUM options and the number of
> collected dead tuples etc. to index AM, we ask index AM via a new
> index AM API whether it wants to do bulkdelete or not. We call
> bulkdelete for only indexes that answered 'yes'. If we got 'no' from
> any one of the indexes, we cannot have a second heap pass.
> INDEX_CLEANUP is not enforceable. When INDEX_CLEANUP is set to false,
> we expect index AMs to return 'no' unless they have a special reason
> for the needs of bulkdelete.

I don't have a very detailed idea of the interface or anything. There
are a few questions that naturally present themselves, that I don't
have good answers to right now. Obviously vacuumlazy.c will only treat
this feedback from each index as an advisory thing. So what happens
when 50% of the indexes say yes and 50% say no? This is a subproblem
that must be solved as part of this work. Ideally it will be solved by
you.  :-)

> One possible benefit of this idea even without bottom-up index
> deleteion patch would be something like
> vacuum_index_cleanup_scale_factor for bulkdelete. For example, in the
> case where the amount of dead tuple is slightly larger than
> maitenance_work_mem the second time calling to bulkdelete will be
> called with a small amount of dead tuples, which is less efficient. If
> an index AM is able to determine not to do bulkdelete by comparing the
> number of dead tuples to a threshold, it can avoid such bulkdelete
> calling.

I agree. Actually, I thought the same thing myself, even before I
realized that bottom-up index deletion was possible.

> Also, as a future extension, once we have retail index deletion
> feature, we might be able to make that API return a ternary value:
> 'no', 'do_bulkdelete', ‘do_indexscandelete, so that index AM can
> choose the appropriate method of index deletion based on the
> statistics.

I agree again!

We may eventually be able to make autovacuum run very frequently
against each table in many important cases, with each VACUUM taking
very little wall-clock time. We don't have to change the fundamental
design to fix most of the current problems. I suspect that the "top
down" nature of VACUUM is sometimes helpful. We just need to
compensate when this design is insufficient. Getting the "best of both
worlds" is possible.

> But for making index vacuuming per-index thing, we need to deal with
> the problem that we cannot know which indexes of the table still has
> index tuples pointing to the collected dead tuple. For example, if an
> index always says 'no' (not need bulkdelete therefore we need to keep
> dead line pointers), the collected dead tuples might already be marked
> as LP_DEAD and there might already not be index tuples pointing to
> them in other index AMs. In that case we don't want to call to
> bulkdelete for other indexes. Probably we can have additional
> statistics like the number of dead tuples in individual indexes so
> that they can determine the needs of bulkdelete. But it’s not a
> comprehensive solution.

Right. Maybe we don't ask the index AMs for discrete yes/no answers.
Maybe we can ask them for a continuous answer, such as a value between
0.0 and 1.0 that represents the urgency/bloat, or something like that.
And so the final yes/no answer that really does have to be made for
the table as a whole (does VACUUM do a second pass over the heap to
make LP_DEAD items into LP_UNUSED items?) can at least consider the
worst case for each index. And maybe the average case, too.

(I am just making random suggestions to stimulate discussion. Don't
take these specific suggestions about the am interface too seriously.)

> > If you are able to pursue this project, in whole or in part, I would
> > definitely be supportive of that. I may be able to commit it. I think
> > that this project has many benefits, not just one or two. It seems
> > strategic.
>
> Thanks, that’s really helpful. I’m going to work on that. Since things
> became complicated by these two features that I proposed I’ll do my
> best to sort out the problem and improve it in PG14.

Excellent! Thank you.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Robert Haas
On Fri, Nov 20, 2020 at 2:37 PM Peter Geoghegan <[hidden email]> wrote:
> Right. Maybe we don't ask the index AMs for discrete yes/no answers.
> Maybe we can ask them for a continuous answer, such as a value between
> 0.0 and 1.0 that represents the urgency/bloat, or something like that.
> And so the final yes/no answer that really does have to be made for
> the table as a whole (does VACUUM do a second pass over the heap to
> make LP_DEAD items into LP_UNUSED items?) can at least consider the
> worst case for each index. And maybe the average case, too.

That's an interesting idea. We should think about the needs of brin
indexes when designing something better than the current system. They
have the interesting property that the heap deciding to change LP_DEAD
to LP_UNUSED doesn't break anything even if nothing's been done to the
index, because they don't store TIDs anyway. So that's an example of
an index AM that might want to do some work to keep performance up,
but it's not actually required. This might be orthogonal to the
0.0-1.0 scale you were thinking about, but it might be good to factor
it into the thinking somehow.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Fri, Nov 20, 2020 at 12:04 PM Robert Haas <[hidden email]> wrote:
> That's an interesting idea. We should think about the needs of brin
> indexes when designing something better than the current system. They
> have the interesting property that the heap deciding to change LP_DEAD
> to LP_UNUSED doesn't break anything even if nothing's been done to the
> index, because they don't store TIDs anyway. So that's an example of
> an index AM that might want to do some work to keep performance up,
> but it's not actually required. This might be orthogonal to the
> 0.0-1.0 scale you were thinking about, but it might be good to factor
> it into the thinking somehow.

I actually made this exact suggestion about BRIN myself, several years ago.

As I've said, it seems like it would be a good idea to ask the exact
same generic question of each index in turn (which is answered using
local smarts added to the index AM). Again, the question is: How
important is it that you get vacuumed now, from your own
narrow/selfish point of view? The way that BRIN answers this question
is not the novel thing about BRIN among other index access methods,
though. (Not that you claimed otherwise -- just framing the discussion
carefully.)

BRIN has no selfish reason to care if the table never gets to have its
LP_DEAD line pointers set to LP_UNUSED -- that's just not something
that it can be expected to understand directly. But all index access
methods should be thought of as not caring about this, because it's
just not their problem. (Especially with bottom-up index deletion, but
even without it.)

The interesting and novel thing about BRIN here is this: lazyvacuum.c
can be taught that a BRIN index alone is no reason to have to do a
second pass over the heap (to make the LP_DEAD/pruned-by-VACUUM line
pointers LP_UNUSED). A BRIN index never gets affected by the usual
considerations about the heapam invariant (the usual thing about TIDs
in an index not pointing to a line pointer that is at risk of being
recycled), which presents us with a unique-to-BRIN opportunity. Which
is exactly what you said.

(***Thinks some more***)

Actually, now I think that BRIN shouldn't be special to vacuumlazy.c
in any way. It doesn't make sense as part of this future world in
which index vacuuming can be skipped for individual indexes (which is
what I talked to Sawada-san about a little earlier in this thread).
Why should it be useful to exploit the "no-real-TIDs" property of BRIN
in this future world? It can only solve a problem that the main
enhancement is itself expected to solve without any special help from
BRIN (just the generic am callback that asks the same generic question
about index vacuuming urgency).

The only reason we press ahead with a second scan (the
LP_DEAD-to-LP_UNUSED thing) in this ideal world is a heap/table
problem. The bloat eventually gets out of hand *in the table*. We have
now conceptually decoupled the problems experienced in the table/heap
from the problems for each index (mostly), so this actually makes
sense. The theory behind AV scheduling becomes much closer to reality
-- by changing the reality! (The need to "prune the table to VACUUM
any one index" notwithstanding -- that's still necessary, of course,
but we still basically decouple table bloat from index bloat at the
conceptual level.)

Does that make sense?
--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Robert Haas
On Fri, Nov 20, 2020 at 4:21 PM Peter Geoghegan <[hidden email]> wrote:

> Actually, now I think that BRIN shouldn't be special to vacuumlazy.c
> in any way. It doesn't make sense as part of this future world in
> which index vacuuming can be skipped for individual indexes (which is
> what I talked to Sawada-san about a little earlier in this thread).
> Why should it be useful to exploit the "no-real-TIDs" property of BRIN
> in this future world? It can only solve a problem that the main
> enhancement is itself expected to solve without any special help from
> BRIN (just the generic am callback that asks the same generic question
> about index vacuuming urgency).
>
> The only reason we press ahead with a second scan (the
> LP_DEAD-to-LP_UNUSED thing) in this ideal world is a heap/table
> problem. The bloat eventually gets out of hand *in the table*. We have
> now conceptually decoupled the problems experienced in the table/heap
> from the problems for each index (mostly), so this actually makes
> sense. The theory behind AV scheduling becomes much closer to reality
> -- by changing the reality! (The need to "prune the table to VACUUM
> any one index" notwithstanding -- that's still necessary, of course,
> but we still basically decouple table bloat from index bloat at the
> conceptual level.)
>
> Does that make sense?

I *think* so. For me the point is that the index never has a right to
insist on being vacuumed, but it can offer an opinion on how helpful
it would be.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Fri, Nov 20, 2020 at 2:17 PM Robert Haas <[hidden email]> wrote:
> > Does that make sense?
>
> I *think* so. For me the point is that the index never has a right to
> insist on being vacuumed, but it can offer an opinion on how helpful
> it would be.

Right, that might be the single most important point. It's a somewhat
more bottom-up direction for VACUUM, that is still fundamentally
top-down. Because that's still necessary.

Opportunistic heap pruning is usually very effective, so today we
realistically have these 4 byte line pointers accumulating in heap
pages. The corresponding "bloatum" in index pages is an index tuple +
line pointer (at least 16 bytes + 4 bytes). Meaning that we accumulate
that *at least* 20 bytes for each 4 bytes in the table. And, indexes
care about *where* items go, making the problem even worse. So in the
absence of index tuple LP_DEAD setting/deletion (or bottom-up index
deletion in Postgres 14), the problem in indexes is probably at least
5x worse.

The precise extent to which this is true will vary. It's a mistake to
try to reason about it at a high level, because there is just too much
variation for that approach to work. We should just give index access
methods *some* say. Sometimes this allows index vacuuming to be very
lazy, other times it allows index vacuuming to be very eager. Often
this variation exists among indexes on the same table.

Of course, vacuumlazy.c is still responsible for not letting the
accumulation of LP_DEAD heap line pointers get out of hand (without
allowing index TIDs to point to the wrong thing due to dangerous TID
recycling issues/bugs). The accumulation of LP_DEAD heap line pointers
will often take a very long time to get out of hand. But when it does
finally get out of hand, index access methods don't get to veto being
vacuumed. Because this isn't actually about their needs anymore.

Actually, the index access methods never truly veto anything. They
merely give some abstract signal about how urgent it is to them (like
the 0.0 - 1.0 thing). This difference actually matters. One index
among many on a table saying "meh, I guess I could benefit from some
index vacuuming if it's no real trouble to you vacuumlazy.c" rather
than saying "it's absolutely unnecessary, don't waste CPU cycles
vacuumlazy.c" may actually shift how vacuumlazy.c processes the heap
(at least occasionally). Maybe the high level VACUUM operation decides
that it is worth taking care of everything all at once -- if all the
indexes together either say "meh" or "now would be a good time", and
vacuumlazy.c then notices that the accumulation of LP_DEAD line
pointers is *also* becoming a problem (it's also a "meh" situation),
then it can be *more* ambitious. It can do a traditional VACUUM early.
Which might still make sense.

This also means that vacuumlazy.c would ideally think about this as an
optimization problem. It may be lazy or eager for the whole table,
just as it may be lazy or eager for individual indexes. (Though the
eagerness/laziness dynamic is probably much more noticeable with
indexes in practice.)

--
Peter Geoghegan


123