xid wraparound danger due to INDEX_CLEANUP false

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Wed, Apr 29, 2020 at 2:04 PM Peter Geoghegan <[hidden email]> wrote:
> As a bonus, we now have an easy correctness cross-check: if some
> random piece of nbtree code lands on a page (having followed a
> downlink or sibling link) that is marked recycled, then clearly
> something is very wrong -- throw a "can't happen" error.

If this doesn't sound that appealing to any of you, then bear this in
mind: nbtree has a terrifying tendency to *not* produce wrong answers
to queries when we land on a concurrently-recycled page.
_bt_moveright() is willing to move right as many times as it takes to
arrive at the correct page, even though in typical cases having to
move right once is rare -- twice is exceptional. I suppose that there
is roughly a 50% chance that we'll end up landing at a point in the
key space that is to the left of the point where we're supposed to
arrive at.  It might take many, many page accesses before
_bt_moveright() finds the correct page, but that often won't be very
noticeable. Or if it is noticed, corruption won't be suspected --
we're still getting a correct answer.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Andres Freund
In reply to this post by Peter Geoghegan-4
Hi,

On 2020-04-29 13:40:55 -0700, Peter Geoghegan wrote:

> On Wed, Apr 29, 2020 at 12:54 PM Andres Freund <[hidden email]> wrote:
> > > Fundamentally, btvacuumpage() doesn't freeze 32-bit XIDs (from
> > > bpto.xact) when it recycles deleted pages. It simply puts them in the
> > > FSM without changing anything about the page itself. This means
> > > surprisingly little in the context of nbtree: the
> > > _bt_page_recyclable() XID check that takes place in btvacuumpage()
> > > also takes place in _bt_getbuf(), at the point where the page actually
> > > gets recycled by the client. That's not great.
> >
> > I think it's quite foolish for btvacuumpage() to not freeze xids. If we
> > only do so when necessary (i.e. older than a potential new relfrozenxid,
> > and only when the vacuum didn't yet skip pages), the costs are pretty
> > miniscule.
>
> I wonder if we should just bite the bullet and mark pages that are
> recycled by VACUUM as explicitly recycled, with WAL-logging and
> everything (this is like freezing, but stronger). That way, the
> _bt_page_recyclable() call within _bt_getbuf() would only be required
> to check that state (while btvacuumpage() would use something like a
> _bt_page_eligible_for_recycling(), which would do the same thing as
> the current _bt_page_recyclable()).

I'm not sure I see the advantage. Only doing so in the freezing case
seems unlikely to cause additional conflicts, but I'm less sure about
doing it always. btpo.xact will often be quite recent, right? So it'd
likely cause more conflicts.


> I'm not sure how low the costs would be, but at least we'd only have
> to do it once per already-deleted page (i.e. only the first time a
> VACUUM operation found _bt_page_eligible_for_recycling() returned true
> for the page and marked it recycled in a crash safe manner). That
> design would be quite a lot simpler, because it expresses the problem
> in terms that make sense to the nbtree code. _bt_getbuf() should not
> have to make a distinction between "possibly recycled" versus
> "definitely recycled".

I don't really see the problem with the check in _bt_getbuf()? I'd
actually like to be *more* aggressive about putting pages in the FSM (or
whatever), and that'd probably require checks like this. E.g. whenever
we unlink a page, we should put it into the FSM (or something different,
see below). And then do all the necessary checks in _bt_getbuf().

It's pretty sad that one right now basically needs to vacuum twice to
reuse pages in nbtree (once to delete the page, once to record it in the
fsm). Usually the xmin horizon should advance much more quickly than
that, allowing reuse earlier.

As far as I can tell, even just adding them to the FSM when setting
ISDELETED would be advantageous. There's obviously that we'll cause
backends to visit a lot of pages that can't actually be reused... But if
we did what I suggest below, that danger probably could largely be
avoided.


> It makes sense that the FSM is not crash safe, I suppose, but we're
> talking about relatively large amounts of free space here. Can't we
> just do it properly/reliably?

What do you mean by that? To have the FSM be crash-safe?

It could make sense to just not have the FSM, and have a linked-list
style stack of pages reachable from the meta page.  That'd be especially
advantageous if we kept xids associated with the "about to be
recyclable" pages in the metapage, so it'd be cheap to evaluate.

There's possibly some not-entirely-trivial locking concerns around such
a list. Adding entries seems easy enough, because we currently only
delete pages from within vacuum. But popping entries could be more
complicated, I'm not exactly sure if there are potential lock nesting
issues (nor am I actually sure there aren't existing ones).

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Wed, Apr 29, 2020 at 2:56 PM Andres Freund <[hidden email]> wrote:
> I'm not sure I see the advantage. Only doing so in the freezing case
> seems unlikely to cause additional conflicts, but I'm less sure about
> doing it always. btpo.xact will often be quite recent, right? So it'd
> likely cause more conflicts.

btpo.xact values come from a call to ReadNewTransactionId(). There
pretty much has to be one call to ReadNewTransactionId() per page
deletion (see comments about it within _bt_unlink_halfdead_page()). So
yes, I'd say that it could be very recent.

I don't mind continuing to do the conflicts in _bt_getbuf(), which
would delay it until the point where we really need the page --
especially if we could do that in a way that captures temporal
locality (e.g. your recyclable page chaining idea).

> I don't really see the problem with the check in _bt_getbuf()? I'd
> actually like to be *more* aggressive about putting pages in the FSM (or
> whatever), and that'd probably require checks like this. E.g. whenever
> we unlink a page, we should put it into the FSM (or something different,
> see below). And then do all the necessary checks in _bt_getbuf().

Basically, I would like to have a page state that represents "it
should be impossible for any scan to land on this page, except for
btvacuumscan()". Without relying on 32-bit XIDs, and ideally without
relying on any other state to interpret what it really means. In
principle we can set a deleted page to that state at the earliest
possible point when that becomes true, without it meaning anything
else, or requiring that we do anything else at the same time (e.g.
actually using it for the right half of a page in a page split,
generating recovery conflicts).

> It's pretty sad that one right now basically needs to vacuum twice to
> reuse pages in nbtree (once to delete the page, once to record it in the
> fsm). Usually the xmin horizon should advance much more quickly than
> that, allowing reuse earlier.

Yes, that's definitely bad. I like the general idea of making us more
aggressive with recycling. Marking pages as "recyclable" (not
"recycled") not too long after they first get deleted in VACUUM, and
then separately recycling them in _bt_getbuf() is a cleaner design.
Separation of concerns. That would build confidence in a more
aggressive approach -- we could add lots of cross-checks against
landing on a recyclable page. Note that we have had a bug in this
exact mechanism in the past -- see commit d3abbbebe52.

If there was a bug then we might still land on the page after it gets
fully recycled, in which case the cross-checks won't detect the bug.
But ISTM that we always have a good chance of landing on the page
before that happens, in which case the cross-check complains and we
get a log message, and possibly even a bug report. We don't have to be
truly lucky to see when our approach is buggy when we go on to make
page deletion more aggressive (in whatever way). And we get the same
cross-checks on standbys.

> > It makes sense that the FSM is not crash safe, I suppose, but we're
> > talking about relatively large amounts of free space here. Can't we
> > just do it properly/reliably?
>
> What do you mean by that? To have the FSM be crash-safe?

That, or the equivalent (pretty much your chaining idea) may well be a
good idea. But what I really meant was an explicit "recyclable" page
state. That's all. We may or may not also continue to rely on the FSM
in the same way.

I suppose that we should try to get rid of the FSM in nbtree. I see
the advantages. It's not essential to my proposal, though.

> It could make sense to just not have the FSM, and have a linked-list
> style stack of pages reachable from the meta page.  That'd be especially
> advantageous if we kept xids associated with the "about to be
> recyclable" pages in the metapage, so it'd be cheap to evaluate.

I like that idea. But doesn't that also argue for delaying the
conflicts until we actually recycle a "recyclable" page?

> There's possibly some not-entirely-trivial locking concerns around such
> a list. Adding entries seems easy enough, because we currently only
> delete pages from within vacuum. But popping entries could be more
> complicated, I'm not exactly sure if there are potential lock nesting
> issues (nor am I actually sure there aren't existing ones).

A "recyclable" page state might help with this, too. _bt_getbuf() is a
bag of tricks, even leaving aside generating recovery conflicts.

If we are going to be more eager, then the cost of dirtying the page a
second time to mark it "recyclable" might mostly not matter.
Especially if we chain the pages. That is, maybe VACUUM recomputes
RecentGlobalXmin at the end of its first btvacuumscan() scan (or at
the end of the whole VACUUM operation), when it notices that it is
already possible to mark many pages as "recyclable". Perhaps we won't
write out the page twice much of the time, because it won't have been
that long since VACUUM dirtied the page in order to delete it.

Yeah, we could be a lot more aggressive here, in a bunch of ways. As
I've said quite a few times, it seems like our implementation of "the
drain technique" is way more conservative than it needs to be (i.e. we
use ReadNewTransactionId() without considering any of the specifics of
the index). But if we mess up, we can't expect amcheck to detect the
problems, which would be totally transient. We're talking about
incredibly subtle concurrency bugs. So IMV it's just not going to
happen until the whole thing becomes way less scary.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
In reply to this post by Robert Haas
On Sat, 18 Apr 2020 at 03:22, Robert Haas <[hidden email]> wrote:

>
> On Thu, Apr 16, 2020 at 6:49 PM Andres Freund <[hidden email]> wrote:
> > Yea. _bt_vacuum_needs_cleanup() needs to check if
> > metad->btm_oldest_btpo_xact is older than the FreezeLimit computed by
> > vacuum_set_xid_limits() and vacuum the index if so even if INDEX_CLEANUP
> > false.
>
> I'm still fairly unclear on what the actual problem is here, and on
> how we propose to fix it. It seems to me that we probably don't have a
> problem in the case where we don't advance relfrozenxid or relminmxid,
> because in that case there's not much difference between the behavior
> created by this patch and a case where we just error out due to an
> interrupt or something before reaching the index cleanup stage. I
> think that the problem is that in the case where we do relfrozenxid,
> we might advance it past some XID value stored in the index metadata.
> Is that right?

I think advancing relfrozenxid past oldest_btpo_xact actually cannot
be a problem. If a subsequent vacuum sees oldest_btpo_xact is an old
xid, we can recycle pages. Before introducing to INDEX_CLEANUP =
false, we used to invoke either bulkdelete or vaucumcleanup at least
once in each vacuum. And thanks to relfrozenxid, a table is
periodically vacuumed by an anti-wraparound vacuum. But with this
feature, we can unconditionally skip both bulkdelete and
vacuumcleanup. So IIUC the problem is that since we skip both,
oldst_btpo_xact could be seen as a "future" xid during vacuum. Which
will be a cause of that vacuum misses pages which can actually be
recycled.

I think we can fix this issue by calling vacuumcleanup callback when
an anti-wraparound vacuum even if INDEX_CLEANUP is false. That way we can
let index AM make decisions whether doing cleanup index at least once
until XID wraparound, same as before. Originally the motivation of
disabling INDEX_CLEANUP was to skip index full scan when
anti-wraparound vacuum to reduce the execution time. By this
change, we will end up doing an index full scan also in some
anti-wraparound vacuum case but we still can skip that if there is no
recyclable page in an index.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Tue, May 5, 2020 at 2:52 PM Masahiko Sawada
<[hidden email]> wrote:
> So IIUC the problem is that since we skip both,
> oldst_btpo_xact could be seen as a "future" xid during vacuum. Which
> will be a cause of that vacuum misses pages which can actually be
> recycled.

This is also my understanding of the problem.

> I think we can fix this issue by calling vacuumcleanup callback when
> an anti-wraparound vacuum even if INDEX_CLEANUP is false. That way we can
> let index AM make decisions whether doing cleanup index at least once
> until XID wraparound, same as before.

+1

Can you work on a patch?

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Wed, 6 May 2020 at 07:14, Peter Geoghegan <[hidden email]> wrote:

>
> On Tue, May 5, 2020 at 2:52 PM Masahiko Sawada
> <[hidden email]> wrote:
> > So IIUC the problem is that since we skip both,
> > oldst_btpo_xact could be seen as a "future" xid during vacuum. Which
> > will be a cause of that vacuum misses pages which can actually be
> > recycled.
>
> This is also my understanding of the problem.
>
> > I think we can fix this issue by calling vacuumcleanup callback when
> > an anti-wraparound vacuum even if INDEX_CLEANUP is false. That way we can
> > let index AM make decisions whether doing cleanup index at least once
> > until XID wraparound, same as before.
>
> +1
>
> Can you work on a patch?

Yes, I'll submit a bug fix patch.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Wed, 6 May 2020 at 07:17, Masahiko Sawada
<[hidden email]> wrote:

>
> On Wed, 6 May 2020 at 07:14, Peter Geoghegan <[hidden email]> wrote:
> >
> > On Tue, May 5, 2020 at 2:52 PM Masahiko Sawada
> > <[hidden email]> wrote:
> > > So IIUC the problem is that since we skip both,
> > > oldst_btpo_xact could be seen as a "future" xid during vacuum. Which
> > > will be a cause of that vacuum misses pages which can actually be
> > > recycled.
> >
> > This is also my understanding of the problem.
> >
> > > I think we can fix this issue by calling vacuumcleanup callback when
> > > an anti-wraparound vacuum even if INDEX_CLEANUP is false. That way we can
> > > let index AM make decisions whether doing cleanup index at least once
> > > until XID wraparound, same as before.
> >
> > +1
> >
> > Can you work on a patch?
>
> Yes, I'll submit a bug fix patch.
>
I've attached the patch fixes this issue.

With this patch, we don't skip only index cleanup phase when
performing an aggressive vacuum. The reason why I don't skip only
index cleanup phase is that index vacuum phase can be called multiple
times, which takes a very long time. Since the purpose of this index
cleanup is to process recyclable pages it's enough to do only index
cleanup phase. However it also means we do index cleanup even when
table might have garbage whereas we used to call index cleanup only
when there is no garbage on a table. As far as I can think it's no
problem but perhaps needs more research.


Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

fix_index_cleanup.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Wed, May 6, 2020 at 2:28 AM Masahiko Sawada
<[hidden email]> wrote:
> I've attached the patch fixes this issue.
>
> With this patch, we don't skip only index cleanup phase when
> performing an aggressive vacuum. The reason why I don't skip only
> index cleanup phase is that index vacuum phase can be called multiple
> times, which takes a very long time. Since the purpose of this index
> cleanup is to process recyclable pages it's enough to do only index
> cleanup phase.

That's only true in nbtree, though. The way that I imagined we'd want
to fix this is by putting control in each index access method. So,
we'd revise the way that amvacuumcleanup() worked -- the
amvacuumcleanup routine for each index AM would sometimes be called in
a mode that means "I don't really want you to do any cleanup, but if
you absolutely have to for your own reasons then you can". This could
be represented using something similar to
IndexVacuumInfo.analyze_only.

This approach has an obvious disadvantage: the patch really has to
teach *every* index AM to do something with that state (most will
simply do no work). It seems logical to have the index AM control what
happens, though. This allows the logic to live inside
_bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
place where we make decisions like this.

Most other AMs don't have this problem. GiST has a similar issue with
recyclable pages, except that it doesn't use 32-bit XIDs so it doesn't
need to care about this stuff at all. Besides, it seems like it might
be a good idea to do other basic maintenance of the index when we're
"skipping" index cleanup. We probably should always do things that
require only a small, fixed amount of work. Things like maintaining
metadata in the metapage.

There may be practical reasons why this approach isn't suitable for
backpatch even if it is a superior approach. What do you think? Also,
what do you think about this Robert and Andres?

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Wed, May 6, 2020 at 11:28 AM Peter Geoghegan <[hidden email]> wrote:
> This approach has an obvious disadvantage: the patch really has to
> teach *every* index AM to do something with that state (most will
> simply do no work). It seems logical to have the index AM control what
> happens, though. This allows the logic to live inside
> _bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
> place where we make decisions like this.

Also, do we really want to skip summarization of BRIN indexes? This
cleanup is rather dissimilar to the cleanup that takes place in most
other AMs -- it isn't really that related to garbage collection (BRIN
is rather unique in this respect). I think that BRIN might be an
inappropriate target for "index_cleanup off" VACUUMs for that reason.

See the explanation of how this takes place from the docs:
https://www.postgresql.org/docs/devel/brin-intro.html#BRIN-OPERATION

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Alvaro Herrera-9
On 2020-May-06, Peter Geoghegan wrote:

> Also, do we really want to skip summarization of BRIN indexes? This
> cleanup is rather dissimilar to the cleanup that takes place in most
> other AMs -- it isn't really that related to garbage collection (BRIN
> is rather unique in this respect). I think that BRIN might be an
> inappropriate target for "index_cleanup off" VACUUMs for that reason.
>
> See the explanation of how this takes place from the docs:
> https://www.postgresql.org/docs/devel/brin-intro.html#BRIN-OPERATION

Good question.  I agree that BRIN summarization is not at all related to
what other index AMs do during the cleanup phase.  However, if the
index_cleanup feature is there to make it faster to process a table
that's nearing wraparound hazards (or at least the warnings), then I
think it makes sense to complete the vacuum as fast as possible -- which
includes not trying to summarize it for brin indexes.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Wed, May 6, 2020 at 1:06 PM Alvaro Herrera <[hidden email]> wrote:
> Good question.  I agree that BRIN summarization is not at all related to
> what other index AMs do during the cleanup phase.  However, if the
> index_cleanup feature is there to make it faster to process a table
> that's nearing wraparound hazards (or at least the warnings), then I
> think it makes sense to complete the vacuum as fast as possible -- which
> includes not trying to summarize it for brin indexes.

I forgot about the fact that the AutoVacuumRequestWork() interface
exists at all until just now. That's how "autosummarize = on" makes
sure that autosummarization takes place. These work items are not
affected by the fact that the VACUUM happens to be a "index_cleanup
off" VACUUM. Fortunately, the user is required to explicitly opt-in to
autosummarization (by setting "autosummarize = on") in order for
autovacuum to spend extra time processing work items when it might be
important to advance relfrozenxid ASAP. (My initial assumption was
that the autosummarization business happened within
brinvacuumcleanup(), but I now see that I was mistaken.)

There is a separate question (nothing to do with summarization) about
the cleanup steps performed in brinvacuumcleanup(), which are unlike
any of the cleanup/maintenance that we expect for an amvacuumcleanup
routine in general. As I said in my last e-mail, these steps have
nothing to do with garbage tuples. Rather, it's deferred maintenance
that we need to do even with append-only workloads (including when
autosummarization has not been enabled). What about that? Is that
okay?

ISTM that the fundamental issue is that BRIN imagines that it is in
control, which isn't quite true in light of the "index_cleanup off"
stuff -- a call to brinvacuumcleanup() is expected to take place at
fairly consistent intervals to take care of revmap processing, which,
in general, might not happen now. I blame commit a96c41feec6 for this,
not BRIN. ISTM that whatever behavior we deem appropriate, the proper
place to decide on it is within BRIN. Not within vacuumlazy.c.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
In reply to this post by Peter Geoghegan-4
On Thu, 7 May 2020 at 03:28, Peter Geoghegan <[hidden email]> wrote:

>
> On Wed, May 6, 2020 at 2:28 AM Masahiko Sawada
> <[hidden email]> wrote:
> > I've attached the patch fixes this issue.
> >
> > With this patch, we don't skip only index cleanup phase when
> > performing an aggressive vacuum. The reason why I don't skip only
> > index cleanup phase is that index vacuum phase can be called multiple
> > times, which takes a very long time. Since the purpose of this index
> > cleanup is to process recyclable pages it's enough to do only index
> > cleanup phase.
>
> That's only true in nbtree, though. The way that I imagined we'd want
> to fix this is by putting control in each index access method. So,
> we'd revise the way that amvacuumcleanup() worked -- the
> amvacuumcleanup routine for each index AM would sometimes be called in
> a mode that means "I don't really want you to do any cleanup, but if
> you absolutely have to for your own reasons then you can". This could
> be represented using something similar to
> IndexVacuumInfo.analyze_only.
>
> This approach has an obvious disadvantage: the patch really has to
> teach *every* index AM to do something with that state (most will
> simply do no work). It seems logical to have the index AM control what
> happens, though. This allows the logic to live inside
> _bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
> place where we make decisions like this.
>
> Most other AMs don't have this problem. GiST has a similar issue with
> recyclable pages, except that it doesn't use 32-bit XIDs so it doesn't
> need to care about this stuff at all. Besides, it seems like it might
> be a good idea to do other basic maintenance of the index when we're
> "skipping" index cleanup. We probably should always do things that
> require only a small, fixed amount of work. Things like maintaining
> metadata in the metapage.
>
> There may be practical reasons why this approach isn't suitable for
> backpatch even if it is a superior approach. What do you think?

I agree this idea is better. I was thinking the same approach but I
was concerned about backpatching. Especially since I was thinking to
add one or two fields to IndexVacuumInfo existing index AM might not
work with the new VacuumInfo structure.

If we go with this idea, we need to change lazy vacuum so that it uses
two-pass strategy vacuum even if INDEX_CLEANUP is false. Also in
parallel vacuum, we need to launch workers. But I think these changes
are no big problem.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Thu, 7 May 2020 at 15:40, Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 7 May 2020 at 03:28, Peter Geoghegan <[hidden email]> wrote:
> >
> > On Wed, May 6, 2020 at 2:28 AM Masahiko Sawada
> > <[hidden email]> wrote:
> > > I've attached the patch fixes this issue.
> > >
> > > With this patch, we don't skip only index cleanup phase when
> > > performing an aggressive vacuum. The reason why I don't skip only
> > > index cleanup phase is that index vacuum phase can be called multiple
> > > times, which takes a very long time. Since the purpose of this index
> > > cleanup is to process recyclable pages it's enough to do only index
> > > cleanup phase.
> >
> > That's only true in nbtree, though. The way that I imagined we'd want
> > to fix this is by putting control in each index access method. So,
> > we'd revise the way that amvacuumcleanup() worked -- the
> > amvacuumcleanup routine for each index AM would sometimes be called in
> > a mode that means "I don't really want you to do any cleanup, but if
> > you absolutely have to for your own reasons then you can". This could
> > be represented using something similar to
> > IndexVacuumInfo.analyze_only.
> >
> > This approach has an obvious disadvantage: the patch really has to
> > teach *every* index AM to do something with that state (most will
> > simply do no work). It seems logical to have the index AM control what
> > happens, though. This allows the logic to live inside
> > _bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
> > place where we make decisions like this.
> >
> > Most other AMs don't have this problem. GiST has a similar issue with
> > recyclable pages, except that it doesn't use 32-bit XIDs so it doesn't
> > need to care about this stuff at all. Besides, it seems like it might
> > be a good idea to do other basic maintenance of the index when we're
> > "skipping" index cleanup. We probably should always do things that
> > require only a small, fixed amount of work. Things like maintaining
> > metadata in the metapage.
> >
> > There may be practical reasons why this approach isn't suitable for
> > backpatch even if it is a superior approach. What do you think?
>
> I agree this idea is better. I was thinking the same approach but I
> was concerned about backpatching. Especially since I was thinking to
> add one or two fields to IndexVacuumInfo existing index AM might not
> work with the new VacuumInfo structure.

It would be ok if we added these fields at the end of VacuumInfo structure?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Thu, 7 May 2020 at 16:26, Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 7 May 2020 at 15:40, Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Thu, 7 May 2020 at 03:28, Peter Geoghegan <[hidden email]> wrote:
> > >
> > > On Wed, May 6, 2020 at 2:28 AM Masahiko Sawada
> > > <[hidden email]> wrote:
> > > > I've attached the patch fixes this issue.
> > > >
> > > > With this patch, we don't skip only index cleanup phase when
> > > > performing an aggressive vacuum. The reason why I don't skip only
> > > > index cleanup phase is that index vacuum phase can be called multiple
> > > > times, which takes a very long time. Since the purpose of this index
> > > > cleanup is to process recyclable pages it's enough to do only index
> > > > cleanup phase.
> > >
> > > That's only true in nbtree, though. The way that I imagined we'd want
> > > to fix this is by putting control in each index access method. So,
> > > we'd revise the way that amvacuumcleanup() worked -- the
> > > amvacuumcleanup routine for each index AM would sometimes be called in
> > > a mode that means "I don't really want you to do any cleanup, but if
> > > you absolutely have to for your own reasons then you can". This could
> > > be represented using something similar to
> > > IndexVacuumInfo.analyze_only.
> > >
> > > This approach has an obvious disadvantage: the patch really has to
> > > teach *every* index AM to do something with that state (most will
> > > simply do no work). It seems logical to have the index AM control what
> > > happens, though. This allows the logic to live inside
> > > _bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
> > > place where we make decisions like this.
> > >
> > > Most other AMs don't have this problem. GiST has a similar issue with
> > > recyclable pages, except that it doesn't use 32-bit XIDs so it doesn't
> > > need to care about this stuff at all. Besides, it seems like it might
> > > be a good idea to do other basic maintenance of the index when we're
> > > "skipping" index cleanup. We probably should always do things that
> > > require only a small, fixed amount of work. Things like maintaining
> > > metadata in the metapage.
> > >
> > > There may be practical reasons why this approach isn't suitable for
> > > backpatch even if it is a superior approach. What do you think?
> >
> > I agree this idea is better. I was thinking the same approach but I
> > was concerned about backpatching. Especially since I was thinking to
> > add one or two fields to IndexVacuumInfo existing index AM might not
> > work with the new VacuumInfo structure.
>
> It would be ok if we added these fields at the end of VacuumInfo structure?
>
I've attached WIP patch for HEAD. With this patch, the core pass
index_cleanup to bulkdelete and vacuumcleanup callbacks so that they
can make decision whether run vacuum or not.

If the direction of this patch seems good, I'll change the patch so
that we pass something information to these callbacks indicating
whether this vacuum is anti-wraparound vacuum. This is necessary
because it's enough to invoke index cleanup before XID wraparound as
per discussion so far.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

fix_index_cleanup_v2_WIP.patch (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Mon, May 18, 2020 at 7:32 PM Masahiko Sawada
<[hidden email]> wrote:
> I've attached WIP patch for HEAD. With this patch, the core pass
> index_cleanup to bulkdelete and vacuumcleanup callbacks so that they
> can make decision whether run vacuum or not.
>
> If the direction of this patch seems good, I'll change the patch so
> that we pass something information to these callbacks indicating
> whether this vacuum is anti-wraparound vacuum. This is necessary
> because it's enough to invoke index cleanup before XID wraparound as
> per discussion so far.

It. seems like the right direction to me. Robert?

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Fri, May 22, 2020 at 1:40 PM Peter Geoghegan <[hidden email]> wrote:
> It. seems like the right direction to me. Robert?

BTW, this is an interesting report of somebody using the INDEX_CLEANUP
feature when they had to deal with a difficult production issue:

https://www.buildkitestatus.com/incidents/h0vnx4gp7djx

This report is not really relevant to our discussion, but I thought
you might find it interesting.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
In reply to this post by Masahiko Sawada-2
On Tue, 19 May 2020 at 11:32, Masahiko Sawada
<[hidden email]> wrote:

>
> On Thu, 7 May 2020 at 16:26, Masahiko Sawada
> <[hidden email]> wrote:
> >
> > On Thu, 7 May 2020 at 15:40, Masahiko Sawada
> > <[hidden email]> wrote:
> > >
> > > On Thu, 7 May 2020 at 03:28, Peter Geoghegan <[hidden email]> wrote:
> > > >
> > > > On Wed, May 6, 2020 at 2:28 AM Masahiko Sawada
> > > > <[hidden email]> wrote:
> > > > > I've attached the patch fixes this issue.
> > > > >
> > > > > With this patch, we don't skip only index cleanup phase when
> > > > > performing an aggressive vacuum. The reason why I don't skip only
> > > > > index cleanup phase is that index vacuum phase can be called multiple
> > > > > times, which takes a very long time. Since the purpose of this index
> > > > > cleanup is to process recyclable pages it's enough to do only index
> > > > > cleanup phase.
> > > >
> > > > That's only true in nbtree, though. The way that I imagined we'd want
> > > > to fix this is by putting control in each index access method. So,
> > > > we'd revise the way that amvacuumcleanup() worked -- the
> > > > amvacuumcleanup routine for each index AM would sometimes be called in
> > > > a mode that means "I don't really want you to do any cleanup, but if
> > > > you absolutely have to for your own reasons then you can". This could
> > > > be represented using something similar to
> > > > IndexVacuumInfo.analyze_only.
> > > >
> > > > This approach has an obvious disadvantage: the patch really has to
> > > > teach *every* index AM to do something with that state (most will
> > > > simply do no work). It seems logical to have the index AM control what
> > > > happens, though. This allows the logic to live inside
> > > > _bt_vacuum_needs_cleanup() in the case of nbtree, so there is only one
> > > > place where we make decisions like this.
> > > >
> > > > Most other AMs don't have this problem. GiST has a similar issue with
> > > > recyclable pages, except that it doesn't use 32-bit XIDs so it doesn't
> > > > need to care about this stuff at all. Besides, it seems like it might
> > > > be a good idea to do other basic maintenance of the index when we're
> > > > "skipping" index cleanup. We probably should always do things that
> > > > require only a small, fixed amount of work. Things like maintaining
> > > > metadata in the metapage.
> > > >
> > > > There may be practical reasons why this approach isn't suitable for
> > > > backpatch even if it is a superior approach. What do you think?
> > >
> > > I agree this idea is better. I was thinking the same approach but I
> > > was concerned about backpatching. Especially since I was thinking to
> > > add one or two fields to IndexVacuumInfo existing index AM might not
> > > work with the new VacuumInfo structure.
> >
> > It would be ok if we added these fields at the end of VacuumInfo structure?
> >
>
> I've attached WIP patch for HEAD. With this patch, the core pass
> index_cleanup to bulkdelete and vacuumcleanup callbacks so that they
> can make decision whether run vacuum or not.
>
> If the direction of this patch seems good, I'll change the patch so
> that we pass something information to these callbacks indicating
> whether this vacuum is anti-wraparound vacuum. This is necessary
> because it's enough to invoke index cleanup before XID wraparound as
> per discussion so far.
>
I've updated the patch so that vacuum passes is_wraparound flag to
bulkdelete and vacuumcleanup. Therefore I've added two new variables
in total: index_cleanup and is_wraparound. Index AMs can make the
decision of whether to skip bulkdelete and indexcleanup or not.

Also, I've added this item to Older Bugs so as not to forget.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

fix_index_cleanup_v2.patch (26K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Robert Haas
In reply to this post by Peter Geoghegan-4
On Fri, May 22, 2020 at 4:40 PM Peter Geoghegan <[hidden email]> wrote:

> On Mon, May 18, 2020 at 7:32 PM Masahiko Sawada
> <[hidden email]> wrote:
> > I've attached WIP patch for HEAD. With this patch, the core pass
> > index_cleanup to bulkdelete and vacuumcleanup callbacks so that they
> > can make decision whether run vacuum or not.
> >
> > If the direction of this patch seems good, I'll change the patch so
> > that we pass something information to these callbacks indicating
> > whether this vacuum is anti-wraparound vacuum. This is necessary
> > because it's enough to invoke index cleanup before XID wraparound as
> > per discussion so far.
>
> It. seems like the right direction to me. Robert?

Sorry, I'm so far behind on my email. Argh.

I think, especially on the blog post you linked, that we should aim to
have INDEX_CLEANUP OFF mode do the minimum possible amount of work
while still keeping us safe against transaction ID wraparound. So if,
for example, it's desirable but not imperative for BRIN to
resummarize, then it's OK normally but should be skipped with
INDEX_CLEANUP OFF.

I find the patch itself confusing and the comments inadequate,
especially the changes to lazy_scan_heap(). Before the INDEX_CLEANUP
patch went into the tree, LVRelStats had a member hasindex indicating
whether or not the table had any indexes. The patch changed that
member to useindex, indicating whether or not we were going to do
index vacuuming; thus, it would be false if either there were no
indexes or if we were going to ignore them. This patch redefines
useindex to mean whether or not the table has any indexes, but without
renaming the variable. There's also really no comments anywhere in the
vacuumlazy.c explaining the overall scheme here; what are we actually
doing? Apparently, what we're really doing here is that even when
INDEX_CLEANUP is OFF, we're still going to keep all the dead tuples.
That seems sad, but if it's what we have to do then it at least needs
comments explaining it.

As for the btree portion of the change, I expect you understand this
better than I do, so I'm reluctant to stick my neck out, but it seems
that what the patch does is force index cleanup to happen even when
INDEX_CLEANUP is OFF provided that the vacuum is for wraparound and
the btree index has at least 1 recyclable page. My first reaction is
to wonder whether that doesn't nerf this feature into oblivion. Isn't
it likely that an index that is being vacuumed for wraparound will
have a recyclable page someplace? If the presence of that 1 recyclable
page causes the "help, I'm about to run out of XIDs, please do the
least possible work" flag to become a no-op, I don't think users are
going to be too happy with that. Maybe I am misunderstanding.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Peter Geoghegan-4
On Wed, Jun 24, 2020 at 10:21 AM Robert Haas <[hidden email]> wrote:
> Sorry, I'm so far behind on my email. Argh.

That's okay.

> I think, especially on the blog post you linked, that we should aim to
> have INDEX_CLEANUP OFF mode do the minimum possible amount of work
> while still keeping us safe against transaction ID wraparound. So if,
> for example, it's desirable but not imperative for BRIN to
> resummarize, then it's OK normally but should be skipped with
> INDEX_CLEANUP OFF.

I agree that that's very important.

> Apparently, what we're really doing here is that even when
> INDEX_CLEANUP is OFF, we're still going to keep all the dead tuples.
> That seems sad, but if it's what we have to do then it at least needs
> comments explaining it.

+1. Though I think that AMs should technically have the right to
consider it advisory.

> As for the btree portion of the change, I expect you understand this
> better than I do, so I'm reluctant to stick my neck out, but it seems
> that what the patch does is force index cleanup to happen even when
> INDEX_CLEANUP is OFF provided that the vacuum is for wraparound and
> the btree index has at least 1 recyclable page. My first reaction is
> to wonder whether that doesn't nerf this feature into oblivion. Isn't
> it likely that an index that is being vacuumed for wraparound will
> have a recyclable page someplace? If the presence of that 1 recyclable
> page causes the "help, I'm about to run out of XIDs, please do the
> least possible work" flag to become a no-op, I don't think users are
> going to be too happy with that. Maybe I am misunderstanding.

I have mixed feelings about it myself. These are valid concerns.

This is a problem to the extent that the user has a table that they'd
like to use INDEX_CLEANUP with, that has indexes that regularly
require cleanup due to page deletion. ISTM, then, that the really
relevant high level design questions for this patch are:

1. How often is that likely to happen in The Real World™?

2. If we fail to do cleanup and leak already-deleted pages, how bad is
that? ( Both in general, and in the worst case.)

I'll hazard a guess for 1: I think that it might not come up that
often. Page deletion is often something that we hardly ever need. And,
unlike some DB systems, we only do it when pages are fully empty
(which, as it turns out, isn't necessarily better than our simple
approach [1]). I tend to think it's unlikely to happen in cases where
INDEX_CLEANUP is used, because those are cases that also must not have
that much index churn to begin with.

Then there's question 2. The intuition behind the approach from
Sawada-san's patch was that allowing wraparound here feels wrong -- it
should be in the AM's hands. However, it's not like I can point to
some ironclad guarantee about not leaking deleted pages that existed
before the INDEX_CLEANUP feature. We know that the FSM is not crash
safe, and that's that. Is this really all that different? Maybe it is,
but it seems like a quantitative difference to me.

I'm kind of arguing against myself even as I try to advance my
original argument. If workloads that use INDEX_CLEANUP don't need to
delete and recycle pages in any case, then why should we care that
those same workloads might leak pages on account of the wraparound
hazard? There's nothing to leak! Maybe some compromise is possible, at
least for the backbranches. Perhaps nbtree is told about the
requirements in a bit more detail than we'd imagined. It's not just an
INDEX_CLEANUP boolean. It could be an enum or something. Not sure,
though.

The real reason that I want to push the mechanism down into index
access methods is because that design is clearly better overall; it
just so happens that the specific way in which we currently defer
recycling in nbtree makes very little sense, so it's harder to see the
big picture. The xid-cleanup design that we have was approximately the
easiest way to do it, so that's what we got. We should figure out a
way to recycle the pages at something close to the earliest possible
opportunity, without having to perform a full scan on the index
relation within btvacuumscan(). Maybe we can use the autovacuum work
item mechanism for that. For indexes that get VACUUMed once a week on
average, it makes zero sense to wait another week to recycle the pages
that get deleted, in a staggered fashion. It should be possible to
recycle the pages a minute or two after VACUUM proper finishes, with
extra work that's proportionate to the number of deleted pages. This
is still conservative. I am currently very busy with an unrelated
B-Tree prototype, so I might not get around to it this year. Maybe
Sawada-san can think about this?

Also, handling of the vacuum_cleanup_index_scale_factor stuff (added
to Postgres 11 by commit 857f9c36) should live next to code for the
confusingly-similar INDEX_CLEANUP stuff (added to Postgres 12 by
commit a96c41feec6), on general principle. I think that that
organization is a lot easier to follow.

[1] https://www.sciencedirect.com/science/article/pii/002200009390020W
--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: xid wraparound danger due to INDEX_CLEANUP false

Masahiko Sawada-2
On Thu, 25 Jun 2020 at 05:02, Peter Geoghegan <[hidden email]> wrote:

>
> On Wed, Jun 24, 2020 at 10:21 AM Robert Haas <[hidden email]> wrote:
> > Sorry, I'm so far behind on my email. Argh.
>
> That's okay.
>
> > I think, especially on the blog post you linked, that we should aim to
> > have INDEX_CLEANUP OFF mode do the minimum possible amount of work
> > while still keeping us safe against transaction ID wraparound. So if,
> > for example, it's desirable but not imperative for BRIN to
> > resummarize, then it's OK normally but should be skipped with
> > INDEX_CLEANUP OFF.
>
> I agree that that's very important.
>
> > Apparently, what we're really doing here is that even when
> > INDEX_CLEANUP is OFF, we're still going to keep all the dead tuples.
> > That seems sad, but if it's what we have to do then it at least needs
> > comments explaining it.
>
> +1. Though I think that AMs should technically have the right to
> consider it advisory.
>
> > As for the btree portion of the change, I expect you understand this
> > better than I do, so I'm reluctant to stick my neck out, but it seems
> > that what the patch does is force index cleanup to happen even when
> > INDEX_CLEANUP is OFF provided that the vacuum is for wraparound and
> > the btree index has at least 1 recyclable page. My first reaction is
> > to wonder whether that doesn't nerf this feature into oblivion. Isn't
> > it likely that an index that is being vacuumed for wraparound will
> > have a recyclable page someplace? If the presence of that 1 recyclable
> > page causes the "help, I'm about to run out of XIDs, please do the
> > least possible work" flag to become a no-op, I don't think users are
> > going to be too happy with that. Maybe I am misunderstanding.
>
> I have mixed feelings about it myself. These are valid concerns.
>
> This is a problem to the extent that the user has a table that they'd
> like to use INDEX_CLEANUP with, that has indexes that regularly
> require cleanup due to page deletion. ISTM, then, that the really
> relevant high level design questions for this patch are:
>
> 1. How often is that likely to happen in The Real World™?
>
> 2. If we fail to do cleanup and leak already-deleted pages, how bad is
> that? ( Both in general, and in the worst case.)
>
> I'll hazard a guess for 1: I think that it might not come up that
> often. Page deletion is often something that we hardly ever need. And,
> unlike some DB systems, we only do it when pages are fully empty
> (which, as it turns out, isn't necessarily better than our simple
> approach [1]). I tend to think it's unlikely to happen in cases where
> INDEX_CLEANUP is used, because those are cases that also must not have
> that much index churn to begin with.
>
> Then there's question 2. The intuition behind the approach from
> Sawada-san's patch was that allowing wraparound here feels wrong -- it
> should be in the AM's hands. However, it's not like I can point to
> some ironclad guarantee about not leaking deleted pages that existed
> before the INDEX_CLEANUP feature. We know that the FSM is not crash
> safe, and that's that. Is this really all that different? Maybe it is,
> but it seems like a quantitative difference to me.

I think that with the approach implemented in my patch, it could be a
problem for the user that the user cannot easily know in advance
whether vacuum with INDEX_CLEANUP false will perform index cleanup,
even if page deletion doesn’t happen in most cases. They can check the
vacuum will be a wraparound vacuum but it’s relatively hard for users
to check in advance there are recyclable pages in the btree index.
This will be a restriction for users, especially those who want to use
INDEX_CLEANUP feature to speed up an impending XID wraparound vacuum
described on the blog post that Peter shared.

I don’t come up with a good solution to keep us safe against XID
wraparound yet but it seems to me that it’s better to have an option
that forces index cleanup not to happen.

>
> I'm kind of arguing against myself even as I try to advance my
> original argument. If workloads that use INDEX_CLEANUP don't need to
> delete and recycle pages in any case, then why should we care that
> those same workloads might leak pages on account of the wraparound
> hazard?

I had the same impression at first.

> The real reason that I want to push the mechanism down into index
> access methods is because that design is clearly better overall; it
> just so happens that the specific way in which we currently defer
> recycling in nbtree makes very little sense, so it's harder to see the
> big picture. The xid-cleanup design that we have was approximately the
> easiest way to do it, so that's what we got. We should figure out a
> way to recycle the pages at something close to the earliest possible
> opportunity, without having to perform a full scan on the index
> relation within btvacuumscan().

+1

> Maybe we can use the autovacuum work
> item mechanism for that. For indexes that get VACUUMed once a week on
> average, it makes zero sense to wait another week to recycle the pages
> that get deleted, in a staggered fashion. It should be possible to
> recycle the pages a minute or two after VACUUM proper finishes, with
> extra work that's proportionate to the number of deleted pages. This
> is still conservative. I am currently very busy with an unrelated
> B-Tree prototype, so I might not get around to it this year. Maybe
> Sawada-san can think about this?

I thought that btbulkdelete and/or btvacuumcleanup can register an
autovacuum work item to recycle the page that gets deleted but it
might not able to recycle those pages enough because the autovacuum
work items could be taken just after vacuum. And if page deletion is
relatively a rare case in practice, we might be able to take an
optimistic approach that vacuum registers deleted pages to FSM on the
deletion and a process who takes a free page checks if the page is
really recyclable. Anyway, I’ll try to think more about this.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


123