Improve search for missing parent downlinks in amcheck

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Improve search for missing parent downlinks in amcheck

Alexander Korotkov
Hi!

Currently we amcheck supports lossy checking for missing parent
downlinks.  It collects bitmap of downlink hashes and use it to check
subsequent tree level.  We've experienced some large corrupted indexes
which pass this check due to its looseness.

However, it seems to me we can implement this check in non-lossy
manner without making it significantly slower.  We anyway traverse
downlinks from parent to children in order to verify that hikeys are
corresponding to downlink keys.  We can also traverse from one
downlink to subsequent using rightlinks.  So, if there are some
intermediate pages between them, they are candidates to have missing
parent downlinks.  The patch is attached.

With this patch amcheck could successfully detect corruption for our
customer, which unpatched amcheck couldn't find.

Opinions?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

amcheck-btree-improve-missing-parent-downlinks-check.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Mon, Apr 15, 2019 at 7:30 PM Alexander Korotkov
<[hidden email]> wrote:
> Currently we amcheck supports lossy checking for missing parent
> downlinks.  It collects bitmap of downlink hashes and use it to check
> subsequent tree level.  We've experienced some large corrupted indexes
> which pass this check due to its looseness.

Can you be more specific? What was the cause of the corruption? I'm
always very interested in hearing about cases that amcheck could have
detected, but didn't.

Was the issue that the Bloom filter was simply undersized/ineffective?

> However, it seems to me we can implement this check in non-lossy
> manner without making it significantly slower.  We anyway traverse
> downlinks from parent to children in order to verify that hikeys are
> corresponding to downlink keys.

Actually, we don't check the high keys in children against the parent
(all other items are checked, though). We probably *should* do
something with the high key when verifying consistency across levels,
but currently we don't. (We only use the high key for the same-page
high key check -- more on this below.)

> We can also traverse from one
> downlink to subsequent using rightlinks.  So, if there are some
> intermediate pages between them, they are candidates to have missing
> parent downlinks.  The patch is attached.
>
> With this patch amcheck could successfully detect corruption for our
> customer, which unpatched amcheck couldn't find.

Maybe we can be a lot less conservative about sizing the Bloom filter
instead? That would be an easier fix IMV -- we can probably change
that logic to be a lot more aggressive without anybody noticing, since
the Bloom filter is already usually tiny. We are already not very
careful about saving work within bt_index_parent_check(), but with
this patch we follow each downlink twice instead of once. That seems
wasteful.

The reason I used a Bloom filter here is because I would eventually
like the missing downlink check to fingerprint entire tuples, not just
block numbers. In L&Y terms, the check could in the future fingerprint
the separator key and the downlink at the same time, not just the
downlink. That way, we could probe the Bloom filter on the next level
down for its high key (with the right sibling pointer set to be
consistent with the parent) iff we don't see that the page split was
interrupted (i.e. iff P_INCOMPLETE_SPLIT() bit is not set). Obviously
this would be a more effective form of verification, since we would
notice high key values that don't agree with the parent's values for
the same sibling/cousin/child block.

I didn't do it that way for v11 because of "minus infinity" items on
internal pages, which don't store the original key (the key remains
the high key of the left sibling page, but we truncate the original to
0 attributes in _bt_pgaddtup()). I think that we should eventually
stop truncating minus infinity items, and actually store a "low key"
at P_FIRSTDATAKEY() within internal pages instead. That would be
useful for other things anyway (e.g. prefix compression).

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Tue, Apr 16, 2019 at 12:00 PM Peter Geoghegan <[hidden email]> wrote:
> Can you be more specific? What was the cause of the corruption? I'm
> always very interested in hearing about cases that amcheck could have
> detected, but didn't.

FWIW, v4 indexes in Postgres 12 will support the new "rootdescend"
verification option, which isn't lossy, and would certainly have
detected your customer issue in practice. Admittedly the new check is
quite expensive, even compared to the other bt_index_parent_check()
checks, but it is nice that we now have a verification option that is
*extremely* thorough, and uses _bt_search() directly.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
In reply to this post by Peter Geoghegan-4
On Tue, Apr 16, 2019 at 12:00 PM Peter Geoghegan <[hidden email]> wrote:

> On Mon, Apr 15, 2019 at 7:30 PM Alexander Korotkov
> <[hidden email]> wrote:
> > Currently we amcheck supports lossy checking for missing parent
> > downlinks.  It collects bitmap of downlink hashes and use it to check
> > subsequent tree level.  We've experienced some large corrupted indexes
> > which pass this check due to its looseness.
>
> Can you be more specific? What was the cause of the corruption? I'm
> always very interested in hearing about cases that amcheck could have
> detected, but didn't.

Ping?

I am interested in doing better here. The background information seems
very interesting.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Alexander Korotkov
In reply to this post by Peter Geoghegan-4
On Tue, Apr 16, 2019 at 10:04 PM Peter Geoghegan <[hidden email]> wrote:

>
> On Tue, Apr 16, 2019 at 12:00 PM Peter Geoghegan <[hidden email]> wrote:
> > Can you be more specific? What was the cause of the corruption? I'm
> > always very interested in hearing about cases that amcheck could have
> > detected, but didn't.
>
> FWIW, v4 indexes in Postgres 12 will support the new "rootdescend"
> verification option, which isn't lossy, and would certainly have
> detected your customer issue in practice. Admittedly the new check is
> quite expensive, even compared to the other bt_index_parent_check()
> checks, but it is nice that we now have a verification option that is
> *extremely* thorough, and uses _bt_search() directly.

"rootdescend" is cool type of check.  Thank you for noticing, I wasn't aware of it.
But can it detect the missing downlink in following situation?

        A
     /     \
  B <-> C <-> D

Here A has downlinks to B and D, which downlink to C is missing,
while B, C and D are correctly connected with leftlinks and rightlinks.
I can see "rootdescend" calls _bt_search(), which would just step
right from C to D as if it was concurrent split.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Sat, Apr 27, 2019 at 4:57 PM Alexander Korotkov
<[hidden email]> wrote:

> "rootdescend" is cool type of check.  Thank you for noticing, I wasn't aware of it.
> But can it detect the missing downlink in following situation?
>
>         A
>      /     \
>   B <-> C <-> D
>
> Here A has downlinks to B and D, which downlink to C is missing,
> while B, C and D are correctly connected with leftlinks and rightlinks.
> I can see "rootdescend" calls _bt_search(), which would just step
> right from C to D as if it was concurrent split.

There is a comment about this scenario above bt_rootdescend() in amcheck.

You're right -- this is a kind of corruption that even the new
rootdescend verification option would miss. We can imagine a version
of rootdescend verification that tells the core code to only move
right when there was an *interrupted* page split (i.e.
P_INCOMPLETE_SPLIT() flag bit is set), but that isn't what happens
right now.

That said, the lossy downlink check that you want to improve on
*should* already catch this situation. Of course it might not because
it is lossy (uses a Bloom filter), but I think that that's very
unlikely. That's why I would like to understand the problem that you
found with the check.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Alexander Korotkov
In reply to this post by Peter Geoghegan-4
On Tue, Apr 16, 2019 at 10:00 PM Peter Geoghegan <[hidden email]> wrote:

>
> On Mon, Apr 15, 2019 at 7:30 PM Alexander Korotkov
> <[hidden email]> wrote:
> > Currently we amcheck supports lossy checking for missing parent
> > downlinks.  It collects bitmap of downlink hashes and use it to check
> > subsequent tree level.  We've experienced some large corrupted indexes
> > which pass this check due to its looseness.
>
> Can you be more specific? What was the cause of the corruption? I'm
> always very interested in hearing about cases that amcheck could have
> detected, but didn't.

AFAIR, the cause of corruption in this case was our (Postgres Pro)
modification.  Not something really present in PostgreSQL core.

>
> Was the issue that the Bloom filter was simply undersized/ineffective?

Yes, increasing of Bloom filter size also helps.  But my intention was
to make non-lossy check here.

>
> > However, it seems to me we can implement this check in non-lossy
> > manner without making it significantly slower.  We anyway traverse
> > downlinks from parent to children in order to verify that hikeys are
> > corresponding to downlink keys.
>
> Actually, we don't check the high keys in children against the parent
> (all other items are checked, though). We probably *should* do
> something with the high key when verifying consistency across levels,
> but currently we don't. (We only use the high key for the same-page
> high key check -- more on this below.)

Nice idea.

> > We can also traverse from one
> > downlink to subsequent using rightlinks.  So, if there are some
> > intermediate pages between them, they are candidates to have missing
> > parent downlinks.  The patch is attached.
> >
> > With this patch amcheck could successfully detect corruption for our
> > customer, which unpatched amcheck couldn't find.
>
> Maybe we can be a lot less conservative about sizing the Bloom filter
> instead? That would be an easier fix IMV -- we can probably change
> that logic to be a lot more aggressive without anybody noticing, since
> the Bloom filter is already usually tiny. We are already not very
> careful about saving work within bt_index_parent_check(), but with
> this patch we follow each downlink twice instead of once. That seems
> wasteful.

It wouldn't be really wasteful, because the same children were
accessed recently.  So, they are likely not yet evicted from
shared_buffers.  I think we can fit both checks into one loop over
downlinks instead of two.

> The reason I used a Bloom filter here is because I would eventually
> like the missing downlink check to fingerprint entire tuples, not just
> block numbers. In L&Y terms, the check could in the future fingerprint
> the separator key and the downlink at the same time, not just the
> downlink. That way, we could probe the Bloom filter on the next level
> down for its high key (with the right sibling pointer set to be
> consistent with the parent) iff we don't see that the page split was
> interrupted (i.e. iff P_INCOMPLETE_SPLIT() bit is not set). Obviously
> this would be a more effective form of verification, since we would
> notice high key values that don't agree with the parent's values for
> the same sibling/cousin/child block.
>
> I didn't do it that way for v11 because of "minus infinity" items on
> internal pages, which don't store the original key (the key remains
> the high key of the left sibling page, but we truncate the original to
> 0 attributes in _bt_pgaddtup()). I think that we should eventually
> stop truncating minus infinity items, and actually store a "low key"
> at P_FIRSTDATAKEY() within internal pages instead. That would be
> useful for other things anyway (e.g. prefix compression).

Yes, we can do more checks with bloom filter.  But assuming that we
anyway iterate over children for each non-leaf page, can we do:
1) All the checks, which bt_downlink_check() does for now
2) Check there are no missing downlinks
3) Check hikeys
in one pass, can't we?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Sat, Apr 27, 2019 at 5:13 PM Alexander Korotkov
<[hidden email]> wrote:
> Yes, increasing of Bloom filter size also helps.  But my intention was
> to make non-lossy check here.

I agree that that might be a good goal, but I am interested in knowing
if there is something naive about how the downlinkfilter Bloom filter
is sized. I think that we could probably do better than this without
much work:

            /*
             * Extra readonly downlink check.
             *
             * In readonly case, we know that there cannot be a concurrent
             * page split or a concurrent page deletion, which gives us the
             * opportunity to verify that every non-ignorable page had a
             * downlink one level up.  We must be tolerant of interrupted page
             * splits and page deletions, though.  This is taken care of in
             * bt_downlink_missing_check().
             */
            total_pages = (int64) state->rel->rd_rel->relpages;
            state->downlinkfilter = bloom_create(total_pages, work_mem, seed);

Maybe we could use "smgrnblocks(index->rd_smgr, MAIN_FORKNUM))"
instead of relpages, for example.

> It wouldn't be really wasteful, because the same children were
> accessed recently.  So, they are likely not yet evicted from
> shared_buffers.  I think we can fit both checks into one loop over
> downlinks instead of two.

I see your point, but if we're going to treat this as a bug then I
would prefer a simple fix.

> Yes, we can do more checks with bloom filter.  But assuming that we
> anyway iterate over children for each non-leaf page, can we do:
> 1) All the checks, which bt_downlink_check() does for now
> 2) Check there are no missing downlinks
> 3) Check hikeys
> in one pass, can't we?

We can expect every high key in a page to have a copy contained within
its parent, either as one of the keys, or as parent's own high key
(assuming no concurrent or interrupted page splits or page deletions).
This is true today, even though we truncate negative infinity items in
internal pages.

I think that the simple answer to your question is yes. It would be
more complicated that way, and the only extra check would be the check
of high keys against their parent, but overall this does seem
possible.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
In reply to this post by Alexander Korotkov
On Sat, Apr 27, 2019 at 5:13 PM Alexander Korotkov
<[hidden email]> wrote:
> Yes, increasing of Bloom filter size also helps.  But my intention was
> to make non-lossy check here.

Why is that your intention? Do you want to do this as a feature for
Postgres 13, or do you want to treat this as a bug that we need to
backpatch a fix for?

Can we avoid the problem you saw with the Bloom filter approach by
using the real size of the index (i.e.
smgrnblocks()/RelationGetNumberOfBlocks()) to size the Bloom filter,
and/or by rethinking the work_mem cap? Maybe we can have a WARNING
that advertises that work_mem is probably too low?

The  state->downlinkfilter Bloom filter should be small in almost all
cases, so I still don't fully understand your concern. With a 100GB
index, we'll have ~13 million blocks. We only need a Bloom filter that
is ~250MB to have less than a 1% chance of missing inconsistencies
even with such a large index. I admit that its unfriendly that users
are not warned about the shortage currently, but that is something we
can probably find a simple (backpatchable) fix for.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Alexander Korotkov
On Sun, Apr 28, 2019 at 4:36 AM Peter Geoghegan <[hidden email]> wrote:
> On Sat, Apr 27, 2019 at 5:13 PM Alexander Korotkov
> <[hidden email]> wrote:
> > Yes, increasing of Bloom filter size also helps.  But my intention was
> > to make non-lossy check here.
>
> Why is that your intention? Do you want to do this as a feature for
> Postgres 13, or do you want to treat this as a bug that we need to
> backpatch a fix for?

I think this definitely not bug fix.  Bloom filter was designed to be
lossy, no way blaming it for that :)

> Can we avoid the problem you saw with the Bloom filter approach by
> using the real size of the index (i.e.
> smgrnblocks()/RelationGetNumberOfBlocks()) to size the Bloom filter,
> and/or by rethinking the work_mem cap? Maybe we can have a WARNING
> that advertises that work_mem is probably too low?
>
> The  state->downlinkfilter Bloom filter should be small in almost all
> cases, so I still don't fully understand your concern. With a 100GB
> index, we'll have ~13 million blocks. We only need a Bloom filter that
> is ~250MB to have less than a 1% chance of missing inconsistencies
> even with such a large index. I admit that its unfriendly that users
> are not warned about the shortage currently, but that is something we
> can probably find a simple (backpatchable) fix for.

Sounds reasonable.  I'll think about proposing backpatch of something like this.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Sun, Apr 28, 2019 at 10:15 AM Alexander Korotkov
<[hidden email]> wrote:
> I think this definitely not bug fix.  Bloom filter was designed to be
> lossy, no way blaming it for that :)

I will think about a simple fix, but after the upcoming point release.
There is no hurry.


--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Thomas Munro-5
On Wed, May 1, 2019 at 12:58 PM Peter Geoghegan <[hidden email]> wrote:
> On Sun, Apr 28, 2019 at 10:15 AM Alexander Korotkov
> <[hidden email]> wrote:
> > I think this definitely not bug fix.  Bloom filter was designed to be
> > lossy, no way blaming it for that :)
>
> I will think about a simple fix, but after the upcoming point release.
> There is no hurry.

A bureaucratic question:  What should the status be for this CF entry?

--
Thomas Munro
https://enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Sun, Jul 7, 2019 at 7:53 PM Thomas Munro <[hidden email]> wrote:
> On Wed, May 1, 2019 at 12:58 PM Peter Geoghegan <[hidden email]> wrote:
> > I will think about a simple fix, but after the upcoming point release.
> > There is no hurry.
>
> A bureaucratic question:  What should the status be for this CF entry?

I have plans to use RelationGetNumberOfBlocks() to size amcheck's
downlink Bloom filter. I think that that will solve the problems with
unreliable estimates of index size. which seemed to be the basis of
Alexander's complaint.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
In reply to this post by Peter Geoghegan-4
On Tue, Apr 30, 2019 at 5:58 PM Peter Geoghegan <[hidden email]> wrote:
> I will think about a simple fix, but after the upcoming point release.
> There is no hurry.

Attached draft patch uses RelationGetNumberOfBlocks() to size each of
the two Bloom filters that may be used by amcheck to perform
verification.

The basic heapallindexed Bloom filter is now sized based on the
conservative assumption that there must be *at least*
"RelationGetNumberOfBlocks()  * 50" elements to fingerprint (reltuples
will continue to be used to size the basic heapallindexed Bloom filter
in most cases, though). The patch also uses the same
RelationGetNumberOfBlocks() value to size the downlink Bloom filter.
This second change will fix your problem very non-invasively.

I intend to backpatch this to v11 in the next few days.

--
Peter Geoghegan

0001-Set-minimum-amcheck-Bloom-filter-size.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Alexander Korotkov
On Fri, Jul 19, 2019 at 3:21 AM Peter Geoghegan <[hidden email]> wrote:

> On Tue, Apr 30, 2019 at 5:58 PM Peter Geoghegan <[hidden email]> wrote:
> > I will think about a simple fix, but after the upcoming point release.
> > There is no hurry.
>
> Attached draft patch uses RelationGetNumberOfBlocks() to size each of
> the two Bloom filters that may be used by amcheck to perform
> verification.
>
> The basic heapallindexed Bloom filter is now sized based on the
> conservative assumption that there must be *at least*
> "RelationGetNumberOfBlocks()  * 50" elements to fingerprint (reltuples
> will continue to be used to size the basic heapallindexed Bloom filter
> in most cases, though). The patch also uses the same
> RelationGetNumberOfBlocks() value to size the downlink Bloom filter.
> This second change will fix your problem very non-invasively.
>
> I intend to backpatch this to v11 in the next few days.
Thank you for backpatching this!

BTW, there is next revision of patch I'm proposing for v13.

In this revision check for missing downlinks is combined with
bt_downlink_check().  So, pages visited by bt_downlink_check() patch
doesn't cause extra accessed.  It only causes following additional
page accesses:
1) Downlinks corresponding to "negative infinity" keys,
2) Pages of child level, which are not referenced by downlinks.

But I think these two kinds are very minority, and those accesses
could be trade off with more precise missing downlink check without
bloom filter.  What do you think?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

amcheck-btree-improve-missing-parent-downlinks-check-2.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Peter Geoghegan-4
On Mon, Aug 12, 2019 at 12:01 PM Alexander Korotkov
<[hidden email]> wrote:
> BTW, there is next revision of patch I'm proposing for v13.

Cool.

> In this revision check for missing downlinks is combined with
> bt_downlink_check().  So, pages visited by bt_downlink_check() patch
> doesn't cause extra accessed.  It only causes following additional
> page accesses:
> 1) Downlinks corresponding to "negative infinity" keys,
> 2) Pages of child level, which are not referenced by downlinks.
>
> But I think these two kinds are very minority, and those accesses
> could be trade off with more precise missing downlink check without
> bloom filter.  What do you think?

I am generally in favor of making the downlink check absolutely
reliable, and am not too worried about the modest additional overhead.
After all, bt_index_parent_check() is supposed to be thorough though
expensive. The only reason that I used a Bloom filter for
fingerprinting downlink blocks was that it seemed important to quickly
get amcheck coverage for subtle multi-level page deletion bugs just
after v11 feature freeze. We can now come up with a better design for
that.

I was confused about how this patch worked at first. But then I
remembered that Lehman and Yao consider downlinks to be distinct
things to separator keys in internal pages. The high key of an
internal page in the final separator key, so you have n downlinks and
n + 1 separator keys per internal page -- two distinct things that
appear in alternating order (the negative infinity item is not
considered to have any separator key here). So, while internal page
items are explicitly "(downlink, separator)" pairs that are grouped
into a single tuple in nbtree, with a separate tuple just for the high
key, Lehman and Yao would find it just as natural to treat them as
"(separator, downlink)" pairs. You have to skip the first downlink on
each internal level to make that work, but this makes our
bt_downlink_check() check have something from the target page (child's
parent page) that is like the high key in the child.

It's already really confusing that we don't quite mean the same thing
as Lehman and Yao when we say downlink (See also my long "why a
highkey is never truly a copy of another item" comment block within
bt_target_page_check()), and that is not your patch's fault. But maybe
we need to fix that to make your patch easier to understand. (i.e.
maybe we need to go over every use of the word "downlink" in nbtree,
and make it say something more precise, to make everything less
confusing.)

Other feedback:

* Did you miss the opportunity to verify that every high key matches
its right sibling page's downlink tuple in parent page? We talked
about this already, but you don't seem to match the key (you only
match the downlink block).

* You are decoupling the new check from the bt_index_parent_check()
"heapallindexed" option. That's probably a good thing, but you need to
update the sgml docs.

* Didn't you forget to remove the BtreeCheckState.rightsplit flag?

* You've significantly changed the behavior of bt_downlink_check() --
I would note this in its header comments. This is where ~99% of the
new work happens.

* I don't like that you use the loaded term "target" here -- anything
called "target" should be BtreeCheckState.target:

>  static void
> -bt_downlink_missing_check(BtreeCheckState *state)
> +bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
> +                         BlockNumber targetblock, Page target)
>  {

If it's unclear what I mean, this old comment should make it clearer:

/*
 * State associated with verifying a B-Tree index
 *
 * target is the point of reference for a verification operation.
 *
 * Other B-Tree pages may be allocated, but those are always auxiliary (e.g.,
 * they are current target's child pages).  Conceptually, problems are only
 * ever found in the current target page (or for a particular heap tuple during
 * heapallindexed verification).  Each page found by verification's left/right,
 * top/bottom scan becomes the target exactly once.
 */

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Improve search for missing parent downlinks in amcheck

Alexander Korotkov
On Tue, Aug 13, 2019 at 11:44 PM Peter Geoghegan <[hidden email]> wrote:

> > In this revision check for missing downlinks is combined with
> > bt_downlink_check().  So, pages visited by bt_downlink_check() patch
> > doesn't cause extra accessed.  It only causes following additional
> > page accesses:
> > 1) Downlinks corresponding to "negative infinity" keys,
> > 2) Pages of child level, which are not referenced by downlinks.
> >
> > But I think these two kinds are very minority, and those accesses
> > could be trade off with more precise missing downlink check without
> > bloom filter.  What do you think?
>
> I am generally in favor of making the downlink check absolutely
> reliable, and am not too worried about the modest additional overhead.
> After all, bt_index_parent_check() is supposed to be thorough though
> expensive. The only reason that I used a Bloom filter for
> fingerprinting downlink blocks was that it seemed important to quickly
> get amcheck coverage for subtle multi-level page deletion bugs just
> after v11 feature freeze. We can now come up with a better design for
> that.
Great!

> I was confused about how this patch worked at first. But then I
> remembered that Lehman and Yao consider downlinks to be distinct
> things to separator keys in internal pages. The high key of an
> internal page in the final separator key, so you have n downlinks and
> n + 1 separator keys per internal page -- two distinct things that
> appear in alternating order (the negative infinity item is not
> considered to have any separator key here). So, while internal page
> items are explicitly "(downlink, separator)" pairs that are grouped
> into a single tuple in nbtree, with a separate tuple just for the high
> key, Lehman and Yao would find it just as natural to treat them as
> "(separator, downlink)" pairs. You have to skip the first downlink on
> each internal level to make that work, but this makes our
> bt_downlink_check() check have something from the target page (child's
> parent page) that is like the high key in the child.
>
> It's already really confusing that we don't quite mean the same thing
> as Lehman and Yao when we say downlink (See also my long "why a
> highkey is never truly a copy of another item" comment block within
> bt_target_page_check()), and that is not your patch's fault. But maybe
> we need to fix that to make your patch easier to understand. (i.e.
> maybe we need to go over every use of the word "downlink" in nbtree,
> and make it say something more precise, to make everything less
> confusing.)
I agree that current terms nbtree use to describe downlinks and
separator keys may be confusing.  I'll try to fix this and come up
with patch if succeed.

> Other feedback:
>
> * Did you miss the opportunity to verify that every high key matches
> its right sibling page's downlink tuple in parent page? We talked
> about this already, but you don't seem to match the key (you only
> match the downlink block).
>
> * You are decoupling the new check from the bt_index_parent_check()
> "heapallindexed" option. That's probably a good thing, but you need to
> update the sgml docs.
>
> * Didn't you forget to remove the BtreeCheckState.rightsplit flag?
>
> * You've significantly changed the behavior of bt_downlink_check() --
> I would note this in its header comments. This is where ~99% of the
> new work happens.
>
> * I don't like that you use the loaded term "target" here -- anything
> called "target" should be BtreeCheckState.target:
>
> >  static void
> > -bt_downlink_missing_check(BtreeCheckState *state)
> > +bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
> > +                         BlockNumber targetblock, Page target)
> >  {
>
> If it's unclear what I mean, this old comment should make it clearer:
>
> /*
>  * State associated with verifying a B-Tree index
>  *
>  * target is the point of reference for a verification operation.
>  *
>  * Other B-Tree pages may be allocated, but those are always auxiliary (e.g.,
>  * they are current target's child pages).  Conceptually, problems are only
>  * ever found in the current target page (or for a particular heap tuple during
>  * heapallindexed verification).  Each page found by verification's left/right,
>  * top/bottom scan becomes the target exactly once.
>  */
The revised patch seems to fix all of above.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

amcheck-btree-improve-missing-parent-downlinks-check-3.patch (31K) Download Attachment