amcheck verification for GiST

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

amcheck verification for GiST

Andrey Borodin-2
Hi, hackers!

Here's the patch with amcheck functionality for GiST.

It basically checks two invariants:
1. Every internal tuple need no adjustment by tuples of referenced page
2. Internal page reference or only leaf pages or only internal pages

We actually cannot check all balanced tree invariants due to concurrency reasons some concurrent splits will be visible as temporary balance violations.

Are there any other invariants that we can check?

I'd be happy to hear any thought about this.

Best regards, Andrey Borodin.

0001-GiST-verification-function-for-amcheck.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Thomas Munro-3
On Sun, Sep 23, 2018 at 10:15 PM Andrey Borodin <[hidden email]> wrote:
> Here's the patch with amcheck functionality for GiST.

Hi Andrey,

Windows doesn't like it[1]:

contrib/amcheck/verify_gist.c(163): error C2121: '#' : invalid
character : possibly the result of a macro expansion
[C:\projects\postgresql\amcheck.vcxproj]

That's:

+    MemoryContext mctx = AllocSetContextCreate(CurrentMemoryContext,
+ "amcheck context",
+#if PG_VERSION_NUM >= 110000
+ ALLOCSET_DEFAULT_SIZES);
+#else
+ ALLOCSET_DEFAULT_MINSIZE,
+ ALLOCSET_DEFAULT_INITSIZE,
+ ALLOCSET_DEFAULT_MAXSIZE);
+#endif

Not sure what's gong on there... perhaps it doesn't like you to do
that in the middle of a function-like-macro invocation
(AllocSetContextCreate)?

[1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.14056

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andres Freund
On 2018-09-24 15:29:38 +1200, Thomas Munro wrote:

> On Sun, Sep 23, 2018 at 10:15 PM Andrey Borodin <[hidden email]> wrote:
> > Here's the patch with amcheck functionality for GiST.
>
> Hi Andrey,
>
> Windows doesn't like it[1]:
>
> contrib/amcheck/verify_gist.c(163): error C2121: '#' : invalid
> character : possibly the result of a macro expansion
> [C:\projects\postgresql\amcheck.vcxproj]
>
> That's:
>
> +    MemoryContext mctx = AllocSetContextCreate(CurrentMemoryContext,
> + "amcheck context",
> +#if PG_VERSION_NUM >= 110000
> + ALLOCSET_DEFAULT_SIZES);
> +#else
> + ALLOCSET_DEFAULT_MINSIZE,
> + ALLOCSET_DEFAULT_INITSIZE,
> + ALLOCSET_DEFAULT_MAXSIZE);
> +#endif
>
> Not sure what's gong on there... perhaps it doesn't like you to do
> that in the middle of a function-like-macro invocation
> (AllocSetContextCreate)?

But note that the version dependent code shouldn't be present in
/contrib anyway.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2
In reply to this post by Thomas Munro-3
Hi!

> 24 сент. 2018 г., в 8:29, Thomas Munro <[hidden email]> написал(а):
>
> On Sun, Sep 23, 2018 at 10:15 PM Andrey Borodin <[hidden email]> wrote:
>> Here's the patch with amcheck functionality for GiST.
>
> Hi Andrey,
>
> Windows doesn't like it[1]:

Thanks, Thomas! Yes, I've missed that version-dependent macro. Surely, it's redundant.

Best regards, Andrey Borodin.

0001-GiST-verification-function-for-amcheck-v2.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Peter Geoghegan-4
On Sun, Sep 23, 2018 at 10:12 PM Andrey Borodin <[hidden email]> wrote:
> (0001-GiST-verification-function-for-amcheck-v2.patch)

Thanks for working on this. Some feedback:

* You do this:

> +/* Check of an internal page. Hold locks on two pages at a time (parent+child). */

This isn't consistent with what you do within verify_nbtree.c, which
deliberately avoids ever holding more than a single buffer lock at a
time, on general principle. That isn't necessarily a reason why you
have to do the same, but it's not clear why you do things that way.
Why isn't it enough to have a ShareLock on the relation? Maybe this is
a sign that it would be a good idea to always operate on a palloc()'d
copy of the page, by introducing something equivalent to
palloc_btree_page(). (That would also be an opportunity to do very
basic checks on every page.)

* You need to sprinkle a few CHECK_FOR_INTERRUPTS() calls around.
Certainly, there should be one at the top of the main loop.

* Maybe gist_index_check() should be called gist_index_parent_check(),
since it's rather like the existing verification function
bt_index_parent_check().

* Alternatively, you could find a way to make your function only need
an AccessShareLock -- that would make gist_index_check() an
appropriate name. That would probably require careful thought about
VACUUM.

* Why is it okay to do this?:

> +       if (GistTupleIsInvalid(idxtuple))
> +           ereport(LOG,
> +                   (errmsg("index \"%s\" contains an inner tuple marked as invalid",
> +                           RelationGetRelationName(rel)),
> +                    errdetail("This is caused by an incomplete page split at crash recovery before upgrading to PostgreSQL 9.1."),
> +                    errhint("Please REINDEX it.")));

You should probably mention the gistdoinsert() precedent for this.

* Can we check GIST_PAGE_ID somewhere? I try to be as paranoid as
possible, adding almost any check that I can think of, provided it
hasn't got very high overhead. Note that gistcheckpage() doesn't do
this for you.

* Should we be concerned about the memory used by pushStackIfSplited()?

* How about a cross-check between IndexTupleSize() and
ItemIdGetLength(), like the B-Tree code? It's a bit unfortunate that
we have this redundancy, which wastes space, but we do, so we might as
well get some small benefit from it.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2
Hi, Peter!

Thank you for the review!

> 7 дек. 2018 г., в 3:59, Peter Geoghegan <[hidden email]> написал(а):
>
> On Sun, Sep 23, 2018 at 10:12 PM Andrey Borodin <[hidden email]> wrote:
> * You do this:
>
>> +/* Check of an internal page. Hold locks on two pages at a time (parent+child). */
>
> This isn't consistent with what you do within verify_nbtree.c, which
> deliberately avoids ever holding more than a single buffer lock at a
> time, on general principle. That isn't necessarily a reason why you
> have to do the same, but it's not clear why you do things that way.
> Why isn't it enough to have a ShareLock on the relation? Maybe this is
> a sign that it would be a good idea to always operate on a palloc()'d
> copy of the page, by introducing something equivalent to
> palloc_btree_page(). (That would also be an opportunity to do very
> basic checks on every page.)
If we unlock parent page before checking child, some insert can adjust tuple on parent, sneak into child and insert new tuple.
This can trigger false positive. I'll think about it more.

> * You need to sprinkle a few CHECK_FOR_INTERRUPTS() calls around.
> Certainly, there should be one at the top of the main loop.
I've added check into main loop of the scan. All deeper paths hold buffer locks.

> * Maybe gist_index_check() should be called gist_index_parent_check(),
> since it's rather like the existing verification function
> bt_index_parent_check().
>
> * Alternatively, you could find a way to make your function only need
> an AccessShareLock -- that would make gist_index_check() an
> appropriate name. That would probably require careful thought about
> VACUUM.

I've changed lock level to AccessShareLock. IMV scan is just as safe as regular GiST index scan.
There is my patch with VACUUM on CF, it can add deleted pages. I'll update one of these two patches accordingly, if other is committed.

>
> * Why is it okay to do this?:
>
>> +       if (GistTupleIsInvalid(idxtuple))
>> +           ereport(LOG,
>> +                   (errmsg("index \"%s\" contains an inner tuple marked as invalid",
>> +                           RelationGetRelationName(rel)),
>> +                    errdetail("This is caused by an incomplete page split at crash recovery before upgrading to PostgreSQL 9.1."),
>> +                    errhint("Please REINDEX it.")));
>
> You should probably mention the gistdoinsert() precedent for this.
This invalid tuple will break inserts, but will not affect select. I do not know, should this be error or warning in amcheck?

> * Can we check GIST_PAGE_ID somewhere? I try to be as paranoid as
> possible, adding almost any check that I can think of, provided it
> hasn't got very high overhead. Note that gistcheckpage() doesn't do
> this for you.
Done. I think that gistcheckpage() should do this too, but I think we should avoid interventions into GiST mechanics here.
>
> * Should we be concerned about the memory used by pushStackIfSplited()?
Memory is pfree()`d as usual. Tree scan code is almost equivalent to VACUUM`s gistbulkdelete.
>
> * How about a cross-check between IndexTupleSize() and
> ItemIdGetLength(), like the B-Tree code? It's a bit unfortunate that
> we have this redundancy, which wastes space, but we do, so we might as
> well get some small benefit from it.
Done. I'm checking it MAXALIGNED, this rounding seems correct.


Please find attached v3.


Best regards, Andrey Borodin.


0001-GiST-verification-function-for-amcheck-v3.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Peter Geoghegan-4
On Tue, Jan 1, 2019 at 2:11 AM Andrey Borodin <[hidden email]> wrote:

> > This isn't consistent with what you do within verify_nbtree.c, which
> > deliberately avoids ever holding more than a single buffer lock at a
> > time, on general principle. That isn't necessarily a reason why you
> > have to do the same, but it's not clear why you do things that way.
> > Why isn't it enough to have a ShareLock on the relation? Maybe this is
> > a sign that it would be a good idea to always operate on a palloc()'d
> > copy of the page, by introducing something equivalent to
> > palloc_btree_page(). (That would also be an opportunity to do very
> > basic checks on every page.)
> If we unlock parent page before checking child, some insert can adjust tuple on parent, sneak into child and insert new tuple.
> This can trigger false positive. I'll think about it more.

I think that holding a buffer lock on an internal pages for as long as
it takes to check all of the child pages is a non-starter. If you
can't think of a way of not doing that that's race free with a
relation-level AccessShareLock, then a relation-level ShareLock (which
will block VACUUM) seems necessary.

I think that you should duplicate some of what's in
bt_index_check_internal() -- lock the heap table as well as the index,
to avoid deadlocks. We might not do this with contrib/pageinspect, but
that's not really intended to be used in production in the same way
this will be.

> > * Maybe gist_index_check() should be called gist_index_parent_check(),
> > since it's rather like the existing verification function
> > bt_index_parent_check().
> >
> > * Alternatively, you could find a way to make your function only need
> > an AccessShareLock -- that would make gist_index_check() an
> > appropriate name. That would probably require careful thought about
> > VACUUM.
>
> I've changed lock level to AccessShareLock. IMV scan is just as safe as regular GiST index scan.
> There is my patch with VACUUM on CF, it can add deleted pages. I'll update one of these two patches accordingly, if other is committed.

Maybe you should have renamed it to gist_index_parent_check() instead,
and kept the ShareLock. I don't think that this business with buffer
locks is acceptable. And it is mostly checking parent/child
relationships. Right?

You're not really able to check GiST tuples against anything other
than their parent, unlike with B-Tree (you can only do very simple
things, like the IndexTupleSize()/lp_len cross-check). Naming the
function gist_index_parent_check() seems like the right way to go,
even if you could get away with an AccessShareLock (which I now tend
to doubt). It was way too optimistic to suppose that there might be a
clever way of locking down race conditions that allowed you to not
couple buffer locks, but also use an AccessShareLock. After all, even
the B-Tree amcheck code doesn't manage to do this when verifying
parent/child relationships (it only does something clever with the
sibling page cross-check).

> > * Why is it okay to do this?:
> >
> >> +       if (GistTupleIsInvalid(idxtuple))
> >> +           ereport(LOG,
> >> +                   (errmsg("index \"%s\" contains an inner tuple marked as invalid",
> >> +                           RelationGetRelationName(rel)),
> >> +                    errdetail("This is caused by an incomplete page split at crash recovery before upgrading to PostgreSQL 9.1."),
> >> +                    errhint("Please REINDEX it.")));
> >
> > You should probably mention the gistdoinsert() precedent for this.
> This invalid tuple will break inserts, but will not affect select. I do not know, should this be error or warning in amcheck?

It should be an error. Breaking inserts is a serious problem.

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Michael Paquier-2
On Thu, Jan 31, 2019 at 03:58:48PM -0800, Peter Geoghegan wrote:
> I think that holding a buffer lock on an internal pages for as long as
> it takes to check all of the child pages is a non-starter. If you
> can't think of a way of not doing that that's race free with a
> relation-level AccessShareLock, then a relation-level ShareLock (which
> will block VACUUM) seems necessary.

(Please be careful to update the status of the patch in the CF
correctly!)
This review is recent, so I have moved the patch to next CF, waiting
for input from the author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2
In reply to this post by Peter Geoghegan-4
Hi Peter!

Sorry for the delay. Here's new version.

> 1 февр. 2019 г., в 4:58, Peter Geoghegan <[hidden email]> написал(а):
>
> On Tue, Jan 1, 2019 at 2:11 AM Andrey Borodin <[hidden email]> wrote:
>>> This isn't consistent with what you do within verify_nbtree.c, which
>>> deliberately avoids ever holding more than a single buffer lock at a
>>> time, on general principle. That isn't necessarily a reason why you
>>> have to do the same, but it's not clear why you do things that way.
>>> Why isn't it enough to have a ShareLock on the relation? Maybe this is
>>> a sign that it would be a good idea to always operate on a palloc()'d
>>> copy of the page, by introducing something equivalent to
>>> palloc_btree_page(). (That would also be an opportunity to do very
>>> basic checks on every page.)
>> If we unlock parent page before checking child, some insert can adjust tuple on parent, sneak into child and insert new tuple.
>> This can trigger false positive. I'll think about it more.
>
> I think that holding a buffer lock on an internal pages for as long as
> it takes to check all of the child pages is a non-starter. If you
> can't think of a way of not doing that that's race free with a
> relation-level AccessShareLock, then a relation-level ShareLock (which
> will block VACUUM) seems necessary.
>
> I think that you should duplicate some of what's in
> bt_index_check_internal() -- lock the heap table as well as the index,
> to avoid deadlocks. We might not do this with contrib/pageinspect, but
> that's not really intended to be used in production in the same way
> this will be.
I've extracted functions amcheck_lock_relation() and amcheck_unlock_relation().
With this patch a little bit complicated, but I do not think that code duplication will be OK here.
Instead of lock\unlock functions I can provide one-function-interface receiving index_checkable() and index_check() callbacks.

>
>>> * Maybe gist_index_check() should be called gist_index_parent_check(),
>>> since it's rather like the existing verification function
>>> bt_index_parent_check().
>>>
>>> * Alternatively, you could find a way to make your function only need
>>> an AccessShareLock -- that would make gist_index_check() an
>>> appropriate name. That would probably require careful thought about
>>> VACUUM.
>>
>> I've changed lock level to AccessShareLock. IMV scan is just as safe as regular GiST index scan.
>> There is my patch with VACUUM on CF, it can add deleted pages. I'll update one of these two patches accordingly, if other is committed.
>
> Maybe you should have renamed it to gist_index_parent_check() instead,
> and kept the ShareLock. I don't think that this business with buffer
> locks is acceptable. And it is mostly checking parent/child
> relationships. Right?
That's right. Semantically gist_index_parent_check() is correct name, let's use it.

>
>
> You're not really able to check GiST tuples against anything other
> than their parent, unlike with B-Tree (you can only do very simple
> things, like the IndexTupleSize()/lp_len cross-check). Naming the
> function gist_index_parent_check() seems like the right way to go,
> even if you could get away with an AccessShareLock (which I now tend
> to doubt). It was way too optimistic to suppose that there might be a
> clever way of locking down race conditions that allowed you to not
> couple buffer locks, but also use an AccessShareLock. After all, even
> the B-Tree amcheck code doesn't manage to do this when verifying
> parent/child relationships (it only does something clever with the
> sibling page cross-check).
That's true, we cannot avoid locking parent and child page simultaneously to check correctness of tuples.

Currently, we do not check index tuples against heap. Should we do this or leave this for another patch?


Thanks!

Best regards, Andrey Borodin.


0001-GiST-verification-function-for-amcheck-v4.patch (25K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Peter Geoghegan-4
On Sun, Feb 17, 2019 at 12:55 AM Andrey Borodin <[hidden email]> wrote:
> That's true, we cannot avoid locking parent and child page simultaneously to check correctness of tuples.

Right.

Some further questions/comments:

* I think that this should be an error:

> +       if (GistPageIsLeaf(page))
> +       {
> +           /* should never happen unless it is root */
> +           Assert(stack->blkno == GIST_ROOT_BLKNO);
> +       }

I use assertions in the verify_nbtree.c, but only for things that are
programming errors, and state that amcheck "owns". If they fail, then
it's a bug in amcheck specifically (they're really obvious assertions
about local state). Whereas this case could in theory happen with a
corrupt index, for example due to a page written in the wrong place.
I'm sure that the root block looks very similar to a leaf, so we won't
detect this any other way.

It's good to be paranoid, and to even think adversarially, provided it
doesn't make the design more difficult and has low runtime overhead.
We could debate whether or not this corruption is realistic, but it's
easy to make it an error and the cost is low, so you should just do
it.

* Couldn't leaf-like root pages also get some of the testing that we
do for other pages within check_index_tuple()? Ideally it wouldn't be
too much of a special case, though.

* I think that you need to add an
errcode(ERRCODE_FEATURE_NOT_SUPPORTED) to this:

> +   /*
> +    * Check that it's not a leftover invalid tuple from pre-9.1
> +    * See also gistdoinsert() and gistbulkdelete() handlding of such tuples.
> +    * We do not consider it error here, but warn operator.
> +    */
> +   if (GistTupleIsInvalid(idxtuple))
> +       ereport(ERROR,
> +               (errmsg("index \"%s\" contains an inner tuple marked as invalid",
> +                       RelationGetRelationName(rel)),
> +                errdetail("This is caused by an incomplete page split at crash recovery before upgrading to PostgreSQL 9.1."),
> +                errhint("Please REINDEX it.")));

> Currently, we do not check index tuples against heap. Should we do this or leave this for another patch?

To address this question: I would leave this for now. It can probably
work based on the same principle as nbtree's heapallindexed
verification, with few or no changes, but I don't think that we need
to make that happen in this release.

* You still hold multiple buffer locks at once, starting with the
parent and moving to the child. Only 2 buffer locks. Why is this
necessary, given that you now hold a ShareLock on both heap and index
relations? Couldn't you just copy the parent into local memory, in the
style of verify_nbtree.c?

* Note that this "parent then child" lock order seems to not be
consistent with the general rule for holding concurrent buffer locks
that is described in the GiST README:

"""
Concurrency control
-------------------
As a rule of thumb, if you need to hold a lock on multiple pages at the
same time, the locks should be acquired in the following order: child page
before parent, and left-to-right at the same level. Always acquiring the
locks in the same order avoids deadlocks.
"""

* The main point of entry for GiST verification is
gist_check_parent_keys_consistency(). It would be nice to have
comments that describe what it does, and in what order. I understand
that English is not your first language, which makes it harder, but I
would appreciate it if you made the effort to explain the theory. I
don't want to assume that I understand your intent -- I could be
wrong.

* Suggest that you use function prototypes consistently, even for
static functions. That is the style that we prefer. Also, please make
the comments consistent with project guidelines on indentation and
style.

* It would be nice to know if the code in
gist_check_parent_keys_consistency() finds problems in the parent, or
in the child. The nbtree check has an idea of a "target" page: every
page gets to be the target exactly once, and we only ever find
problems in the current target. Maybe that is arbitrary in some cases,
because the relationship between the two is where the problem actually
is. I still think that it is a good idea to make it work that way if
possible. It makes it easier to describe complicated relationships in
comments.

* Why does it make sense to use range_gist_picksplit()/operator class
"union" support function to verify parent/child relationships? Does it
handle NULL values correctly, given the special rules?

* Why not use the "consistent" support function instead of the "union"
support function? Could you write new code that was based on
gistindex_keytest() to do this?

* The GiST docs ("63.3. Extensibility") say of the "distance" support
function: "For an internal tree node, the distance returned must not
be greater than the distance to any of the child nodes". Is there an
opportunity to test this relationship, too, by making sure that
distance is sane? Perhaps this is a very naive question -- I am not a
GiST expert.

* Is it right that gist_check_internal_page() should both return a
value that says "this internal page has child pages that are
themselves internal", while testing the child pages?

* Do we need to worry about F_DELETED pages? Why or why not?

* Do we need to worry about F_HAS_GARBAGE pages? Why or why not?

--
Peter Geoghegan

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2
Hi!


Thanks for this detailed review!

>
> * Note that this "parent then child" lock order seems to not be
> consistent with the general rule for holding concurrent buffer locks
> that is described in the GiST README:

This is correct. I've changed locking order.
When we check target internal page, we make a palloc'ed copy and unlock it (but hold the pin).
If we find discrepancy between parent and child keys we lock parent page again.
Then look for correct downlink. And check keys again.
In case when discrepancy still present we report an error.
Otherwise drop parent lock.

>
> * I think that this should be an error:
>
>> +       if (GistPageIsLeaf(page))
>> +       {
>> +           /* should never happen unless it is root */
>> +           Assert(stack->blkno == GIST_ROOT_BLKNO);
>> +       }
>
Done. If this happens, this looks like a programming error or page header flags were corrupted
concurrently. Let's just report.

> * Couldn't leaf-like root pages also get some of the testing that we
> do for other pages within check_index_tuple()? Ideally it wouldn't be
> too much of a special case, though.
Done.

> * I think that you need to add an
> errcode(ERRCODE_FEATURE_NOT_SUPPORTED) to this:

Done.

> * You still hold multiple buffer locks at once, starting with the
> parent and moving to the child. Only 2 buffer locks. Why is this
> necessary, given that you now hold a ShareLock on both heap and index
> relations? Couldn't you just copy the parent into local memory, in the
> style of verify_nbtree.c?
Done.

> * The main point of entry for GiST verification is
> gist_check_parent_keys_consistency(). It would be nice to have
> comments that describe what it does, and in what order. I understand
> that English is not your first language, which makes it harder, but I
> would appreciate it if you made the effort to explain the theory. I
> don't want to assume that I understand your intent -- I could be
> wrong.

I've added about 10 lines of comments.

>
> * Suggest that you use function prototypes consistently, even for
> static functions. That is the style that we prefer. Also, please make
> the comments consistent with project guidelines on indentation and
> style.
Done.

>
> * It would be nice to know if the code in
> gist_check_parent_keys_consistency() finds problems in the parent, or
> in the child. The nbtree check has an idea of a "target" page: every
> page gets to be the target exactly once, and we only ever find
> problems in the current target. Maybe that is arbitrary in some cases,
> because the relationship between the two is where the problem actually
> is. I still think that it is a good idea to make it work that way if
> possible. It makes it easier to describe complicated relationships in
> comments.
the "target" for gist_check_parent_keys_consistency() is tuples on internal
page and their relation with tuples on referenced page. All other checks
are just additional checks, while I expect that this parent-child relationship
may contain some kind of bug: we had observed that GiST could not find some
tuples after CREATE INDEX CONCURRENTLY, but could not reliably reproduce the
problem before it was gone. That's why I've started this work.

> * Why does it make sense to use range_gist_picksplit()/operator class
> "union" support function to verify parent/child relationships? Does it
> handle NULL values correctly, given the special rules?
Yes, union handles NULLs. I do not use range_gist_picksplit().
By definition parent tuple is union of all child tuples.

> * Why not use the "consistent" support function instead of the "union"
> support function? Could you write new code that was based on
> gistindex_keytest() to do this?
Initially, I've used the consistency function. But it answers the question
"Does the downlinked key-space overlap with query". And query may be "not a
given key". Consistency function is controlled by search strategy. Different
operator classes support different set of strategies. But every opclass
should support union to build a GiST. So, I've used "union" instead of
"consistency".

>
> * The GiST docs ("63.3. Extensibility") say of the "distance" support
> function: "For an internal tree node, the distance returned must not
> be greater than the distance to any of the child nodes". Is there an
> opportunity to test this relationship, too, by making sure that
> distance is sane? Perhaps this is a very naive question -- I am not a
> GiST expert.
If parent tuple is correctly adjusted by child tuples, distance between
them must be 0. With this check We will test opclass, not an index structure.

>
> * Is it right that gist_check_internal_page() should both return a
> value that says "this internal page has child pages that are
> themselves internal", while testing the child pages?
Yes.

>
> * Do we need to worry about F_DELETED pages? Why or why not?
Currently, GiST scan does not check for F_DELETED: there simply is no code
to delete a page in GiST (except my patch on commitfest).
I suspect that there may be deleted pages from previous versions.
But encountering this in a search should not be a problem. Thus, I copy
behavior from index scan: do not complain about deleted pages.


> * Do we need to worry about F_HAS_GARBAGE pages? Why or why not?
This flag is for microvacuum and only hints that there may be some vacuumable
tuples on page. It is used during check for neccesary split to allocate some space.

Thanks!

Best regards, Andrey Borodin.

0001-GiST-verification-function-for-amcheck-v5.patch (29K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Heikki Linnakangas
There's a little copy-pasto in gist_check_page_keys():

> + for (o = FirstOffsetNumber; o <= parent_maxoff; o = OffsetNumberNext(i))

Should be "OffsetNumberNext(o)".

I tested this patch with your testing patch from the other thread (after
fixing the above), to leave behind incompletely split pages [1]. It
seems that the amcheck code doesn't expect incomplete splits:

postgres=# SELECT gist_index_parent_check('x_c_idx');
ERROR:  index "x_c_idx" has inconsistent records

[1]
https://www.postgresql.org/message-id/EB87A69B-EE5E-4259-9EEB-DA9DC1F7E265%40yandex-team.ru

- Heikki

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Heikki Linnakangas
On 04/03/2019 17:53, Heikki Linnakangas wrote:
> I tested this patch with your testing patch from the other thread (after
> fixing the above), to leave behind incompletely split pages [1]. It
> seems that the amcheck code doesn't expect incomplete splits:
>
> postgres=# SELECT gist_index_parent_check('x_c_idx');
> ERROR:  index "x_c_idx" has inconsistent records

On closer look, I think that was because that testing patch to leave
behind incomplete splits really did corrupt the index. It always
inserted the downlink to the parent, but randomly skipped clearing the
FOLLOW_RIGHT flag and updating the NSN in the child. That's not a valid
combination. To test incomplete splits, you need to skip inserting the
downlink to the parent, too.

- Heikki

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2
Hi!

Here's new version of GiST amcheck, which takes into account recently committed GiST VACUUM.
It tests that deleted pages do not contain any data.

Also, Heikki's fix applied (wrong OffsetNumberNext(i) replaced by OffsetNumberNext(o)).

Thanks!

Best regards, Andrey Borodin.

0001-GiST-verification-function-for-amcheck-v6.patch (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Heikki Linnakangas
On 27/03/2019 11:51, Andrey Borodin wrote:
> Hi!
>
> Here's new version of GiST amcheck, which takes into account recently committed GiST VACUUM.
> It tests that deleted pages do not contain any data.

Thanks! I had a look, and refactored it quite a bit.

I found the way the recursion worked confusing. On each internal page,
it looped through all the child nodes, checking the consistency of the
downlinks. And then it looped through the children again, to recurse.
This isn't performance-critical, but visiting every page twice still
seems strange.

In gist_check_page_keys(), if we get into the code to deal with a
concurrent update, we set 'parent' to point to a tuple on a parent page,
then unlock it, and continue to look at remaining tuples, using the
pointer that points to an unlocked buffer.

I came up with the attached, which fixes the above-mentioned things. I
also replaced the check that each node has only internal or leaf
children, with a different check that the tree has the same height in
all branches. That catches more potential problems, and was easier to
implement after the refactoring. This still needs at least a round of
fixing typos and tidying up comments, but it's more straightforward now,
IMHO.

What have you been using to test this?

- Heikki

amcheck-gist-v6-heikki.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Peter Geoghegan-4
On Wed, Mar 27, 2019 at 10:29 AM Heikki Linnakangas <[hidden email]> wrote:
> Thanks! I had a look, and refactored it quite a bit.

I'm really happy that other people seem to be driving amcheck in a new
direction, without any prompting from me. It's too important to remain
something that only I have contributed to.

> I found the way the recursion worked confusing. On each internal page,
> it looped through all the child nodes, checking the consistency of the
> downlinks. And then it looped through the children again, to recurse.
> This isn't performance-critical, but visiting every page twice still
> seems strange.

To be fair, that's actually what bt_index_parent_check() does. OTOH,
it has to work in a way that makes it an extension of
bt_index_check(), which is not a problem for
gist_index_parent_check().

> In gist_check_page_keys(), if we get into the code to deal with a
> concurrent update, we set 'parent' to point to a tuple on a parent page,
> then unlock it, and continue to look at remaining tuples, using the
> pointer that points to an unlocked buffer.

Why not just copy the page into a local buffer? See my remarks on this below.

> I came up with the attached, which fixes the above-mentioned things. I
> also replaced the check that each node has only internal or leaf
> children, with a different check that the tree has the same height in
> all branches. That catches more potential problems, and was easier to
> implement after the refactoring. This still needs at least a round of
> fixing typos and tidying up comments, but it's more straightforward now,
> IMHO.

You probably didn't mean to leave this "boom" error behind:

> +   if (GistPageIsDeleted(page))
> +   {
> +       elog(ERROR,"boom");

I see that you have this check for deleted pages:

> +       if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
> +           ereport(ERROR,
> +                   (errcode(ERRCODE_INDEX_CORRUPTED),
> +                   errmsg("index \"%s\" has deleted page with tuples",
> +                           RelationGetRelationName(rel))));
> +   }

Why not have a similar check for non-deleted pages, whose maxoffset
must be <= MaxIndexTuplesPerPage?

I see various errors like this one:

> +           if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_INDEX_CORRUPTED),
> +                        errmsg("index \"%s\" has inconsistent tuple sizes",
> +                               RelationGetRelationName(rel))));

Can't we say which TID is involved here, so we can find the offending
corrupt tuple afterwards? Or at least the block number? And maybe even
the LSN of the page? I think that that kind of stuff could be added in
a number of places.

I see this stuff that's related to concurrent processes:

> +       /*
> +        * It's possible that the page was split since we looked at the parent,
> +        * so that we didn't missed the downlink of the right sibling when we
> +        * scanned the parent. If so, add the right sibling to the stack now.
> +        */

> +               /*
> +                * There was a  discrepancy between parent and child tuples.
> +                * We need to verify it is not a result of concurrent call
> +                * of gistplacetopage(). So, lock parent and try to find downlink
> +                * for current page. It may be missing due to concurrent page
> +                * split, this is OK.
> +                */

Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?

> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);

If the gistplacetopage() stuff is truly necessary, then is it okay to
call gist_refind_parent() with the original buffer lock still held
like this?

I still suspect that we should have something like palloc_btree_page()
for GiST, so that we're always operating on a copy of the page in
palloc()'d memory.  Maybe it's worthwhile to do something clever with
concurrently holding buffer locks, but if that's what we're doing here
then I would also expect us to have something weaker than ShareLock as
our relation-level heavyweight lock. And, there should be a prominent
explanation of the theory behind it somewhere.

> What have you been using to test this?

pg_hexedit has full support for GiST.     ;-)

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2
In reply to this post by Heikki Linnakangas
Thanks for looking into this!

> 27 марта 2019 г., в 22:29, Heikki Linnakangas <[hidden email]> написал(а):
>
> On 27/03/2019 11:51, Andrey Borodin wrote:
>> Hi!
>> Here's new version of GiST amcheck, which takes into account recently committed GiST VACUUM.
>> It tests that deleted pages do not contain any data.
>
> Thanks! I had a look, and refactored it quite a bit.
Cool! New scan logic is much easier to read.

> I found the way the recursion worked confusing. On each internal page, it looped through all the child nodes, checking the consistency of the downlinks. And then it looped through the children again, to recurse. This isn't performance-critical, but visiting every page twice still seems strange.
>
> In gist_check_page_keys(), if we get into the code to deal with a concurrent update, we set 'parent' to point to a tuple on a parent page, then unlock it, and continue to look at remaining tuples, using the pointer that points to an unlocked buffer.
Uh, that was a tricky bug.

> I came up with the attached, which fixes the above-mentioned things. I also replaced the check that each node has only internal or leaf children, with a different check that the tree has the same height in all branches. That catches more potential problems, and was easier to implement after the refactoring. This still needs at least a round of fixing typos and tidying up comments, but it's more straightforward now, IMHO.
>
> What have you been using to test this?
Please see attached patch with line
//if (false) // THIS LINE IS INTENTIONALLY BROKEN
which breaks GiST consistency. Uncomment it to create logically broken GiST.

Also, I've fixed buffer release and small typo (childblkno->parentblkno).

To test that it does not deadlock with inserts and vacuums I use pgbench the same way is it is used here [0] but also with
SELECT gist_index_parent_check('gist_check_idx');

Also I've added NOTICE when parent is not refound. To test this, I was removing adjust here
if (stack->parenttup && gistgetadjusted(rel, stack->parenttup, idxtuple, state))

> XXX: shouldn't we rather use gist_consistent?
No, consistency test must use scan strategy, which can be absent from opclass.
Parent tuples must be adjusted with every child. We check this simple invariant, it should cover all corner cases.


> 28 марта 2019 г., в 4:57, Peter Geoghegan <[hidden email]> написал(а):
>
>
>> In gist_check_page_keys(), if we get into the code to deal with a
>> concurrent update, we set 'parent' to point to a tuple on a parent page,
>> then unlock it, and continue to look at remaining tuples, using the
>> pointer that points to an unlocked buffer.
>
> Why not just copy the page into a local buffer? See my remarks on this below.
It already was copied, there was a bug of reusing copy from released buffer. In case, when we suspected error, which outed to be not error.

>
> You probably didn't mean to leave this "boom" error behind:
>
>> +   if (GistPageIsDeleted(page))
>> +   {
>> +       elog(ERROR,"boom");
Oops. Sorry.

> I see that you have this check for deleted pages:
>
>> +       if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
>> +           ereport(ERROR,
>> +                   (errcode(ERRCODE_INDEX_CORRUPTED),
>> +                   errmsg("index \"%s\" has deleted page with tuples",
>> +                           RelationGetRelationName(rel))));
>> +   }
>
> Why not have a similar check for non-deleted pages, whose maxoffset
> must be <= MaxIndexTuplesPerPage?
Done.

>
> I see various errors like this one:
>
>> +           if (MAXALIGN(ItemIdGetLength(iid)) != MAXALIGN(IndexTupleSize(idxtuple)))
>> +               ereport(ERROR,
>> +                       (errcode(ERRCODE_INDEX_CORRUPTED),
>> +                        errmsg("index \"%s\" has inconsistent tuple sizes",
>> +                               RelationGetRelationName(rel))));
>
> Can't we say which TID is involved here, so we can find the offending
> corrupt tuple afterwards? Or at least the block number? And maybe even
> the LSN of the page? I think that that kind of stuff could be added in
> a number of places.
I've added block number and offset whenever known. I do not understand point of LSN here...

>
> I see this stuff that's related to concurrent processes:
>
>> +       /*
>> +        * It's possible that the page was split since we looked at the parent,
>> +        * so that we didn't missed the downlink of the right sibling when we
>> +        * scanned the parent. If so, add the right sibling to the stack now.
>> +        */
>
>> +               /*
>> +                * There was a  discrepancy between parent and child tuples.
>> +                * We need to verify it is not a result of concurrent call
>> +                * of gistplacetopage(). So, lock parent and try to find downlink
>> +                * for current page. It may be missing due to concurrent page
>> +                * split, this is OK.
>> +                */
>
> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
There may be concurrent inserts? In GiST they can reorder items on page.

>
>> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);
>
> If the gistplacetopage() stuff is truly necessary, then is it okay to
> call gist_refind_parent() with the original buffer lock still held
> like this?
When we call gist_refind_parent() we hold lock for a child and lock parent.
We exclude concurrent VACUUM, thus parent cannot become a child for current child, because it has to be recycled for such coincidence.

>
> I still suspect that we should have something like palloc_btree_page()
> for GiST, so that we're always operating on a copy of the page in
> palloc()'d memory.
That's actually is what we are doing now. GistScanItem->parenttup is always a copy. But we are doing more small copies of each individual tuple, because we have to refresh this copies sometimes.

>  Maybe it's worthwhile to do something clever with
> concurrently holding buffer locks, but if that's what we're doing here
> then I would also expect us to have something weaker than ShareLock as
> our relation-level heavyweight lock. And, there should be a prominent
> explanation of the theory behind it somewhere.
We definitely can run under SharedLock. And I will try to compose up all things that prevent us from using weaker levels in next message...

>
>> What have you been using to test this?
>
> pg_hexedit has full support for GiST.     ;-)
For me it is easier to break GiST in it's code for tests :)

Thanks!

Best regards, Andrey Borodin.

[0]https://www.postgresql.org/message-id/flat/96ec7ebd-42b9-4df5-18a4-42181c8a5a41%40iki.fi#be331694fd0c8f2a575b3e51a4306e72


0001-GiST-verification-function-for-amcheck-v7.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2


> 28 марта 2019 г., в 18:35, Andrey Borodin <[hidden email]> написал(а):
>>
>> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
> There may be concurrent inserts? In GiST they can reorder items on page.

Looks like I've tried to cope with same problem twice:
v3 of the patch used AccessShareLock and many locks with incorrect order.
We could use one of possible solutions: either use ShareLock, or rewrite scan to correct locking order.
But patches v4-v7 use both.
I think we should use AccessShareLock, as long as we implemented tricky logic with gist_refind_parent().


>>> +               stack->parenttup = gist_refind_parent(rel, stack->parentblk, stack->blkno, strategy);
>>
>> If the gistplacetopage() stuff is truly necessary, then is it okay to
>> call gist_refind_parent() with the original buffer lock still held
>> like this?
> When we call gist_refind_parent() we hold lock for a child and lock parent.
> We exclude concurrent VACUUM, thus parent cannot become a child for current child, because it has to be recycled for such coincidence.
That's merely hard form of paranoia, internal pages are never deleted. gist_index_parent_check() would work just fine with concurrent VACUUM, INSERTs and SELECTs.

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Peter Geoghegan-4
On Thu, Mar 28, 2019 at 10:08 AM Andrey Borodin <[hidden email]> wrote:
> >> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
> > There may be concurrent inserts? In GiST they can reorder items on page.
>
> Looks like I've tried to cope with same problem twice:
> v3 of the patch used AccessShareLock and many locks with incorrect order.
> We could use one of possible solutions: either use ShareLock, or rewrite scan to correct locking order.
> But patches v4-v7 use both.

It definitely has to be one or the other. The combination makes no sense.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: amcheck verification for GiST

Andrey Borodin-2


> 29 марта 2019 г., в 5:35, Peter Geoghegan <[hidden email]> написал(а):
>
> On Thu, Mar 28, 2019 at 10:08 AM Andrey Borodin <[hidden email]> wrote:
>>>> Is this really needed? Isn't the ShareLock on the index sufficient? If so, why?
>>> There may be concurrent inserts? In GiST they can reorder items on page.
>>
>> Looks like I've tried to cope with same problem twice:
>> v3 of the patch used AccessShareLock and many locks with incorrect order.
>> We could use one of possible solutions: either use ShareLock, or rewrite scan to correct locking order.
>> But patches v4-v7 use both.
>
> It definitely has to be one or the other. The combination makes no sense.
Here's updated patch with AccessShareLock.
I've tried to stress this with combination of random inserts, vaccuums and checks. This process neither failed, nor deadlocked.
The patch intentionally contains one superflous line to make gist logically broken. This triggers regression test of amcheck.


Best regards, Andrey Borodin.

0001-GiST-verification-function-for-amcheck-v8.patch (30K) Download Attachment
12