Do not check unlogged indexes on standby

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Do not check unlogged indexes on standby

Andrey Borodin-2
Hi hackers!

Currently, if we check indexes on standby we often get

man-psbpshn0skhsxynd/xiva_xtable_testing_01 R # select bt_index_check('xiva_loadtest.pk_uid');
ERROR:  58P01: could not open file "base/16453/125407": No such file or directory

I think that we should print warning and that's it. Amcheck should not give false positives.

Or, maybe, there are some design considerations that I miss?


BTW I really want to enable rightlink-leftlink invariant validation on standby..

Thanks!

Best regards, Andrey Borodin.


0001-Do-not-verify-unlogged-indexes-during-amcheck.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Do not check unlogged indexes on standby

Peter Geoghegan-4
On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <[hidden email]> wrote:
> Currently, if we check indexes on standby we often get
>
> man-psbpshn0skhsxynd/xiva_xtable_testing_01 R # select bt_index_check('xiva_loadtest.pk_uid');
> ERROR:  58P01: could not open file "base/16453/125407": No such file or directory
>
> I think that we should print warning and that's it. Amcheck should not give false positives.

I agree -- amcheck should just skip over unlogged tables during
recovery, since there is simply nothing to check.

I pushed your patch to all branches that have amcheck just now, so now
we skip over unlogged relations when in recovery, though I made some
revisions.

Your patch didn't handle temp tables/indexes that were created in the
first session correctly -- we must be careful about the distinction
between unlogged tables, and tables that don't require WAL logging
(the later includes temp tables). Also, I thought that it was a good
idea to actively test for the presence of a main fork when we don't
skip (i.e. when the system isn't in recovery and the B-Tree indexes
isn't unlogged) -- we now give a clean report of corruption when that
happens, rather than letting an ambiguous "can't happen" error get
raised by low-level code. This might be possible with system catalog
corruption, for example. Finally, I thought that the WARNING was a bit
strong -- a NOTICE is more appropriate.

Thanks!

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Do not check unlogged indexes on standby

Peter Geoghegan-4
In reply to this post by Andrey Borodin-2
On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <[hidden email]> wrote:
> BTW I really want to enable rightlink-leftlink invariant validation on standby..

That seems very hard. My hope was that bt_check_index() can detect the
same problem a different way. The bt_right_page_check_scankey()
cross-page check (which needs nothing more than an AccessShareLock)
will often detect such problems, because the page image itself will be
totally wrong in some way.

I'm guessing that you have direct experience with that *not* being
good enough, though. Can you share further details? I suppose that
bt_right_page_check_scankey() helps with transposed pages, but doesn't
help so much when you have WAL-level inconsistencies.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Do not check unlogged indexes on standby

Andrey Borodin-2


> 13 авг. 2019 г., в 3:23, Peter Geoghegan <[hidden email]> написал(а):
>
> I pushed your patch to all branches that have amcheck just now, so now
> we skip over unlogged relations when in recovery, though I made some
> revisions.
Oh, cool, thanks!

> Your patch didn't handle temp tables/indexes that were created in the
> first session correctly -- we must be careful about the distinction
> between unlogged tables, and tables that don't require WAL logging
> (the later includes temp tables). Also, I thought that it was a good
> idea to actively test for the presence of a main fork when we don't
> skip (i.e. when the system isn't in recovery and the B-Tree indexes
> isn't unlogged) -- we now give a clean report of corruption when that
> happens, rather than letting an ambiguous "can't happen" error get
> raised by low-level code. This might be possible with system catalog
> corruption, for example. Finally, I thought that the WARNING was a bit
> strong -- a NOTICE is more appropriate.
+1

> 13 авг. 2019 г., в 3:36, Peter Geoghegan <[hidden email]> написал(а):
>
> On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <[hidden email]> wrote:
>> BTW I really want to enable rightlink-leftlink invariant validation on standby..
>
> That seems very hard. My hope was that bt_check_index() can detect the
> same problem a different way. The bt_right_page_check_scankey()
> cross-page check (which needs nothing more than an AccessShareLock)
> will often detect such problems, because the page image itself will be
> totally wrong in some way.
>
> I'm guessing that you have direct experience with that *not* being
> good enough, though. Can you share further details? I suppose that
> bt_right_page_check_scankey() helps with transposed pages, but doesn't
> help so much when you have WAL-level inconsistencies.

We have a bunch of internal testing HA clusters that suffered from corruption conditions.
We fixed everything that can be detected with parent-check on primaries or usual check on standbys.
(page updates were lost both on primary and during WAL replay)
But from time to time when clusters switch primary from one availability zone to another we observe
"right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index"

We are going to search for these clusters with this [0] tolerating possible fraction of false positives, we have them anyway.
But I think I could put some effort into making corruption-detection tooling better.
I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck.


Best regards, Andrey Borodin.

[0] https://github.com/x4m/amcheck/commit/894d8bafb3c9a26bbc168ea5f4f33bcd1fc9f495

Reply | Threaded
Open this post in threaded view
|

Re: Do not check unlogged indexes on standby

Peter Geoghegan-4
On Tue, Aug 13, 2019 at 5:17 AM Andrey Borodin <[hidden email]> wrote:
> We have a bunch of internal testing HA clusters that suffered from corruption conditions.
> We fixed everything that can be detected with parent-check on primaries or usual check on standbys.
> (page updates were lost both on primary and during WAL replay)
> But from time to time when clusters switch primary from one availability zone to another we observe
> "right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index"

That sounds like an issue caused by a failure to replay all available
WAL, where only one page happened to get written out by a checkpoint
before a crash. It's something like that. That wouldn't be caught by
the cross-page bt_index_check() check that we do already.

> We are going to search for these clusters with this [0] tolerating possible fraction of false positives, we have them anyway.
> But I think I could put some effort into making corruption-detection tooling better.
> I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck.

That's one possibility. When I first designed amcheck it was important
to be conservative, so I invented a general rule about never acquiring
multiple buffer locks at once. I still think that that was the correct
decision for the bt_downlink_check() check (the main extra
bt_index_parent_check() check), but I think that you're right about
retrying to verify the sibling links when bt_index_check() is called
from SQL.

nbtree will often "couple" buffer locks on the leaf level; it will
acquire a lock on a leaf page, and not release that lock until it has
also acquired a lock on the right sibling page (I'm mostly thinking of
_bt_stepright()). I am in favor of a patch that makes amcheck perform
sibling link verification within bt_index_check(), by retrying while
pessimistically coupling buffer locks. (Though I think that that
should just happen on the leaf level. We should not try to be too
clever about ignorable/half-dead/deleted pages, to be conservative.)

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: Do not check unlogged indexes on standby

Andrey Borodin-2


> 13 авг. 2019 г., в 20:30, Peter Geoghegan <[hidden email]> написал(а):
>
> That's one possibility. When I first designed amcheck it was important
> to be conservative, so I invented a general rule about never acquiring
> multiple buffer locks at once. I still think that that was the correct
> decision for the bt_downlink_check() check (the main extra
> bt_index_parent_check() check), but I think that you're right about
> retrying to verify the sibling links when bt_index_check() is called
> from SQL.
>
> nbtree will often "couple" buffer locks on the leaf level; it will
> acquire a lock on a leaf page, and not release that lock until it has
> also acquired a lock on the right sibling page (I'm mostly thinking of
> _bt_stepright()). I am in favor of a patch that makes amcheck perform
> sibling link verification within bt_index_check(), by retrying while
> pessimistically coupling buffer locks. (Though I think that that
> should just happen on the leaf level. We should not try to be too
> clever about ignorable/half-dead/deleted pages, to be conservative.)
PFA V1 of this check retry.

Best regards, Andrey Borodin.

0001-In-amcheck-nbtree-do-rightlink-verification-with-loc.patch (5K) Download Attachment