new heapcheck contrib module

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
164 messages Options
12345 ... 9
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
Here is v5 of the patch.  Major changes in this version include:

1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database.  Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index.  The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.

2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function.  The new mode is used by a new function verify_btreeam(), analogous to verify_heapam(), both of which are used by the pg_amcheck command line tool.

3) The regression test which generates corruption within a table uses the pageinspect module to determine the location of each tuple on disk for corrupting.  This was suggested upthread.

Testing on the command line shows that the pre-existing btree checking code could use some hardening, as it currently crashes the backend on certain corruptions.  When I corrupt relation files for tables and indexes in the backend and then use pg_amcheck to check all objects in the database, I keep getting assertions from the btree checking code.  I think I need to harden this code, but wanted to post an updated patch and solicit opinions before doing so.  Here are some example problems I'm seeing.  Note the stack trace when calling from the command line tool includes the new verify_btreeam function, but you can get the same crashes using the old interface via psql:

From psql, first error:

test=# select bt_index_parent_check('corrupted_idx', true, true);
TRAP: FailedAssertion("_bt_check_natts(rel, key->heapkeyspace, page, offnum)", File: "nbtsearch.c", Line: 663)
0   postgres                            0x0000000106872977 ExceptionalCondition + 103
1   postgres                            0x00000001063a33e2 _bt_compare + 1090
2   amcheck.so                          0x0000000106d62921 bt_target_page_check + 6033
3   amcheck.so                          0x0000000106d5fd2f bt_index_check_internal + 2847
4   amcheck.so                          0x0000000106d60433 bt_index_parent_check + 67
5   postgres                            0x00000001064d6762 ExecInterpExpr + 1634
6   postgres                            0x000000010650d071 ExecResult + 321
7   postgres                            0x00000001064ddc3d standard_ExecutorRun + 301
8   postgres                            0x00000001066600c5 PortalRunSelect + 389
9   postgres                            0x000000010665fc7f PortalRun + 527
10  postgres                            0x000000010665ed59 exec_simple_query + 1641
11  postgres                            0x000000010665c99d PostgresMain + 3661
12  postgres                            0x00000001065d6a8a BackendRun + 410
13  postgres                            0x00000001065d61c4 ServerLoop + 3044
14  postgres                            0x00000001065d2fe9 PostmasterMain + 3769
15  postgres                            0x000000010652e3b0 help + 0
16  libdyld.dylib                       0x00007fff6725fcc9 start + 1
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: 2020-05-11 10:11:47.394 PDT [41091] LOG:  server process (PID 41309) was terminated by signal 6: Abort trap: 6



From commandline, second error:

pgtest % pg_amcheck -i test                                            
(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmin = 3289393 is in the future
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmax = 0 precedes relation relminmxid = 1
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmin = 12593 is in the future
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)

<snip>

(relname=corrupted,blkno=107,offnum=20,lp_off=7392,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmin = 306 precedes relation relfrozenxid = 487
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmax = 0 precedes relation relminmxid = 1
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmin = 305 precedes relation relfrozenxid = 487
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
t_hoff > lp_len (54 > 34)
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
t_hoff not max-aligned (54)
TRAP: FailedAssertion("TransactionIdIsValid(xmax)", File: "heapam_visibility.c", Line: 1319)
0   postgres                            0x0000000105b22977 ExceptionalCondition + 103
1   postgres                            0x0000000105636e86 HeapTupleSatisfiesVacuum + 1158
2   postgres                            0x0000000105634aa1 heapam_index_build_range_scan + 1089
3   amcheck.so                          0x00000001060100f3 bt_index_check_internal + 3811
4   amcheck.so                          0x000000010601057c verify_btreeam + 316
5   postgres                            0x0000000105796266 ExecMakeTableFunctionResult + 422
6   postgres                            0x00000001057a8c35 FunctionNext + 101
7   postgres                            0x00000001057bbf3e ExecNestLoop + 478
8   postgres                            0x000000010578dc3d standard_ExecutorRun + 301
9   postgres                            0x00000001059100c5 PortalRunSelect + 389
10  postgres                            0x000000010590fc7f PortalRun + 527
11  postgres                            0x000000010590ed59 exec_simple_query + 1641
12  postgres                            0x000000010590c99d PostgresMain + 3661
13  postgres                            0x0000000105886a8a BackendRun + 410
14  postgres                            0x00000001058861c4 ServerLoop + 3044
15  postgres                            0x0000000105882fe9 PostmasterMain + 3769
16  postgres                            0x00000001057de3b0 help + 0
17  libdyld.dylib                       0x00007fff6725fcc9 start + 1
pg_amcheck: error: query failed: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.




Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v5-0001-Adding-verify_heapam-and-pg_amcheck.patch (227K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Mon, May 11, 2020 at 10:21 AM Mark Dilger
<[hidden email]> wrote:
> 2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function.

Somebody suggested that I make amcheck work in this way during its
initial development. I rejected that idea at the time, though. It
seems hard to make it work because the B-Tree index scan is a logical
order index scan. It's quite possible that a corrupt index will have
circular sibling links, and things like that. Making everything an
error removes that concern. There are clearly some failures that we
could just soldier on from, but the distinction gets rather blurred.

I understand why you want to do it this way. It makes sense that the
heap stuff would report all inconsistencies together, at the end. I
don't think that that's really workable (or even desirable) in the
case of B-Tree indexes, though. When an index is corrupt, the solution
is always to do root cause analysis, to make sure that the issue does
not recur, and then to REINDEX. There isn't really a question about
doing data recovery of the index structure.

Would it be possible to log the first B-Tree inconsistency, and then
move on to the next high-level phase of verification? You don't have
to throw an error, but it seems like a good idea for amcheck to still
give up on further verification of the index.

The assertion failure that you reported happens because of a generic
assertion made from _bt_compare(). It doesn't have anything to do with
amcheck (you'll see the same thing from regular index scans), really.
I think that removing that assertion would be the opposite of
hardening. Even if you removed it, the backend will still crash once
you come up with a slightly more evil index tuple. Maybe *that* could
be mostly avoided with widespread hardening; we could in principle
perform cross-checks of varlena headers against the tuple or page
layout at any point reachable from _bt_compare(). That seems like
something that would have unacceptable overhead, because the cost
would be imposed on everything. And even then you've only ameliorated
the problem.

Code like amcheck's PageGetItemIdCareful() goes further than the
equivalent backend macro (PageGetItemId()) to avoid assertion failures
and crashes with corrupt data. I doubt that it is practical to take it
much further than that, though. It's subject to diminishing returns.
In general, _bt_compare() calls user-defined code that is usually
written in C. This C code could in principle feel entitled to do any
number of scary things when you corrupt the input data. The amcheck
module's dependency on user-defined operator code is totally
unavoidable -- it is the single source of truth for the nbtree checks.

It boils down to this: I think that regression tests that run on the
buildfarm and actually corrupt data are not practical, at least in the
case of the index checks -- though probably in all cases. Look at the
pageinspect "btree.out" test output file -- it's very limited, because
we have to work around a bunch of implementation details. It's no
accident that the bt_page_items() test shows a palindrome value in the
data column (the value is "01 00 00 00 00 00 00 01"). That's an
endianness workaround.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On May 12, 2020, at 5:34 PM, Peter Geoghegan <[hidden email]> wrote:
>
> On Mon, May 11, 2020 at 10:21 AM Mark Dilger
> <[hidden email]> wrote:
>> 2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function.

Thank you yet again for reviewing.  I really appreciate the feedback!

> Somebody suggested that I make amcheck work in this way during its
> initial development. I rejected that idea at the time, though. It
> seems hard to make it work because the B-Tree index scan is a logical
> order index scan. It's quite possible that a corrupt index will have
> circular sibling links, and things like that. Making everything an
> error removes that concern. There are clearly some failures that we
> could just soldier on from, but the distinction gets rather blurred.

Ok, I take your point that the code cannot soldier on after the first error is returned.  I'll change that for v6 of the patch, moving on to the next relation after hitting the first corruption in any particular index.  Do you mind that I refactored the code to return the error rather than ereporting?  If it offends your sensibilities, I could rip that back out, at the expense of having to use try/catch logic in some other places.  I prefer to avoid the try/catch stuff, but I'm not going to put up a huge fuss.

> I understand why you want to do it this way. It makes sense that the
> heap stuff would report all inconsistencies together, at the end. I
> don't think that that's really workable (or even desirable) in the
> case of B-Tree indexes, though. When an index is corrupt, the solution
> is always to do root cause analysis, to make sure that the issue does
> not recur, and then to REINDEX. There isn't really a question about
> doing data recovery of the index structure.

Yes, I agree that reindexing is the most sensible remedy.  I certainly have no plans to implement some pg_fsck_index type tool.  Even for tables, I'm not interested in creating such a tool. I just want a good tool for finding out what the nature of the corruption is, as that might make it easier to debug what went wrong.  It's not just for debugging production systems, but also for chasing down problems in half-baked code prior to release.

> Would it be possible to log the first B-Tree inconsistency, and then
> move on to the next high-level phase of verification? You don't have
> to throw an error, but it seems like a good idea for amcheck to still
> give up on further verification of the index.

Ok, good, it sounds like we're converging on the same idea.  I'm happy to do so.

> The assertion failure that you reported happens because of a generic
> assertion made from _bt_compare(). It doesn't have anything to do with
> amcheck (you'll see the same thing from regular index scans), really.

Oh, I know that already.  I could see that easily enough in the backtrace.  But if you look at the way I implemented verify_heapam, you might notice this:

/*
 * check_tuphdr_xids
 *
 *  Determine whether tuples are visible for verification.  Similar to
 *  HeapTupleSatisfiesVacuum, but with critical differences.
 *
 *  1) Does not touch hint bits.  It seems imprudent to write hint bits
 *     to a table during a corruption check.
 *  2) Only makes a boolean determination of whether verification should
 *     see the tuple, rather than doing extra work for vacuum-related
 *     categorization.
 *
 *  The caller should already have checked that xmin and xmax are not out of
 *  bounds for the relation.
 */

The point is that when checking the table for corruption I avoid calling anything that might assert (or segfault, or whatever).  I was talking about refactoring the btree checking code to be similarly careful.

> I think that removing that assertion would be the opposite of
> hardening. Even if you removed it, the backend will still crash once
> you come up with a slightly more evil index tuple. Maybe *that* could
> be mostly avoided with widespread hardening; we could in principle
> perform cross-checks of varlena headers against the tuple or page
> layout at any point reachable from _bt_compare(). That seems like
> something that would have unacceptable overhead, because the cost
> would be imposed on everything. And even then you've only ameliorated
> the problem.

I think we may have different mental models of how this all works in practice.  I am (or was) envisioning that the backend, during regular table and index scans, cannot afford to check for corruption at all steps along the way, and therefore does not, but that a corruption checking tool has a fundamentally different purpose, and can and should choose to operate in a way that won't blow up when checking a corrupt relation.  It's the difference between a car designed to drive down the highway at high speed vs. a military vehicle designed to drive over a minefield with a guy on the front bumper scanning for landmines, the whole while going half a mile an hour.

I'm starting to infer from your comments that you see the landmine detection vehicle as also driving at high speed, detecting landmines on occasion by seeing them first, but frequently by failing to see them and just blowing up.

> Code like amcheck's PageGetItemIdCareful() goes further than the
> equivalent backend macro (PageGetItemId()) to avoid assertion failures
> and crashes with corrupt data. I doubt that it is practical to take it
> much further than that, though. It's subject to diminishing returns.

Ok.

> In general, _bt_compare() calls user-defined code that is usually
> written in C. This C code could in principle feel entitled to do any
> number of scary things when you corrupt the input data. The amcheck
> module's dependency on user-defined operator code is totally
> unavoidable -- it is the single source of truth for the nbtree checks.

I don't really understand this argument, since users with buggy user defined operators are not the target audience, but I also don't think there is any point in arguing it, since I'm already resolved to take your advice about not hardening the btree stuff any further.

> It boils down to this: I think that regression tests that run on the
> buildfarm and actually corrupt data are not practical, at least in the
> case of the index checks -- though probably in all cases. Look at the
> pageinspect "btree.out" test output file -- it's very limited, because
> we have to work around a bunch of implementation details. It's no
> accident that the bt_page_items() test shows a palindrome value in the
> data column (the value is "01 00 00 00 00 00 00 01"). That's an
> endianness workaround.

One of the delays in submitting the most recent version of the patch is that I was having trouble creating a reliable, portable btree corrupting regression test.  Ultimately, I submitted v5 without any btree corrupting regression test, as it proved pretty difficult to write one good enough for submission, and I had already put a couple more days into developing v5 than I had intended.  So I can't argue too much with your point here.

I did however address (some?) issues that you and others mentioned about the table corrupting regression test.  Perhaps there are remaining issues that will show up on machines with different endianness than I have thus far tested, but I don't see that they will be insurmountable.  Are you fundamentally opposed to that test framework?  If you're going to vote against committing the patch with that test, I'll back down and just remove it from the patch, but it doesn't seem like a bad regression test to me.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Tue, May 12, 2020 at 7:07 PM Mark Dilger
<[hidden email]> wrote:
> Thank you yet again for reviewing.  I really appreciate the feedback!

Happy to help. It's important work.

> Ok, I take your point that the code cannot soldier on after the first error is returned.  I'll change that for v6 of the patch, moving on to the next relation after hitting the first corruption in any particular index.  Do you mind that I refactored the code to return the error rather than ereporting?

try/catch seems like the way to do it. Not all amcheck errors come
from amcheck -- some are things that the backend code does, that are
known to appear in amcheck from time to time. I'm thinking in
particular of the
table_index_build_scan()/heapam_index_build_range_scan() errors, as
well as the errors from _bt_checkpage().

> Yes, I agree that reindexing is the most sensible remedy.  I certainly have no plans to implement some pg_fsck_index type tool.  Even for tables, I'm not interested in creating such a tool. I just want a good tool for finding out what the nature of the corruption is, as that might make it easier to debug what went wrong.  It's not just for debugging production systems, but also for chasing down problems in half-baked code prior to release.

All good goals.

>  * check_tuphdr_xids

> The point is that when checking the table for corruption I avoid calling anything that might assert (or segfault, or whatever).

I don't think that you can expect to avoid assertion failures in
general. I'll stick with your example. You're calling
TransactionIdDidCommit() from check_tuphdr_xids(), which will
interrogate the commit log and pg_subtrans. It's just not under your
control. I'm sure that you could get an assertion failure somewhere in
there, and even if you couldn't that could change at any time.

You've quasi-duplicated some sensitive code to do that much, which
seems excessive. But it's also not enough.

> I'm starting to infer from your comments that you see the landmine detection vehicle as also driving at high speed, detecting landmines on occasion by seeing them first, but frequently by failing to see them and just blowing up.

That's not it. I would certainly prefer if the landmine detector
didn't blow up. Not having that happen is certainly a goal I share --
that's why PageGetItemIdCareful() exists. But not at any cost,
especially not when "blow up" means an assertion failure that users
won't actually see in production. Avoiding assertion failures like the
one you showed is likely to have a high cost (removing defensive
asserts in low level access method code) for a low benefit. Any
attempt to avoid having the checker itself blow up rather than throw
an error message needs to be assessed pragmatically, on a case-by-case
basis.

> One of the delays in submitting the most recent version of the patch is that I was having trouble creating a reliable, portable btree corrupting regression test.

To be clear, I think that corrupting data is very helpful with ad-hoc
testing during development.

> I did however address (some?) issues that you and others mentioned about the table corrupting regression test.  Perhaps there are remaining issues that will show up on machines with different endianness than I have thus far tested, but I don't see that they will be insurmountable.  Are you fundamentally opposed to that test framework?

I haven't thought about it enough just yet, but I am certainly suspicious of it.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
On Tue, May 12, 2020 at 11:06 PM Peter Geoghegan <[hidden email]> wrote:
> try/catch seems like the way to do it. Not all amcheck errors come
> from amcheck -- some are things that the backend code does, that are
> known to appear in amcheck from time to time. I'm thinking in
> particular of the
> table_index_build_scan()/heapam_index_build_range_scan() errors, as
> well as the errors from _bt_checkpage().

That would require the use of a subtransaction.

> You've quasi-duplicated some sensitive code to do that much, which
> seems excessive. But it's also not enough.

I think this is a good summary of the problems in this area. On the
one hand, I think it's hideous that we sanity check user input to
death, but blindly trust the bytes on disk to the point of seg
faulting if they're wrong. The idea that int4 + int4 has to have
overflow checking because otherwise a user might be sad when they get
a negative result from adding two negative numbers, while at the same
time supposing that the same user will be unwilling to accept the
performance hit to avoid crashing if they have a bad tuple, is quite
suspect in my mind. The overflow checking is also expensive, but we do
it because it's the right thing to do, and then we try to minimize the
overhead. It is unclear to me why we shouldn't also take that approach
with bytes that come from disk. In particular, using Assert() checks
for such things instead of elog() is basically Assert(there is no such
thing as a corrupted database).

On the other hand, that problem is clearly way above this patch's pay
grade. There's a lot of stuff all over the code base that would have
to be changed to fix it. It can't be done as an incidental thing as
part of this patch or any other. It's a massive effort unto itself. We
need to somehow draw a clean line between what this patch does and
what it does not do, such that the scope of this patch remains
something achievable. Otherwise, we'll end up with nothing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Wed, May 13, 2020 at 12:22 PM Robert Haas <[hidden email]> wrote:

> I think this is a good summary of the problems in this area. On the
> one hand, I think it's hideous that we sanity check user input to
> death, but blindly trust the bytes on disk to the point of seg
> faulting if they're wrong. The idea that int4 + int4 has to have
> overflow checking because otherwise a user might be sad when they get
> a negative result from adding two negative numbers, while at the same
> time supposing that the same user will be unwilling to accept the
> performance hit to avoid crashing if they have a bad tuple, is quite
> suspect in my mind. The overflow checking is also expensive, but we do
> it because it's the right thing to do, and then we try to minimize the
> overhead. It is unclear to me why we shouldn't also take that approach
> with bytes that come from disk. In particular, using Assert() checks
> for such things instead of elog() is basically Assert(there is no such
> thing as a corrupted database).

I think that it depends. It's nice to be able to add an Assert()
without really having to worry about the overhead at all. I sometimes
call relatively expensive functions in assertions. For example, there
is an assert that calls _bt_compare() within _bt_check_unique() that I
added at one point -- it caught a real bug a few weeks later. You
could always be doing more.

In general we don't exactly trust the bytes blindly. I've found that
corrupting tuples in a creative way with pg_hexedit doesn't usually
result in a segfault. Sometimes we'll do things like display NULL
values when heap line pointers are corrupt, which isn't as good as an
error but is still okay. We ought to protect against Murphy, not
Machiavelli. ISTM that access method code naturally evolves towards
avoiding the most disruptive errors in the event of real world
corruption, in particular avoiding segfaulting. It's very hard to
prove that, though.

Do you recall seeing corruption resulting in segfaults in production?
I personally don't recall seeing that. If it happened, the segfaults
themselves probably wouldn't be the main concern.

> On the other hand, that problem is clearly way above this patch's pay
> grade. There's a lot of stuff all over the code base that would have
> to be changed to fix it. It can't be done as an incidental thing as
> part of this patch or any other. It's a massive effort unto itself. We
> need to somehow draw a clean line between what this patch does and
> what it does not do, such that the scope of this patch remains
> something achievable. Otherwise, we'll end up with nothing.

I can easily come up with an adversarial input that will segfault a
backend, even amcheck, but it'll be somewhat contrived. It's hard to
fool amcheck currently because it doesn't exactly trust line pointers.
But I'm sure I could get the backend to segfault amcheck if I tried.
I'd probably try to play around with varlena headers. It would require
a certain amount of craftiness.

It's not exactly clear where you draw the line here. And I don't think
that the line will be very clearly defined, in the end. It'll be
something that is subject to change over time, as new information
comes to light. I think that it's necessary to accept a certain amount
of ambiguity here.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Alvaro Herrera-9
In reply to this post by Peter Geoghegan-4
On 2020-May-12, Peter Geoghegan wrote:

> > The point is that when checking the table for corruption I avoid
> > calling anything that might assert (or segfault, or whatever).
>
> I don't think that you can expect to avoid assertion failures in
> general.

Hmm.  I think we should (try to?) write code that avoids all crashes
with production builds, but not extend that to assertion failures.
Sticking again with the provided example,

> I'll stick with your example. You're calling
> TransactionIdDidCommit() from check_tuphdr_xids(), which will
> interrogate the commit log and pg_subtrans. It's just not under your
> control.

in a production build this would just fail with an error that the
pg_xact file cannot be found, which is fine -- if this happens in a
production system, you're not disturbing any other sessions.  Or maybe
the file is there and the byte can be read, in which case you would get
the correct response; but that's fine too.

I don't know to what extent this is possible.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <[hidden email]> wrote:
> Hmm.  I think we should (try to?) write code that avoids all crashes
> with production builds, but not extend that to assertion failures.

Assertions are only a problem at all because Mark would like to write
tests that involve a selection of truly corrupt data. That's a new
requirement, and one that I have my doubts about.

> > I'll stick with your example. You're calling
> > TransactionIdDidCommit() from check_tuphdr_xids(), which will
> > interrogate the commit log and pg_subtrans. It's just not under your
> > control.
>
> in a production build this would just fail with an error that the
> pg_xact file cannot be found, which is fine -- if this happens in a
> production system, you're not disturbing any other sessions.  Or maybe
> the file is there and the byte can be read, in which case you would get
> the correct response; but that's fine too.

I think that this is fine, too, since I don't consider assertion
failures with corrupt data all that important. I'd make some effort to
avoid it, but not too much, and not at the expense of a useful general
purpose assertion that could catch bugs in many different contexts.

I would be willing to make a larger effort to avoid crashing a
backend, since that affects production. I might go to some effort to
not crash with downright adversarial inputs, for example. But it seems
inappropriate to take extreme measures just to avoid a crash with
extremely contrived inputs that will probably never occur. My sense is
that this is subject to sharply diminishing returns. Completely
nailing down hard crashes from corrupt data seems like the wrong
priority, at the very least. Pursuing that objective over other
objectives sounds like zero-risk bias.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Alvaro Herrera-9
On 2020-May-13, Peter Geoghegan wrote:

> On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <[hidden email]> wrote:
> > Hmm.  I think we should (try to?) write code that avoids all crashes
> > with production builds, but not extend that to assertion failures.
>
> Assertions are only a problem at all because Mark would like to write
> tests that involve a selection of truly corrupt data. That's a new
> requirement, and one that I have my doubts about.

I agree that this (a test tool that exercises our code against
arbitrarily corrupted data pages) is not going to work as a test that
all buildfarm members run -- it seems something for specialized
buildfarm members to run, or even something that's run outside of the
buildfarm, like sqlsmith.  Obviously such a tool would not be able to
run against an assertion-enabled build, and we shouldn't even try.

> I would be willing to make a larger effort to avoid crashing a
> backend, since that affects production. I might go to some effort to
> not crash with downright adversarial inputs, for example. But it seems
> inappropriate to take extreme measures just to avoid a crash with
> extremely contrived inputs that will probably never occur. My sense is
> that this is subject to sharply diminishing returns. Completely
> nailing down hard crashes from corrupt data seems like the wrong
> priority, at the very least. Pursuing that objective over other
> objectives sounds like zero-risk bias.

I think my initial approach for this would be to use a fuzzing tool that
generates data blocks semi-randomly, then uses them as Postgres data
pages somehow, and see what happens -- examine any resulting crashes and
make individual judgement calls about the fix(es) necessary to prevent
each of them.  I expect that many such pages would be rejected as
corrupt by page header checks.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Wed, May 13, 2020 at 4:32 PM Alvaro Herrera <[hidden email]> wrote:
> I think my initial approach for this would be to use a fuzzing tool that
> generates data blocks semi-randomly, then uses them as Postgres data
> pages somehow, and see what happens -- examine any resulting crashes and
> make individual judgement calls about the fix(es) necessary to prevent
> each of them.  I expect that many such pages would be rejected as
> corrupt by page header checks.

As I mentioned in my response to Robert earlier, that's more or less
been my experience with adversarial corruption generated using
pg_hexedit. Within nbtree, as well as heapam. I put a lot of work into
that tool, and have used it to simulate all kinds of weird scenarios.
I've done things like corrupt individual tuple header fields, swap
line pointers, create circular sibling links in indexes, corrupt
varlena headers, and corrupt line pointer flags/status bits. Postgres
itself rarely segfaults, and amcheck will only segfault with a truly
contrived input.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
In reply to this post by Peter Geoghegan-4


> On May 13, 2020, at 3:29 PM, Peter Geoghegan <[hidden email]> wrote:
>
> On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <[hidden email]> wrote:
>> Hmm.  I think we should (try to?) write code that avoids all crashes
>> with production builds, but not extend that to assertion failures.
>
> Assertions are only a problem at all because Mark would like to write
> tests that involve a selection of truly corrupt data. That's a new
> requirement, and one that I have my doubts about.
>
>>> I'll stick with your example. You're calling
>>> TransactionIdDidCommit() from check_tuphdr_xids(), which will
>>> interrogate the commit log and pg_subtrans. It's just not under your
>>> control.
>>
>> in a production build this would just fail with an error that the
>> pg_xact file cannot be found, which is fine -- if this happens in a
>> production system, you're not disturbing any other sessions.  Or maybe
>> the file is there and the byte can be read, in which case you would get
>> the correct response; but that's fine too.
>
> I think that this is fine, too, since I don't consider assertion
> failures with corrupt data all that important. I'd make some effort to
> avoid it, but not too much, and not at the expense of a useful general
> purpose assertion that could catch bugs in many different contexts.

I am not removing any assertions.  I do not propose to remove any assertions. When I talk about "hardening against assertions", that is not in any way a proposal to remove assertions from the code.  What I'm talking about is writing the amcheck contrib module code in such a way that it only calls a function that could assert on bad data after checking that the data is not bad.

I don't know that hardening against assertions in this manner is worth doing, but this is none the less what I'm talking about.  You have made decent arguments that it probably isn't worth doing for the btree checking code.  And in any event, it is probably something that could be addressed in a future patch after getting this patch committed.

There is a separate but related question in the offing about whether the backend code, independently of any amcheck contrib stuff, should be more paranoid in how it processes tuples to check for corruption.  The heap deform tuple code in question is on a pretty hot code path, and I don't know that folks would accept the performance hit of more checks being done in that part of the system, but that's pretty far from relevant to this patch.  That should be hashed out, or not, at some other time on some other thread.

> I would be willing to make a larger effort to avoid crashing a
> backend, since that affects production. I might go to some effort to
> not crash with downright adversarial inputs, for example. But it seems
> inappropriate to take extreme measures just to avoid a crash with
> extremely contrived inputs that will probably never occur.

I think this is a misrepresentation of the tests that I've been running.  There are two kinds of tests that I have done:

First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived.  That was included in the regression test suite because it needed to be something other developers could read, verify, "yeah, I can see why that would be corruption, and would give an error message of the sort the test expects", and then could be run to verify that indeed that expected error message was generated.

The second kind of corruption test I have been running is nothing more than writing random nonsense into randomly chosen locations within heap files and then running verify_heapam against those heap relations.  It's much more Murphy than Machiavelli when it's just generated by calling random().  When I initially did this kind of testing, the heapam checking code had lots of problems.  Now it doesn't.  There's very little contrived about that which I can see. It's the kind of corruption you'd expect from any number of faulty storage systems.  The one "contrived" aspect of my testing in this regard is that the script I use to write random nonsense to random locations in heap files is smart enough not to write random junk to the page headers.  This is because if I corrupt the page headers, the backend never even gets as far as running the verify_heapam functions, as the page cache rejects loading the page.



Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Wed, May 13, 2020 at 5:18 PM Mark Dilger
<[hidden email]> wrote:
> I am not removing any assertions.  I do not propose to remove any assertions. When I talk about "hardening against assertions", that is not in any way a proposal to remove assertions from the code.

I'm sorry if I seemed to suggest that you wanted to remove assertions,
rather than test more things earlier. I recognize that that could be a
useful thing to do, both in general, and maybe even in the specific
example you gave -- on general robustness grounds. At the same time,
it's something that can only be taken so far. It's probably not going
to make it practical to corrupt data in a regression test or tap test.

> There is a separate but related question in the offing about whether the backend code, independently of any amcheck contrib stuff, should be more paranoid in how it processes tuples to check for corruption.

I bet that there is something that we could do to be a bit more
defensive. Of course, we do a certain amount of that on general
robustness grounds already. A systematic review of that could be quite
useful. But as you point out, it's not really in scope here.

> > I would be willing to make a larger effort to avoid crashing a
> > backend, since that affects production. I might go to some effort to
> > not crash with downright adversarial inputs, for example. But it seems
> > inappropriate to take extreme measures just to avoid a crash with
> > extremely contrived inputs that will probably never occur.
>
> I think this is a misrepresentation of the tests that I've been running.

I didn't actually mean it that way, but I can see how my words could
reasonably be interpreted that way. I apologize.

> There are two kinds of tests that I have done:
>
> First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived.  That was included in the regression test suite because it needed to be something other developers could read, verify, "yeah, I can see why that would be corruption, and would give an error message of the sort the test expects", and then could be run to verify that indeed that expected error message was generated.

I still don't think that this is necessary. It could work for one type
of corruption, that happens to not have any of the problems, but just
testing that one type of corruption seems rather arbitrary to me.

> The second kind of corruption test I have been running is nothing more than writing random nonsense into randomly chosen locations within heap files and then running verify_heapam against those heap relations.  It's much more Murphy than Machiavelli when it's just generated by calling random().

That sounds like a good initial test case, to guide your intuitions
about how to make the feature robust.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On May 13, 2020, at 5:36 PM, Peter Geoghegan <[hidden email]> wrote:
>
> On Wed, May 13, 2020 at 5:18 PM Mark Dilger
> <[hidden email]> wrote:
>> I am not removing any assertions.  I do not propose to remove any assertions. When I talk about "hardening against assertions", that is not in any way a proposal to remove assertions from the code.
>
> I'm sorry if I seemed to suggest that you wanted to remove assertions

Not a problem at all.  As always, I appreciate your involvement in this code and design review.


>> I think this is a misrepresentation of the tests that I've been running.
>
> I didn't actually mean it that way, but I can see how my words could
> reasonably be interpreted that way. I apologize.

Again, no worries.

>> There are two kinds of tests that I have done:
>>
>> First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived.  That was included in the regression test suite because it needed to be something other developers could read, verify, "yeah, I can see why that would be corruption, and would give an error message of the sort the test expects", and then could be run to verify that indeed that expected error message was generated.
>
> I still don't think that this is necessary. It could work for one type
> of corruption, that happens to not have any of the problems, but just
> testing that one type of corruption seems rather arbitrary to me.

As discussed with Robert off list, this probably doesn't matter.  The patch can be committed with or without this particular TAP test.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
In reply to this post by Peter Geoghegan-4
On Wed, May 13, 2020 at 5:33 PM Peter Geoghegan <[hidden email]> wrote:
> Do you recall seeing corruption resulting in segfaults in production?

I have seen that, I believe. I think it's more common to fail with
errors about not being able to palloc>1GB, not being able to look up
an xid or mxid, etc. but I am pretty sure I've seen multiple cases
involving seg faults, too. Unfortunately for my credibility, I can't
remember the details right now.

> I personally don't recall seeing that. If it happened, the segfaults
> themselves probably wouldn't be the main concern.

I don't really agree. Hypothetically speaking, suppose you corrupt
your only copy of a critical table in such a way that every time you
select from it, the system seg faults. A user in this situation might
ask questions like:

1. How did my table get corrupted?
2. Why do I only have one copy of it?
3. How do I retrieve the non-corrupted portion of my data from that
table and get back up and running?

In the grand scheme of things, #1 and #2 are the most important
questions, but when something like this actually happens, #3 tends to
be the most urgent question, and it's a lot harder to get the
uncorrupted data out if the system keeps crashing.

Also, a seg fault tends to lead customers to think that the database
has a bug, rather than that the database is corrupted.

Slightly off-topic here, but I think our error reporting in this area
is pretty lame. I've learned over the years that when a customer
reports that they get a complaint about a too-large memory allocation
every time they access a table, they've probably got a corrupted
varlena header. However, that's extremely non-obvious to a typical
user. We should try to report errors indicative of corruption in a way
that gives the user some clue that corruption has happened. Peter made
a stab at improving things there by adding
errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
users will never see the error code, only the message, and a lot of
corruption produces still produces errors that weren't changed by that
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
In reply to this post by Alvaro Herrera-9
On Wed, May 13, 2020 at 7:32 PM Alvaro Herrera <[hidden email]> wrote:
> I agree that this (a test tool that exercises our code against
> arbitrarily corrupted data pages) is not going to work as a test that
> all buildfarm members run -- it seems something for specialized
> buildfarm members to run, or even something that's run outside of the
> buildfarm, like sqlsmith.  Obviously such a tool would not be able to
> run against an assertion-enabled build, and we shouldn't even try.

I have a question about what you mean here by "arbitrarily."

If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against heap pages full of randomly-generated
garbage, I tend to agree. Such a test wouldn't be very stable and
might fail in lots of low-probability ways that could require
unreasonable effort to find and fix.

If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against any corrupted heap pages at all, I tend to
disagree. If we did that, then we'd basically be releasing a heap
corruption checker with very limited test coverage. Like, we shouldn't
only have negative test cases, where the absence of corruption
produces no results. We should also have positive test cases, where
the thing finds some problem...

At least, that's what I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
In reply to this post by Robert Haas
On Thu, May 14, 2020 at 11:33 AM Robert Haas <[hidden email]> wrote:
> I have seen that, I believe. I think it's more common to fail with
> errors about not being able to palloc>1GB, not being able to look up
> an xid or mxid, etc. but I am pretty sure I've seen multiple cases
> involving seg faults, too. Unfortunately for my credibility, I can't
> remember the details right now.

I believe you, both in general, and also because what you're saying
here is plausible, even if it doesn't fit my own experience.

Corruption is by its very nature exceptional. At least, if that isn't
true then something must be seriously wrong, so the idea that it will
be different in some way each time seems like a good working
assumption. Your exceptional cases are not necessarily the same as
mine, especially where hardware problems are concerned. On the other
hand, it's also possible for corruption that originates from very
different sources to exhibit the same basic inconsistencies and
symptoms.

I've noticed that SLRU corruption is often a leading indicator of
general storage problems. The inconsistencies between certain SLRU
state and the heap happens to be far easier to notice in practice,
particularly when VACUUM runs. But it's not fundamentally different to
inconsistencies from pages within one single main fork of some heap
relation.

> > I personally don't recall seeing that. If it happened, the segfaults
> > themselves probably wouldn't be the main concern.
>
> I don't really agree. Hypothetically speaking, suppose you corrupt
> your only copy of a critical table in such a way that every time you
> select from it, the system seg faults. A user in this situation might
> ask questions like:

I agree that that could be a problem. But that's not what I've seen
happen in production systems myself.

Maybe there is some low hanging fruit here. Perhaps we can make the
real PageGetItemId() a little closer to PageGetItemIdCareful() without
noticeable overhead, as I suggested already. Are there any real
generalizations that we can make about why backends segfault with
corrupt data? Maybe there is. That seems important.

> Slightly off-topic here, but I think our error reporting in this area
> is pretty lame. I've learned over the years that when a customer
> reports that they get a complaint about a too-large memory allocation
> every time they access a table, they've probably got a corrupted
> varlena header.

I certainlt learned the same lesson in the same way.

> However, that's extremely non-obvious to a typical
> user. We should try to report errors indicative of corruption in a way
> that gives the user some clue that corruption has happened. Peter made
> a stab at improving things there by adding
> errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
> users will never see the error code, only the message, and a lot of
> corruption produces still produces errors that weren't changed by that
> commit.

The theory is that "can't happen" errors having an errcode that should
be considered similar to or equivalent to ERRCODE_DATA_CORRUPTED. I
doubt that it works out that way in practice, though.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Alvaro Herrera-9
In reply to this post by Robert Haas
On 2020-May-14, Robert Haas wrote:

> I have a question about what you mean here by "arbitrarily."
>
> If you mean that we shouldn't have the buildfarm run the proposed heap
> corruption checker against heap pages full of randomly-generated
> garbage, I tend to agree. Such a test wouldn't be very stable and
> might fail in lots of low-probability ways that could require
> unreasonable effort to find and fix.

This is what I meant.  I was thinking of blocks generated randomly.

> If you mean that we shouldn't have the buildfarm run the proposed heap
> corruption checker against any corrupted heap pages at all, I tend to
> disagree.

Yeah, IMV those would not be arbitrarily corrupted -- instead they're
crafted to be corrupted in some specific way.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On 2020-May-14, Robert Haas wrote:
>> If you mean that we shouldn't have the buildfarm run the proposed heap
>> corruption checker against heap pages full of randomly-generated
>> garbage, I tend to agree. Such a test wouldn't be very stable and
>> might fail in lots of low-probability ways that could require
>> unreasonable effort to find and fix.

> This is what I meant.  I was thinking of blocks generated randomly.

Yeah, -1 for using random data --- when it fails, how you gonna
reproduce the problem?

>> If you mean that we shouldn't have the buildfarm run the proposed heap
>> corruption checker against any corrupted heap pages at all, I tend to
>> disagree.

> Yeah, IMV those would not be arbitrarily corrupted -- instead they're
> crafted to be corrupted in some specific way.

I think there's definitely value in corrupting data in some predictable
(reproducible) way and verifying that the check code catches it and
responds as expected.  Sure, this will not be 100% coverage, but it'll be
a lot better than 0% coverage.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Eisentraut-6
In reply to this post by Mark Dilger-5
On 2020-05-11 19:21, Mark Dilger wrote:
> 1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database.  Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index.  The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.

Why is this useful over just using the extension's functions via psql?

I suppose you could make an argument for a command-line wrapper around
almost every admin-focused contrib module (pageinspect, pg_prewarm,
pgstattuple, ...), but that doesn't seem very sensible.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On May 14, 2020, at 1:02 PM, Peter Eisentraut <[hidden email]> wrote:
>
> On 2020-05-11 19:21, Mark Dilger wrote:
>> 1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database.  Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index.  The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.
>
> Why is this useful over just using the extension's functions via psql?

The tool doesn't hold a single snapshot or transaction for the lifetime of checking the entire database.  A future improvement to the tool might add parallelism.  Users could do all of this in scripts, but having a single tool with the most commonly useful options avoids duplication of effort.
 

Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





12345 ... 9