new heapcheck contrib module

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
164 messages Options
123456 ... 9
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Dilip Kumar-2
On Mon, May 11, 2020 at 10:51 PM Mark Dilger
<[hidden email]> wrote:

>
> Here is v5 of the patch.  Major changes in this version include:
>
> 1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database.  Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index.  The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.
>
> 2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function.  The new mode is used by a new function verify_btreeam(), analogous to verify_heapam(), both of which are used by the pg_amcheck command line tool.
>
> 3) The regression test which generates corruption within a table uses the pageinspect module to determine the location of each tuple on disk for corrupting.  This was suggested upthread.
>
> Testing on the command line shows that the pre-existing btree checking code could use some hardening, as it currently crashes the backend on certain corruptions.  When I corrupt relation files for tables and indexes in the backend and then use pg_amcheck to check all objects in the database, I keep getting assertions from the btree checking code.  I think I need to harden this code, but wanted to post an updated patch and solicit opinions before doing so.  Here are some example problems I'm seeing.  Note the stack trace when calling from the command line tool includes the new verify_btreeam function, but you can get the same crashes using the old interface via psql:
>
> From psql, first error:
>
> test=# select bt_index_parent_check('corrupted_idx', true, true);
> TRAP: FailedAssertion("_bt_check_natts(rel, key->heapkeyspace, page, offnum)", File: "nbtsearch.c", Line: 663)
> 0   postgres                            0x0000000106872977 ExceptionalCondition + 103
> 1   postgres                            0x00000001063a33e2 _bt_compare + 1090
> 2   amcheck.so                          0x0000000106d62921 bt_target_page_check + 6033
> 3   amcheck.so                          0x0000000106d5fd2f bt_index_check_internal + 2847
> 4   amcheck.so                          0x0000000106d60433 bt_index_parent_check + 67
> 5   postgres                            0x00000001064d6762 ExecInterpExpr + 1634
> 6   postgres                            0x000000010650d071 ExecResult + 321
> 7   postgres                            0x00000001064ddc3d standard_ExecutorRun + 301
> 8   postgres                            0x00000001066600c5 PortalRunSelect + 389
> 9   postgres                            0x000000010665fc7f PortalRun + 527
> 10  postgres                            0x000000010665ed59 exec_simple_query + 1641
> 11  postgres                            0x000000010665c99d PostgresMain + 3661
> 12  postgres                            0x00000001065d6a8a BackendRun + 410
> 13  postgres                            0x00000001065d61c4 ServerLoop + 3044
> 14  postgres                            0x00000001065d2fe9 PostmasterMain + 3769
> 15  postgres                            0x000000010652e3b0 help + 0
> 16  libdyld.dylib                       0x00007fff6725fcc9 start + 1
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: 2020-05-11 10:11:47.394 PDT [41091] LOG:  server process (PID 41309) was terminated by signal 6: Abort trap: 6
>
>
>
> From commandline, second error:
>
> pgtest % pg_amcheck -i test
> (relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
> tuple xmin = 3289393 is in the future
> (relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
> tuple xmax = 0 precedes relation relminmxid = 1
> (relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
> tuple xmin = 12593 is in the future
> (relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
>
> <snip>
>
> (relname=corrupted,blkno=107,offnum=20,lp_off=7392,lp_flags=1,lp_len=34,attnum=,chunk=)
> tuple xmin = 306 precedes relation relfrozenxid = 487
> (relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
> tuple xmax = 0 precedes relation relminmxid = 1
> (relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
> tuple xmin = 305 precedes relation relfrozenxid = 487
> (relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
> t_hoff > lp_len (54 > 34)
> (relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
> t_hoff not max-aligned (54)
> TRAP: FailedAssertion("TransactionIdIsValid(xmax)", File: "heapam_visibility.c", Line: 1319)
> 0   postgres                            0x0000000105b22977 ExceptionalCondition + 103
> 1   postgres                            0x0000000105636e86 HeapTupleSatisfiesVacuum + 1158
> 2   postgres                            0x0000000105634aa1 heapam_index_build_range_scan + 1089
> 3   amcheck.so                          0x00000001060100f3 bt_index_check_internal + 3811
> 4   amcheck.so                          0x000000010601057c verify_btreeam + 316
> 5   postgres                            0x0000000105796266 ExecMakeTableFunctionResult + 422
> 6   postgres                            0x00000001057a8c35 FunctionNext + 101
> 7   postgres                            0x00000001057bbf3e ExecNestLoop + 478
> 8   postgres                            0x000000010578dc3d standard_ExecutorRun + 301
> 9   postgres                            0x00000001059100c5 PortalRunSelect + 389
> 10  postgres                            0x000000010590fc7f PortalRun + 527
> 11  postgres                            0x000000010590ed59 exec_simple_query + 1641
> 12  postgres                            0x000000010590c99d PostgresMain + 3661
> 13  postgres                            0x0000000105886a8a BackendRun + 410
> 14  postgres                            0x00000001058861c4 ServerLoop + 3044
> 15  postgres                            0x0000000105882fe9 PostmasterMain + 3769
> 16  postgres                            0x00000001057de3b0 help + 0
> 17  libdyld.dylib                       0x00007fff6725fcc9 start + 1
> pg_amcheck: error: query failed: server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.

I have just browsed through the patch and the idea is quite
interesting.  I think we can expand it to check that whether the flags
set in the infomask are sane or not w.r.t other flags and xid status.
Some examples are

- If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
should not be set in new_infomask2.
- If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
actually cross verify the transaction status from the CLOG and check
whether is matching the hint bit or not.

While browsing through the code I could not find that we are doing
this kind of check,  ignore if we are already checking this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
In reply to this post by Mark Dilger-5


> On May 11, 2020, at 10:21 AM, Mark Dilger <[hidden email]> wrote:
>
> <v5-0001-Adding-verify_heapam-and-pg_amcheck.patch>

Rebased with some whitespace fixes, but otherwise unmodified from v5.





Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v6-0001-Adding-verify_heapam-and-pg_amcheck.patch (224K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
In reply to this post by Dilip Kumar-2


> On Jun 11, 2020, at 9:14 AM, Dilip Kumar <[hidden email]> wrote:
>
> I have just browsed through the patch and the idea is quite
> interesting.  I think we can expand it to check that whether the flags
> set in the infomask are sane or not w.r.t other flags and xid status.
> Some examples are
>
> - If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
> should not be set in new_infomask2.
> - If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
> actually cross verify the transaction status from the CLOG and check
> whether is matching the hint bit or not.
>
> While browsing through the code I could not find that we are doing
> this kind of check,  ignore if we are already checking this.
Thanks for taking a look!

Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.  It doesn't violate any assertion in the backend, nor does it cause the code to crash.  (At least, I don't immediately see how it does either of those things.)  At first glance it appears invalid to have those bits both set simultaneously, but I'm hesitant to enforce that without good reason.  If it is a good thing to enforce, should we also change the backend code to Assert?

I integrated your idea into one of the regression tests.  It now sets these two bits in the header of one of the rows in a table.  The verify_heapam check output (which includes all detected corruptions) does not change, which verifies your observation that verify_heapam is not checking for this.  I've attached that as a patch to this email.  Note that this patch should be applied atop the v6 patch recently posted in another email.


 


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




WIP_dilip_kumar_idea.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Dilip Kumar-2
On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
<[hidden email]> wrote:

>
>
>
> > On Jun 11, 2020, at 9:14 AM, Dilip Kumar <[hidden email]> wrote:
> >
> > I have just browsed through the patch and the idea is quite
> > interesting.  I think we can expand it to check that whether the flags
> > set in the infomask are sane or not w.r.t other flags and xid status.
> > Some examples are
> >
> > - If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
> > should not be set in new_infomask2.
> > - If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
> > actually cross verify the transaction status from the CLOG and check
> > whether is matching the hint bit or not.
> >
> > While browsing through the code I could not find that we are doing
> > this kind of check,  ignore if we are already checking this.
>
> Thanks for taking a look!
>
> Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.

Ok

  It doesn't violate any assertion in the backend, nor does it cause
the code to crash.  (At least, I don't immediately see how it does
either of those things.)  At first glance it appears invalid to have
those bits both set simultaneously, but I'm hesitant to enforce that
without good reason.  If it is a good thing to enforce, should we also
change the backend code to Assert?

Yeah, it may not hit assert or crash but it could lead to a wrong
result.  But I agree that it could be an assertion in the backend
code.  What about the other check, like hint bit is saying the
transaction is committed but actually as per the clog the status is
something else.  I think in general processing it is hard to check
such things in backend no? because if the hint bit is set saying that
the transaction is committed then we will directly check its
visibility with the snapshot.  I think a corruption checker may be a
good tool for catching such anomalies.

> I integrated your idea into one of the regression tests.  It now sets these two bits in the header of one of the rows in a table.  The verify_heapam check output (which includes all detected corruptions) does not change, which verifies your observation that verifies _heapam is not checking for this.  I've attached that as a patch to this email.  Note that this patch should be applied atop the v6 patch recently posted in another email.

Ok.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Jun 11, 2020, at 11:35 PM, Dilip Kumar <[hidden email]> wrote:
>
> On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
> <[hidden email]> wrote:
>>
>>
>>
>>> On Jun 11, 2020, at 9:14 AM, Dilip Kumar <[hidden email]> wrote:
>>>
>>> I have just browsed through the patch and the idea is quite
>>> interesting.  I think we can expand it to check that whether the flags
>>> set in the infomask are sane or not w.r.t other flags and xid status.
>>> Some examples are
>>>
>>> - If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
>>> should not be set in new_infomask2.
>>> - If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
>>> actually cross verify the transaction status from the CLOG and check
>>> whether is matching the hint bit or not.
>>>
>>> While browsing through the code I could not find that we are doing
>>> this kind of check,  ignore if we are already checking this.
>>
>> Thanks for taking a look!
>>
>> Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.
>
> Ok
>
>
>>  It doesn't violate any assertion in the backend, nor does it cause
>> the code to crash.  (At least, I don't immediately see how it does
>> either of those things.)  At first glance it appears invalid to have
>> those bits both set simultaneously, but I'm hesitant to enforce that
>> without good reason.  If it is a good thing to enforce, should we also
>> change the backend code to Assert?
>
> Yeah, it may not hit assert or crash but it could lead to a wrong
> result.  But I agree that it could be an assertion in the backend
> code.  
For v7, I've added an assertion for this.  Per heap/README.tuplock, "We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set."  I added an assertion for that, too.  Both new assertions are in RelationPutHeapTuple().  I'm not sure if that is the best place to put the assertion, but I am confident that the assertion needs to only check tuples destined for disk, as in memory tuples can and do violate the assertion.

Also for v7, I've updated contrib/amcheck to report these two conditions as corruption.

> What about the other check, like hint bit is saying the
> transaction is committed but actually as per the clog the status is
> something else.  I think in general processing it is hard to check
> such things in backend no? because if the hint bit is set saying that
> the transaction is committed then we will directly check its
> visibility with the snapshot.  I think a corruption checker may be a
> good tool for catching such anomalies.

I already made some design changes to this patch to avoid taking the CLogTruncationLock too often.  I'm happy to incorporate this idea, but perhaps you could provide a design on how to do it without all the extra locking?  If not, I can try to get this into v8 as an optional check, so users can turn it on at their discretion.  Having the check enabled by default is probably a non-starter.





Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v7-0001-Adding-verify_heapam-and-pg_amcheck.patch (224K) Download Attachment
v7-0002-Adding-checks-of-invalid-combinations-of-hint-bit.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module (typos)

Erik Rijkers
On 2020-06-12 23:06, Mark Dilger wrote:

> [v7-0001-Adding-verify_heapam-and-pg_amcheck.patch]
> [v7-0002-Adding-checks-o...ations-of-hint-bit.patch]

I came across these typos in the sgml:

--exclude-scheam   should be
--exclude-schema

<option>table</option>     should be
<option>--table</option>


I found this connection problem (or perhaps it is as designed):

$ env | grep ^PG
PGPORT=6965
PGPASSFILE=/home/aardvark/.pg_aardvark
PGDATABASE=testdb
PGDATA=/home/aardvark/pg_stuff/pg_installations/pgsql.amcheck/data

-- just to show that psql is connecting (via $PGPASSFILE and $PGPORT and
$PGDATABASE):
-- and showing a  table t  that I made earlier

$ psql
SET
Timing is on.
psql (14devel_amcheck_0612_2f48)
Type "help" for help.

testdb=# \dt+ t
                            List of relations
  Schema | Name | Type  |  Owner   | Persistence |  Size  | Description
--------+------+-------+----------+-------------+--------+-------------
  public | t    | table | aardvark | permanent   | 346 MB |
(1 row)

testdb=# \q

I think this should work:

$ pg_amcheck -i -t t
pg_amcheck: error: no matching tables were found

It seems a bug that I have to add  '-d testdb':

This works OK:
pg_amcheck -i -t t -d testdb

Is that error as expected?


thanks,

Erik Rijkers


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module (typos)

Mark Dilger-5


> On Jun 13, 2020, at 2:13 PM, Erik Rijkers <[hidden email]> wrote:

Thanks for the review!

> On 2020-06-12 23:06, Mark Dilger wrote:
>
>> [v7-0001-Adding-verify_heapam-and-pg_amcheck.patch]
>> [v7-0002-Adding-checks-o...ations-of-hint-bit.patch]
>
> I came across these typos in the sgml:
>
> --exclude-scheam   should be
> --exclude-schema
>
> <option>table</option>     should be
> <option>--table</option>
Yeah, I agree and have made these changes for v8.

> I found this connection problem (or perhaps it is as designed):
>
> $ env | grep ^PG
> PGPORT=6965
> PGPASSFILE=/home/aardvark/.pg_aardvark
> PGDATABASE=testdb
> PGDATA=/home/aardvark/pg_stuff/pg_installations/pgsql.amcheck/data
>
> -- just to show that psql is connecting (via $PGPASSFILE and $PGPORT and $PGDATABASE):
> -- and showing a  table t  that I made earlier
>
> $ psql
> SET
> Timing is on.
> psql (14devel_amcheck_0612_2f48)
> Type "help" for help.
>
> testdb=# \dt+ t
>                           List of relations
> Schema | Name | Type  |  Owner   | Persistence |  Size  | Description
> --------+------+-------+----------+-------------+--------+-------------
> public | t    | table | aardvark | permanent   | 346 MB |
> (1 row)
>
> testdb=# \q
>
> I think this should work:
>
> $ pg_amcheck -i -t t
> pg_amcheck: error: no matching tables were found
>
> It seems a bug that I have to add  '-d testdb':
>
> This works OK:
> pg_amcheck -i -t t -d testdb
>
> Is that error as expected?
It was expected, but looking more broadly at other tools, your expectations seem to be more typical.  I've changed it in v8.  Thanks again for having a look at this patch!

Note that I've merge the two patches (v7-0001 and v7-0002) back into a single patch, since the separation introduced in v7 was only for illustration of changes in v7.





Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v8-0001-Adding-verify_heapam-and-pg_amcheck.patch (228K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Dilip Kumar-2
In reply to this post by Mark Dilger-5
On Sat, Jun 13, 2020 at 2:36 AM Mark Dilger
<[hidden email]> wrote:

>
>
>
> > On Jun 11, 2020, at 11:35 PM, Dilip Kumar <[hidden email]> wrote:
> >
> > On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
> > <[hidden email]> wrote:
> >>
> >>
> >>
> >>> On Jun 11, 2020, at 9:14 AM, Dilip Kumar <[hidden email]> wrote:
> >>>
> >>> I have just browsed through the patch and the idea is quite
> >>> interesting.  I think we can expand it to check that whether the flags
> >>> set in the infomask are sane or not w.r.t other flags and xid status.
> >>> Some examples are
> >>>
> >>> - If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
> >>> should not be set in new_infomask2.
> >>> - If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
> >>> actually cross verify the transaction status from the CLOG and check
> >>> whether is matching the hint bit or not.
> >>>
> >>> While browsing through the code I could not find that we are doing
> >>> this kind of check,  ignore if we are already checking this.
> >>
> >> Thanks for taking a look!
> >>
> >> Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.
> >
> > Ok
> >
> >
> >>  It doesn't violate any assertion in the backend, nor does it cause
> >> the code to crash.  (At least, I don't immediately see how it does
> >> either of those things.)  At first glance it appears invalid to have
> >> those bits both set simultaneously, but I'm hesitant to enforce that
> >> without good reason.  If it is a good thing to enforce, should we also
> >> change the backend code to Assert?
> >
> > Yeah, it may not hit assert or crash but it could lead to a wrong
> > result.  But I agree that it could be an assertion in the backend
> > code.
>
> For v7, I've added an assertion for this.  Per heap/README.tuplock, "We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set."  I added an assertion for that, too.  Both new assertions are in RelationPutHeapTuple().  I'm not sure if that is the best place to put the assertion, but I am confident that the assertion needs to only check tuples destined for disk, as in memory tuples can and do violate the assertion.
>
> Also for v7, I've updated contrib/amcheck to report these two conditions as corruption.
>
> > What about the other check, like hint bit is saying the
> > transaction is committed but actually as per the clog the status is
> > something else.  I think in general processing it is hard to check
> > such things in backend no? because if the hint bit is set saying that
> > the transaction is committed then we will directly check its
> > visibility with the snapshot.  I think a corruption checker may be a
> > good tool for catching such anomalies.
>
> I already made some design changes to this patch to avoid taking the CLogTruncationLock too often.  I'm happy to incorporate this idea, but perhaps you could provide a design on how to do it without all the extra locking?  If not, I can try to get this into v8 as an optional check, so users can turn it on at their discretion.  Having the check enabled by default is probably a non-starter.

Okay,  even I can't think a way to do it without an extra locking.

I have looked into 0001 patch and I have a few comments.

1.
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;

Isn't it a good idea to verify the Redirected Itemtid?  Because we
will still access the redirected item id to find the
actual tuple from the index scan.  Maybe not exactly at this level,
but we can verify that the link itemid store in that
is within the itemid range of the page or not.

2.

+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }

I think we can also check that if there is no NULL attributes (if
(!(t_infomask & HEAP_HASNULL)) then
ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.


3.
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;

So we are first setting ctx->offset to 0, then inside
check_tuple_attribute, we will keep updating the offset as we process
the attributes and after the loop is over we set ctx->offset to -1,  I
did not understand that why we need to reset it to -1, do we ever
check for that.  We don't even initialize the ctx->offset to -1 while
initializing the context for the tuple so I do not understand what is
the meaning of the random value -1.

4.
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }

I think the error message "toast chunk is neither short nor extended".
Because ideally, the toast chunk should not be further toasted.
So I think the check is correct, but the error message is not correct.

5.

+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+   ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
....
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+    "oldest valid xid %u ",
+    ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }

Don't we need to close the relation/toastrel/toastindexrel in such
return which is without an abort? IIRC, we
will get relcache leak WARNING on commit if we left them open in commit path.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Jun 21, 2020, at 2:54 AM, Dilip Kumar <[hidden email]> wrote:
>
> I have looked into 0001 patch and I have a few comments.
>
> 1.
> +
> + /* Skip over unused/dead/redirected line pointers */
> + if (!ItemIdIsUsed(ctx.itemid) ||
> + ItemIdIsDead(ctx.itemid) ||
> + ItemIdIsRedirected(ctx.itemid))
> + continue;
>
> Isn't it a good idea to verify the Redirected Itemtid?  Because we
> will still access the redirected item id to find the
> actual tuple from the index scan.  Maybe not exactly at this level,
> but we can verify that the link itemid store in that
> is within the itemid range of the page or not.
Good idea.  I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.

> 2.
>
> + /* Check for tuple header corruption */
> + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> + {
> + confess(ctx,
> + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
> + ctx->tuphdr->t_hoff,
> + (unsigned) SizeofHeapTupleHeader));
> + fatal = true;
> + }
>
> I think we can also check that if there is no NULL attributes (if
> (!(t_infomask & HEAP_HASNULL)) then
> ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
You have to take alignment padding into account, but otherwise yes, and I've added a check for that.

> 3.
> + ctx->offset = 0;
> + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> + {
> + if (!check_tuple_attribute(ctx))
> + break;
> + }
> + ctx->offset = -1;
> + ctx->attnum = -1;
>
> So we are first setting ctx->offset to 0, then inside
> check_tuple_attribute, we will keep updating the offset as we process
> the attributes and after the loop is over we set ctx->offset to -1,  I
> did not understand that why we need to reset it to -1, do we ever
> check for that.  We don't even initialize the ctx->offset to -1 while
> initializing the context for the tuple so I do not understand what is
> the meaning of the random value -1.
Ahh, right, those are left over from a previous design of the code.  Thanks for pointing them out.  They are now removed.

> 4.
> + if (!VARATT_IS_EXTENDED(chunk))
> + {
> + chunksize = VARSIZE(chunk) - VARHDRSZ;
> + chunkdata = VARDATA(chunk);
> + }
> + else if (VARATT_IS_SHORT(chunk))
> + {
> + /*
> + * could happen due to heap_form_tuple doing its thing
> + */
> + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
> + chunkdata = VARDATA_SHORT(chunk);
> + }
> + else
> + {
> + /* should never happen */
> + confess(ctx,
> + pstrdup("toast chunk is neither short nor extended"));
> + return;
> + }
>
> I think the error message "toast chunk is neither short nor extended".
> Because ideally, the toast chunk should not be further toasted.
> So I think the check is correct, but the error message is not correct.
I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".

> 5.
>
> + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
> + check_relation_relkind_and_relam(ctx.rel);
> +
> + /*
> + * Open the toast relation, if any, also protected from concurrent
> + * vacuums.
> + */
> + if (ctx.rel->rd_rel->reltoastrelid)
> + {
> + int offset;
> +
> + /* Main relation has associated toast relation */
> + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
> +   ShareUpdateExclusiveLock);
> + offset = toast_open_indexes(ctx.toastrel,
> ....
> + if (TransactionIdIsNormal(ctx.relfrozenxid) &&
> + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
> + {
> + confess(&ctx, psprintf("relfrozenxid %u precedes global "
> +    "oldest valid xid %u ",
> +    ctx.relfrozenxid, ctx.oldestValidXid));
> + PG_RETURN_NULL();
> + }
>
> Don't we need to close the relation/toastrel/toastindexrel in such
> return which is without an abort? IIRC, we
> will get relcache leak WARNING on commit if we left them open in commit path.
Ok, I've added logic to close them.

All changes inspired by your review are included in the v9-0001 patch.  The differences since v8 are pulled out into v9_diffs for easier review.






Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v9-0001-Adding-verify_heapam-and-pg_amcheck.patch (230K) Download Attachment
v9_diffs (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Dilip Kumar-2
On Mon, Jun 22, 2020 at 5:44 AM Mark Dilger
<[hidden email]> wrote:

>
>
>
> > On Jun 21, 2020, at 2:54 AM, Dilip Kumar <[hidden email]> wrote:
> >
> > I have looked into 0001 patch and I have a few comments.
> >
> > 1.
> > +
> > + /* Skip over unused/dead/redirected line pointers */
> > + if (!ItemIdIsUsed(ctx.itemid) ||
> > + ItemIdIsDead(ctx.itemid) ||
> > + ItemIdIsRedirected(ctx.itemid))
> > + continue;
> >
> > Isn't it a good idea to verify the Redirected Itemtid?  Because we
> > will still access the redirected item id to find the
> > actual tuple from the index scan.  Maybe not exactly at this level,
> > but we can verify that the link itemid store in that
> > is within the itemid range of the page or not.
>
> Good idea.  I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.
>
> > 2.
> >
> > + /* Check for tuple header corruption */
> > + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> > + {
> > + confess(ctx,
> > + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
> > + ctx->tuphdr->t_hoff,
> > + (unsigned) SizeofHeapTupleHeader));
> > + fatal = true;
> > + }
> >
> > I think we can also check that if there is no NULL attributes (if
> > (!(t_infomask & HEAP_HASNULL)) then
> > ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
>
> You have to take alignment padding into account, but otherwise yes, and I've added a check for that.
>
> > 3.
> > + ctx->offset = 0;
> > + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> > + {
> > + if (!check_tuple_attribute(ctx))
> > + break;
> > + }
> > + ctx->offset = -1;
> > + ctx->attnum = -1;
> >
> > So we are first setting ctx->offset to 0, then inside
> > check_tuple_attribute, we will keep updating the offset as we process
> > the attributes and after the loop is over we set ctx->offset to -1,  I
> > did not understand that why we need to reset it to -1, do we ever
> > check for that.  We don't even initialize the ctx->offset to -1 while
> > initializing the context for the tuple so I do not understand what is
> > the meaning of the random value -1.
>
> Ahh, right, those are left over from a previous design of the code.  Thanks for pointing them out.  They are now removed.
>
> > 4.
> > + if (!VARATT_IS_EXTENDED(chunk))
> > + {
> > + chunksize = VARSIZE(chunk) - VARHDRSZ;
> > + chunkdata = VARDATA(chunk);
> > + }
> > + else if (VARATT_IS_SHORT(chunk))
> > + {
> > + /*
> > + * could happen due to heap_form_tuple doing its thing
> > + */
> > + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
> > + chunkdata = VARDATA_SHORT(chunk);
> > + }
> > + else
> > + {
> > + /* should never happen */
> > + confess(ctx,
> > + pstrdup("toast chunk is neither short nor extended"));
> > + return;
> > + }
> >
> > I think the error message "toast chunk is neither short nor extended".
> > Because ideally, the toast chunk should not be further toasted.
> > So I think the check is correct, but the error message is not correct.
>
> I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".
>
> > 5.
> >
> > + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
> > + check_relation_relkind_and_relam(ctx.rel);
> > +
> > + /*
> > + * Open the toast relation, if any, also protected from concurrent
> > + * vacuums.
> > + */
> > + if (ctx.rel->rd_rel->reltoastrelid)
> > + {
> > + int offset;
> > +
> > + /* Main relation has associated toast relation */
> > + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
> > +   ShareUpdateExclusiveLock);
> > + offset = toast_open_indexes(ctx.toastrel,
> > ....
> > + if (TransactionIdIsNormal(ctx.relfrozenxid) &&
> > + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
> > + {
> > + confess(&ctx, psprintf("relfrozenxid %u precedes global "
> > +    "oldest valid xid %u ",
> > +    ctx.relfrozenxid, ctx.oldestValidXid));
> > + PG_RETURN_NULL();
> > + }
> >
> > Don't we need to close the relation/toastrel/toastindexrel in such
> > return which is without an abort? IIRC, we
> > will get relcache leak WARNING on commit if we left them open in commit path.
>
> Ok, I've added logic to close them.
>
> All changes inspired by your review are included in the v9-0001 patch.  The differences since v8 are pulled out into v9_diffs for easier review.

I have reviewed the changes in v9_diffs and looks fine to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Dilip Kumar-2
On Sun, Jun 28, 2020 at 8:59 PM Dilip Kumar <[hidden email]> wrote:

>
> On Mon, Jun 22, 2020 at 5:44 AM Mark Dilger
> <[hidden email]> wrote:
> >
> >
> >
> > > On Jun 21, 2020, at 2:54 AM, Dilip Kumar <[hidden email]> wrote:
> > >
> > > I have looked into 0001 patch and I have a few comments.
> > >
> > > 1.
> > > +
> > > + /* Skip over unused/dead/redirected line pointers */
> > > + if (!ItemIdIsUsed(ctx.itemid) ||
> > > + ItemIdIsDead(ctx.itemid) ||
> > > + ItemIdIsRedirected(ctx.itemid))
> > > + continue;
> > >
> > > Isn't it a good idea to verify the Redirected Itemtid?  Because we
> > > will still access the redirected item id to find the
> > > actual tuple from the index scan.  Maybe not exactly at this level,
> > > but we can verify that the link itemid store in that
> > > is within the itemid range of the page or not.
> >
> > Good idea.  I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.
> >
> > > 2.
> > >
> > > + /* Check for tuple header corruption */
> > > + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
> > > + {
> > > + confess(ctx,
> > > + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
> > > + ctx->tuphdr->t_hoff,
> > > + (unsigned) SizeofHeapTupleHeader));
> > > + fatal = true;
> > > + }
> > >
> > > I think we can also check that if there is no NULL attributes (if
> > > (!(t_infomask & HEAP_HASNULL)) then
> > > ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
> >
> > You have to take alignment padding into account, but otherwise yes, and I've added a check for that.
> >
> > > 3.
> > > + ctx->offset = 0;
> > > + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
> > > + {
> > > + if (!check_tuple_attribute(ctx))
> > > + break;
> > > + }
> > > + ctx->offset = -1;
> > > + ctx->attnum = -1;
> > >
> > > So we are first setting ctx->offset to 0, then inside
> > > check_tuple_attribute, we will keep updating the offset as we process
> > > the attributes and after the loop is over we set ctx->offset to -1,  I
> > > did not understand that why we need to reset it to -1, do we ever
> > > check for that.  We don't even initialize the ctx->offset to -1 while
> > > initializing the context for the tuple so I do not understand what is
> > > the meaning of the random value -1.
> >
> > Ahh, right, those are left over from a previous design of the code.  Thanks for pointing them out.  They are now removed.
> >
> > > 4.
> > > + if (!VARATT_IS_EXTENDED(chunk))
> > > + {
> > > + chunksize = VARSIZE(chunk) - VARHDRSZ;
> > > + chunkdata = VARDATA(chunk);
> > > + }
> > > + else if (VARATT_IS_SHORT(chunk))
> > > + {
> > > + /*
> > > + * could happen due to heap_form_tuple doing its thing
> > > + */
> > > + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
> > > + chunkdata = VARDATA_SHORT(chunk);
> > > + }
> > > + else
> > > + {
> > > + /* should never happen */
> > > + confess(ctx,
> > > + pstrdup("toast chunk is neither short nor extended"));
> > > + return;
> > > + }
> > >
> > > I think the error message "toast chunk is neither short nor extended".
> > > Because ideally, the toast chunk should not be further toasted.
> > > So I think the check is correct, but the error message is not correct.
> >
> > I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".
> >
> > > 5.
> > >
> > > + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
> > > + check_relation_relkind_and_relam(ctx.rel);
> > > +
> > > + /*
> > > + * Open the toast relation, if any, also protected from concurrent
> > > + * vacuums.
> > > + */
> > > + if (ctx.rel->rd_rel->reltoastrelid)
> > > + {
> > > + int offset;
> > > +
> > > + /* Main relation has associated toast relation */
> > > + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
> > > +   ShareUpdateExclusiveLock);
> > > + offset = toast_open_indexes(ctx.toastrel,
> > > ....
> > > + if (TransactionIdIsNormal(ctx.relfrozenxid) &&
> > > + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
> > > + {
> > > + confess(&ctx, psprintf("relfrozenxid %u precedes global "
> > > +    "oldest valid xid %u ",
> > > +    ctx.relfrozenxid, ctx.oldestValidXid));
> > > + PG_RETURN_NULL();
> > > + }
> > >
> > > Don't we need to close the relation/toastrel/toastindexrel in such
> > > return which is without an abort? IIRC, we
> > > will get relcache leak WARNING on commit if we left them open in commit path.
> >
> > Ok, I've added logic to close them.
> >
> > All changes inspired by your review are included in the v9-0001 patch.  The differences since v8 are pulled out into v9_diffs for easier review.
>
> I have reviewed the changes in v9_diffs and looks fine to me.

Some more comments on v9_0001.
1.
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
...
...
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+    &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);

I might be missing something, but it appears that first we are getting
the nextFullXid and after that, we are scanning the block by block.
So while we are scanning the block if the nextXid is advanced and it
has updated some tuple in the heap pages,  then it seems the current
logic will complain about out of range xid.  I did not test this
behavior so please point me to the logic which is protecting this.

2.
/*
 * Helper function to construct the TupleDesc needed by verify_heapam.
 */
static TupleDesc
verify_heapam_tupdesc(void)

From function name, it appeared that it is verifying tuple descriptor
but this is just creating the tuple descriptor.

3.
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+    &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }

Here, do we want to test that in VM the all visible bit is set whereas
on the page it is not set?  That can lead to a wrong result in an
index-only scan.

4. One cosmetic comment

+ /* Skip non-varlena values, but update offset first */
..
+
+ /* Ok, we're looking at a varlena attribute. */

Throughout the patch, I have noticed that some of your single-line
comments have "full stop" whereas other don't.  Can we keep them
consistent?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Jun 28, 2020, at 9:05 AM, Dilip Kumar <[hidden email]> wrote:
>
> Some more comments on v9_0001.
> 1.
> + LWLockAcquire(XidGenLock, LW_SHARED);
> + nextFullXid = ShmemVariableCache->nextFullXid;
> + ctx.oldestValidXid = ShmemVariableCache->oldestXid;
> + LWLockRelease(XidGenLock);
> + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
> ...
> ...
> +
> + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
> + {
> + int32 mapbits;
> + OffsetNumber maxoff;
> + PageHeader ph;
> +
> + /* Optionally skip over all-frozen or all-visible blocks */
> + if (skip_all_frozen || skip_all_visible)
> + {
> + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> +    &vmbuffer);
> + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> + continue;
> + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> + continue;
> + }
> +
> + /* Read and lock the next page. */
> + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
> + RBM_NORMAL, ctx.bstrategy);
> + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
>
> I might be missing something, but it appears that first we are getting
> the nextFullXid and after that, we are scanning the block by block.
> So while we are scanning the block if the nextXid is advanced and it
> has updated some tuple in the heap pages,  then it seems the current
> logic will complain about out of range xid.  I did not test this
> behavior so please point me to the logic which is protecting this.

We know the oldest valid Xid cannot advance, because we hold a lock that would prevent it from doing so.  We cannot know that the newest Xid will not advance, but when we see an Xid beyond the end of the known valid range, we check its validity, and either report it as a corruption or advance our idea of the newest valid Xid, depending on that check.  That logic is in TransactionIdValidInRel.

> 2.
> /*
> * Helper function to construct the TupleDesc needed by verify_heapam.
> */
> static TupleDesc
> verify_heapam_tupdesc(void)
>
> From function name, it appeared that it is verifying tuple descriptor
> but this is just creating the tuple descriptor.

In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns a set of records.  This is the tuple descriptor for that function.  I understand that the name can be parsed as verify_(heapam_tupdesc), but it is meant as (verify_heapam)_tupdesc.  Do you have a name you would prefer?

> 3.
> + /* Optionally skip over all-frozen or all-visible blocks */
> + if (skip_all_frozen || skip_all_visible)
> + {
> + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> +    &vmbuffer);
> + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> + continue;
> + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> + continue;
> + }
>
> Here, do we want to test that in VM the all visible bit is set whereas
> on the page it is not set?  That can lead to a wrong result in an
> index-only scan.

If the caller has specified that the corruption check should skip over all-frozen or all-visible data, then we cannot load the page that the VM claims is all-frozen or all-visible without defeating the purpose of the caller having specified these options.  Without loading the page, we cannot check the page's header bits.

When not skipping all-visible or all-frozen blocks, we might like to pin both the heap page and the visibility map page in order to compare the two, being careful not to hold a pin on the one while performing I/O on the other.  See for example the logic in heap_delete().  But I'm not sure what guarantees the system makes about agreement between these two bits.  Certainly, the VM should not claim a page is all visible when it isn't, but are we guaranteed that a page that is all-visible will always have its all-visible bit set?  I don't know if (possibly transient) disagreement between these two bits constitutes corruption.  Perhaps others following this thread can advise?

> 4. One cosmetic comment
>
> + /* Skip non-varlena values, but update offset first */
> ..
> +
> + /* Ok, we're looking at a varlena attribute. */
>
> Throughout the patch, I have noticed that some of your single-line
> comments have "full stop" whereas other don't.  Can we keep them
> consistent?

I try to use a "full stop" at the end of sentences, but not at the end of sentence fragments.  To me, a "full stop" means that a sentence has reached its conclusion.  I don't intentionally use one at the end of a fragment, unless the fragment precedes a full sentence, in which case the "full stop" is needed to separate the two.  Of course, I may have violated my own rule in a few places, but before I submit a v10 patch with comment punctuation changes, perhaps we can agree on what the rule is?  (This has probably been discussed before and agreed before.  A link to the appropriate email thread would be sufficient.)

For example:

        /* red, green, or blue */
        /* set to pink */
        /* set to blue.  We have not closed the file. */
        /* At this point, we have chosen the color. */

The first comment is not a sentence, but the fourth is.  The third comment is a fragment followed by a full sentence, and a "full stop" separates the two.  As for the second comment, as I recall, verb phrases can be interpreted as a full sentence, as in "Close the door!", when they are meant as commands to the listener, but not otherwise.  "set to pink" is not a command to the reader, but rather a description of what the code is doing at that point, so I think of it as a mere verb phrase and not a full sentence.

Making matters even more complicated, portions of the logic in verify_heapam were taken from sections of code that would ereport(), elog(), or Assert() on corruption, and when I took such code, I sometimes also took the comments in unmodified form.  That means that my normal commenting rules don't apply, as I'm not the comment author in such cases.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Alvaro Herrera-9
In reply to this post by Mark Dilger-5
I think there are two very large patches here.  One adds checking of
heapam tables to amcheck, and the other adds a binary that eases calling
amcheck from the command line.  I think these should be two separate
patches.

I don't know what to think of a module contrib/pg_amcheck.  I kinda lean
towards fitting it in src/bin/scripts rather than as a contrib module.
However, it seems a bit weird that it depends on a contrib module.
Maybe amcheck should not be a contrib module at all but rather a new
extension in src/extensions/ that is compiled and installed (in the
filesystem, not in databases) by default.

I strongly agree with hardening backend code so that all the crashes
that Mark has found can be repaired.  (We discussed this topic
before[1]: we'd repair all crashes when run with production code, not
all assertion crashes.)

[1] https://postgr.es/m/20200513221051.GA26592@...

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Jun 30, 2020, at 11:44 AM, Alvaro Herrera <[hidden email]> wrote:
>
> I think there are two very large patches here.  One adds checking of
> heapam tables to amcheck, and the other adds a binary that eases calling
> amcheck from the command line.  I think these should be two separate
> patches.

contrib/amcheck has pretty limited regression test coverage.  I wrote pg_amcheck in large part because the infrastructure I was writing for testing contrib/amcheck was starting to look like a stand-alone tool, so I made it one.  I can split contrib/pg_amcheck into a separate patch, but I would expect reviewers to use it to review contrib/amcheck  Say the word, and I'll resubmit as two separate patches.

> I don't know what to think of a module contrib/pg_amcheck.  I kinda lean
> towards fitting it in src/bin/scripts rather than as a contrib module.
> However, it seems a bit weird that it depends on a contrib module.

Agreed.

> Maybe amcheck should not be a contrib module at all but rather a new
> extension in src/extensions/ that is compiled and installed (in the
> filesystem, not in databases) by default.

Fine with me, but I'll have to see what others think about that.

> I strongly agree with hardening backend code so that all the crashes
> that Mark has found can be repaired.  (We discussed this topic
> before[1]: we'd repair all crashes when run with production code, not
> all assertion crashes.)

I'm guessing that hardening the backend would be a separate patch?  Or did you want that as part of this one?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Alvaro Herrera-9
On 2020-Jun-30, Mark Dilger wrote:

> I'm guessing that hardening the backend would be a separate patch?  Or
> did you want that as part of this one?

Lately, to me the foremost criterion to determine what is a separate
patch and what isn't is the way the commit message is structured.  If it
looks too much like a bullet list of unrelated things, that suggests
that the commit should be split into one commit per bullet point; of
course, there are counterexamples.  But when I have a commit message
that says "I do A, and I also do B because I need it for A", then it
makes more sense to do B first standalone and then A on top.  OTOH if
two things are done because they're heavily intermixed (e.g. commit
850196b610d2, bullet points galore), that suggests that one commit is a
decent approach.

Just my opinion, of course.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Dilip Kumar-2
In reply to this post by Mark Dilger-5
On Sun, Jun 28, 2020 at 11:18 PM Mark Dilger
<[hidden email]> wrote:

>
>
>
> > On Jun 28, 2020, at 9:05 AM, Dilip Kumar <[hidden email]> wrote:
> >
> > Some more comments on v9_0001.
> > 1.
> > + LWLockAcquire(XidGenLock, LW_SHARED);
> > + nextFullXid = ShmemVariableCache->nextFullXid;
> > + ctx.oldestValidXid = ShmemVariableCache->oldestXid;
> > + LWLockRelease(XidGenLock);
> > + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
> > ...
> > ...
> > +
> > + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
> > + {
> > + int32 mapbits;
> > + OffsetNumber maxoff;
> > + PageHeader ph;
> > +
> > + /* Optionally skip over all-frozen or all-visible blocks */
> > + if (skip_all_frozen || skip_all_visible)
> > + {
> > + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> > +    &vmbuffer);
> > + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> > + continue;
> > + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> > + continue;
> > + }
> > +
> > + /* Read and lock the next page. */
> > + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
> > + RBM_NORMAL, ctx.bstrategy);
> > + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
> >
> > I might be missing something, but it appears that first we are getting
> > the nextFullXid and after that, we are scanning the block by block.
> > So while we are scanning the block if the nextXid is advanced and it
> > has updated some tuple in the heap pages,  then it seems the current
> > logic will complain about out of range xid.  I did not test this
> > behavior so please point me to the logic which is protecting this.
>
> We know the oldest valid Xid cannot advance, because we hold a lock that would prevent it from doing so.  We cannot know that the newest Xid will not advance, but when we see an Xid beyond the end of the known valid range, we check its validity, and either report it as a corruption or advance our idea of the newest valid Xid, depending on that check.  That logic is in TransactionIdValidInRel.

That makes sense to me.

>
> > 2.
> > /*
> > * Helper function to construct the TupleDesc needed by verify_heapam.
> > */
> > static TupleDesc
> > verify_heapam_tupdesc(void)
> >
> > From function name, it appeared that it is verifying tuple descriptor
> > but this is just creating the tuple descriptor.
>
> In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns a set of records.  This is the tuple descriptor for that function.  I understand that the name can be parsed as verify_(heapam_tupdesc), but it is meant as (verify_heapam)_tupdesc.  Do you have a name you would prefer?

Not very particular, but if we have a name like
verify_heapam_get_tupdesc, But, just a suggestion so it's your choice
if you prefer the current name I have no objection.

>
> > 3.
> > + /* Optionally skip over all-frozen or all-visible blocks */
> > + if (skip_all_frozen || skip_all_visible)
> > + {
> > + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
> > +    &vmbuffer);
> > + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
> > + continue;
> > + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
> > + continue;
> > + }
> >
> > Here, do we want to test that in VM the all visible bit is set whereas
> > on the page it is not set?  That can lead to a wrong result in an
> > index-only scan.
>
> If the caller has specified that the corruption check should skip over all-frozen or all-visible data, then we cannot load the page that the VM claims is all-frozen or all-visible without defeating the purpose of the caller having specified these options.  Without loading the page, we cannot check the page's header bits.
>
> When not skipping all-visible or all-frozen blocks, we might like to pin both the heap page and the visibility map page in order to compare the two, being careful not to hold a pin on the one while performing I/O on the other.  See for example the logic in heap_delete().  But I'm not sure what guarantees the system makes about agreement between these two bits.  Certainly, the VM should not claim a page is all visible when it isn't, but are we guaranteed that a page that is all-visible will always have its all-visible bit set?  I don't know if (possibly transient) disagreement between these two bits constitutes corruption.  Perhaps others following this thread can advise?

Right, the VM should not claim its all visible when it actually not.
But, IIRC, it is not guaranteed that if the page is all visible then
the VM must set the all visible flag.

> > 4. One cosmetic comment
> >
> > + /* Skip non-varlena values, but update offset first */
> > ..
> > +
> > + /* Ok, we're looking at a varlena attribute. */
> >
> > Throughout the patch, I have noticed that some of your single-line
> > comments have "full stop" whereas other don't.  Can we keep them
> > consistent?
>
> I try to use a "full stop" at the end of sentences, but not at the end of sentence fragments.  To me, a "full stop" means that a sentence has reached its conclusion.  I don't intentionally use one at the end of a fragment, unless the fragment precedes a full sentence, in which case the "full stop" is needed to separate the two.  Of course, I may have violated my own rule in a few places, but before I submit a v10 patch with comment punctuation changes, perhaps we can agree on what the rule is?  (This has probably been discussed before and agreed before.  A link to the appropriate email thread would be sufficient.)

I can see in different files we have followed different rules.  I am
fine as far as those are consistent across the file.

> For example:
>
>         /* red, green, or blue */
>         /* set to pink */
>         /* set to blue.  We have not closed the file. */
>         /* At this point, we have chosen the color. */
>
> The first comment is not a sentence, but the fourth is.  The third comment is a fragment followed by a full sentence, and a "full stop" separates the two.  As for the second comment, as I recall, verb phrases can be interpreted as a full sentence, as in "Close the door!", when they are meant as commands to the listener, but not otherwise.  "set to pink" is not a command to the reader, but rather a description of what the code is doing at that point, so I think of it as a mere verb phrase and not a full sentence.

> Making matters even more complicated, portions of the logic in verify_heapam were taken from sections of code that would ereport(), elog(), or Assert() on corruption, and when I took such code, I sometimes also took the comments in unmodified form.  That means that my normal commenting rules don't apply, as I'm not the comment author in such cases.

I agree.

A few more comments.
1.

+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
....
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {


So first we are checking that if the varatt is not
VARATT_IS_EXTERNAL_ONDISK then we are returning,  but just a
few statements down we are checking if the varatt is
VARATT_IS_EXTERNAL_INDIRECT, so seems like unreachable code.

2. Another point related to the same code is that toast_save_datum
always set the VARTAG_ONDISK tag.  IIUC, we use
VARTAG_INDIRECT in reorderbuffer for generating temp tuple so ideally
while scanning the heap we should never get
VARATT_IS_EXTERNAL_INDIRECT tuple.  Am I missing something here?

3.
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+   "attribute #%u at t_hoff = %u, "
+   "offset = %u",
+   va_tag, ctx->attnum,
+   ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }

+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }

First, we are checking that if VARATT_IS_1B_E and if so we will check
whether its tag is VARTAG_ONDISK or not.  But just after that, we will
get the actual attribute pointer and
Again check the same thing with 2 different checks.  Can you explain
why this is necessary?

4.
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }

Maybe we can further expand these checks,  like if the tuple is
HEAP_XMAX_LOCK_ONLY then HEAP_UPDATED or HEAP_HOT_UPDATED should not
be set.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Jul 4, 2020, at 6:04 AM, Dilip Kumar <[hidden email]> wrote:
>
> A few more comments.

Your comments all pertain to function check_tuple_attribute(), which follows the logic of heap_deform_tuple() and  detoast_external_attr().  The idea is that any error that could result in an assertion or crash in those functions should be checked carefully by check_tuple_attribute(), and checked *before* any such asserts or crashes might be triggered.

I obviously did not explain this thinking in the function comment.  That is rectified in the v10 patch, attached.

> 1.
>
> + if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> + {
> + confess(ctx,
> + pstrdup("attribute is external but not marked as on disk"));
> + return true;
> + }
> +
> ....
> +
> + /*
> + * Must dereference indirect toast pointers before we can check them
> + */
> + if (VARATT_IS_EXTERNAL_INDIRECT(attr))
> + {
>
>
> So first we are checking that if the varatt is not
> VARATT_IS_EXTERNAL_ONDISK then we are returning,  but just a
> few statements down we are checking if the varatt is
> VARATT_IS_EXTERNAL_INDIRECT, so seems like unreachable code.
True.  I've removed the VARATT_IS_EXTERNAL_INDIRECT check.


> 2. Another point related to the same code is that toast_save_datum
> always set the VARTAG_ONDISK tag.  IIUC, we use
> VARTAG_INDIRECT in reorderbuffer for generating temp tuple so ideally
> while scanning the heap we should never get
> VARATT_IS_EXTERNAL_INDIRECT tuple.  Am I missing something here?

I think you are right that we cannot get a VARATT_IS_EXTERNAL_INDIRECT tuple. That check is removed in v10.

> 3.
> + if (VARATT_IS_1B_E(tp + ctx->offset))
> + {
> + uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
> +
> + if (va_tag != VARTAG_ONDISK)
> + {
> + confess(ctx, psprintf("unexpected TOAST vartag %u for "
> +   "attribute #%u at t_hoff = %u, "
> +   "offset = %u",
> +   va_tag, ctx->attnum,
> +   ctx->tuphdr->t_hoff, ctx->offset));
> + return false; /* We can't know where the next attribute
> + * begins */
> + }
> + }
>
> + /* Skip values that are not external */
> + if (!VARATT_IS_EXTERNAL(attr))
> + return true;
> +
> + /* It is external, and we're looking at a page on disk */
> + if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> + {
> + confess(ctx,
> + pstrdup("attribute is external but not marked as on disk"));
> + return true;
> + }
>
> First, we are checking that if VARATT_IS_1B_E and if so we will check
> whether its tag is VARTAG_ONDISK or not.  But just after that, we will
> get the actual attribute pointer and
> Again check the same thing with 2 different checks.  Can you explain
> why this is necessary?
The code that calls check_tuple_attribute() expects it to check the current attribute, but also to safely advance the ctx->offset value to the next attribute, as the caller is iterating over all attributes.  The first check verifies that it is safe to call att_addlength_pointer, as we must not call att_addlength_pointer on a corrupt datum.  The second check simply returns on non-external attributes, having advanced ctx->offset, there is nothing left to do.  The third check is validating the external attribute, now that we know that it is external.  You are right that the third check cannot fail, as the first check would already have confess()ed and returned false.  The third check is removed in v10, attached.

> 4.
> + if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
> + (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
> + {
> + confess(ctx,
> + psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
> + }
> + if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
> + (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
> + {
> + confess(ctx,
> + psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
> + }
>
> Maybe we can further expand these checks,  like if the tuple is
> HEAP_XMAX_LOCK_ONLY then HEAP_UPDATED or HEAP_HOT_UPDATED should not
> be set.
Adding Asserts in src/backend/access/heap/hio.c against those two conditions, the regression tests fail in quite a lot of places where HEAP_XMAX_LOCK_ONLY and HEAP_UPDATED are both true.  I'm leaving this idea out for v10, since it doesn't work, but in case you want to tell me what I did wrong, here are the changed I made on top of v10:

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 00de10b7c9..76d23e141a 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -57,6 +57,10 @@ RelationPutHeapTuple(Relation relation,
                         (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
        Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
                         (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+       Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+                        (tuple->t_data->t_infomask & HEAP_UPDATED)));
+       Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+                        (tuple->t_data->t_infomask2 & HEAP_HOT_UPDATED)));

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 49d3d5618a..60e4ad5be0 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -969,12 +969,19 @@ check_tuple(HeapCheckContext * ctx)
                                                          ctx->tuphdr->t_hoff));
                fatal = true;
        }
-       if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
-               (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+       if (ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY)
        {
-               confess(ctx,
-                       psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+               if (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED)
+                       confess(ctx,
+                               psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+               if (ctx->tuphdr->t_infomask & HEAP_UPDATED)
+                       confess(ctx,
+                               psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_UPDATED both set"));
+               if (ctx->tuphdr->t_infomask2 & HEAP_HOT_UPDATED)
+                       confess(ctx,
+                               psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_HOT_UPDATED both set"));
        }
+
        if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
                (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
        {


The v10 patch without these ideas is here:






Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v10-0001-Adding-verify_heapam-and-pg_amcheck.patch (230K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
On Mon, Jul 6, 2020 at 2:06 PM Mark Dilger <[hidden email]> wrote:
> The v10 patch without these ideas is here:

Along the lines of what Alvaro was saying before, I think this
definitely needs to be split up into a series of patches. The commit
message for v10 describes it doing three pretty separate things, and I
think that argues for splitting it into a series of three patches. I'd
argue for this ordering:

0001 Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it.  This is
used by the new pg_amcheck command line tool for reporting back to
the caller.

0002 Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.

0003 Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.

It's too hard to review things like this when it's all mixed together.

+++ b/contrib/amcheck/t/skipping.pl

The name of this file is inconsistent with the tree's usual
convention, which is all stuff like 001_whatever.pl, except for
src/test/modules/brin, which randomly decided to use two digits
instead of three. There's no precedent for a test file with no leading
numeric digits. Also, what does "skipping" even have to do with what
the test is checking? Maybe it's intended to refer to the new error
handling "skipping" the actual error in favor of just reporting it
without stopping, but that's not really what the word "skipping"
normally means. Finally, it seems a bit over-engineered: do we really
need 183 test cases to check that detecting a problem doesn't lead to
an abort? Like, if that's the purpose of the test, I'd expect it to
check one corrupt relation and one non-corrupt relation, each with and
without the no-error behavior. And that's about it. Or maybe it's
talking about skipping pages during the checks, because those pages
are all-visible or all-frozen? It's not very clear to me what's going
on here.

+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;

Please add explanatory comments indicating what these are intended to
mean. For most of the the structure members, the brief comments
already present seem sufficient; but here, more explanation looks
necessary and less is provided. The "Values for returning tuples"
could possibly also use some more detail.

+#define HEAPCHECK_RELATION_COLS 8

I think this should really be at the top of the file someplace.
Sometimes people have adopted this style when the #define is only used
within the function that contains it, but that's not the case here.

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));

I think it would be better if we had three string values selecting the
different behaviors, and made the parameter NOT NULL but with a
default. It seems like that would be easier to understand. Right now,
I can tell that my options for what to skip are "all visible", "all
frozen", and, uh, some other thing that I don't know what it is. I'm
gonna guess the third option is to skip nothing, but it seems best to
make that explicit. Also, should we maybe consider spelling this
'all-visible' and 'all-frozen' with dashes, instead of using spaces?
Spaces in an option value seems a little icky to me somehow.

+ int64 startblock = -1;
+ int64 endblock = -1;
...
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
...
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)

So, the user can specify a negative value explicitly and it will be
treated as the default, and an endblock value that's larger than the
relation size will be treated as the relation size. The way pg_prewarm
does the corresponding checks seems superior: null indicates the
default value, and any non-null value must be within range or you get
an error. Also, you seem to be treating endblock as the first block
that should not be checked, whereas pg_prewarm takes what seems to me
to be the more natural interpretation: the end block is the last block
that IS checked. If you do it this way, then someone who specifies the
same start and end block will check no blocks -- silently, I think.

+               if (skip_all_frozen || skip_all_visible)

Since you can't skip all frozen without skipping all visible, this
test could be simplified. Or you could introduce a three-valued enum
and test that skip_pages != SKIP_PAGES_NONE, which might be even
better.

+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);

I don't understand this assertion, and I don't understand the comment,
either. I think ctx.blkno can never be equal to InvalidBlockNumber
because we never set it to anything outside the range of 0..(endblock
- 1), and I think ctx.buffer must always be unequal to InvalidBuffer
because we just initialized it by calling ReadBufferExtended(). So I
think this assertion would still pass if we wrote && rather than ||.
But even then, I don't know what that has to do with the comment or
why it even makes sense to have an assertion for that in the first
place.

+       /*
+        * Open the relation.  We use ShareUpdateExclusive to prevent concurrent
+        * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+        * global oldestXid to be newer than those.  This protection
saves us from
+        * having to reacquire the locks and recheck those minimums for every
+        * tuple, which would be expensive.
+        */
+       ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);

I don't think we'd need to recheck for every tuple, would we? Just for
cases where there's an apparent violation of the rules. I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.

It's also unclear to me why this concerns itself with relfrozenxid and
the cluster-wide oldestXid value but not with datfrozenxid. It seems
like if we're going to sanity-check the relfrozenxid against the
cluster-wide value, we ought to also check it against the
database-wide value. Checking neither would also seem like a plausible
choice. But it seems very strange to only check against the
cluster-wide value.

+               StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+                                                "InvalidOffsetNumber
increments to FirstOffsetNumber");

If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.

+                       if (ItemIdIsRedirected(ctx.itemid))
+                       {
+                               uint16 redirect = ItemIdGetRedirect(ctx.itemid);
+                               if (redirect <= SizeOfPageHeaderData
|| redirect >= ph->pd_lower)
...
+                               if ((redirect - SizeOfPageHeaderData)
% sizeof(uint16))

I think that ItemIdGetRedirect() returns an offset, not a byte
position. So the expectation that I would have is that it would be any
integer >= 0 and <= maxoff. Am I confused? BTW, it seems like it might
be good to complain if the item to which it points is LP_UNUSED...
AFAIK that shouldn't happen.

+                                errmsg("\"%s\" is not a heap AM",

I think the correct wording would be just "is not a heap." The "heap
AM" is the thing in pg_am, not a specific table.

+confess(HeapCheckContext * ctx, char *msg)
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)

This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.


+       /*
+        * In principle, there is nothing to prevent a scan over a large, highly
+        * corrupted table from using workmem worth of memory building up the
+        * tuplestore.  Don't leak the msg argument memory.
+        */
+       pfree(msg);

Maybe change the second sentence to something like: "That should be
OK, else the user can lower work_mem, but we'd better not leak any
additional memory."

+/*
+ * check_tuphdr_xids
+ *
+ *     Determine whether tuples are visible for verification.  Similar to
+ *  HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ *  1) Does not touch hint bits.  It seems imprudent to write hint bits
+ *     to a table during a corruption check.
+ *  2) Only makes a boolean determination of whether verification should
+ *     see the tuple, rather than doing extra work for vacuum-related
+ *     categorization.
+ *
+ *  The caller should already have checked that xmin and xmax are not out of
+ *  bounds for the relation.
+ */

First, check_tuphdr_xids() doesn't seem like a very good name. If you
have a function with that name and, like this one, it returns Boolean,
what does true mean? What does false mean? Kinda hard to tell. And
also, check the tuple header XIDs *for what*? If you called it, say,
tuple_is_visible(), that would be self-evident.

Second, consider that we hold at least AccessShareLock on the relation
- actually, ATM we hold ShareUpdateExclusiveLock. Either way, there
cannot be a concurrent modification to the tuple descriptor in
progress. Therefore, I think that only a HEAPTUPLE_DEAD tuple is
potentially using a non-current schema. If the tuple is
HEAPTUPLE_INSERT_IN_PROGRESS, there's either no ADD COLUMN in the
inserting transaction, or that transaction committed before we got our
lock. Similarly if it's HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD, the original inserter must've committed
before we got our lock. Or if it's both inserted and deleted in the
same transaction, say, then that transaction committed before we got
our lock or else contains no relevant DDL. IOW, I think you can check
everything but dead tuples here.

Capitalization and punctuation for messages complaining about problems
need to be consistent. verify_heapam() has "Invalid redirect line
pointer offset %u out of bounds" which starts with a capital letter,
but check_tuphdr_xids() has "heap tuple with XMAX_IS_MULTI is neither
LOCKED_ONLY nor has a valid xmax" which does not. I vote for lower
case, but in any event it should be the same. Also,
check_tuphdr_xids() has "tuple xvac = %u invalid" which is either a
debugging leftover or a very unclear complaint. I think some real work
needs to be put into the phrasing of these messages so that it's more
clear exactly what is going on and why it's bad. For example the first
example in this paragraph is clearly a problem of some kind, but it's
not very clear exactly what is happening: is %u the offset of the
invalid line redirect or the value to which it points? I don't think
the phrasing is very grammatical, which makes it hard to tell which is
meant, and I actually think it would be a good idea to include both
things.

Project policy is generally against splitting a string across multiple
lines to fit within 80 characters. We like to fit within 80
characters, but we like to be able to grep for strings more, and
breaking them up like this makes that harder.

+               confess(ctx,
+                               pstrdup("corrupt toast chunk va_header"));

This is another message that I don't think is very clear. There's two
elements to that. One is that the phrasing is not very good, and the
other is that there are no % escapes. What's somebody going to do when
they see this message? First, they're probably going to have to look
at the code to figure out in which circumstances it gets generated;
that's a sign that the message isn't phrased clearly enough. That will
tell them that an unexpected bit pattern has been found, but not what
that unexpected bit pattern actually was. So then, they're going to
have to try to find the relevant va_header by some other means and
fish out the relevant bit so that they can see what actually went
wrong.

+ *   Checks the current attribute as tracked in ctx for corruption.  Records
+ *   any corruption found in ctx->corruption.
+ *
+ *

Extra blank line.

+       Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+
                   ctx->attnum);

Maybe you could avoid the line wrap by declaring this without
initializing it, and then initializing it as a separate statement.

+               confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+
ctx->tuphdr->t_hoff, ctx->offset,
+                                                         ctx->lp_len));

Uggh! This isn't even remotely an English sentence. I don't think
formulas are the way to go here, but I like the idea of formulas in
some places and written-out messages in others even less. I guess the
complaint here in English is something like "tuple attribute %d should
start at offset %u, but tuple length is only %u" or something of that
sort. Also, it seems like this complaint really ought to have been
reported on the *preceding* loop iteration, either complaining that
(1) the fixed length attribute is more than the number of remaining
bytes in the tuple or (2) the varlena header for the tuple specifies
an excessively high length. It seems like you're blaming the wrong
attribute for the problem.

BTW, the header comments for this function (check_tuple_attribute)
neglect to document the meaning of the return value.

+                       confess(ctx, psprintf("tuple xmax = %u
precedes relation "
+
"relfrozenxid = %u",

This is another example of these messages needing  work. The
corresponding message from heap_prepare_freeze_tuple() is "found
update xid %u from before relfrozenxid %u". That's better, because we
don't normally include equals signs in our messages like this, and
also because "relation relfrozenxid" is redundant. I think this should
say something like "tuple xmax %u precedes relfrozenxid %u".

+                       confess(ctx, psprintf("tuple xmax = %u is in
the future",
+                                                                 xmax));

And then this could be something like "tuple xmax %u follows
last-assigned xid %u". That would be more symmetric and more
informative.

+               if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) >
ctx->tuphdr->t_hoff)

I think we should be able to predict the exact value of t_hoff and
complain if it isn't precisely equal to the expected value. Or is that
not possible for some reason?

Is there some place that's checking that lp_len >=
SizeOfHeapTupleHeader before check_tuple() goes and starts poking into
the header? If not, there should be.

+$node->command_ok(

+       [
+               'pg_amcheck', '-p', $port, 'postgres'
+       ],
+       'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+       [
+               'pg_amcheck', '-i', '-p', $port, 'postgres'
+       ],
+       'pg_amcheck all schemas, tables and indexes');

I haven't really looked through the btree-checking and pg_amcheck
parts of this much yet, but this caught my eye. Why would the default
be to check tables but not indexes? I think the default ought to be to
check everything we know how to check.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Michael Paquier-2
In reply to this post by Tom Lane-2
On Thu, May 14, 2020 at 03:50:52PM -0400, Tom Lane wrote:
> I think there's definitely value in corrupting data in some predictable
> (reproducible) way and verifying that the check code catches it and
> responds as expected.  Sure, this will not be 100% coverage, but it'll be
> a lot better than 0% coverage.

Skimming quickly through the patch, that's what is done in a way
similar to pg_checksums's 002_actions.pl.  So it seems fine to me to
use something like that for some basic coverage.  We may want to
refactor the test APIs to unify all that though.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
In reply to this post by Robert Haas


> On Jul 16, 2020, at 12:38 PM, Robert Haas <[hidden email]> wrote:
>
> On Mon, Jul 6, 2020 at 2:06 PM Mark Dilger <[hidden email]> wrote:
>> The v10 patch without these ideas is here:
>
> Along the lines of what Alvaro was saying before, I think this
> definitely needs to be split up into a series of patches. The commit
> message for v10 describes it doing three pretty separate things, and I
> think that argues for splitting it into a series of three patches. I'd
> argue for this ordering:
>
> 0001 Refactoring existing amcheck btree checking functions to optionally
> return corruption information rather than ereport'ing it.  This is
> used by the new pg_amcheck command line tool for reporting back to
> the caller.
>
> 0002 Adding new function verify_heapam for checking a heap relation and
> associated toast relation, if any, to contrib/amcheck.
>
> 0003 Adding new contrib module pg_amcheck, which is a command line
> interface for running amcheck's verifications against tables and
> indexes.
>
> It's too hard to review things like this when it's all mixed together.
The v11 patch series is broken up as you suggest.

> +++ b/contrib/amcheck/t/skipping.pl
>
> The name of this file is inconsistent with the tree's usual
> convention, which is all stuff like 001_whatever.pl, except for
> src/test/modules/brin, which randomly decided to use two digits
> instead of three. There's no precedent for a test file with no leading
> numeric digits. Also, what does "skipping" even have to do with what
> the test is checking? Maybe it's intended to refer to the new error
> handling "skipping" the actual error in favor of just reporting it
> without stopping, but that's not really what the word "skipping"
> normally means. Finally, it seems a bit over-engineered: do we really
> need 183 test cases to check that detecting a problem doesn't lead to
> an abort? Like, if that's the purpose of the test, I'd expect it to
> check one corrupt relation and one non-corrupt relation, each with and
> without the no-error behavior. And that's about it. Or maybe it's
> talking about skipping pages during the checks, because those pages
> are all-visible or all-frozen? It's not very clear to me what's going
> on here.
The "skipping" did originally refer to testing verify_heapam()'s option to skip all-visible or all-frozen blocks.  I have renamed it 001_verify_heapam.pl, since it tests that function.

>
> + TransactionId nextKnownValidXid;
> + TransactionId oldestValidXid;
>
> Please add explanatory comments indicating what these are intended to
> mean.

Done.

> For most of the the structure members, the brief comments
> already present seem sufficient; but here, more explanation looks
> necessary and less is provided. The "Values for returning tuples"
> could possibly also use some more detail.

Ok, I've expanded the comments for these.

> +#define HEAPCHECK_RELATION_COLS 8
>
> I think this should really be at the top of the file someplace.
> Sometimes people have adopted this style when the #define is only used
> within the function that contains it, but that's not the case here.

Done.

>
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("unrecognized parameter for 'skip': %s", skip),
> + errhint("please choose from 'all visible', 'all frozen', "
> + "or NULL")));
>
> I think it would be better if we had three string values selecting the
> different behaviors, and made the parameter NOT NULL but with a
> default. It seems like that would be easier to understand. Right now,
> I can tell that my options for what to skip are "all visible", "all
> frozen", and, uh, some other thing that I don't know what it is. I'm
> gonna guess the third option is to skip nothing, but it seems best to
> make that explicit. Also, should we maybe consider spelling this
> 'all-visible' and 'all-frozen' with dashes, instead of using spaces?
> Spaces in an option value seems a little icky to me somehow.
I've made the options 'all-visible', 'all-frozen', and 'none'.  It defaults to 'none'.  I did not mark the function as strict, as I think NULL is a reasonable value (and the default) for startblock and endblock.  

> + int64 startblock = -1;
> + int64 endblock = -1;
> ...
> + if (!PG_ARGISNULL(3))
> + startblock = PG_GETARG_INT64(3);
> + if (!PG_ARGISNULL(4))
> + endblock = PG_GETARG_INT64(4);
> ...
> + if (startblock < 0)
> + startblock = 0;
> + if (endblock < 0 || endblock > ctx.nblocks)
> + endblock = ctx.nblocks;
> +
> + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
>
> So, the user can specify a negative value explicitly and it will be
> treated as the default, and an endblock value that's larger than the
> relation size will be treated as the relation size. The way pg_prewarm
> does the corresponding checks seems superior: null indicates the
> default value, and any non-null value must be within range or you get
> an error. Also, you seem to be treating endblock as the first block
> that should not be checked, whereas pg_prewarm takes what seems to me
> to be the more natural interpretation: the end block is the last block
> that IS checked. If you do it this way, then someone who specifies the
> same start and end block will check no blocks -- silently, I think.
Under that regime, for relations with one block of data, (startblock=0, endblock=0) means "check the zero'th block", and for relations with no blocks of data, specifying any non-null (startblock,endblock) pair raises an exception.  I don't like that too much, but I'm happy to defer to precedent.  Since you say pg_prewarm works this way (I did not check), I have changed verify_heapam to do likewise.

> +               if (skip_all_frozen || skip_all_visible)
>
> Since you can't skip all frozen without skipping all visible, this
> test could be simplified. Or you could introduce a three-valued enum
> and test that skip_pages != SKIP_PAGES_NONE, which might be even
> better.

It works now with a three-valued enum.

> + /* We must unlock the page from the prior iteration, if any */
> + Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
>
> I don't understand this assertion, and I don't understand the comment,
> either. I think ctx.blkno can never be equal to InvalidBlockNumber
> because we never set it to anything outside the range of 0..(endblock
> - 1), and I think ctx.buffer must always be unequal to InvalidBuffer
> because we just initialized it by calling ReadBufferExtended(). So I
> think this assertion would still pass if we wrote && rather than ||.
> But even then, I don't know what that has to do with the comment or
> why it even makes sense to have an assertion for that in the first
> place.
Yes, it is vestigial.  Removed.

> +       /*
> +        * Open the relation.  We use ShareUpdateExclusive to prevent concurrent
> +        * vacuums from changing the relfrozenxid, relminmxid, or advancing the
> +        * global oldestXid to be newer than those.  This protection
> saves us from
> +        * having to reacquire the locks and recheck those minimums for every
> +        * tuple, which would be expensive.
> +        */
> +       ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
>
> I don't think we'd need to recheck for every tuple, would we? Just for
> cases where there's an apparent violation of the rules.
It's a bit fuzzy what an "apparent violation" might be if both ends of the range of valid xids may be moving, and arbitrarily much.  It's also not clear how often to recheck, since you'd be dealing with a race condition no matter how often you check.  Perhaps the comments shouldn't mention how often you'd have to recheck, since there is no really defensible choice for that.  I removed the offending sentence.

> I guess that
> could still be expensive if there's a lot of them, but needing
> ShareUpdateExclusiveLock rather than only AccessShareLock is a little
> unfortunate.

I welcome strategies that would allow for taking a lesser lock.

> It's also unclear to me why this concerns itself with relfrozenxid and
> the cluster-wide oldestXid value but not with datfrozenxid. It seems
> like if we're going to sanity-check the relfrozenxid against the
> cluster-wide value, we ought to also check it against the
> database-wide value. Checking neither would also seem like a plausible
> choice. But it seems very strange to only check against the
> cluster-wide value.

If the relation has a normal relfrozenxid, then the oldest valid xid we can encounter in the table is relfrozenxid.  Otherwise, each row needs to be compared against some other minimum xid value.

Logically, that other minimum xid value should be the oldest valid xid for the database, which must logically be at least as old as any valid row in the table and no older than the oldest valid xid for the cluster.

Unfortunately, if the comments in commands/vacuum.c circa line 1572 can be believed, and if I am reading them correctly, the stored value for the oldest valid xid in the database has been known to be corrupted by bugs in pg_upgrade.  This is awful.  If I compare the xid of a row in a table against the oldest xid value for the database, and the xid of the row is older, what can I do?  I don't have a principled basis for determining which one of them is wrong.  

The logic in verify_heapam is conservative; it makes no guarantees about finding and reporting all corruption, but if it does report a row as corrupt, you can bank on that, bugs in verify_heapam itself not withstanding.  I think this is a good choice; a tool with only false negatives is much more useful than one with both false positives and false negatives.  

I have added a comment about my reasoning to verify_heapam.c.  I'm happy to be convinced of a better strategy for handling this situation.

>
> +               StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
> +                                                "InvalidOffsetNumber
> increments to FirstOffsetNumber");
>
> If you are going to rely on this property, I agree that it is good to
> check it. But it would be better to NOT rely on this property, and I
> suspect the code can be written quite cleanly without relying on it.
> And actually, that's what you did, because you first set ctx.offnum =
> InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
> the loop initializer. So AFAICS the first initializer, and the static
> assert, are pointless.
Ah, right you are.  Removed.

>
> +                       if (ItemIdIsRedirected(ctx.itemid))
> +                       {
> +                               uint16 redirect = ItemIdGetRedirect(ctx.itemid);
> +                               if (redirect <= SizeOfPageHeaderData
> || redirect >= ph->pd_lower)
> ...
> +                               if ((redirect - SizeOfPageHeaderData)
> % sizeof(uint16))
>
> I think that ItemIdGetRedirect() returns an offset, not a byte
> position. So the expectation that I would have is that it would be any
> integer >= 0 and <= maxoff. Am I confused?
I think you are right about it returning an offset, which should be between FirstOffsetNumber and maxoff, inclusive.  I have updated the checks.

> BTW, it seems like it might
> be good to complain if the item to which it points is LP_UNUSED...
> AFAIK that shouldn't happen.

Thanks for mentioning that.  It now checks for that.

> +                                errmsg("\"%s\" is not a heap AM",
>
> I think the correct wording would be just "is not a heap." The "heap
> AM" is the thing in pg_am, not a specific table.

Fixed.

> +confess(HeapCheckContext * ctx, char *msg)
> +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
> +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
>
> This is what happens when you pgindent without adding all the right
> things to typedefs.list first ... or when you don't pgindent and have
> odd ideas about how to indent things.

Hmm.  I don't see the three lines of code you are quoting.  Which patch is that from?

>
> +       /*
> +        * In principle, there is nothing to prevent a scan over a large, highly
> +        * corrupted table from using workmem worth of memory building up the
> +        * tuplestore.  Don't leak the msg argument memory.
> +        */
> +       pfree(msg);
>
> Maybe change the second sentence to something like: "That should be
> OK, else the user can lower work_mem, but we'd better not leak any
> additional memory."
It may be a little wordy, but I went with

    /*
     * In principle, there is nothing to prevent a scan over a large, highly
     * corrupted table from using workmem worth of memory building up the
     * tuplestore.  That's ok, but if we also leak the msg argument memory
     * until the end of the query, we could exceed workmem by more than a
     * trivial amount.  Therefore, free the msg argument each time we are
     * called rather than waiting for our current memory context to be freed.
     */

> +/*
> + * check_tuphdr_xids
> + *
> + *     Determine whether tuples are visible for verification.  Similar to
> + *  HeapTupleSatisfiesVacuum, but with critical differences.
> + *
> + *  1) Does not touch hint bits.  It seems imprudent to write hint bits
> + *     to a table during a corruption check.
> + *  2) Only makes a boolean determination of whether verification should
> + *     see the tuple, rather than doing extra work for vacuum-related
> + *     categorization.
> + *
> + *  The caller should already have checked that xmin and xmax are not out of
> + *  bounds for the relation.
> + */
>
> First, check_tuphdr_xids() doesn't seem like a very good name. If you
> have a function with that name and, like this one, it returns Boolean,
> what does true mean? What does false mean? Kinda hard to tell. And
> also, check the tuple header XIDs *for what*? If you called it, say,
> tuple_is_visible(), that would be self-evident.
Changed.

> Second, consider that we hold at least AccessShareLock on the relation
> - actually, ATM we hold ShareUpdateExclusiveLock. Either way, there
> cannot be a concurrent modification to the tuple descriptor in
> progress. Therefore, I think that only a HEAPTUPLE_DEAD tuple is
> potentially using a non-current schema. If the tuple is
> HEAPTUPLE_INSERT_IN_PROGRESS, there's either no ADD COLUMN in the
> inserting transaction, or that transaction committed before we got our
> lock. Similarly if it's HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_RECENTLY_DEAD, the original inserter must've committed
> before we got our lock. Or if it's both inserted and deleted in the
> same transaction, say, then that transaction committed before we got
> our lock or else contains no relevant DDL. IOW, I think you can check
> everything but dead tuples here.
Ok, I have changed tuple_is_visible to return true rather than false for those other cases.

> Capitalization and punctuation for messages complaining about problems
> need to be consistent. verify_heapam() has "Invalid redirect line
> pointer offset %u out of bounds" which starts with a capital letter,
> but check_tuphdr_xids() has "heap tuple with XMAX_IS_MULTI is neither
> LOCKED_ONLY nor has a valid xmax" which does not. I vote for lower
> case, but in any event it should be the same.

I standardized on all lowercase text, though I left embedded symbols and constants such as LOCKED_ONLY alone.

> Also,
> check_tuphdr_xids() has "tuple xvac = %u invalid" which is either a
> debugging leftover or a very unclear complaint.

Right.  That has been changed to "old-style VACUUM FULL transaction ID %u is invalid in this relation".

> I think some real work
> needs to be put into the phrasing of these messages so that it's more
> clear exactly what is going on and why it's bad. For example the first
> example in this paragraph is clearly a problem of some kind, but it's
> not very clear exactly what is happening: is %u the offset of the
> invalid line redirect or the value to which it points? I don't think
> the phrasing is very grammatical, which makes it hard to tell which is
> meant, and I actually think it would be a good idea to include both
> things.

Beware that every row returned from amcheck has more fields than just the error message.

    blkno OUT bigint,
    offnum OUT integer,
    lp_off OUT smallint,
    lp_flags OUT smallint,
    lp_len OUT smallint,
    attnum OUT integer,
    chunk OUT integer,
    msg OUT text

Rather than including blkno, offnum, lp_off, lp_flags, lp_len, attnum, or chunk in the message, it would be better to remove these things from messages that include them.  For the specific message under consideration, I've converted the text to "line pointer redirection to item at offset number %u is outside valid bounds %u .. %u".  That avoids duplicating the offset information of the referring item, while reporting to offset of the referred item.

> Project policy is generally against splitting a string across multiple
> lines to fit within 80 characters. We like to fit within 80
> characters, but we like to be able to grep for strings more, and
> breaking them up like this makes that harder.

Thanks for clarifying the project policy.  I joined these message strings back together.

> +               confess(ctx,
> +                               pstrdup("corrupt toast chunk va_header"));
>
> This is another message that I don't think is very clear. There's two
> elements to that. One is that the phrasing is not very good, and the
> other is that there are no % escapes

Changed to "corrupt extended toast chunk with sequence number %d has invalid varlena header %0x".  I think all the other information about where the corruption was found is already present in the other returned columns.

> What's somebody going to do when
> they see this message? First, they're probably going to have to look
> at the code to figure out in which circumstances it gets generated;
> that's a sign that the message isn't phrased clearly enough. That will
> tell them that an unexpected bit pattern has been found, but not what
> that unexpected bit pattern actually was. So then, they're going to
> have to try to find the relevant va_header by some other means and
> fish out the relevant bit so that they can see what actually went
> wrong.

Right.

>
> + *   Checks the current attribute as tracked in ctx for corruption.  Records
> + *   any corruption found in ctx->corruption.
> + *
> + *
>
> Extra blank line.

Fixed.

> +       Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
> +
>                   ctx->attnum);
>
> Maybe you could avoid the line wrap by declaring this without
> initializing it, and then initializing it as a separate statement.

Yes, I like that better.  I did not need to do the same with infomask, but it looks better to me to break the declaration and initialization for both, so I did that.

>
> +               confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
> +
> ctx->tuphdr->t_hoff, ctx->offset,
> +                                                         ctx->lp_len));
>
> Uggh! This isn't even remotely an English sentence. I don't think
> formulas are the way to go here, but I like the idea of formulas in
> some places and written-out messages in others even less. I guess the
> complaint here in English is something like "tuple attribute %d should
> start at offset %u, but tuple length is only %u" or something of that
> sort. Also, it seems like this complaint really ought to have been
> reported on the *preceding* loop iteration, either complaining that
> (1) the fixed length attribute is more than the number of remaining
> bytes in the tuple or (2) the varlena header for the tuple specifies
> an excessively high length. It seems like you're blaming the wrong
> attribute for the problem.
Yeah, and it wouldn't complain if the final attribute of a tuple was overlong, as there wouldn't be a next attribute to blame it on.  I've changed it to report as you suggest, although it also still complains if the first attribute starts outside the bounds of the tuple.  The two error messages now read as "tuple attribute should start at offset %u, but tuple length is only %u" and "tuple attribute of length %u ends at offset %u, but tuple length is only %u".

> BTW, the header comments for this function (check_tuple_attribute)
> neglect to document the meaning of the return value.

Fixed.

> +                       confess(ctx, psprintf("tuple xmax = %u
> precedes relation "
> +
> "relfrozenxid = %u",
>
> This is another example of these messages needing  work. The
> corresponding message from heap_prepare_freeze_tuple() is "found
> update xid %u from before relfrozenxid %u". That's better, because we
> don't normally include equals signs in our messages like this, and
> also because "relation relfrozenxid" is redundant. I think this should
> say something like "tuple xmax %u precedes relfrozenxid %u".
>
> +                       confess(ctx, psprintf("tuple xmax = %u is in
> the future",
> +                                                                 xmax));
>
> And then this could be something like "tuple xmax %u follows
> last-assigned xid %u". That would be more symmetric and more
> informative.
Both of these have been changed.

> +               if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) >
> ctx->tuphdr->t_hoff)
>
> I think we should be able to predict the exact value of t_hoff and
> complain if it isn't precisely equal to the expected value. Or is that
> not possible for some reason?

That is possible, and I've updated the error message to match.  There are cases where you can't know if the HEAP_HASNULL bit is wrong or if the t_hoff value is wrong, but I've changed the code to just compute the length based on the HEAP_HASNULL setting and use that as the expected value, and complain when the actual value does not match the expected.  That sidesteps the problem of not knowing exactly which value to blame.

> Is there some place that's checking that lp_len >=
> SizeOfHeapTupleHeader before check_tuple() goes and starts poking into
> the header? If not, there should be.

Good catch.  check_tuple() now does that before reading the header.

> +$node->command_ok(
>
> +       [
> +               'pg_amcheck', '-p', $port, 'postgres'
> +       ],
> +       'pg_amcheck all schemas and tables implicitly');
> +
> +$node->command_ok(
> +       [
> +               'pg_amcheck', '-i', '-p', $port, 'postgres'
> +       ],
> +       'pg_amcheck all schemas, tables and indexes');
>
> I haven't really looked through the btree-checking and pg_amcheck
> parts of this much yet, but this caught my eye. Why would the default
> be to check tables but not indexes? I think the default ought to be to
> check everything we know how to check.
I have changed the default to match your expectations.





Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v11-0001-Adding-function-verify_btreeam-and-bumping-modul.patch (92K) Download Attachment
v11-0002-Adding-function-verify_heapam-to-amcheck-module.patch (82K) Download Attachment
v11-0003-Adding-contrib-module-pg_amcheck.patch (79K) Download Attachment
123456 ... 9