new heapcheck contrib module

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
164 messages Options
1 ... 3456789
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Aug 24, 2020, at 2:48 AM, Amul Sul <[hidden email]> wrote:
>
> Few comments for v14 version:
>
> v14-0001:
>
> verify_heapam.c: In function ‘verify_heapam’:
> verify_heapam.c:339:14: warning: variable ‘ph’ set but not used
> [-Wunused-but-set-variable]
>   PageHeader ph;
>              ^
> verify_heapam.c: In function ‘check_toast_tuple’:
> verify_heapam.c:877:8: warning: variable ‘chunkdata’ set but not used
> [-Wunused-but-set-variable]
>  char    *chunkdata;
>
> Got these compilation warnings
Removed.

>
>
> +++ b/contrib/amcheck/amcheck.h
> @@ -0,0 +1,5 @@
> +#include "postgres.h"
> +
> +Datum verify_heapam(PG_FUNCTION_ARGS);
> +Datum bt_index_check(PG_FUNCTION_ARGS);
> +Datum bt_index_parent_check(PG_FUNCTION_ARGS);
>
> bt_index_* are needed?
This entire header file is not needed.  Removed.

> #include "access/htup_details.h"
> #include "access/xact.h"
> #include "catalog/pg_type.h"
> #include "catalog/storage_xlog.h"
> #include "storage/smgr.h"
> #include "utils/lsyscache.h"
> #include "utils/rel.h"
> #include "utils/snapmgr.h"
> #include "utils/syscache.h"
>
> These header file inclusion to verify_heapam.c. can be omitted. Some of those
> might be implicitly got added by other header files or no longer need due to
> recent changes.
Removed.


> + *   on_error_stop:
> + *     Whether to stop at the end of the first page for which errors are
> + *     detected.  Note that multiple rows may be returned.
> + *
> + *   check_toast:
> + *     Whether to check each toasted attribute against the toast table to
> + *     verify that it can be found there.
> + *
> + *   skip:
> + *     What kinds of pages in the heap relation should be skipped.  Valid
> + *     options are "all-visible", "all-frozen", and "none".
>
> I think it would be good if the description also includes what will be default
> value otherwise.
The defaults are defined in amcheck--1.2--1.3.sql, and I was concerned that documenting them in verify_heapam.c would create a hazard of the defaults and their documented values getting out of sync.  The handling of null arguments in verify_heapam.c was, however, duplicating the defaults from the .sql file, so I've changed that to just ereport error on null.  (I can't make the whole function strict, as some other arguments are allowed to be null.)  I have not documented the defaults in either file, as they are quite self-evident in the .sql file.  I've updated some tests that were passing null to get the default behavior to now either pass nothing or explicitly pass the argument they want.

>
> + /*
> + * Optionally open the toast relation, if any, also protected from
> + * concurrent vacuums.
> + */
>
> Now lock is changed to AccessShareLock, I think we need to rephrase this comment
> as well since we are not really doing anything extra explicitly to protect from
> the concurrent vacuum.

Right.  Comment changed.

> +/*
> + * Return wehter a multitransaction ID is in the cached valid range.
> + */
>
> Typo: s/wehter/whether

Changed.

> v14-0002:
>
> +#define NOPAGER 0
>
> Unused macro.

Removed.

> + appendPQExpBuffer(querybuf,
> +   "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg"
> +   "\nFROM public.verify_heapam("
> + "\nrelation := %u,"
> + "\non_error_stop := %s,"
> + "\nskip := %s,"
> + "\ncheck_toast := %s,"
> + "\nstartblock := %s,"
> + "\nendblock := %s) v, "
> + "\npg_catalog.pg_class c"
> +   "\nWHERE c.oid = %u",
> +   tbloid, stop, skip, toast, startblock, endblock, tbloid);
> [....]
> + appendPQExpBuffer(querybuf,
> +   "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)",
> +   idxoid,
> +   settings.heapallindexed ? "true" : "false",
> +   settings.rootdescend ? "true" : "false");
>
> The assumption that the amcheck extension will be always installed in the public
> schema doesn't seem to be correct. This will not work if amcheck install
> somewhere else.
Right.  I removed the schema qualification, leaving it up to the search path.  

Thanks for the review!




Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch (112K) Download Attachment
v15-0002-Adding-contrib-module-pg_amcheck.patch (118K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Andrey Borodin-2


> 25 авг. 2020 г., в 19:36, Mark Dilger <[hidden email]> написал(а):

Hi Mark!

Thanks for working on this important feature.

I was experimenting a bit with our internal heapcheck and found out that it's not helping with truncated CLOG anyhow.
Will your module be able to gather tid's of similar corruptions?

server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
ERROR:  58P01: could not access status of transaction 636558742
DETAIL:  Could not open file "pg_xact/025F": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:913
Time: 3439.915 ms (00:03.440)

Thanks!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
On Fri, Aug 28, 2020 at 1:07 AM Andrey M. Borodin <[hidden email]> wrote:
> I was experimenting a bit with our internal heapcheck and found out that it's not helping with truncated CLOG anyhow.
> Will your module be able to gather tid's of similar corruptions?
>
> server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
> ERROR:  58P01: could not access status of transaction 636558742
> DETAIL:  Could not open file "pg_xact/025F": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:913
> Time: 3439.915 ms (00:03.440)

This kind of thing gets really tricky. PostgreSQL uses errors in tons
of places to report problems, and if you want to accumulate a list of
errors and report them all rather than just letting the first one
cancel the operation, you need special handling for each individual
error you want to bypass. A tool like this naturally wants to use as
much PostgreSQL infrastructure as possible, to avoid duplicating a ton
of code and creating a bloated monstrosity, but all that code can
throw errors. I think the code in its current form is trying to be
resilient against problems on the table pages that it is actually
checking, but it can't necessarily handle gracefully corruption in
other parts of the system. For instance:

- CLOG could be truncated, as in your example
- the disk files could have had their permissions changed so that they
can't be accessed
- the PageIsVerified() check might fail when pages are read
- the TOAST table's metadata in pg_class/pg_attribute/etc. could be corrupted
- ...or the files for those system catalogs could've had their
permissions changed
- ....or they could contain invalid pages
- ...or their indexes could be messed up

I think there are probably a bunch more, and I don't think it's
practical to allow this tool to continue after arbitrary stuff goes
wrong. It'll be too much code and impossible to maintain. In the case
you mention, I think we should view that as a problem with clog rather
than a problem with the table, and thus out of scope.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
In reply to this post by Andrey Borodin-2


> On Aug 27, 2020, at 10:07 PM, Andrey M. Borodin <[hidden email]> wrote:
>
>
>
>> 25 авг. 2020 г., в 19:36, Mark Dilger <[hidden email]> написал(а):
>
> Hi Mark!
>
> Thanks for working on this important feature.
>
> I was experimenting a bit with our internal heapcheck and found out that it's not helping with truncated CLOG anyhow.
> Will your module be able to gather tid's of similar corruptions?
>
> server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
> ERROR:  58P01: could not access status of transaction 636558742
> DETAIL:  Could not open file "pg_xact/025F": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:913
> Time: 3439.915 ms (00:03.440)

The design principle for verify_heapam.c is, if the rest of the system is not corrupt, corruption in the table being checked should not cause a crash during the table check. This is a very limited principle.  Even corruption in the associated toast table or toast index could cause a crash.  That is why checking against the toast table is optional, and false by default.

Perhaps a more extensive effort could be made later.  I think it is out of scope for this release cycle.  It is a very interesting area for further research, though.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Andrey Borodin-2
In reply to this post by Robert Haas


> 28 авг. 2020 г., в 18:58, Robert Haas <[hidden email]> написал(а):
> In the case
> you mention, I think we should view that as a problem with clog rather
> than a problem with the table, and thus out of scope.

I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Aug 28, 2020, at 11:10 AM, Andrey M. Borodin <[hidden email]> wrote:
>
>
>
>> 28 авг. 2020 г., в 18:58, Robert Haas <[hidden email]> написал(а):
>> In the case
>> you mention, I think we should view that as a problem with clog rather
>> than a problem with the table, and thus out of scope.
>
> I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
> Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).
>
> Best regards, Andrey Borodin.

If you lock the relations involved, check the toast table first, the toast index second, and the main table third, do you still get the problem?  Look at how pg_amcheck handles this and let me know if you still see a problem.  There is the ever present problem that external forces, like a rogue process deleting backend files, will strike at precisely the wrong moment, but barring that kind of concurrent corruption, I think the toast table being checked prior to the main table being checked solves some of the issues you are worried about.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
In reply to this post by Andrey Borodin-2
On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <[hidden email]> wrote:
> I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
> Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).

The code can (and should, and I think does) refrain from looking up
XIDs that are out of the range thought to be valid -- but how do you
propose that it avoid looking up XIDs that ought to have clog data
associated with them despite being >= relfrozenxid and < nextxid?
TransactionIdDidCommit() does not have a suppress-errors flag, adding
one would be quite invasive, yet we cannot safely perform a
significant number of checks without knowing whether the inserting
transaction committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Andrey Borodin-2


> 29 авг. 2020 г., в 00:56, Robert Haas <[hidden email]> написал(а):
>
> On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <[hidden email]> wrote:
>> I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
>> Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).
>
> The code can (and should, and I think does) refrain from looking up
> XIDs that are out of the range thought to be valid -- but how do you
> propose that it avoid looking up XIDs that ought to have clog data
> associated with them despite being >= relfrozenxid and < nextxid?
> TransactionIdDidCommit() does not have a suppress-errors flag, adding
> one would be quite invasive, yet we cannot safely perform a
> significant number of checks without knowing whether the inserting
> transaction committed.

What you write seems completely correct to me. I agree that CLOG thresholds lookup seems unnecessary.

But I have a real corruption at hand (on testing site). If I have proposed here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot fix the problem, because cannot list affected tuples. These tools do not solve the problem neglected for long enough. It would be supercool if they could.

This corruption like a caries had 3 stages:
1. incorrect VM flag that page do not need vacuum
2. xmin and xmax < relfrozenxid
3. CLOG truncated

Stage 2 is curable with proposed toolset, stage 3 is not. But they are not that different.

Thanks!

Best regards, Andrey Borodin.

Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Aug 29, 2020, at 3:27 AM, Andrey M. Borodin <[hidden email]> wrote:
>
>
>
>> 29 авг. 2020 г., в 00:56, Robert Haas <[hidden email]> написал(а):
>>
>> On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <[hidden email]> wrote:
>>> I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
>>> Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).
>>
>> The code can (and should, and I think does) refrain from looking up
>> XIDs that are out of the range thought to be valid -- but how do you
>> propose that it avoid looking up XIDs that ought to have clog data
>> associated with them despite being >= relfrozenxid and < nextxid?
>> TransactionIdDidCommit() does not have a suppress-errors flag, adding
>> one would be quite invasive, yet we cannot safely perform a
>> significant number of checks without knowing whether the inserting
>> transaction committed.
>
> What you write seems completely correct to me. I agree that CLOG thresholds lookup seems unnecessary.
>
> But I have a real corruption at hand (on testing site). If I have proposed here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot fix the problem, because cannot list affected tuples. These tools do not solve the problem neglected for long enough. It would be supercool if they could.
>
> This corruption like a caries had 3 stages:
> 1. incorrect VM flag that page do not need vacuum
> 2. xmin and xmax < relfrozenxid
> 3. CLOG truncated
>
> Stage 2 is curable with proposed toolset, stage 3 is not. But they are not that different.

I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog.  Ultimately, I ripped that out.  My reasoning was that a simpler patch submission was more likely to be acceptable to the community.

If you want to submit a separate patch that creates a non-throwing version of the clog interface, and get the community to accept and commit it, I would seriously consider using that from verify_heapam.  If it gets committed in time, I might even do so for this release cycle.  But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
In reply to this post by Mark Dilger-5
On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger
<[hidden email]> wrote:
> Thanks for the review!

+                                                         msg OUT text
+                                                         )

Looks like atypical formatting.

+REVOKE ALL ON FUNCTION
+verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
+FROM PUBLIC;

This too.

+-- Don't want this to be available to public

Add "by default, but superusers can grant access" or so?

I think there should be a call to pg_class_aclcheck() here, just like
the one in pg_prewarm, so that if the superuser does choose to grant
access, users given access can check tables they anyway have
permission to access, but not others. Maybe put that in
check_relation_relkind_and_relam() and rename it. Might want to look
at the pg_surgery precedent, too. Oh, and that functions header
comment is also wrong.

I think that the way the checks on the block range are performed could
be improved. Generally, we want to avoid reporting the same problem
with a variety of different message strings, because it adds burden
for translators and is potentially confusing for users. You've got two
message strings that are only going to be used for empty relations and
a third message string that is only going to be used for non-empty
relations. What stops you from just ripping off the way that this is
done in pg_prewarm, which requires only 2 messages? Then you'd be
adding a net total of 0 new messages instead of 3, and in my view they
would be clearer than your third message, "block range is out of
bounds for relation with block count %u: " INT64_FORMAT " .. "
INT64_FORMAT, which doesn't say very precisely what the problem is,
and also falls afoul of our usual practice of avoiding the use of
INT64_FORMAT in error messages that are subject to translation. I
notice that pg_prewarm just silently does nothing if the start and end
blocks are swapped, rather than generating an error. We could choose
to do differently here, but I'm not sure why we should bother.

+                       all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+                       all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+                       if ((all_frozen && skip_option ==
SKIP_PAGES_ALL_FROZEN) ||
+                               (all_visible && skip_option ==
SKIP_PAGES_ALL_VISIBLE))
+                       {
+                               continue;
+                       }

This isn't horrible style, but why not just get rid of the local
variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if
((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... }

Typically no braces around a block containing only one line.

+ * table contains corrupt all frozen bits, a concurrent vacuum might skip the

all-frozen?

+ * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions
+ * seems acceptable, since if we had checked it earlier in our scan it would
+ * have truly been valid at that time, and we break no MVCC guarantees by
+ * failing to notice the concurrent change in its status.

I agree with the first half of this sentence, but I don't know what
MVCC guarantees have to do with anything. I'd just delete the second
part, or make it a lot clearer.

+ * Some kinds of tuple header corruption make it unsafe to check the tuple
+ * attributes, for example when the tuple is foreshortened and such checks
+ * would read beyond the end of the line pointer (and perhaps the page).  In

I think of foreshortening mostly as an art term, though I guess it has
other meanings. Maybe it would be clearer to say something like "Some
kinds of corruption make it unsafe to check the tuple attributes, for
example when the line pointer refers to a range of bytes outside the
page"?

+ * Other kinds of tuple header corruption do not bare on the question of

bear

+                                                 pstrdup(_("updating
transaction ID marked incompatibly as keys updated and locked
only")));
+                                                 pstrdup(_("updating
transaction ID marked incompatibly as committed and as a
multitransaction ID")));

"updating transaction ID" might scare somebody who thinks that you are
telling them that you changed something. That's not what it means, but
it might not be totally clear. Maybe:

tuple is marked as only locked, but also claims key columns were updated
multixact should not be marked committed

+
psprintf(_("data offset differs from expected: %u vs. %u (1 attribute,
has nulls)"),

For these, how about:

tuple data should begin at byte %u, but actually begins at byte %u (1
attribute, has nulls)
etc.

+
psprintf(_("old-style VACUUM FULL transaction ID is in the future:
%u"),
+
psprintf(_("old-style VACUUM FULL transaction ID precedes freeze
threshold: %u"),
+
psprintf(_("old-style VACUUM FULL transaction ID is invalid in this
relation: %u"),

old-style VACUUM FULL transaction ID %u is in the future
old-style VACUUM FULL transaction ID %u precedes freeze threshold %u
old-style VACUUM FULL transaction ID %u out of range %u..%u

Doesn't the second of these overlap with the third?

Similarly in other places, e.g.

+
psprintf(_("inserting transaction ID is in the future: %u"),

I think this should change to: inserting transaction ID %u is in the future

+       else if (VARATT_IS_SHORT(chunk))
+               /*
+                * could happen due to heap_form_tuple doing its thing
+                */
+               chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;

Add braces here, since there are multiple lines.

+                                                 psprintf(_("toast
chunk sequence number not the expected sequence number: %u vs. %u"),

toast chunk sequence number %u does not match expected sequence number %u

There are more instances of this kind of thing.

+
psprintf(_("toasted attribute has unexpected TOAST tag: %u"),

Remove colon.

+
psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
%u (attribute length %u)"),

Let's try to specify the attribute number in the attribute messages
where we can, e.g.

+
psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
%u (attribute length %u)"),

How about: attribute %u with length %u should end at offset %u, but
the tuple length is only %u

+               if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+                       TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+               {
+                       report_corruption(ctx,
+                                                         /*
translator: Both %u are transaction IDs. */
+
psprintf(_("inserting transaction ID is from before freeze cutoff: %u
vs. %u"),
+
    xmin, ctx->relfrozenxid));
+                       fatal = true;
+               }
+               else if (!xid_valid_in_rel(xmin, ctx))
+               {
+                       report_corruption(ctx,
+                                                         /*
translator: %u is a transaction ID. */
+
psprintf(_("inserting transaction ID is in the future: %u"),
+
    xmin));
+                       fatal = true;
+               }

This seems like good evidence that xid_valid_in_rel needs some
rethinking. As far as I can see, every place where you call
xid_valid_in_rel, you have checks beforehand that duplicate some of
what it does, so that you can give a more accurate error message.
That's not good. Either the message should be adjusted so that it
covers all the cases "e.g. tuple xmin %u is outside acceptable range
%u..%u" or we should just get rid of xid_valid_in_rel() and have
separate error messages for each case, e.g. tuple xmin %u precedes
relfrozenxid %u". I think it's OK to use terms like xmin and xmax in
these messages, rather than inserting transaction ID etc. We have
existing instances of that, and while someone might judge it
user-unfriendly, I disagree. A person who is qualified to interpret
this output must know what 'tuplex min' means immediately, but whether
they can understand that 'inserting transaction ID' means the same
thing is questionable, I think.

This is not a full review, but in general I think that this is getting
pretty close to being committable. The error messages seem to still
need some polishing and I wouldn't be surprised if there are a few
more bugs lurking yet, but I think it's come a long way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5


> On Sep 21, 2020, at 2:09 PM, Robert Haas <[hidden email]> wrote:
>
> I think there should be a call to pg_class_aclcheck() here, just like
> the one in pg_prewarm, so that if the superuser does choose to grant
> access, users given access can check tables they anyway have
> permission to access, but not others. Maybe put that in
> check_relation_relkind_and_relam() and rename it. Might want to look
> at the pg_surgery precedent, too.

In the presence of corruption, verify_heapam() reports to the user (in other words, leaks) metadata about the corrupted rows.  Reasoning about the attack vectors this creates is hard, but a conservative approach is to assume that an attacker can cause corruption in order to benefit from the leakage, and make sure the leakage does not violate any reasonable security expectations.

Basing the security decision on whether the user has access to read the table seems insufficient, as it ignores row level security.  Perhaps that is ok if row level security is not enabled for the table or if the user has been granted BYPASSRLS.  There is another problem, though.  There is no grantable privilege to read dead rows.  In the case of corruption, verify_heapam() may well report metadata about dead rows.

pg_surgery also appears to leak information about dead rows.  Owners of tables can probe whether supplied TIDs refer to dead rows.  If a table containing sensitive information has rows deleted prior to ownership being transferred, the new owner of the table could probe each page of deleted data to determine something of the content that was there.  Information about the number of deleted rows is already available through the pg_stat_* views, but those views don't give such a fine-grained approach to figuring out how large each deleted row was.  For a table with fixed content options, the content can sometimes be completely inferred from the length of the row.  (Consider a table with a single text column containing either "approved" or "denied".)

But pg_surgery is understood to be a collection of sharp tools only to be used under fairly exceptional conditions.  amcheck, on the other hand, is something that feels safer and more reasonable to use on a regular basis, perhaps from a cron job executed by a less trusted user.  Forcing the user to be superuser makes it clearer that this feeling of safety is not justified.

I am inclined to just restrict verify_heapam() to superusers and be done.  What do you think?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Tue, Sep 22, 2020 at 10:55 AM Mark Dilger
<[hidden email]> wrote:
> I am inclined to just restrict verify_heapam() to superusers and be done.  What do you think?

The existing amcheck functions were designed to have execute privilege
granted to non-superusers, though we never actually advertised that
fact. Maybe now would be a good time to start doing so.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Robert Haas
In reply to this post by Mark Dilger-5
On Tue, Sep 22, 2020 at 1:55 PM Mark Dilger
<[hidden email]> wrote:
> I am inclined to just restrict verify_heapam() to superusers and be done.  What do you think?

I think that's an old and largely failed approach. If you want to use
pg_class_ownercheck here rather than pg_class_aclcheck or something
like that, seems fair enough. But I don't think there should be an
is-superuser check in the code, because we've been trying really hard
to get rid of those in most places. And I also don't think there
should be no secondary permissions check, because if somebody does
grant execute permission on these functions, it's unlikely that they
want the person getting that permission to be able to check every
relation in the system even those on which they have no other
privileges at all.

But now I see that there's no secondary permission check in the
verify_nbtree.c code. Is that intentional? Peter, what's the
justification for that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <[hidden email]> wrote:
> But now I see that there's no secondary permission check in the
> verify_nbtree.c code. Is that intentional? Peter, what's the
> justification for that?

As noted by comments in contrib/amcheck/sql/check_btree.sql (the
verify_nbtree.c tests), this is intentional. Note that we explicitly
test that a non-superuser role can perform verification following
GRANT EXECUTE ON FUNCTION ... .

As I mentioned earlier, this is supported (or at least it is supported
in my interpretation of things). It just isn't documented anywhere
outside the test itself.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
In reply to this post by Robert Haas
On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <[hidden email]> wrote:
> +REVOKE ALL ON FUNCTION
> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
> +FROM PUBLIC;
>
> This too.

Do we really want to use a cstring as an enum-like argument?

I think that I see a bug at this point in check_tuple() (in
v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch):

> +   /* If xmax is a multixact, it should be within valid range */
> +   xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
> +   if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx))
> +   {

*** SNIP ***

> +   }
> +
> +   /* If xmax is normal, it should be within valid range */
> +   if (TransactionIdIsNormal(xmax))
> +   {

Why should it be okay to call TransactionIdIsNormal(xmax) at this
point? It isn't certain that xmax is an XID at all (could be a
MultiXactId, since you called HeapTupleHeaderGetRawXmax() to get the
value in the first place). Don't you need to check "(infomask &
HEAP_XMAX_IS_MULTI) == 0" here?

This does look like it's shaping up. Thanks for working on it, Mark.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Peter Geoghegan-4
In reply to this post by Mark Dilger-5
On Sat, Aug 29, 2020 at 10:48 AM Mark Dilger
<[hidden email]> wrote:
> I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog.  Ultimately, I ripped that out.  My reasoning was that a simpler patch submission was more likely to be acceptable to the community.

Isn't some kind of pragmatic compromise possible?

> But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.

Fair enough, but if you're alluding to what I said then about
check_tuphdr_xids()/clog checking a while back then FWIW I didn't
intend to block progress on clog/xact status verification at all. I
just don't think that it is sensible to impose an iron clad guarantee
about having no assertion failures with corrupt clog data -- that
leads to far too much code duplication. But why should you need to
provide an absolute guarantee of that?

I for one would be fine with making the clog checks an optional extra,
that rescinds the no crash guarantee that you're keen on -- just like
with the TOAST checks that you have already in v15. It might make
sense to review how often crashes occur with simulated corruption, and
then to minimize the number of occurrences in the real world. Maybe we
could tolerate a usually-no-crash interface to clog -- if it could
still have assertion failures. Making a strong guarantee about
assertions seems unnecessary.

I don't see how verify_heapam will avoid raising an error during basic
validation from PageIsVerified(), which will violate the guarantee
about not throwing errors. I don't see that as a problem myself, but
presumably you will.

--
Peter Geoghegan


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Tom Lane-2
In reply to this post by Peter Geoghegan-4
Peter Geoghegan <[hidden email]> writes:
> On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <[hidden email]> wrote:
>> +REVOKE ALL ON FUNCTION
>> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
>> +FROM PUBLIC;
>>
>> This too.

> Do we really want to use a cstring as an enum-like argument?

Ugh.  We should not be using cstring as a SQL-exposed datatype
unless there really is no alternative.  Why wasn't this argument
declared "text"?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Stephen Frost
In reply to this post by Peter Geoghegan-4
Greetings,

* Peter Geoghegan ([hidden email]) wrote:
> On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <[hidden email]> wrote:
> > But now I see that there's no secondary permission check in the
> > verify_nbtree.c code. Is that intentional? Peter, what's the
> > justification for that?
>
> As noted by comments in contrib/amcheck/sql/check_btree.sql (the
> verify_nbtree.c tests), this is intentional. Note that we explicitly
> test that a non-superuser role can perform verification following
> GRANT EXECUTE ON FUNCTION ... .

> As I mentioned earlier, this is supported (or at least it is supported
> in my interpretation of things). It just isn't documented anywhere
> outside the test itself.

Would certainly be good to document this but I tend to agree with the
comments that ideally-

a) it'd be nice for a relatively low-privileged user/process could run
   the tests in an ongoing manner
b) we don't want to add more is-superuser checks
c) users shouldn't really be given the ability to see rows they're not
   supposed to have access to

In other places in the code, when an error is generated and the user
doesn't have access to the underlying table or doesn't have BYPASSRLS,
we don't include the details or the actual data in the error.  Perhaps
that approach would make sense here (or perhaps not, but it doesn't seem
entirely crazy to me, anyway).  In other words:

a) keep the ability for someone who has EXECUTE on the function to be
   able to run the function against any relation
b) when we detect an issue, perform a permissions check to see if the
   user calling the function has rights to read the rows of the table
   and, if RLS is enabled on the table, if they have BYPASSRLS
c) if the user has appropriate privileges, log the detailed error, if
   not, return a generic error with a HINT that details weren't
   available due to lack of privileges on the relation

I can appreciate the concerns regarding dead rows ending up being
visible to someone who wouldn't normally be able to see them but I'd
argue we could simply document that fact rather than try to build
something to address it, for this particular case.  If there's push back
on that then I'd suggest we have a "can read dead rows" or some such
capability that can be GRANT'd (in the form of a default role, I would
think) which a user would also have to have in order to get detailed
error reports from this function.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Michael Paquier-2
In reply to this post by Mark Dilger-5
On Tue, Aug 25, 2020 at 07:36:53AM -0700, Mark Dilger wrote:
> Removed.

This patch is failing to compile on Windows:
C:\projects\postgresql\src\include\fe_utils/print.h(18): fatal error
  C1083: Cannot open include file: 'libpq-fe.h': No such file or
  directory [C:\projects\postgresql\pg_amcheck.vcxproj]

It looks like you forgot to tweak the scripts in src/tools/msvc/.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: new heapcheck contrib module

Mark Dilger-5
In reply to this post by Robert Haas
Robert, Peter, Andrey, Stephen, and Michael,

Attached is a new version based in part on your review comments, quoted and responded to below as necessary.

There remain a few open issues and/or things I did not implement:

- This version follows Robert's suggestion of using pg_class_aclcheck() to check that the caller has permission to select from the table being checked.  This is inconsistent with the btree checking logic, which does no such check.  These two approaches should be reconciled, but there was apparently no agreement on this issue.

- The public facing documentation, currently live at https://www.postgresql.org/docs/13/amcheck.html, claims "amcheck functions may only be used by superusers."  The docs on master still say the same.  This patch replaces that language with alternate language explaining that execute permissions may be granted to non-superusers, along with a warning about the risk of data leakage.  Perhaps some portion of that language in this patch should be back-patched?

- Stephen's comments about restricting how much information goes into the returned corruption report depending on the permissions of the caller has not been implemented.  I may implement some of this if doing so is consistent with whatever we decide to do for the aclcheck issue, above, though probably not.  It seems overly complicated.

- This version does not change clog handling, which leaves Andrey's concern unaddressed.  Peter also showed some support for (or perhaps just a lack of opposition to) doing more of what Andrey suggests.  I may come back to this issue, depending on time available and further feedback.


Moving on to Michael's review....

> On Sep 28, 2020, at 10:56 PM, Michael Paquier <[hidden email]> wrote:
>
> On Tue, Aug 25, 2020 at 07:36:53AM -0700, Mark Dilger wrote:
>> Removed.
>
> This patch is failing to compile on Windows:
> C:\projects\postgresql\src\include\fe_utils/print.h(18): fatal error
>  C1083: Cannot open include file: 'libpq-fe.h': No such file or
>  directory [C:\projects\postgresql\pg_amcheck.vcxproj]
>
> It looks like you forgot to tweak the scripts in src/tools/msvc/.
Fixed, I think.  I have not tested on windows.


Moving on to Stephen's review....

> On Sep 23, 2020, at 6:46 AM, Stephen Frost <[hidden email]> wrote:
>
> Greetings,
>
> * Peter Geoghegan ([hidden email]) wrote:
>> On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <[hidden email]> wrote:
>>> But now I see that there's no secondary permission check in the
>>> verify_nbtree.c code. Is that intentional? Peter, what's the
>>> justification for that?
>>
>> As noted by comments in contrib/amcheck/sql/check_btree.sql (the
>> verify_nbtree.c tests), this is intentional. Note that we explicitly
>> test that a non-superuser role can perform verification following
>> GRANT EXECUTE ON FUNCTION ... .
>
>> As I mentioned earlier, this is supported (or at least it is supported
>> in my interpretation of things). It just isn't documented anywhere
>> outside the test itself.
>
> Would certainly be good to document this but I tend to agree with the
> comments that ideally-
>
> a) it'd be nice for a relatively low-privileged user/process could run
>   the tests in an ongoing manner
> b) we don't want to add more is-superuser checks
> c) users shouldn't really be given the ability to see rows they're not
>   supposed to have access to
>
> In other places in the code, when an error is generated and the user
> doesn't have access to the underlying table or doesn't have BYPASSRLS,
> we don't include the details or the actual data in the error.  Perhaps
> that approach would make sense here (or perhaps not, but it doesn't seem
> entirely crazy to me, anyway).  In other words:
>
> a) keep the ability for someone who has EXECUTE on the function to be
>   able to run the function against any relation
> b) when we detect an issue, perform a permissions check to see if the
>   user calling the function has rights to read the rows of the table
>   and, if RLS is enabled on the table, if they have BYPASSRLS
> c) if the user has appropriate privileges, log the detailed error, if
>   not, return a generic error with a HINT that details weren't
>   available due to lack of privileges on the relation
>
> I can appreciate the concerns regarding dead rows ending up being
> visible to someone who wouldn't normally be able to see them but I'd
> argue we could simply document that fact rather than try to build
> something to address it, for this particular case.  If there's push back
> on that then I'd suggest we have a "can read dead rows" or some such
> capability that can be GRANT'd (in the form of a default role, I would
> think) which a user would also have to have in order to get detailed
> error reports from this function.
There wasn't enough agreement on the thread about how this should work, so I left this idea unimplemented.

I'm a bit concerned that restricting the results for non-superusers would create a perverse incentive to use a superuser role to connect and check tables.  On the other hand, there would not be any difference in the output in the common case that no corruption exists, so maybe the perverse incentive would not be too significant.

Implementing the idea you outline would complicate the patch a fair amount, as we'd need to tailor all the reports in this way, and extend the tests to verify we're not leaking any information to non-superusers.  I would prefer to find a simpler solution.


Moving on to Robert's review....

> On Sep 21, 2020, at 2:09 PM, Robert Haas <[hidden email]> wrote:
>
> On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger
> <[hidden email]> wrote:
>> Thanks for the review!
>
> +                                                         msg OUT text
> +                                                         )
>
> Looks like atypical formatting.
>
> +REVOKE ALL ON FUNCTION
> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
> +FROM PUBLIC;
>
> This too.
Changed in this next version.

> +-- Don't want this to be available to public
>
> Add "by default, but superusers can grant access" or so?

Hmm.  I borrowed the verbiage from elsewhere.

contrib/pg_buffercache/pg_buffercache--1.2.sql:-- Don't want these to be available to public.
contrib/pg_freespacemap/pg_freespacemap--1.1.sql:-- Don't want these to be available to public.
contrib/pg_visibility/pg_visibility--1.1.sql:-- Don't want these to be available to public.

> I think there should be a call to pg_class_aclcheck() here, just like
> the one in pg_prewarm, so that if the superuser does choose to grant
> access, users given access can check tables they anyway have
> permission to access, but not others. Maybe put that in
> check_relation_relkind_and_relam() and rename it. Might want to look
> at the pg_surgery precedent, too.

I don't think there are any great options here, but for this next version I've done it with pg_class_aclcheck().

> Oh, and that functions header
> comment is also wrong.

Changed in this next version.

> I think that the way the checks on the block range are performed could
> be improved. Generally, we want to avoid reporting the same problem
> with a variety of different message strings, because it adds burden
> for translators and is potentially confusing for users. You've got two
> message strings that are only going to be used for empty relations and
> a third message string that is only going to be used for non-empty
> relations. What stops you from just ripping off the way that this is
> done in pg_prewarm, which requires only 2 messages? Then you'd be
> adding a net total of 0 new messages instead of 3, and in my view they
> would be clearer than your third message, "block range is out of
> bounds for relation with block count %u: " INT64_FORMAT " .. "
> INT64_FORMAT, which doesn't say very precisely what the problem is,
> and also falls afoul of our usual practice of avoiding the use of
> INT64_FORMAT in error messages that are subject to translation. I
> notice that pg_prewarm just silently does nothing if the start and end
> blocks are swapped, rather than generating an error. We could choose
> to do differently here, but I'm not sure why we should bother.
This next version borrows pg_prewarm's messages as you suggest, except that pg_prewarm embeds INT64_FORMAT in the message strings, which are replaced with %u in this next patch.  Also, there is no good way to report an invalid block range for empty tables using these messages, so the patch now just exists early in such a case for invalid ranges without throwing an error.  This is a little bit non-orthogonal with how invalid block ranges are handled on non-empty tables, but perhaps that's ok.

>
> +                       all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
> +                       all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
> +
> +                       if ((all_frozen && skip_option ==
> SKIP_PAGES_ALL_FROZEN) ||
> +                               (all_visible && skip_option ==
> SKIP_PAGES_ALL_VISIBLE))
> +                       {
> +                               continue;
> +                       }
>
> This isn't horrible style, but why not just get rid of the local
> variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if
> ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... }
>
> Typically no braces around a block containing only one line.
Changed in this next version.

> + * table contains corrupt all frozen bits, a concurrent vacuum might skip the
>
> all-frozen?

Changed in this next version.

> + * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions
> + * seems acceptable, since if we had checked it earlier in our scan it would
> + * have truly been valid at that time, and we break no MVCC guarantees by
> + * failing to notice the concurrent change in its status.
>
> I agree with the first half of this sentence, but I don't know what
> MVCC guarantees have to do with anything. I'd just delete the second
> part, or make it a lot clearer.

Changed in this next version to simply omit the MVCC related language.

>
> + * Some kinds of tuple header corruption make it unsafe to check the tuple
> + * attributes, for example when the tuple is foreshortened and such checks
> + * would read beyond the end of the line pointer (and perhaps the page).  In
>
> I think of foreshortening mostly as an art term, though I guess it has
> other meanings. Maybe it would be clearer to say something like "Some
> kinds of corruption make it unsafe to check the tuple attributes, for
> example when the line pointer refers to a range of bytes outside the
> page"?
>
> + * Other kinds of tuple header corruption do not bare on the question of
>
> bear
Changed.

> +                                                 pstrdup(_("updating
> transaction ID marked incompatibly as keys updated and locked
> only")));
> +                                                 pstrdup(_("updating
> transaction ID marked incompatibly as committed and as a
> multitransaction ID")));
>
> "updating transaction ID" might scare somebody who thinks that you are
> telling them that you changed something. That's not what it means, but
> it might not be totally clear. Maybe:
>
> tuple is marked as only locked, but also claims key columns were updated
> multixact should not be marked committed
Changed to use your verbiage.

> +
> psprintf(_("data offset differs from expected: %u vs. %u (1 attribute,
> has nulls)"),
>
> For these, how about:
>
> tuple data should begin at byte %u, but actually begins at byte %u (1
> attribute, has nulls)
> etc.

Is it ok to embed interpolated values into the message string like that?  I thought that made it harder for translators.  I agree that your language is easier to understand, and have used it in this next version of the patch.  Many of your comments that follow raise the same issue, but I'm using your verbiage anyway.

> +
> psprintf(_("old-style VACUUM FULL transaction ID is in the future:
> %u"),
> +
> psprintf(_("old-style VACUUM FULL transaction ID precedes freeze
> threshold: %u"),
> +
> psprintf(_("old-style VACUUM FULL transaction ID is invalid in this
> relation: %u"),
>
> old-style VACUUM FULL transaction ID %u is in the future
> old-style VACUUM FULL transaction ID %u precedes freeze threshold %u
> old-style VACUUM FULL transaction ID %u out of range %u..%u
>
> Doesn't the second of these overlap with the third?
Good point.  If the second one reports, so will the third.  I've changed it to use if/else if logic to avoid that, and to use your suggested verbiage.

>
> Similarly in other places, e.g.
>
> +
> psprintf(_("inserting transaction ID is in the future: %u"),
>
> I think this should change to: inserting transaction ID %u is in the future

Changed, along with similarly formatted messages.

>
> +       else if (VARATT_IS_SHORT(chunk))
> +               /*
> +                * could happen due to heap_form_tuple doing its thing
> +                */
> +               chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
>
> Add braces here, since there are multiple lines.

Changed.

>
> +                                                 psprintf(_("toast
> chunk sequence number not the expected sequence number: %u vs. %u"),
>
> toast chunk sequence number %u does not match expected sequence number %u
>
> There are more instances of this kind of thing.

Changed.

> +
> psprintf(_("toasted attribute has unexpected TOAST tag: %u"),
>
> Remove colon.

Changed.

> +
> psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
> %u (attribute length %u)"),
>
> Let's try to specify the attribute number in the attribute messages
> where we can, e.g.
>
> +
> psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
> %u (attribute length %u)"),
>
> How about: attribute %u with length %u should end at offset %u, but
> the tuple length is only %u
I had omitted the attribute numbers from the attribute corruption messages because attnum is one of the OUT parameters from verify_heapam.  I'm including attnum in the message text for this next version, as you request.

> +               if (TransactionIdIsNormal(ctx->relfrozenxid) &&
> +                       TransactionIdPrecedes(xmin, ctx->relfrozenxid))
> +               {
> +                       report_corruption(ctx,
> +                                                         /*
> translator: Both %u are transaction IDs. */
> +
> psprintf(_("inserting transaction ID is from before freeze cutoff: %u
> vs. %u"),
> +
>    xmin, ctx->relfrozenxid));
> +                       fatal = true;
> +               }
> +               else if (!xid_valid_in_rel(xmin, ctx))
> +               {
> +                       report_corruption(ctx,
> +                                                         /*
> translator: %u is a transaction ID. */
> +
> psprintf(_("inserting transaction ID is in the future: %u"),
> +
>    xmin));
> +                       fatal = true;
> +               }
>
> This seems like good evidence that xid_valid_in_rel needs some
> rethinking. As far as I can see, every place where you call
> xid_valid_in_rel, you have checks beforehand that duplicate some of
> what it does, so that you can give a more accurate error message.
> That's not good. Either the message should be adjusted so that it
> covers all the cases "e.g. tuple xmin %u is outside acceptable range
> %u..%u" or we should just get rid of xid_valid_in_rel() and have
> separate error messages for each case, e.g. tuple xmin %u precedes
> relfrozenxid %u".
This next version is refactored, removing the function xid_valid_in_rel entirely, and structuring get_xid_status differently.

> I think it's OK to use terms like xmin and xmax in
> these messages, rather than inserting transaction ID etc. We have
> existing instances of that, and while someone might judge it
> user-unfriendly, I disagree. A person who is qualified to interpret
> this output must know what 'tuplex min' means immediately, but whether
> they can understand that 'inserting transaction ID' means the same
> thing is questionable, I think.

Done.

> This is not a full review, but in general I think that this is getting
> pretty close to being committable. The error messages seem to still
> need some polishing and I wouldn't be surprised if there are a few
> more bugs lurking yet, but I think it's come a long way.

This next version has some other message rewording.  While testing, I found it odd to report an xid as out of bounds (in the future, or before the freeze threshold, etc.), without mentioning the xid value against which it is being compared unfavorably.  We don't normally need to think about the epoch when comparing two xids against each other, as they must both make sense relative to the current epoch; but for corruption, you can't assume the corrupt xid was written relative to any particular epoch, and only the 32-bit xid value can be printed since the epoch is unknown.  The other xid value (freeze threshold, etc) can be printed with the epoch information, but printing the epoch+xid merely as xid8out does (in other words, as a UINT64) makes the messages thoroughly confusing.  I went with the equivalent of sprintf("%u:%u", epoch, xid), which follows the precedent from pg_controldata.c, gistdesc.c, and elsewhere.


Moving on to Peter's reviews....

> On Sep 22, 2020, at 4:18 PM, Peter Geoghegan <[hidden email]> wrote:
>
> On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <[hidden email]> wrote:
>> +REVOKE ALL ON FUNCTION
>> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
>> +FROM PUBLIC;
>>
>> This too.
>
> Do we really want to use a cstring as an enum-like argument?
Perhaps not.  This next version has that as text.

>
> I think that I see a bug at this point in check_tuple() (in
> v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch):
>
>> +   /* If xmax is a multixact, it should be within valid range */
>> +   xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
>> +   if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx))
>> +   {
>
> *** SNIP ***
>
>> +   }
>> +
>> +   /* If xmax is normal, it should be within valid range */
>> +   if (TransactionIdIsNormal(xmax))
>> +   {
>
> Why should it be okay to call TransactionIdIsNormal(xmax) at this
> point? It isn't certain that xmax is an XID at all (could be a
> MultiXactId, since you called HeapTupleHeaderGetRawXmax() to get the
> value in the first place). Don't you need to check "(infomask &
> HEAP_XMAX_IS_MULTI) == 0" here?
I think you are right.  This check you suggest is used in this next version.


> On Sep 22, 2020, at 5:16 PM, Peter Geoghegan <[hidden email]> wrote:
>
> On Sat, Aug 29, 2020 at 10:48 AM Mark Dilger
> <[hidden email]> wrote:
>> I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog.  Ultimately, I ripped that out.  My reasoning was that a simpler patch submission was more likely to be acceptable to the community.
>
> Isn't some kind of pragmatic compromise possible?
>
>> But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.
>
> Fair enough, but if you're alluding to what I said then about
> check_tuphdr_xids()/clog checking a while back then FWIW I didn't
> intend to block progress on clog/xact status verification at all.
I don't recall your comments factoring into my thinking on this specific issue, but rather a conversation I had off-list with Robert.  The clog interface may be a hot enough code path that adding a flag for non-throwing behavior merely to support a contrib module might be resisted.  If folks generally like such a change to the clog interface, I could consider adding that as a third patch in this set.

> I
> just don't think that it is sensible to impose an iron clad guarantee
> about having no assertion failures with corrupt clog data -- that
> leads to far too much code duplication. But why should you need to
> provide an absolute guarantee of that?
>
> I for one would be fine with making the clog checks an optional extra,
> that rescinds the no crash guarantee that you're keen on -- just like
> with the TOAST checks that you have already in v15. It might make
> sense to review how often crashes occur with simulated corruption, and
> then to minimize the number of occurrences in the real world. Maybe we
> could tolerate a usually-no-crash interface to clog -- if it could
> still have assertion failures. Making a strong guarantee about
> assertions seems unnecessary.
>
> I don't see how verify_heapam will avoid raising an error during basic
> validation from PageIsVerified(), which will violate the guarantee
> about not throwing errors. I don't see that as a problem myself, but
> presumably you will.
My concern is not so much that verify_heapam will stop with an error, but rather that it might trigger a panic that stops all backends.  Stopping with an error merely because it hits corruption is not ideal, as I would rather it completed the scan and reported all corruptions found, but that's minor compared to the damage done if verify_heapam creates downtime in a production environment offering high availability guarantees.  That statement might seem nuts, given that the corrupt table itself would be causing downtime, but that analysis depends on assumptions about table access patterns, and there is no a priori reason to think that corrupt pages are necessarily ever being accessed, or accessed in a way that causes crashes (rather than merely wrong results) outside verify_heapam scanning the whole table.






Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




v16-0001-Adding-function-verify_heapam-to-amcheck-module.patch (127K) Download Attachment
v16-0002-Adding-contrib-module-pg_amcheck.patch (122K) Download Attachment
1 ... 3456789