Online checksums verification in the backend

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Sat, Mar 28, 2020 at 12:28:27PM +0900, Masahiko Sawada wrote:
> On Wed, 18 Mar 2020 at 19:11, Julien Rouhaud <[hidden email]> wrote:
> >
> > v5 attached
>
> Thank you for updating the patch. I have some comments:

Thanks a lot for the review!

> 1.
> +       <entry>
> +        <literal><function>pg_check_relation(<parameter>relation</parameter>
> <type>oid</type>, <parameter>fork</parameter>
> <type>text</type>)</function></literal>
> +       </entry>
>
> Looking at the declaration of pg_check_relation, 'relation' and 'fork'
> are optional arguments. So I think the above is not correct. But as I
> commented below, 'relation' should not be optional, so maybe the above
> line could be:
>
> +        <literal><function>pg_check_relation(<parameter>relation</parameter>
> <type>oid</type>[, <parameter>fork</parameter>
> <type>text</type>])</function></literal>
Yes I missed that when making relation mandatory.  Fixed.

> 2.
> +   <indexterm>
> +    <primary>pg_check_relation</primary>
> +   </indexterm>
> +   <para>
> +    <function>pg_check_relation</function> iterates over all the blocks of all
> +    or the specified fork of a given relation and verify their checksum.  It
> +    returns the list of blocks for which the found checksum doesn't match the
> +    expected one.  You must be a member of the
> +    <literal>pg_read_all_stats</literal> role to use this function.  It can
> +    only be used if data checksums are enabled.  See <xref
> +    linkend="app-initdb-data-checksums"/> for more information.
> +   </para>
>
> * I think we need a description about possible values for 'fork'
> (i.g., 'main', 'vm', 'fsm' and 'init'), and the behavior when 'fork'
> is omitted.
Done.

> * Do we need to explain about checksum cost-based delay here?

It's probably better in config.sgml, nearby vacuum cost-based delay, so done
this way with a link to reference that part.

> 3.
> +CREATE OR REPLACE FUNCTION pg_check_relation(
> +  IN relation regclass DEFAULT NULL::regclass, IN fork text DEFAULT NULL::text,
> +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> +  OUT expected_checksum integer, OUT found_checksum integer)
> +  RETURNS SETOF record VOLATILE LANGUAGE internal AS 'pg_check_relation'
> +  PARALLEL RESTRICTED;
>
> Now that pg_check_relation doesn't accept NULL as 'relation', I think
> we need to make 'relation' a mandatory argument.
Correct, fixed.

> 4.
> +   /* Check if the relation (still) exists */
> +   if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> +   {
> +       /*
> +        * We use a standard relation_open() to acquire the initial lock.  It
> +        * means that this will block until the lock is acquired, or will
> +        * raise an ERROR if lock_timeout has been set.  If caller wants to
> +        * check multiple tables while relying on a maximum wait time, it
> +        * should process tables one by one instead of relying on a global
> +        * processing with the main SRF.
> +        */
> +       relation = relation_open(relid, AccessShareLock);
> +   }
>
> IIUC the above was necessary because we used to have
> check_all_relations() which iterates all relations on the database to
> do checksum checks. But now that we don't have that function and
> pg_check_relation processes one relation. Can we just do
> relation_open() here?
Ah yes I missed that comment.  I think only the comment needed to be updated to
remove any part related to NULL-relation call.  I ended up removign the whole
comment since locking and lock_timeout behavior is inherent to relation_open
and there's no need to document that any further now that we always only check
one relation at a time.

> 5.
> I think we need to check if the relation is a temp relation. I'm not
> sure it's worth to check checksums of temp relations but at least we
> need not to check other's temp relations.

Good point.  I think it's still worthwhile to check the backend's temp
relation, although typical usage should be a bgworker/cron job doing that check
so there shouldn't be any.

> 6.
> +/*
> + * Safely read the wanted buffer from disk, dealing with possible concurrency
> + * issue.  Note that if a buffer is found dirty in shared_buffers, no read will
> + * be performed and the caller will be informed that no check should be done.
> + * We can safely ignore such buffers as they'll be written before next
> + * checkpoint's completion..
> [...]
> + */
>
> I think the above comment also needs some "/*-------" at the beginning and end.
Fixed.

> 7.
> +static void
> +check_get_buffer(Relation relation, ForkNumber forknum,
> +                BlockNumber blkno, char *buffer, bool needlock, bool *checkit,
> +                bool *found_in_sb)
> +{
>
> Maybe we can make check_get_buffer() return a bool indicating we found
> a buffer to check, instead of having '*checkit'. That way, we can
> simplify the following code:
>
> +       check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> +                        &checkit, &found_in_sb);
> +
> +       if (!checkit)
> +           continue;
>
> to something like:
>
> + if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> +                          &found_in_sb))
> +     continue;
Changed.

> 8.
> +       if (PageIsVerified(buffer, blkno))
> +       {
> +           /*
> +            * If the page is really new, there won't by any checksum to be
> +            * computed or expected.
> +            */
> +           *chk_expected = *chk_found = NoComputedChecksum;
> +           return true;
> +       }
> +       else
> +       {
> +           /*
> +            * There's a corruption, but since this affect PageIsNew, we
> +            * can't compute a checksum, so set NoComputedChecksum for the
> +            * expected checksum.
> +            */
> +           *chk_expected = NoComputedChecksum;
> +           *chk_found = hdr->pd_checksum;
> +       }
> +       return false;
>
> * I think the 'else' is not necessary here.
AFAICT it's, see below.

> * Setting *chk_expected and *chk_found seems useless when we return
> true. The caller doesn't use them.

Indeed, fixed.

> * Should we forcibly overwrite ignore_checksum_failure to off in
> pg_check_relation()? Otherwise, this logic seems not work fine.
>
> * I don't understand why we cannot compute a checksum in case where a
> page looks like a new page but is actually corrupted. Could you please
> elaborate on that?

PageIsVerified has a different behavior depending on whether the page looks new
or not.  If the page looks like new, it only checks that it's indeed a new
page, and otherwise try to verify the checksum.

Also, pg_check_page() has an assert to make sure that the page isn't (or don't
look like) new.

So it seems to me that the 'else' is required to properly detect a real or fake
PageIsNew, and try to compute checksums only when required.

> 8.
> +   {
> +       {"checksum_cost_page_hit", PGC_USERSET, RESOURCES_CHECKSUM_DELAY,
> +           gettext_noop("Checksum cost for a page found in the buffer cache."),
> +           NULL
> +       },
> +       &ChecksumCostPageHit,
> +       1, 0, 10000,
> +       NULL, NULL, NULL
> +   },
>
> * There is no description about the newly added four GUC parameters in the doc.
>
> * We need to put new GUC parameters into postgresql.conf.sample as well.
Fixed both.

> * The patch doesn't use checksum_cost_page_hit at all.

Indeed, I also realized that while working on previous issues.  I removed it
and renamed checksum_cost_page_miss to checksum_cost_page.

>
> 9.
> diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
> index eb19644419..37f63e747c 100644
> --- a/src/backend/utils/init/globals.c
> +++ b/src/backend/utils/init/globals.c
> @@ -134,6 +134,14 @@ int            max_worker_processes = 8;
>  int            max_parallel_workers = 8;
>  int            MaxBackends = 0;
>
> +int            ChecksumCostPageHit = 1;    /* GUC parameters for
> checksum check */
> +int            ChecksumCostPageMiss = 10;
> +int            ChecksumCostLimit = 200;
> +double     ChecksumCostDelay = 0;
> +
> +int            ChecksumCostBalance = 0;    /* working state for
> checksums check */
> +bool       ChecksumCostActive = false;
>
> Can we declare them in checksum.c since these parameters are used only
> in checksum.c and it does I/O my itself.
The GUC parameters would still need to be global, so for consistency I kept all
the variables in globals.c.

>
> 10.
> +       /* Report the failure to the stat collector and the logs. */
> +       pgstat_report_checksum_failure();
> +       ereport(WARNING,
> +               (errcode(ERRCODE_DATA_CORRUPTED),
> +                errmsg("invalid page in block %u of relation %s",
> +                       blkno,
> +                       relpath(relation->rd_smgr->smgr_rnode, forknum))));
>
> I think we could do pgstat_report_checksum_failure() and emit WARNING
> twice for the same page since PageIsVerified() also does the same.
As mentioned before, in this patch I only calls PageIsVerified() if the buffer
looks like new, and in this case PageIsVerified() only verify that it's a true
all-zero-page, and won't try to verify the checksum, so there's no possibility
of duplicated report.  I modified the comments to document all the interactions
and expectations.

v6 attached.

v6-0001-Add-a-pg_check_relation-function.patch (43K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Masahiko Sawada-2
On Sat, 28 Mar 2020 at 21:19, Julien Rouhaud <[hidden email]> wrote:
>
> On Sat, Mar 28, 2020 at 12:28:27PM +0900, Masahiko Sawada wrote:
> > On Wed, 18 Mar 2020 at 19:11, Julien Rouhaud <[hidden email]> wrote:
> > >
> > > v5 attached
> >
> > Thank you for updating the patch. I have some comments:
>
> Thanks a lot for the review!

Thank you for updating the patch!

> > 4.
> > +   /* Check if the relation (still) exists */
> > +   if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> > +   {
> > +       /*
> > +        * We use a standard relation_open() to acquire the initial lock.  It
> > +        * means that this will block until the lock is acquired, or will
> > +        * raise an ERROR if lock_timeout has been set.  If caller wants to
> > +        * check multiple tables while relying on a maximum wait time, it
> > +        * should process tables one by one instead of relying on a global
> > +        * processing with the main SRF.
> > +        */
> > +       relation = relation_open(relid, AccessShareLock);
> > +   }
> >
> > IIUC the above was necessary because we used to have
> > check_all_relations() which iterates all relations on the database to
> > do checksum checks. But now that we don't have that function and
> > pg_check_relation processes one relation. Can we just do
> > relation_open() here?
>
> Ah yes I missed that comment.  I think only the comment needed to be updated to
> remove any part related to NULL-relation call.  I ended up removign the whole
> comment since locking and lock_timeout behavior is inherent to relation_open
> and there's no need to document that any further now that we always only check
> one relation at a time.

The current patch still checks SearchSysCacheExists1() before
relation_open. Why do we need to call SearchSysCacheExists1() here? I
think if the given relation doesn't exist, relation_open() will raise
an error "could not open relation with OID %u".

+   /* Open the relation if it exists.  */
+   if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
+   {
+       relation = relation_open(relid, AccessShareLock);
+   }


> > 8.
> > +       if (PageIsVerified(buffer, blkno))
> > +       {
> > +           /*
> > +            * If the page is really new, there won't by any checksum to be
> > +            * computed or expected.
> > +            */
> > +           *chk_expected = *chk_found = NoComputedChecksum;
> > +           return true;
> > +       }
> > +       else
> > +       {
> > +           /*
> > +            * There's a corruption, but since this affect PageIsNew, we
> > +            * can't compute a checksum, so set NoComputedChecksum for the
> > +            * expected checksum.
> > +            */
> > +           *chk_expected = NoComputedChecksum;
> > +           *chk_found = hdr->pd_checksum;
> > +       }
> > +       return false;
> >
> > * I think the 'else' is not necessary here.
>
> AFAICT it's, see below.
>
> > * Setting *chk_expected and *chk_found seems useless when we return
> > true. The caller doesn't use them.
>
> Indeed, fixed.

The patch still sets values to both?

+       if (PageIsVerified(buffer, blkno))
+       {
+           /* No corruption. */
+           *chk_expected = *chk_found = NoComputedChecksum;
+           return true;
+       }

>
> > * Should we forcibly overwrite ignore_checksum_failure to off in
> > pg_check_relation()? Otherwise, this logic seems not work fine.
> >
> > * I don't understand why we cannot compute a checksum in case where a
> > page looks like a new page but is actually corrupted. Could you please
> > elaborate on that?
>
> PageIsVerified has a different behavior depending on whether the page looks new
> or not.  If the page looks like new, it only checks that it's indeed a new
> page, and otherwise try to verify the checksum.
>
> Also, pg_check_page() has an assert to make sure that the page isn't (or don't
> look like) new.
>
> So it seems to me that the 'else' is required to properly detect a real or fake
> PageIsNew, and try to compute checksums only when required.

Thank you for your explanation! I understand.

I thought we can arrange the code to something like:

if (PageIsNew(hdr))
{
    if (PageIsVerified(hdr))
    {
        *chk_expected = *chk_found = NoComputedChecksum;
        return true;
    }

    *chk_expected = NoComputedChecksum;
    *chk_found = hdr->pd_checksum;
    return false;
}

But since it's not a critical problem you can ignore it.

>
> > 8.
> > +   {
> > +       {"checksum_cost_page_hit", PGC_USERSET, RESOURCES_CHECKSUM_DELAY,
> > +           gettext_noop("Checksum cost for a page found in the buffer cache."),
> > +           NULL
> > +       },
> > +       &ChecksumCostPageHit,
> > +       1, 0, 10000,
> > +       NULL, NULL, NULL
> > +   },
> >
> > * There is no description about the newly added four GUC parameters in the doc.
> >
> > * We need to put new GUC parameters into postgresql.conf.sample as well.
>
> Fixed both.
>
> > * The patch doesn't use checksum_cost_page_hit at all.
>
> Indeed, I also realized that while working on previous issues.  I removed it
> and renamed checksum_cost_page_miss to checksum_cost_page.

Perhaps we can use checksum_cost_page_hit when we found the page in
the shared buffer but it's marked as dirty?

> >
> > 9.
> > diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
> > index eb19644419..37f63e747c 100644
> > --- a/src/backend/utils/init/globals.c
> > +++ b/src/backend/utils/init/globals.c
> > @@ -134,6 +134,14 @@ int            max_worker_processes = 8;
> >  int            max_parallel_workers = 8;
> >  int            MaxBackends = 0;
> >
> > +int            ChecksumCostPageHit = 1;    /* GUC parameters for
> > checksum check */
> > +int            ChecksumCostPageMiss = 10;
> > +int            ChecksumCostLimit = 200;
> > +double     ChecksumCostDelay = 0;
> > +
> > +int            ChecksumCostBalance = 0;    /* working state for
> > checksums check */
> > +bool       ChecksumCostActive = false;
> >
> > Can we declare them in checksum.c since these parameters are used only
> > in checksum.c and it does I/O my itself.
>
> The GUC parameters would still need to be global, so for consistency I kept all
> the variables in globals.c.

Okay.

> >
> > 10.
> > +       /* Report the failure to the stat collector and the logs. */
> > +       pgstat_report_checksum_failure();
> > +       ereport(WARNING,
> > +               (errcode(ERRCODE_DATA_CORRUPTED),
> > +                errmsg("invalid page in block %u of relation %s",
> > +                       blkno,
> > +                       relpath(relation->rd_smgr->smgr_rnode, forknum))));
> >
> > I think we could do pgstat_report_checksum_failure() and emit WARNING
> > twice for the same page since PageIsVerified() also does the same.
>
> As mentioned before, in this patch I only calls PageIsVerified() if the buffer
> looks like new, and in this case PageIsVerified() only verify that it's a true
> all-zero-page, and won't try to verify the checksum, so there's no possibility
> of duplicated report.  I modified the comments to document all the interactions
> and expectations.

You're right. Thank you for the explanation!

I've read the latest patch and here is random comments:

1.
+       /*
+        * Add a page miss cost, as we're always reading outside the shared
+        * buffers.
+        */
+       /* Add a page cost. */
+       ChecksumCostBalance += ChecksumCostPage;

There are duplicate comments.

2.
+       /* Dirty pages are ignored as they'll be flushed soon. */
+       if (buf_state & BM_DIRTY)
+           checkit = false;

Should we check the buffer if it has BM_TAG_VALID as well here? I
thought there might be a possibility that BufTableLookup() returns a
buf_Id but its buffer tag is not valid for example when the previous
read failed after inserting the buffer tag to the buffer table.

3.
+   /* Add a page cost. */
+   ChecksumCostBalance += ChecksumCostPage;
+
+   return checkit;
+}

The check_get_buffer() seems to be slightly complex to me but when we
reached the end of this function we always return true. Similarly, in
the case where we read the block while holding a partition lock we
always return true as well. Is my understanding right? If so, it might
be better to put some assertions.

4.
@@ -10825,6 +10825,14 @@
   proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}',
   proargnames => '{tablespace,name,size,modification}',
   prosrc => 'pg_ls_tmpdir_1arg' },
+{ oid => '9147', descr => 'check data integrity for one or all relations',
+  proname => 'pg_check_relation', proisstrict => 'f', procost => '10000',
+  prorows => '20', proretset => 't', proparallel => 'r',
+  provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text',
+  proallargtypes => '{regclass,text,oid,int4,int8,int4,int4}',
+  proargmodes => '{i,i,o,o,o,o,o}',
+  proargnames =>
'{relation,fork,relid,forknum,failed_blocknum,expected_checksum,found_checksum}',
+  prosrc => 'pg_check_relation' },

Why is the pg_check_relation() is not a strict function? I think
prostrict can be 'true' for this function and we can drop checking if
the first argument is NULL.

5.
+       memset(values, 0, sizeof(values));
+       memset(nulls, 0, sizeof(nulls));

I think we can do memset right before setting values to them, that is,
after checking (!found_in_sb && !force_lock).

6.
+static bool
+check_buffer(char *buffer, uint32 blkno, uint16 *chk_expected,
+                  uint16 *chk_found)
+{
+   PageHeader  hdr = (PageHeader) buffer;
+
+   Assert(chk_expected && chk_found);
+
+   if (PageIsNew(hdr))
+   {
+       /*
+        * Check if the page is really new or if there's a corruption that
+        * affected PageIsNew detection.  Note that PageIsVerified won't try to
+        * detect checksum corruption in this case, so there's no risk of
+        * duplicated corruption report.
+        */
+       if (PageIsVerified(buffer, blkno))

How about using Page instead of PageHeader? Looking at other codes,
ISTM we usually pass Page to both PageIsNew() and PageIsVerified().

7.
+       <entry>
+        <literal><function>pg_check_relation(<parameter>relation</parameter>
<type>oid</type>[, <parameter>fork</parameter>
<type>text</type>])</function></literal>.
+       </entry>

+{ oid => '9147', descr => 'check data integrity for one or all relations',
+  proname => 'pg_check_relation', proisstrict => 'f', procost => '10000',
+  prorows => '20', proretset => 't', proparallel => 'r',
+  provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text',
+  proallargtypes => '{regclass,text,oid,int4,int8,int4,int4}',
+  proargmodes => '{i,i,o,o,o,o,o}',
+  proargnames =>
'{relation,fork,relid,forknum,failed_blocknum,expected_checksum,found_checksum}',
+  prosrc => 'pg_check_relation' },

The function argument data types don't match in the doc and function
declaretion. relation is 'oid' in the doc but is 'regclass' in the
function declaretion.

8.
+#define SRF_COLS           5   /* Number of output arguments in the SRF */

Looking at similar built-in functions that return set of records they
use a more specific name for the number of returned columns such as
PG_STAT_GET_WAL_SENDERS_COLS and PG_GET_SHMEM_SIZES_COLS. How about
PG_CHECK_RELATION_COLS?

check_relation_fork() seems to quite depends on pg_check_relation()
because the returned tuplestore is specified by pg_check_relation().
It's just an idea but to improve reusability, how about moving
check_relation_fork() to checksumfunc.c? That is, in checksumfuncs.c
while iterating all blocks we call a new function in checksum.c, say
check_one_block() function, which has the following part and is
responsible for getting, checking the specified block and returning a
boolean indicating whether the block has corruption or not, along with
chk_found and chk_expected:

        /*
         * To avoid too much overhead, the buffer will be first read without
         * the locks that would guarantee the lack of false positive, as such
         * events should be quite rare.
         */
Retry:
        if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
                              &found_in_sb))
            continue;

        if (check_buffer(buffer, blkno, &chk_expected, &chk_found))
            continue;

        /*
         * If we get a failure and the buffer wasn't found in shared buffers,
         * reread the buffer with suitable lock to avoid false positive.  See
         * check_get_buffer for more details.
         */
        if (!found_in_sb && !force_lock)
        {
            force_lock = true;
            goto Retry;
        }

A new function in checksumfuncs.c or pg_check_relation will be
responsible for storing the result to the tuplestore. That way,
check_one_block() will be useful for other use when we want to check
if the particular block has corruption with low overhead.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Fri, Apr 03, 2020 at 12:24:50PM +0900, Masahiko Sawada wrote:

> On Sat, 28 Mar 2020 at 21:19, Julien Rouhaud <[hidden email]> wrote:
> >
> The current patch still checks SearchSysCacheExists1() before
> relation_open. Why do we need to call SearchSysCacheExists1() here? I
> think if the given relation doesn't exist, relation_open() will raise
> an error "could not open relation with OID %u".
>
> +   /* Open the relation if it exists.  */
> +   if (SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
> +   {
> +       relation = relation_open(relid, AccessShareLock);
> +   }
Oops yes sorry about that.  Fixed.


> > > 8.
> > > +       if (PageIsVerified(buffer, blkno))
> > > +       {
> > > +           /*
> > > +            * If the page is really new, there won't by any checksum to be
> > > +            * computed or expected.
> > > +            */
> > > +           *chk_expected = *chk_found = NoComputedChecksum;
> > > +           return true;
> > > +       }
> > > +       else
> > > +       {
> > > +           /*
> > > +            * There's a corruption, but since this affect PageIsNew, we
> > > +            * can't compute a checksum, so set NoComputedChecksum for the
> > > +            * expected checksum.
> > > +            */
> > > +           *chk_expected = NoComputedChecksum;
> > > +           *chk_found = hdr->pd_checksum;
> > > +       }
> > > +       return false;
> > >
> > > * I think the 'else' is not necessary here.
> >
> > AFAICT it's, see below.
> >
> > > * Setting *chk_expected and *chk_found seems useless when we return
> > > true. The caller doesn't use them.
> >
> > Indeed, fixed.
>
> The patch still sets values to both?
>
> +       if (PageIsVerified(buffer, blkno))
> +       {
> +           /* No corruption. */
> +           *chk_expected = *chk_found = NoComputedChecksum;
> +           return true;
> +       }

Sorry again, fixed.


> > > * Should we forcibly overwrite ignore_checksum_failure to off in
> > > pg_check_relation()? Otherwise, this logic seems not work fine.
> > >
> > > * I don't understand why we cannot compute a checksum in case where a
> > > page looks like a new page but is actually corrupted. Could you please
> > > elaborate on that?
> >
> > PageIsVerified has a different behavior depending on whether the page looks new
> > or not.  If the page looks like new, it only checks that it's indeed a new
> > page, and otherwise try to verify the checksum.
> >
> > Also, pg_check_page() has an assert to make sure that the page isn't (or don't
> > look like) new.
> >
> > So it seems to me that the 'else' is required to properly detect a real or fake
> > PageIsNew, and try to compute checksums only when required.
>
> Thank you for your explanation! I understand.
>
> I thought we can arrange the code to something like:
>
> if (PageIsNew(hdr))
> {
>     if (PageIsVerified(hdr))
>     {
>         *chk_expected = *chk_found = NoComputedChecksum;
>         return true;
>     }
>
>     *chk_expected = NoComputedChecksum;
>     *chk_found = hdr->pd_checksum;
>     return false;
> }
>
> But since it's not a critical problem you can ignore it.

I like it, so done!


> > > 8.
> > > +   {
> > > +       {"checksum_cost_page_hit", PGC_USERSET, RESOURCES_CHECKSUM_DELAY,
> > > +           gettext_noop("Checksum cost for a page found in the buffer cache."),
> > > +           NULL
> > > +       },
> > > +       &ChecksumCostPageHit,
> > > +       1, 0, 10000,
> > > +       NULL, NULL, NULL
> > > +   },
> > >
> > > * There is no description about the newly added four GUC parameters in the doc.
> > >
> > > * We need to put new GUC parameters into postgresql.conf.sample as well.
> >
> > Fixed both.
> >
> > > * The patch doesn't use checksum_cost_page_hit at all.
> >
> > Indeed, I also realized that while working on previous issues.  I removed it
> > and renamed checksum_cost_page_miss to checksum_cost_page.
>
> Perhaps we can use checksum_cost_page_hit when we found the page in
> the shared buffer but it's marked as dirty?

The thing is that when the buffer is dirty, we won't do any additional check,
thus not adding any overhead.  What may be needed here is to account for the
locking overhead (in all cases), so that if all (or almost all) the buffers are
dirty and in shared buffers the execution can be throttled.  I don't know how
much an issue it can be, but if that's something to be fixes then page_hit
doesn't look like the right answer for that.


> I've read the latest patch and here is random comments:
>
> 1.
> +       /*
> +        * Add a page miss cost, as we're always reading outside the shared
> +        * buffers.
> +        */
> +       /* Add a page cost. */
> +       ChecksumCostBalance += ChecksumCostPage;
>
> There are duplicate comments.
Fixed.


> 2.
> +       /* Dirty pages are ignored as they'll be flushed soon. */
> +       if (buf_state & BM_DIRTY)
> +           checkit = false;
>
> Should we check the buffer if it has BM_TAG_VALID as well here? I
> thought there might be a possibility that BufTableLookup() returns a
> buf_Id but its buffer tag is not valid for example when the previous
> read failed after inserting the buffer tag to the buffer table.


Good point, fixed.


> 3.
> +   /* Add a page cost. */
> +   ChecksumCostBalance += ChecksumCostPage;
> +
> +   return checkit;
> +}
>
> The check_get_buffer() seems to be slightly complex to me but when we
> reached the end of this function we always return true. Similarly, in
> the case where we read the block while holding a partition lock we
> always return true as well. Is my understanding right? If so, it might
> be better to put some assertions.

Yes it's a little bit complex.  I used this approach to avoid the need to
release the locks all over the place, but maybe this doesn't really improve
things.  I added asserts and comments anyway as suggested, thanks.


> 4.
> @@ -10825,6 +10825,14 @@
>    proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}',
>    proargnames => '{tablespace,name,size,modification}',
>    prosrc => 'pg_ls_tmpdir_1arg' },
> +{ oid => '9147', descr => 'check data integrity for one or all relations',
> +  proname => 'pg_check_relation', proisstrict => 'f', procost => '10000',
> +  prorows => '20', proretset => 't', proparallel => 'r',
> +  provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text',
> +  proallargtypes => '{regclass,text,oid,int4,int8,int4,int4}',
> +  proargmodes => '{i,i,o,o,o,o,o}',
> +  proargnames =>
> '{relation,fork,relid,forknum,failed_blocknum,expected_checksum,found_checksum}',
> +  prosrc => 'pg_check_relation' },
>
> Why is the pg_check_relation() is not a strict function? I think
> prostrict can be 'true' for this function and we can drop checking if
> the first argument is NULL.

That's because the fork is still optional.  While this could be made mandatory
without much problems, I think we'll eventually want to add a way to check only
a subset of a fork, so it seemed to me that is wasn't worth changing that now.


> 5.
> +       memset(values, 0, sizeof(values));
> +       memset(nulls, 0, sizeof(nulls));
>
> I think we can do memset right before setting values to them, that is,
> after checking (!found_in_sb && !force_lock).


Indeed, done!


> 6.
> +static bool
> +check_buffer(char *buffer, uint32 blkno, uint16 *chk_expected,
> +                  uint16 *chk_found)
> +{
> +   PageHeader  hdr = (PageHeader) buffer;
> +
> +   Assert(chk_expected && chk_found);
> +
> +   if (PageIsNew(hdr))
> +   {
> +       /*
> +        * Check if the page is really new or if there's a corruption that
> +        * affected PageIsNew detection.  Note that PageIsVerified won't try to
> +        * detect checksum corruption in this case, so there's no risk of
> +        * duplicated corruption report.
> +        */
> +       if (PageIsVerified(buffer, blkno))
>
> How about using Page instead of PageHeader? Looking at other codes,
> ISTM we usually pass Page to both PageIsNew() and PageIsVerified().

Agreed, done.


> 7.
> +       <entry>
> +        <literal><function>pg_check_relation(<parameter>relation</parameter>
> <type>oid</type>[, <parameter>fork</parameter>
> <type>text</type>])</function></literal>.
> +       </entry>
>
> +{ oid => '9147', descr => 'check data integrity for one or all relations',
> +  proname => 'pg_check_relation', proisstrict => 'f', procost => '10000',
> +  prorows => '20', proretset => 't', proparallel => 'r',
> +  provolatile => 'v', prorettype => 'record', proargtypes => 'regclass text',
> +  proallargtypes => '{regclass,text,oid,int4,int8,int4,int4}',
> +  proargmodes => '{i,i,o,o,o,o,o}',
> +  proargnames =>
> '{relation,fork,relid,forknum,failed_blocknum,expected_checksum,found_checksum}',
> +  prosrc => 'pg_check_relation' },
>
> The function argument data types don't match in the doc and function
> declaretion. relation is 'oid' in the doc but is 'regclass' in the
> function declaretion.

Fixed.


> 8.
> +#define SRF_COLS           5   /* Number of output arguments in the SRF */
>
> Looking at similar built-in functions that return set of records they
> use a more specific name for the number of returned columns such as
> PG_STAT_GET_WAL_SENDERS_COLS and PG_GET_SHMEM_SIZES_COLS. How about
> PG_CHECK_RELATION_COLS?
>
> check_relation_fork() seems to quite depends on pg_check_relation()
> because the returned tuplestore is specified by pg_check_relation().
> It's just an idea but to improve reusability, how about moving
> check_relation_fork() to checksumfunc.c? That is, in checksumfuncs.c
> while iterating all blocks we call a new function in checksum.c, say
> check_one_block() function, which has the following part and is
> responsible for getting, checking the specified block and returning a
> boolean indicating whether the block has corruption or not, along with
> chk_found and chk_expected:
>
>         /*
>          * To avoid too much overhead, the buffer will be first read without
>          * the locks that would guarantee the lack of false positive, as such
>          * events should be quite rare.
>          */
> Retry:
>         if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
>                               &found_in_sb))
>             continue;
>
>         if (check_buffer(buffer, blkno, &chk_expected, &chk_found))
>             continue;
>
>         /*
>          * If we get a failure and the buffer wasn't found in shared buffers,
>          * reread the buffer with suitable lock to avoid false positive.  See
>          * check_get_buffer for more details.
>          */
>         if (!found_in_sb && !force_lock)
>         {
>             force_lock = true;
>             goto Retry;
>         }
>
> A new function in checksumfuncs.c or pg_check_relation will be
> responsible for storing the result to the tuplestore. That way,
> check_one_block() will be useful for other use when we want to check
> if the particular block has corruption with low overhead.

Yes, I agree that passing the tuplestore isn't an ideal approach and some
refactoring should probably happen.  One thing is that this wouldn't be
"check_one_block()" but "check_one_block_on_disk()" (which could also be from
the OS cache).  I'm not sure how useful it's in itself.  It also raises some
concerns about the throttling.  I didn't change that for now, but I hope
there'll be some other feedback about it.

v6-0001-Add-a-pg_check_relation-function.patch (43K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Fri, Apr 03, 2020 at 11:39:11AM +0200, Julien Rouhaud wrote:

> On Fri, Apr 03, 2020 at 12:24:50PM +0900, Masahiko Sawada wrote:
> >
> > check_relation_fork() seems to quite depends on pg_check_relation()
> > because the returned tuplestore is specified by pg_check_relation().
> > It's just an idea but to improve reusability, how about moving
> > check_relation_fork() to checksumfunc.c? That is, in checksumfuncs.c
> > while iterating all blocks we call a new function in checksum.c, say
> > check_one_block() function, which has the following part and is
> > responsible for getting, checking the specified block and returning a
> > boolean indicating whether the block has corruption or not, along with
> > chk_found and chk_expected:
> >
> >         /*
> >          * To avoid too much overhead, the buffer will be first read without
> >          * the locks that would guarantee the lack of false positive, as such
> >          * events should be quite rare.
> >          */
> > Retry:
> >         if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> >                               &found_in_sb))
> >             continue;
> >
> >         if (check_buffer(buffer, blkno, &chk_expected, &chk_found))
> >             continue;
> >
> >         /*
> >          * If we get a failure and the buffer wasn't found in shared buffers,
> >          * reread the buffer with suitable lock to avoid false positive.  See
> >          * check_get_buffer for more details.
> >          */
> >         if (!found_in_sb && !force_lock)
> >         {
> >             force_lock = true;
> >             goto Retry;
> >         }
> >
> > A new function in checksumfuncs.c or pg_check_relation will be
> > responsible for storing the result to the tuplestore. That way,
> > check_one_block() will be useful for other use when we want to check
> > if the particular block has corruption with low overhead.
>
>
> Yes, I agree that passing the tuplestore isn't an ideal approach and some
> refactoring should probably happen.  One thing is that this wouldn't be
> "check_one_block()" but "check_one_block_on_disk()" (which could also be from
> the OS cache).  I'm not sure how useful it's in itself.  It also raises some
> concerns about the throttling.  I didn't change that for now, but I hope
> there'll be some other feedback about it.
>

I had some time this morning, so I did the suggested refactoring as it seems
like a way cleaner interface.  I also kept the suggested check_one_block().

v7-0001-Add-a-pg_check_relation-function.patch (44K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Masahiko Sawada-2
On Sat, 4 Apr 2020 at 18:04, Julien Rouhaud <[hidden email]> wrote:

>
> On Fri, Apr 03, 2020 at 11:39:11AM +0200, Julien Rouhaud wrote:
> > On Fri, Apr 03, 2020 at 12:24:50PM +0900, Masahiko Sawada wrote:
> > >
> > > check_relation_fork() seems to quite depends on pg_check_relation()
> > > because the returned tuplestore is specified by pg_check_relation().
> > > It's just an idea but to improve reusability, how about moving
> > > check_relation_fork() to checksumfunc.c? That is, in checksumfuncs.c
> > > while iterating all blocks we call a new function in checksum.c, say
> > > check_one_block() function, which has the following part and is
> > > responsible for getting, checking the specified block and returning a
> > > boolean indicating whether the block has corruption or not, along with
> > > chk_found and chk_expected:
> > >
> > >         /*
> > >          * To avoid too much overhead, the buffer will be first read without
> > >          * the locks that would guarantee the lack of false positive, as such
> > >          * events should be quite rare.
> > >          */
> > > Retry:
> > >         if (!check_get_buffer(relation, forknum, blkno, buffer, force_lock,
> > >                               &found_in_sb))
> > >             continue;
> > >
> > >         if (check_buffer(buffer, blkno, &chk_expected, &chk_found))
> > >             continue;
> > >
> > >         /*
> > >          * If we get a failure and the buffer wasn't found in shared buffers,
> > >          * reread the buffer with suitable lock to avoid false positive.  See
> > >          * check_get_buffer for more details.
> > >          */
> > >         if (!found_in_sb && !force_lock)
> > >         {
> > >             force_lock = true;
> > >             goto Retry;
> > >         }
> > >
> > > A new function in checksumfuncs.c or pg_check_relation will be
> > > responsible for storing the result to the tuplestore. That way,
> > > check_one_block() will be useful for other use when we want to check
> > > if the particular block has corruption with low overhead.
> >
> >
> > Yes, I agree that passing the tuplestore isn't an ideal approach and some
> > refactoring should probably happen.  One thing is that this wouldn't be
> > "check_one_block()" but "check_one_block_on_disk()" (which could also be from
> > the OS cache).  I'm not sure how useful it's in itself.  It also raises some
> > concerns about the throttling.  I didn't change that for now, but I hope
> > there'll be some other feedback about it.
> >
>
>
> I had some time this morning, so I did the suggested refactoring as it seems
> like a way cleaner interface.  I also kept the suggested check_one_block().

Thank you for updating the patch! The patch looks good to me. Here are
some random comments mostly about cosmetic changes.

1.
I think we can have two separate SQL functions:
pg_check_relation(regclass, text) and pg_check_relation(regclass),
instead of setting NULL by default to the second argument.

2.
+ * Check data sanity for a specific block in the given fork of the given
+ * relation, always retrieved locally with smgrred even if a version exists in

s/smgrred/smgrread/

3.
+       /* The buffer will have to check checked. */
+       Assert(checkit);

Should it be "The buffer will have to be checked"?

4.
+   if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
+       ereport(ERROR,
+               (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+                errmsg("only superuser or a member of the
pg_read_server_files role may use this function")));

Looking at the definition of pg_stat_read_server_files role, this role
seems to be for operations that could read non-database files such as
csv files. Therefore, currently this role is used by file_fdw and COPY
command. I personally think pg_stat_scan_tables would be more
appropriate for this function but I'm not sure.

5.
+   /* Set cost-based vacuum delay */
+   ChecksumCostActive = (ChecksumCostDelay > 0);
+   ChecksumCostBalance = 0;

s/vacuum/checksum verification/

6.
+       ereport(WARNING,
+               (errcode(ERRCODE_DATA_CORRUPTED),
+                errmsg("invalid page in block %u of relation %s",
+                       blkno,
+                       relpath(relation->rd_smgr->smgr_rnode, forknum))));

I think it's better to show the relation name instead of the relation path here.

7.
+       ereport(ERROR,
+               (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                errmsg("relation \"%s\" does not have storage to be checked",
+                quote_qualified_identifier(
+                    get_namespace_name(get_rel_namespace(relid)),
+                    get_rel_name(relid)))));

Looking at other similar error messages we don't show qualified
relation name but the relation name gotten by
RelationGetRelationName(relation). Can we do that here as well for
consistency?

8.
+   if (!(rsinfo->allowedModes & SFRM_Materialize))
+       ereport(ERROR,
+               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                errmsg("materialize mode required, but it is not " \
+                       "allowed in this context")));

I think it's better to have this error message in one line for easy grepping.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Sun, Apr 05, 2020 at 01:13:30PM +0900, Masahiko Sawada wrote:
>
> Thank you for updating the patch! The patch looks good to me. Here are
> some random comments mostly about cosmetic changes.
>

Thanks a lot for the review!

>
> 1.
> I think we can have two separate SQL functions:
> pg_check_relation(regclass, text) and pg_check_relation(regclass),
> instead of setting NULL by default to the second argument.
>

I'm fine with it, so implemented this way with the required documentation
changes.

>
> 2.
> + * Check data sanity for a specific block in the given fork of the given
> + * relation, always retrieved locally with smgrred even if a version exists in
>
> s/smgrred/smgrread/

Fixed.

>
> 3.
> +       /* The buffer will have to check checked. */
> +       Assert(checkit);
>
> Should it be "The buffer will have to be checked"?
>

Oops indeed, fixed.

>
> 4.
> +   if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
> +       ereport(ERROR,
> +               (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> +                errmsg("only superuser or a member of the
> pg_read_server_files role may use this function")));
>
> Looking at the definition of pg_stat_read_server_files role, this role
> seems to be for operations that could read non-database files such as
> csv files. Therefore, currently this role is used by file_fdw and COPY
> command. I personally think pg_stat_scan_tables would be more
> appropriate for this function but I'm not sure.
>
That's a very good point, especially since the documentation of this default
role is quite relevant for those functions:

"Execute monitoring functions that may take ACCESS SHARE locks on tables,
potentially for a long time."

So changed!

>
> 5.
> +   /* Set cost-based vacuum delay */
> +   ChecksumCostActive = (ChecksumCostDelay > 0);
> +   ChecksumCostBalance = 0;
>
> s/vacuum/checksum verification/
>

Fixed.

>
> 6.
> +       ereport(WARNING,
> +               (errcode(ERRCODE_DATA_CORRUPTED),
> +                errmsg("invalid page in block %u of relation %s",
> +                       blkno,
> +                       relpath(relation->rd_smgr->smgr_rnode, forknum))));
>
> I think it's better to show the relation name instead of the relation path here.
>
I'm here using the same pattern as what ReadBuffer_common() would display if a
corrupted block is read.  I think it's better to keep the format for both, so
any existing log analyzer will keep working with those new functions.

>
> 7.
> +       ereport(ERROR,
> +               (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                errmsg("relation \"%s\" does not have storage to be checked",
> +                quote_qualified_identifier(
> +                    get_namespace_name(get_rel_namespace(relid)),
> +                    get_rel_name(relid)))));
>
> Looking at other similar error messages we don't show qualified
> relation name but the relation name gotten by
> RelationGetRelationName(relation). Can we do that here as well for
> consistency?
>
Indeed, fixed.

>
> 8.
> +   if (!(rsinfo->allowedModes & SFRM_Materialize))
> +       ereport(ERROR,
> +               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                errmsg("materialize mode required, but it is not " \
> +                       "allowed in this context")));
>
> I think it's better to have this error message in one line for easy grepping.

Fixed.

I also fixed missing leading tab in the perl TAP tests

v8-0001-Add-a-pg_check_relation-function.patch (45K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Masahiko Sawada-2
On Sun, 5 Apr 2020 at 17:44, Julien Rouhaud <[hidden email]> wrote:
>
> On Sun, Apr 05, 2020 at 01:13:30PM +0900, Masahiko Sawada wrote:
> >
> > Thank you for updating the patch! The patch looks good to me. Here are
> > some random comments mostly about cosmetic changes.
> >
>
> Thanks a lot for the review!

Thank you for updating the patch.

>
> >
> > 1.
> > I think we can have two separate SQL functions:
> > pg_check_relation(regclass, text) and pg_check_relation(regclass),
> > instead of setting NULL by default to the second argument.
> >
>
> I'm fine with it, so implemented this way with the required documentation
> changes.

Why do we need two rows in the doc? For instance, replication slot
functions have some optional arguments but there is only one row in
the doc. So I think we don't need to change the doc from the previous
version patch.

And I think these are not necessary as we already defined in
include/catalog/pg_proc.dat:

+CREATE OR REPLACE FUNCTION pg_check_relation(
+  IN relation regclass,
+  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
+  OUT expected_checksum integer, OUT found_checksum integer)
+  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal AS 'pg_check_relation'
+  PARALLEL RESTRICTED;
+
+CREATE OR REPLACE FUNCTION pg_check_relation(
+  IN relation regclass, IN fork text,
+  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
+  OUT expected_checksum integer, OUT found_checksum integer)
+  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal
+  AS 'pg_check_relation_fork'
+  PARALLEL RESTRICTED;

>
> >
> > 2.
> > + * Check data sanity for a specific block in the given fork of the given
> > + * relation, always retrieved locally with smgrred even if a version exists in
> >
> > s/smgrred/smgrread/
>
> Fixed.
>
> >
> > 3.
> > +       /* The buffer will have to check checked. */
> > +       Assert(checkit);
> >
> > Should it be "The buffer will have to be checked"?
> >
>
> Oops indeed, fixed.
>
> >
> > 4.
> > +   if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
> > +       ereport(ERROR,
> > +               (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > +                errmsg("only superuser or a member of the
> > pg_read_server_files role may use this function")));
> >
> > Looking at the definition of pg_stat_read_server_files role, this role
> > seems to be for operations that could read non-database files such as
> > csv files. Therefore, currently this role is used by file_fdw and COPY
> > command. I personally think pg_stat_scan_tables would be more
> > appropriate for this function but I'm not sure.
> >
>
> That's a very good point, especially since the documentation of this default
> role is quite relevant for those functions:
>
> "Execute monitoring functions that may take ACCESS SHARE locks on tables,
> potentially for a long time."
>
> So changed!
>
> >
> > 5.
> > +   /* Set cost-based vacuum delay */
> > +   ChecksumCostActive = (ChecksumCostDelay > 0);
> > +   ChecksumCostBalance = 0;
> >
> > s/vacuum/checksum verification/
> >
>
> Fixed.
>
> >
> > 6.
> > +       ereport(WARNING,
> > +               (errcode(ERRCODE_DATA_CORRUPTED),
> > +                errmsg("invalid page in block %u of relation %s",
> > +                       blkno,
> > +                       relpath(relation->rd_smgr->smgr_rnode, forknum))));
> >
> > I think it's better to show the relation name instead of the relation path here.
> >
>
> I'm here using the same pattern as what ReadBuffer_common() would display if a
> corrupted block is read.  I think it's better to keep the format for both, so
> any existing log analyzer will keep working with those new functions.

Ok, I agree with you.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Sun, Apr 05, 2020 at 06:08:06PM +0900, Masahiko Sawada wrote:
>
> Why do we need two rows in the doc? For instance, replication slot
> functions have some optional arguments but there is only one row in
> the doc. So I think we don't need to change the doc from the previous
> version patch.
>

I thought that if we document the function as pg_check_relation(regclass [,
fork]) users could think that the 2nd argument is optional, so that
pg_check_relation('something', NULL) could be a valid alias for the 1-argument
form, which it isn't.  After checking, I see that e.g. current_setting has the
same semantics and is documented the way you suggest, so fixed back to previous
version.

> And I think these are not necessary as we already defined in
> include/catalog/pg_proc.dat:
>
> +CREATE OR REPLACE FUNCTION pg_check_relation(
> +  IN relation regclass,
> +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> +  OUT expected_checksum integer, OUT found_checksum integer)
> +  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal AS 'pg_check_relation'
> +  PARALLEL RESTRICTED;
> +
> +CREATE OR REPLACE FUNCTION pg_check_relation(
> +  IN relation regclass, IN fork text,
> +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> +  OUT expected_checksum integer, OUT found_checksum integer)
> +  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal
> +  AS 'pg_check_relation_fork'
> +  PARALLEL RESTRICTED;
>
Oh right this isn't required since there's no default value anymore, fixed.

v9 attached.

v9-0001-Add-a-pg_check_relation-function.patch (44K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Masahiko Sawada-2
On Sun, 5 Apr 2020 at 18:45, Julien Rouhaud <[hidden email]> wrote:

>
> On Sun, Apr 05, 2020 at 06:08:06PM +0900, Masahiko Sawada wrote:
> >
> > Why do we need two rows in the doc? For instance, replication slot
> > functions have some optional arguments but there is only one row in
> > the doc. So I think we don't need to change the doc from the previous
> > version patch.
> >
>
> I thought that if we document the function as pg_check_relation(regclass [,
> fork]) users could think that the 2nd argument is optional, so that
> pg_check_relation('something', NULL) could be a valid alias for the 1-argument
> form, which it isn't.  After checking, I see that e.g. current_setting has the
> same semantics and is documented the way you suggest, so fixed back to previous
> version.
>
> > And I think these are not necessary as we already defined in
> > include/catalog/pg_proc.dat:
> >
> > +CREATE OR REPLACE FUNCTION pg_check_relation(
> > +  IN relation regclass,
> > +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> > +  OUT expected_checksum integer, OUT found_checksum integer)
> > +  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal AS 'pg_check_relation'
> > +  PARALLEL RESTRICTED;
> > +
> > +CREATE OR REPLACE FUNCTION pg_check_relation(
> > +  IN relation regclass, IN fork text,
> > +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> > +  OUT expected_checksum integer, OUT found_checksum integer)
> > +  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal
> > +  AS 'pg_check_relation_fork'
> > +  PARALLEL RESTRICTED;
> >
>
> Oh right this isn't required since there's no default value anymore, fixed.
>
> v9 attached.

Thank you for updating the patch! The patch looks good to me.

I've marked this patch as Ready for Committer. I hope this patch will
get committed to PG13.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Sun, Apr 05, 2020 at 08:01:36PM +0900, Masahiko Sawada wrote:

> On Sun, 5 Apr 2020 at 18:45, Julien Rouhaud <[hidden email]> wrote:
> >
> > On Sun, Apr 05, 2020 at 06:08:06PM +0900, Masahiko Sawada wrote:
> > >
> > > Why do we need two rows in the doc? For instance, replication slot
> > > functions have some optional arguments but there is only one row in
> > > the doc. So I think we don't need to change the doc from the previous
> > > version patch.
> > >
> >
> > I thought that if we document the function as pg_check_relation(regclass [,
> > fork]) users could think that the 2nd argument is optional, so that
> > pg_check_relation('something', NULL) could be a valid alias for the 1-argument
> > form, which it isn't.  After checking, I see that e.g. current_setting has the
> > same semantics and is documented the way you suggest, so fixed back to previous
> > version.
> >
> > > And I think these are not necessary as we already defined in
> > > include/catalog/pg_proc.dat:
> > >
> > > +CREATE OR REPLACE FUNCTION pg_check_relation(
> > > +  IN relation regclass,
> > > +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> > > +  OUT expected_checksum integer, OUT found_checksum integer)
> > > +  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal AS 'pg_check_relation'
> > > +  PARALLEL RESTRICTED;
> > > +
> > > +CREATE OR REPLACE FUNCTION pg_check_relation(
> > > +  IN relation regclass, IN fork text,
> > > +  OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
> > > +  OUT expected_checksum integer, OUT found_checksum integer)
> > > +  RETURNS SETOF record STRICT VOLATILE LANGUAGE internal
> > > +  AS 'pg_check_relation_fork'
> > > +  PARALLEL RESTRICTED;
> > >
> >
> > Oh right this isn't required since there's no default value anymore, fixed.
> >
> > v9 attached.
>
> Thank you for updating the patch! The patch looks good to me.
>
> I've marked this patch as Ready for Committer. I hope this patch will
> get committed to PG13.
>
Thanks a lot!


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Daniel Gustafsson
> On 5 Apr 2020, at 13:17, Julien Rouhaud <[hidden email]> wrote:
> On Sun, Apr 05, 2020 at 08:01:36PM +0900, Masahiko Sawada wrote:

>> Thank you for updating the patch! The patch looks good to me.
>>
>> I've marked this patch as Ready for Committer. I hope this patch will
>> get committed to PG13.

> Thanks a lot!

This patch has been through quite thorough review, and skimming the thread all
concerns raised have been addressed.  It still applies and tests gree in the CF
Patchtester.  The feature in itself certainly gets my +1 for inclusion, it
seems a good addition.

Is any committer who has taken part in the thread (or anyone else for that
matter) interested in seeing this to some form of closure in this CF?

cheers ./daniel

Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Justin Pryzby
In reply to this post by Julien Rouhaud
Small language fixes in comments and user-facing docs.


diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 88efb38556..39596db193 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26162,7 +26162,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
 
    <para>
     The functions shown in <xref linkend="functions-data-sanity-table"/>
-    provide means to check for health of data file in a cluster.
+    provide a means to check for health of a data file in a cluster.
    </para>
 
    <table id="functions-data-sanity-table">
@@ -26179,8 +26179,8 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
         <literal><function>pg_check_relation(<parameter>relation</parameter> <type>regclass</type> [, <parameter>fork</parameter> <type>text</type>])</function></literal>
        </entry>
        <entry><type>setof record</type></entry>
-       <entry>Validate the checksums for all blocks of all or the given fork of
-       a given relation.</entry>
+       <entry>Validate the checksum for all blocks of a relation.
+       </entry>
       </row>
      </tbody>
     </tgroup>
@@ -26190,15 +26190,15 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8');
     <primary>pg_check_relation</primary>
    </indexterm>
    <para id="functions-check-relation-note" xreflabel="pg_check_relation">
-    <function>pg_check_relation</function> iterates over all the blocks of a
-    given relation and verify their checksum.  If provided,
-    <replaceable>fork</replaceable> should be <literal>'main'</literal> for the
+    <function>pg_check_relation</function> iterates over all blocks of a
+    given relation and verifies their checksums.  If passed,
+    <replaceable>fork</replaceable> specifies that only checksums of the given
+    fork are to be verified.  Fork should be <literal>'main'</literal> for the
     main data fork, <literal>'fsm'</literal> for the free space map,
     <literal>'vm'</literal> for the visibility map, or
-    <literal>'init'</literal> for the initialization fork, and only this
-    specific fork will be verifies, otherwise all forks will.  The function
-    returns the list of blocks for which the found checksum doesn't match the
-    expected one.  See <xref
+    <literal>'init'</literal> for the initialization fork.
+    The function returns a list of blocks for which the computed and stored
+    checksums don't match.  See <xref
     linkend="runtime-config-resource-checksum-verification-cost"/> for
     information on how to configure cost-based verification delay.  You must be
     a member of the <literal>pg_read_all_stats</literal> role to use this
diff --git a/src/backend/storage/page/checksum.c b/src/backend/storage/page/checksum.c
index eb2c919c34..17cd95ec95 100644
--- a/src/backend/storage/page/checksum.c
+++ b/src/backend/storage/page/checksum.c
@@ -36,7 +36,7 @@
  * actual storage, you have to discard the operating system cache before
  * running those functions.
  *
- * To avoid torn page and possible false positive when reading data, and
+ * To avoid torn pages and possible false positives when reading data, and to
  * keeping overhead as low as possible, the following heuristics are used:
  *
  * - a shared LWLock is taken on the target buffer pool partition mapping, and
@@ -92,8 +92,8 @@ check_one_block(Relation relation, ForkNumber forknum, BlockNumber blkno,
  *chk_expected = *chk_found = NoComputedChecksum;
 
  /*
- * To avoid too much overhead, the buffer will be first read without
- * the locks that would guarantee the lack of false positive, as such
+ * To avoid excessive overhead, the buffer will be first read without
+ * the locks that would prevent false positives, as such
  * events should be quite rare.
  */
 Retry:
@@ -120,10 +120,10 @@ Retry:
 }
 
 /*
- * Perform a checksum check on the passed page.  Returns whether the page is
+ * Perform a checksum check on the passed page.  Return True iff the page is
  * valid or not, and assign the expected and found checksum in chk_expected and
  * chk_found, respectively.  Note that a page can look like new but could be
- * the result of a corruption.  We still check for this case, but we can't
+ * the result of corruption.  We still check for this case, but we can't
  * compute its checksum as pg_checksum_page() is explicitly checking for
  * non-new pages, so NoComputedChecksum will be set in chk_found.
  */
@@ -139,7 +139,7 @@ check_buffer(char *buffer, uint32 blkno, uint16 *chk_expected,
  if (PageIsNew(page))
  {
  /*
- * Check if the page is really new or if there's a corruption that
+ * Check if the page is really new or if there's corruption that
  * affected PageIsNew detection.  Note that PageIsVerified won't try to
  * detect checksum corruption in this case, so there's no risk of
  * duplicated corruption report.
@@ -151,7 +151,7 @@ check_buffer(char *buffer, uint32 blkno, uint16 *chk_expected,
  }
 
  /*
- * There's a corruption, but since this affect PageIsNew, we
+ * There's corruption, but since this affects PageIsNew, we
  * can't compute a checksum, so set NoComputedChecksum for the
  * expected checksum.
  */
@@ -218,8 +218,8 @@ check_delay_point(void)
  * held.  Reading with this lock is to avoid the unlikely but possible case
  * that a buffer wasn't present in shared buffers when we checked but it then
  * alloc'ed in shared_buffers, modified and flushed concurrently when we
- * later try to read it, leading to false positive due to torn page.  Caller
- * can read first the buffer without holding the target buffer mapping
+ * later try to read it, leading to false positives due to a torn page.  Caller
+ * can first read the buffer without holding the target buffer mapping
  * partition LWLock to have an optimistic approach, and reread the buffer
  * from disk in case of error.
  *
@@ -280,7 +280,7 @@ check_get_buffer(Relation relation, ForkNumber forknum,
  checkit = false;
 
  /*
- * Read the buffer from disk, taking on IO lock to prevent torn-page
+ * Read the buffer from disk, taking an IO lock to prevent torn-page
  * reads, in the unlikely event that it was concurrently dirtied and
  * flushed.
  */
@@ -320,7 +320,7 @@ check_get_buffer(Relation relation, ForkNumber forknum,
  /*
  * Didn't find it in the buffer pool and didn't read it while holding the
  * buffer mapping partition lock.  We'll have to try to read it from
- * disk, after releasing the target partition lock to avoid too much
+ * disk, after releasing the target partition lock to avoid excessive
  * overhead.  It means that it's possible to get a torn page later, so
  * we'll have to retry with a suitable lock in case of error to avoid
  * false positive.
diff --git a/src/backend/utils/adt/checksumfuncs.c b/src/backend/utils/adt/checksumfuncs.c
index d005b8d01f..fa5823677a 100644
--- a/src/backend/utils/adt/checksumfuncs.c
+++ b/src/backend/utils/adt/checksumfuncs.c
@@ -1,7 +1,7 @@
 /*-------------------------------------------------------------------------
  *
  * checksumfuncs.c
- *  Functions for checksums related feature such as online verification
+ *  Functions for checksum related feature such as online verification
  *
  * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -181,7 +181,7 @@ check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore,
  if (check_one_block(relation, forknum, blkno, &chk_expected,
  &chk_found))
  {
- /* Buffer not corrupted or no worth checking, continue */
+ /* Buffer not corrupted or not worth checking, continue */
  continue;
  }
 
@@ -192,7 +192,7 @@ check_relation_fork(TupleDesc tupdesc, Tuplestorestate *tupstore,
  values[i++] = Int32GetDatum(forknum);
  values[i++] = UInt32GetDatum(blkno);
  /*
- * This can happen if a corruption makes the block appears as
+ * This can happen if corruption makes the block appears as
  * PageIsNew() but isn't a new page.
  */
  if (chk_expected == NoComputedChecksum)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5a51dccca9..57401580c3 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2383,7 +2383,7 @@ static struct config_int ConfigureNamesInt[] =
 
  {
  {"checksum_cost_page", PGC_USERSET, RESOURCES_CHECKSUM_DELAY,
- gettext_noop("Checksum cost for verifying a page found."),
+ gettext_noop("Checksum cost for verifying a page."),
  NULL
  },
  &ChecksumCostPage,
diff --git a/src/test/check_relation/t/01_checksums_check.pl b/src/test/check_relation/t/01_checksums_check.pl
index 1ad34adcb9..2a3f2880ea 100644
--- a/src/test/check_relation/t/01_checksums_check.pl
+++ b/src/test/check_relation/t/01_checksums_check.pl
@@ -218,7 +218,7 @@ $ENV{PGOPTIONS} = '--client-min-messages=WARNING';
 my ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT"
  . " current_setting('data_checksums')");
 
-is($stdout, 'on', 'Data checksums shoud be enabled');
+is($stdout, 'on', 'Data checksums should be enabled');
 
 ($cmdret, $stdout, $stderr) = $node->psql('postgres', "SELECT"
  . " current_setting('block_size')");
@@ -254,7 +254,7 @@ is(check_checksums_call($node, 'u_t1_id_idx'), '1', 'Can check an unlogged index
  . " current_setting('data_directory') || '/' || pg_relation_filepath('t1')"
 );
 
-isnt($stdout, '', 'A relfinode should be returned');
+isnt($stdout, '', 'A relfilenode should be returned');
 
 my $filename = $stdout;
 
--
2.17.0


0001-fixes-for-online-checksum-verification.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Sun, Jul 12, 2020 at 12:34:03PM -0500, Justin Pryzby wrote:
> Small language fixes in comments and user-facing docs.

Thanks a lot!  I just fixed a small issue (see below), PFA updated v10.

>
> diff --git a/src/backend/storage/page/checksum.c b/src/backend/storage/page/checksum.c
> index eb2c919c34..17cd95ec95 100644
> --- a/src/backend/storage/page/checksum.c
> +++ b/src/backend/storage/page/checksum.c
> @@ -36,7 +36,7 @@
>   * actual storage, you have to discard the operating system cache before
>   * running those functions.
>   *
> - * To avoid torn page and possible false positive when reading data, and
> + * To avoid torn pages and possible false positives when reading data, and to
>   * keeping overhead as low as possible, the following heuristics are used:
>   *
Changed for "to keep".

v10-0001-Add-a-pg_check_relation-function.patch (44K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Michael Paquier-2
On Tue, Jul 14, 2020 at 11:08:08AM +0200, Julien Rouhaud wrote:
> On Sun, Jul 12, 2020 at 12:34:03PM -0500, Justin Pryzby wrote:
>> Small language fixes in comments and user-facing docs.
>
> Thanks a lot!  I just fixed a small issue (see below), PFA updated v10.

Sawada-san, you are registered as a reviewer of this patch.  Are you
planning to look at it?  If you are busy lately, that's fine as well
(congrats!).  In this case it could be better to unregister from the
CF app for this entry.

I am refreshing my mind here, but here are some high-level comments
for now...

+#include "postgres.h"
+
+#include "access/tupdesc.h"
+#include "common/relpath.h"
 #include "storage/block.h"
+#include "utils/relcache.h"
+#include "utils/tuplestore.h"
[...]
+extern bool check_one_block(Relation relation, ForkNumber forknum,
+                           BlockNumber blkno, uint16 *chk_expected,
+                           uint16 *chk_found);
I don't think that it is a good idea to add this much to checksum.h
as these are includes coming mainly from the backend.  Note that
pg_checksum_page() is a function designed to be also available for
frontend tools, with checksum.h something that can be included in
frontends.  This would mean that we could move all the buffer lookup
APIs directly to checksumfuncs.c, or move that into a separate file
closer to the location.

+ * A zero checksum can never be computed, see pg_checksum_page() */
+#define NoComputedChecksum 0
Wouldn't it be better to rename that something like
InvalidPageChecksum, and make use of it in pg_checksum_page()?  It
would be more consistent with the naming of transaction IDs, OIDs or
even XLogRecPtr.  And that could be a separate patch.

+++ b/src/test/check_relation/t/01_checksums_check.pl
@@ -0,0 +1,276 @@
+use strict;
+use warnings;
It could be better to move that to src/test/modules/, so as it could
be picked more easily by MSVC scripts in the future.  Note that if we
apply the normal naming convention here this should be named
001_checksum_check.pl.

+subdir = src/test/check_relation
+top_builddir = ../../..
+include $(top_builddir)/src/Makefile.global
Let's use a Makefile shaped in a way similar to modules/test_misc that
makes use of TAP_TESTS = 1.  There is the infra, let's rely on it for
the regression tests.

+       pg_usleep(msec * 1000L);
Could it be possible to add a wait event here?  It would be nice to be
able to monitor that in pg_stat_activity.

+if (exists $ENV{MY_PG_REGRESS})
+{
+   $ENV{PG_REGRESS} = $ENV{MY_PG_REGRESS};
+}
What is MY_PG_REGRESS for?  A remnant from an external makefile
perhaps?

+   /*
+    * If we get a failure and the buffer wasn't found in shared buffers,
+    * reread the buffer with suitable lock to avoid false positive. See
+    * check_get_buffer for more details.
+    */
+   if (!found_in_sb && !force_lock)
+   {
+       force_lock = true;
+       goto Retry;
+   }
As designed, we have a risk of false positives with a torn page in the
first loop when trying to look for a given buffer as we would try to
use smgrread() without a partition lock.  This stresses me a bit, and
false positives could scare users easily.  Could we consider first a
safer approach where we don't do that, and just read the page while
holding the partition lock?  OK, that would be more expensive, but at
least that's safe in any case.  My memory of this patch is a bit
fuzzy, but this is itching me and this is the heart of the problem
dealt with here :)
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
On Mon, Sep 7, 2020 at 8:59 AM Michael Paquier <[hidden email]> wrote:

>
> +#include "postgres.h"
> +
> +#include "access/tupdesc.h"
> +#include "common/relpath.h"
>  #include "storage/block.h"
> +#include "utils/relcache.h"
> +#include "utils/tuplestore.h"
> [...]
> +extern bool check_one_block(Relation relation, ForkNumber forknum,
> +                           BlockNumber blkno, uint16 *chk_expected,
> +                           uint16 *chk_found);
> I don't think that it is a good idea to add this much to checksum.h
> as these are includes coming mainly from the backend.  Note that
> pg_checksum_page() is a function designed to be also available for
> frontend tools, with checksum.h something that can be included in
> frontends.  This would mean that we could move all the buffer lookup
> APIs directly to checksumfuncs.c, or move that into a separate file
> closer to the location.

Did you mean creating a new checksumfuncs.c file? I don't find any
such file in the current tree.

> + * A zero checksum can never be computed, see pg_checksum_page() */
> +#define NoComputedChecksum 0
> Wouldn't it be better to rename that something like
> InvalidPageChecksum, and make use of it in pg_checksum_page()?  It
> would be more consistent with the naming of transaction IDs, OIDs or
> even XLogRecPtr.  And that could be a separate patch.

It seems quite ambiguous, as checksum validity usually has a different
meaning.  And in the code added here, the meaning isn't that the
ckecksum is invalid but that there's no checsum as it cannot be
computed due to PageIsNew().

> +++ b/src/test/check_relation/t/01_checksums_check.pl
> @@ -0,0 +1,276 @@
> +use strict;
> +use warnings;
> It could be better to move that to src/test/modules/, so as it could
> be picked more easily by MSVC scripts in the future.  Note that if we
> apply the normal naming convention here this should be named
> 001_checksum_check.pl.
>
> +subdir = src/test/check_relation
> +top_builddir = ../../..
> +include $(top_builddir)/src/Makefile.global
> Let's use a Makefile shaped in a way similar to modules/test_misc that
> makes use of TAP_TESTS = 1.  There is the infra, let's rely on it for
> the regression tests.

Will fix.

> +       pg_usleep(msec * 1000L);
> Could it be possible to add a wait event here?  It would be nice to be
> able to monitor that in pg_stat_activity.

Sure, I missed that as this was first implemented as an extension.

> +if (exists $ENV{MY_PG_REGRESS})
> +{
> +   $ENV{PG_REGRESS} = $ENV{MY_PG_REGRESS};
> +}
> What is MY_PG_REGRESS for?  A remnant from an external makefile
> perhaps?

Indeed.

> +   /*
> +    * If we get a failure and the buffer wasn't found in shared buffers,
> +    * reread the buffer with suitable lock to avoid false positive. See
> +    * check_get_buffer for more details.
> +    */
> +   if (!found_in_sb && !force_lock)
> +   {
> +       force_lock = true;
> +       goto Retry;
> +   }
> As designed, we have a risk of false positives with a torn page in the
> first loop when trying to look for a given buffer as we would try to
> use smgrread() without a partition lock.  This stresses me a bit, and
> false positives could scare users easily.  Could we consider first a
> safer approach where we don't do that, and just read the page while
> holding the partition lock?  OK, that would be more expensive, but at
> least that's safe in any case.  My memory of this patch is a bit
> fuzzy, but this is itching me and this is the heart of the problem
> dealt with here :)

I'm not sure I understand.  Unless I missed something this approach
*cannot* raise a false positive.  What it does is force a 2nd check
with stronger lock *to make sure it's actually a corruption*, so we
don't raise false positive.  The only report that can happen in this
1st loop is if smgread raises an error, which AFAICT can only happen
(at least with mdread) if the whole block couldn't be read, which is a
sign of a very bad problem.  This should clearly be reported, as this
cannot be caused by the locking heuristics used here.


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Michael Paquier-2
On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote:
> Did you mean creating a new checksumfuncs.c file? I don't find any
> such file in the current tree.

Your patch adds checksumfuncs.c, so the subroutines grabbing a given
block could just be moved there.

> I'm not sure I understand.  Unless I missed something this approach
> *cannot* raise a false positive.  What it does is force a 2nd check
> with stronger lock *to make sure it's actually a corruption*, so we
> don't raise false positive.  The only report that can happen in this
> 1st loop is if smgread raises an error, which AFAICT can only happen
> (at least with mdread) if the whole block couldn't be read, which is a
> sign of a very bad problem.  This should clearly be reported, as this
> cannot be caused by the locking heuristics used here.

We don't know how much this optimization matters though?  Could it be
possible to get an idea of that?  For example, take the case of one
relation with a fixed size in a read-only workload and a read-write
workload (as long as autovacuum and updates make the number of
relation blocks rather constant for the read-write case), doing a
checksum verification in parallel of multiple clients working on the
relation concurrently.  Assuming that the relation is fully in the OS
cache, we could get an idea of the impact with multiple
(shared_buffers / relation size) rates to make the eviction more
aggressive?  The buffer partition locks, knowing that
NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it
seems to me that it would be good to see if we have a difference.
What do you think?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Masahiko Sawada-2
In reply to this post by Michael Paquier-2
On Mon, 7 Sep 2020 at 15:59, Michael Paquier <[hidden email]> wrote:

>
> On Tue, Jul 14, 2020 at 11:08:08AM +0200, Julien Rouhaud wrote:
> > On Sun, Jul 12, 2020 at 12:34:03PM -0500, Justin Pryzby wrote:
> >> Small language fixes in comments and user-facing docs.
> >
> > Thanks a lot!  I just fixed a small issue (see below), PFA updated v10.
>
> Sawada-san, you are registered as a reviewer of this patch.  Are you
> planning to look at it?  If you are busy lately, that's fine as well
> (congrats!).

Thanks!

>  In this case it could be better to unregister from the
> CF app for this entry.

Well, I sent review comments on this patch and Julien fixed all
comments. So I’d marked this as Ready for Committer since I didn't
have further comments at that time, and I was waiting for the
committer review. I'll look at this patch again but should I remove my
name from the reviewer after that if no comments?

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Michael Paquier-2
On Tue, Sep 08, 2020 at 11:36:45AM +0900, Masahiko Sawada wrote:
> On Mon, 7 Sep 2020 at 15:59, Michael Paquier <[hidden email]> wrote:
>>  In this case it could be better to unregister from the
>> CF app for this entry.
>
> Well, I sent review comments on this patch and Julien fixed all
> comments. So I’d marked this as Ready for Committer since I didn't
> have further comments at that time, and I was waiting for the
> committer review. I'll look at this patch again but should I remove my
> name from the reviewer after that if no comments?

Ah, sorry, I somewhat missed the previous status of the patch.
Perhaps that's an overdose of CF.  Keeping your name as reviewer is
fine I guess.  I have begun looking at the patch and spotted some
issues, so let's see where we do from here.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Julien Rouhaud
In reply to this post by Michael Paquier-2
On Mon, Sep 07, 2020 at 05:50:38PM +0900, Michael Paquier wrote:
> On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote:
> > Did you mean creating a new checksumfuncs.c file? I don't find any
> > such file in the current tree.
>
> Your patch adds checksumfuncs.c, so the subroutines grabbing a given
> block could just be moved there.
>

Sorry, I was in the middle of a rebase for another patch and missed the new
files added in this one.  I added a new checksumfuncs.h for the required
include that should not be seen by client code.  I kept checksumfuncs.c and
checksums.c so that the SQL visible declaration are separated from the rest of
the implementation as this is what we already do elsewhere I think.  If that's
a problem I'll change and put everything in checksumfuncs.[ch].

I also moved the tap tests in src/test/modules and renamed the file with a 3
digits.  For the record I initially copied src/test/modules/brin, and this is
apparently the only subdir that has a 2 digits pattern.

I also added a new WAIT_EVENT_CHECK_DELAY wait event.

> > I'm not sure I understand.  Unless I missed something this approach
> > *cannot* raise a false positive.  What it does is force a 2nd check
> > with stronger lock *to make sure it's actually a corruption*, so we
> > don't raise false positive.  The only report that can happen in this
> > 1st loop is if smgread raises an error, which AFAICT can only happen
> > (at least with mdread) if the whole block couldn't be read, which is a
> > sign of a very bad problem.  This should clearly be reported, as this
> > cannot be caused by the locking heuristics used here.
>
> We don't know how much this optimization matters though?  Could it be
> possible to get an idea of that?  For example, take the case of one
> relation with a fixed size in a read-only workload and a read-write
> workload (as long as autovacuum and updates make the number of
> relation blocks rather constant for the read-write case), doing a
> checksum verification in parallel of multiple clients working on the
> relation concurrently.  Assuming that the relation is fully in the OS
> cache, we could get an idea of the impact with multiple
> (shared_buffers / relation size) rates to make the eviction more
> aggressive?  The buffer partition locks, knowing that
> NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it
> seems to me that it would be good to see if we have a difference.
> What do you think?
I assumed that the odds of having to check the buffer twice were so low, and
avoiding to keep a bufmapping lock while doing some IO was an uncontroversial
enough optimisation, but maybe that's only wishful thinking.

I'll do some becnhmarking and see if I can get some figures, but it'll probably
take some time.  In the meantime I'm attaching v11 of the patch that should
address all other comments.

v11-0001-Add-a-pg_check_relation-function.patch (45K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online checksums verification in the backend

Michael Paquier-2
On Wed, Sep 09, 2020 at 11:25:24AM +0200, Julien Rouhaud wrote:
> I assumed that the odds of having to check the buffer twice were so low, and
> avoiding to keep a bufmapping lock while doing some IO was an uncontroversial
> enough optimisation, but maybe that's only wishful thinking.

Perhaps it is worth it, so it would be good to make sure of it and see
if that's actually worth the extra complication.  This also depends if
the page is in the OS cache if the page is not in shared buffers,
meaning that smgrread() is needed to fetch the page to check.  I would
be more curious to see if there is an actual difference if the page is
the OS cache.

> I'll do some becnhmarking and see if I can get some figures, but it'll probably
> take some time.  In the meantime I'm attaching v11 of the patch that should
> address all other comments.

Thanks.

Another thing that was itching me is the introduction of 3 GUCs with
one new category for the sake of two SQL functions.  For VACUUM we
have many things relying on the GUC delays, with autovacuum and manual
vacuum.  Perhaps it would make sense to have these if we have some day
a background worker doing checksum verifications, still that could
perfectly be in contrib/, and that would be kind of hard to tune as
well.  The patch enabling checksums on-the-fly could also be a reason
good enough.  Another thing we could consider is to pass down those
parameters as function arguments, at the cost of not being able to
reload them.
--
Michael

signature.asc (849 bytes) Download Attachment
123