Online verification of checksums

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
180 messages Options
123456 ... 9
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Fabien COELHO-3

Hallo Michael,

Patch v6 applies cleanly, compiles, local make check is ok.

>> My current opinion is that when offline some errors are not admissible,
>> whereas the same errors are admissible when online because they may be due
>> to the ongoing database processing, so the behavior should not be strictly
>> the same.
>
> Indeed, the recently-added pg_verify_checksums testsuite

A welcome addition!

> adds a few files with just 'foo' in them and with V5 of the patch,
> pg_verify_checksums no longer bails out with an error on those.

> I have now re-added the retry logic for partially-read pages, so that it
> bails out if it reads a page partially twice. This makes the testsuite
> work again.
>
> I am not convinced we need to differentiate further between online and
> offline operation, can you explain in more detail which other
> differences are ok in online mode and why?

For instance the "file/directory was removed" do not look okay at all when
offline, even if unlikely. Moreover, the checks hides the error message
and is fully silent in this case, while it was not beforehand on the
same error when offline.

The "check if page was modified since checkpoint" does not look useful
when offline. Maybe it lacks a comment to say that this cannot (should not
?) happen when offline, but even then I would not like it to be true: ISTM
that no page should be allowed to be skipped on the checkpoint condition
when offline, but it is probably ok to skip with the new page test, which
make me still think that they should be counted and reported separately,
or at least the checkpoint skip test should not be run when offline.

When offline, the retry logic does not make much sense, it should complain
directly on the first error? Also, I'm unsure of the read & checksum retry
logic *without any delay*.

>> This might suggest some option to tell the command that it should work in
>> online or offline mode, so that it may be stricter in some cases. The
>> default may be one of the option, eg the stricter offline mode, or maybe
>> guessed at startup.
>
> If we believe the operation should be different, the patch removes the
> "is cluster online?" check (as it is no longer necessary), so we could
> just replace the current error message with a global variable with the
> result of that check and use it where needed (if any).

That could let open the issue of someone starting the check offline, and
then starting the database while it is not finished. Maybe it is not worth
sweating about such a narrow use case.

If operations are to be different, and it seems to me they should be, I'd
suggest (1) auto detect default based one the existing "is cluster online"
code, (2) force options, eg --online vs --offline, which would complain
and exit if the cluster is not in the right state on startup.

I'd suggest to add a failing checksum online test, if possible. At least a
"foo" file? It would also be nice if the test could apply on an active
database, eg with a low-rate pgbench running in parallel to the
verification, but I'm not sure how easy it is to add such a thing.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> >I am not convinced we need to differentiate further between online and
> >offline operation, can you explain in more detail which other
> >differences are ok in online mode and why?
>
> For instance the "file/directory was removed" do not look okay at all when
> offline, even if unlikely. Moreover, the checks hides the error message and
> is fully silent in this case, while it was not beforehand on the same error
> when offline.

OK, I kinda see the point here and added that.
 
> The "check if page was modified since checkpoint" does not look useful when
> offline. Maybe it lacks a comment to say that this cannot (should not ?)
> happen when offline, but even then I would not like it to be true: ISTM that
> no page should be allowed to be skipped on the checkpoint condition when
> offline, but it is probably ok to skip with the new page test, which make me
> still think that they should be counted and reported separately, or at least
> the checkpoint skip test should not be run when offline.

What is the rationale to not skip on the checkpoint condition when the
instance is offline?  If it was shutdown cleanly, this should not
happen, if the instance crashed, those would be spurious errors that
would get repaired on recovery.

I have not changed that for now.

> When offline, the retry logic does not make much sense, it should complain
> directly on the first error? Also, I'm unsure of the read & checksum retry
> logic *without any delay*.

I think the small overhead of retrying in offline mode even if useless
is worth avoiding making the code more complicated in order to cater for
both modes.

Initially there was a delay, but this was removed after analysis and
requests by several other reviewers.

> >>This might suggest some option to tell the command that it should work in
> >>online or offline mode, so that it may be stricter in some cases. The
> >>default may be one of the option, eg the stricter offline mode, or maybe
> >>guessed at startup.
> >
> >If we believe the operation should be different, the patch removes the
> >"is cluster online?" check (as it is no longer necessary), so we could
> >just replace the current error message with a global variable with the
> >result of that check and use it where needed (if any).
>
> That could let open the issue of someone starting the check offline, and
> then starting the database while it is not finished. Maybe it is not worth
> sweating about such a narrow use case.
I don't think we need to cater for that, yeah.
 
> If operations are to be different, and it seems to me they should be, I'd
> suggest (1) auto detect default based one the existing "is cluster online"
> code, (2) force options, eg --online vs --offline, which would complain and
> exit if the cluster is not in the right state on startup.

The current code bails out if it thinks the cluster is online. What is
wrong with just setting a flag now in case it is?
 
> I'd suggest to add a failing checksum online test, if possible. At least a
> "foo" file?

Ok, done so.

> It would also be nice if the test could apply on an active database,
> eg with a low-rate pgbench running in parallel to the verification,
> but I'm not sure how easy it is to add such a thing.

That sounds much more complicated so I have not tackled that yet.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

online-verification-of-checksums_V7.patch (10K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Stephen Frost
Greetings,

* Michael Banck ([hidden email]) wrote:

> On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> > The "check if page was modified since checkpoint" does not look useful when
> > offline. Maybe it lacks a comment to say that this cannot (should not ?)
> > happen when offline, but even then I would not like it to be true: ISTM that
> > no page should be allowed to be skipped on the checkpoint condition when
> > offline, but it is probably ok to skip with the new page test, which make me
> > still think that they should be counted and reported separately, or at least
> > the checkpoint skip test should not be run when offline.
>
> What is the rationale to not skip on the checkpoint condition when the
> instance is offline?  If it was shutdown cleanly, this should not
> happen, if the instance crashed, those would be spurious errors that
> would get repaired on recovery.
>
> I have not changed that for now.
Agreed- this is an important check even in offline mode.

> > When offline, the retry logic does not make much sense, it should complain
> > directly on the first error? Also, I'm unsure of the read & checksum retry
> > logic *without any delay*.

The race condition being considered here is where an 8k read somehow
gets the first 4k, then is scheduled off-cpu, and the full 8k page is
then written by some other process, and then this process is woken up
to read the second 4k.  I agree that this is unnecessary when the
database is offline, but it's also pretty cheap.  When the database is
online, it's an extremely unlikely case to hit (just try to reproduce
it...) but if it does get hit then it's easy enough to recheck by doing
a reread, which should show that the LSN has been updated in the first
4k and we can then know that this page is in the WAL.  We have not yet
seen a case where such a re-read returns an old LSN and an invalid
checksum; based on discussion with other hackers, that shouldn't be
possible as every kernel seems to consistently write in-order, meaning
that the first 4k will be updated before the second, so a single re-read
should be sufficient.

Remember- this is all in-memory activity also, we aren't talking about
what might happen on disk here.

> I think the small overhead of retrying in offline mode even if useless
> is worth avoiding making the code more complicated in order to cater for
> both modes.

Agreed.

> Initially there was a delay, but this was removed after analysis and
> requests by several other reviewers.

Agreed, there's no need for or point to having such a delay.

> > >>This might suggest some option to tell the command that it should work in
> > >>online or offline mode, so that it may be stricter in some cases. The
> > >>default may be one of the option, eg the stricter offline mode, or maybe
> > >>guessed at startup.
> > >
> > >If we believe the operation should be different, the patch removes the
> > >"is cluster online?" check (as it is no longer necessary), so we could
> > >just replace the current error message with a global variable with the
> > >result of that check and use it where needed (if any).
> >
> > That could let open the issue of someone starting the check offline, and
> > then starting the database while it is not finished. Maybe it is not worth
> > sweating about such a narrow use case.
>
> I don't think we need to cater for that, yeah.
Agreed.

> > It would also be nice if the test could apply on an active database,
> > eg with a low-rate pgbench running in parallel to the verification,
> > but I'm not sure how easy it is to add such a thing.
>
> That sounds much more complicated so I have not tackled that yet.

I agree that this would be nice, but I don't want the regression tests
to become much longer...

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4


On 11/22/18 2:12 AM, Stephen Frost wrote:

> Greetings,
>
> * Michael Banck ([hidden email]) wrote:
>> On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
>>> The "check if page was modified since checkpoint" does not look useful when
>>> offline. Maybe it lacks a comment to say that this cannot (should not ?)
>>> happen when offline, but even then I would not like it to be true: ISTM that
>>> no page should be allowed to be skipped on the checkpoint condition when
>>> offline, but it is probably ok to skip with the new page test, which make me
>>> still think that they should be counted and reported separately, or at least
>>> the checkpoint skip test should not be run when offline.
>>
>> What is the rationale to not skip on the checkpoint condition when the
>> instance is offline?  If it was shutdown cleanly, this should not
>> happen, if the instance crashed, those would be spurious errors that
>> would get repaired on recovery.
>>
>> I have not changed that for now.
>
> Agreed- this is an important check even in offline mode.
>

Yeah. I suppose we could detect if the shutdown was clean (like
pg_rewind does), and then skip the check. Or perhaps we should still do
the check (without a retry), and report it as issue when we find a page
with LSN newer than the last checkpoint.

In any case, the check is pretty cheap (comparing two 64-bit values),
and I don't see how skipping it would optimize anything. It would make
the code a tad simpler, but we still need the check for the online mode.

>>> When offline, the retry logic does not make much sense, it should complain
>>> directly on the first error? Also, I'm unsure of the read & checksum retry
>>> logic *without any delay*.
>
> The race condition being considered here is where an 8k read somehow
> gets the first 4k, then is scheduled off-cpu, and the full 8k page is
> then written by some other process, and then this process is woken up
> to read the second 4k.  I agree that this is unnecessary when the
> database is offline, but it's also pretty cheap.  When the database is
> online, it's an extremely unlikely case to hit (just try to reproduce
> it...) but if it does get hit then it's easy enough to recheck by doing
> a reread, which should show that the LSN has been updated in the first
> 4k and we can then know that this page is in the WAL.  We have not yet
> seen a case where such a re-read returns an old LSN and an invalid
> checksum; based on discussion with other hackers, that shouldn't be
> possible as every kernel seems to consistently write in-order, meaning
> that the first 4k will be updated before the second, so a single re-read
> should be sufficient.
>

Right.

A minor detail is that the reads/writes should be atomic at the sector
level, which used to be 512B, so it's not just about pages torn in
4kB/4kB manner, but possibly an arbitrary mix of 512B chunks from old
and new version.

This also explains why we don't need any delay - the reread happens
after the write must have already written the page header, so the new
LSN must be already visible.

So no delay is necessary. And if it was, how long should the delay be?
The processes might end up off-CPU for arbitrary amount of time, so
picking a good value would be pretty tricky.

> Remember- this is all in-memory activity also, we aren't talking about
> what might happen on disk here.
>
>> I think the small overhead of retrying in offline mode even if useless
>> is worth avoiding making the code more complicated in order to cater for
>> both modes.
>
> Agreed.
>
>> Initially there was a delay, but this was removed after analysis and
>> requests by several other reviewers.
>
> Agreed, there's no need for or point to having such a delay.
>

Yep.

>>>>> This might suggest some option to tell the command that it should work in
>>>>> online or offline mode, so that it may be stricter in some cases. The
>>>>> default may be one of the option, eg the stricter offline mode, or maybe
>>>>> guessed at startup.
>>>>
>>>> If we believe the operation should be different, the patch removes the
>>>> "is cluster online?" check (as it is no longer necessary), so we could
>>>> just replace the current error message with a global variable with the
>>>> result of that check and use it where needed (if any).
>>>
>>> That could let open the issue of someone starting the check offline, and
>>> then starting the database while it is not finished. Maybe it is not worth
>>> sweating about such a narrow use case.
>>
>> I don't think we need to cater for that, yeah.
>
> Agreed.
>

Yep. I don't think other tools protect against that either. And
pg_rewind does actually modify the cluster state, unlike checksum
verification.

>>> It would also be nice if the test could apply on an active database,
>>> eg with a low-rate pgbench running in parallel to the verification,
>>> but I'm not sure how easy it is to add such a thing.
>>
>> That sounds much more complicated so I have not tackled that yet.
>
> I agree that this would be nice, but I don't want the regression tests
> to become much longer...
>

I have to admit I find this thread rather confusing, because the subject
is "online verification of checksums" yet we're discussing verification
on offline instances.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Stephen Frost
Greetings,

* Tomas Vondra ([hidden email]) wrote:

> On 11/22/18 2:12 AM, Stephen Frost wrote:
> >* Michael Banck ([hidden email]) wrote:
> >>On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> >>>The "check if page was modified since checkpoint" does not look useful when
> >>>offline. Maybe it lacks a comment to say that this cannot (should not ?)
> >>>happen when offline, but even then I would not like it to be true: ISTM that
> >>>no page should be allowed to be skipped on the checkpoint condition when
> >>>offline, but it is probably ok to skip with the new page test, which make me
> >>>still think that they should be counted and reported separately, or at least
> >>>the checkpoint skip test should not be run when offline.
> >>
> >>What is the rationale to not skip on the checkpoint condition when the
> >>instance is offline?  If it was shutdown cleanly, this should not
> >>happen, if the instance crashed, those would be spurious errors that
> >>would get repaired on recovery.
> >>
> >>I have not changed that for now.
> >
> >Agreed- this is an important check even in offline mode.
>
> Yeah. I suppose we could detect if the shutdown was clean (like pg_rewind
> does), and then skip the check. Or perhaps we should still do the check
> (without a retry), and report it as issue when we find a page with LSN newer
> than the last checkpoint.
I agree that it'd be nice to report an issue if it's a clean shutdown
but there's an LSN newer than the last checkpoint, though I suspect that
would be more useful in debugging and such and not so useful for users.

> In any case, the check is pretty cheap (comparing two 64-bit values), and I
> don't see how skipping it would optimize anything. It would make the code a
> tad simpler, but we still need the check for the online mode.

Yeah, I'd just keep the check.

> A minor detail is that the reads/writes should be atomic at the sector
> level, which used to be 512B, so it's not just about pages torn in 4kB/4kB
> manner, but possibly an arbitrary mix of 512B chunks from old and new
> version.

Sure.

> This also explains why we don't need any delay - the reread happens after
> the write must have already written the page header, so the new LSN must be
> already visible.

Agreed.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Dmitry Dolgov
> On Wed, Nov 21, 2018 at 1:38 PM Michael Banck <[hidden email]> wrote:
>
> Hi,
>
> On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> > >I am not convinced we need to differentiate further between online and
> > >offline operation, can you explain in more detail which other
> > >differences are ok in online mode and why?
> >
> > For instance the "file/directory was removed" do not look okay at all when
> > offline, even if unlikely. Moreover, the checks hides the error message and
> > is fully silent in this case, while it was not beforehand on the same error
> > when offline.
>
> OK, I kinda see the point here and added that.

Hi,

Just for the information, looks like part of this patch (or at least some
similar code), related to the tests in 002_actions.pl, was committed recently
in 5c99513975, so there are minor conflicts with the master.

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
On Sat, Dec 01, 2018 at 12:47:13PM +0100, Dmitry Dolgov wrote:
> Just for the information, looks like part of this patch (or at least some
> similar code), related to the tests in 002_actions.pl, was committed recently
> in 5c99513975, so there are minor conflicts with the master.

What what I can see in v7 of the patch as posted in [1], all the changes
to 002_actions.pl could just be removed because there are already
equivalents.

[1]: https://postgr.es/m/20181121123535.GD23740@...
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

On Mon, Dec 03, 2018 at 09:48:43AM +0900, Michael Paquier wrote:
> On Sat, Dec 01, 2018 at 12:47:13PM +0100, Dmitry Dolgov wrote:
> > Just for the information, looks like part of this patch (or at least some
> > similar code), related to the tests in 002_actions.pl, was committed recently
> > in 5c99513975, so there are minor conflicts with the master.
>
> What what I can see in v7 of the patch as posted in [1], all the changes
> to 002_actions.pl could just be removed because there are already
> equivalents.

Yeah, new rebased version attached.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

online-verification-of-checksums_V8.patch (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

On Thu, Dec 20, 2018 at 04:19:11PM +0100, Michael Banck wrote:
> Yeah, new rebased version attached.

By the way, one thing that this patch also fixes is checksum
verification on basebackups (as pointed out the other day by my
colleague Bernd Helmele):

postgres@kohn:~$ initdb -k data
postgres@kohn:~$ pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
postgres@kohn:~$ pg_verify_checksums -D data
pg_verify_checksums: cluster must be shut down to verify checksums
postgres@kohn:~$ pg_basebackup -h /tmp -D backup1
postgres@kohn:~$ pg_verify_checksums -D backup1
pg_verify_checksums: cluster must be shut down to verify checksums
postgres@kohn:~$ pg_checksums -c -D backup1
Checksum scan completed
Files scanned:  1094
Blocks scanned: 2867
Bad checksums:  0
Data checksum version: 1

Where pg_checksums has the online verification patch applied.

As I don't think many people will take down their production servers in
order to verify checksums, verifying them on basebackups looks like a
useful use-case that is currently broken with pg_verify_checksums.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Fabien COELHO-3
In reply to this post by Michael Banck-2

Hallo Michael,

> Yeah, new rebased version attached.

Patch v8 applies cleanly, compiles, global & local make check are ok.

A few comments:

About added tests: the node is left running at the end of the script,
which is not very clean. I'd suggest to either move the added checks
before stopping, or to stop again at the end of the script, depending on
the intention.

I'm wondering (possibly again) about the existing early exit if one block
cannot be read on retry: the command should count this as a kind of bad
block, proceed on checking other files, and obviously fail in the end, but
having checked everything else and generated a report. I do not think that
this condition warrants a full stop. ISTM that under rare race conditions
(eg, an unlucky concurrent "drop database" or "drop table") this could
happen when online, although I could not trigger one despite heavy
testing, so I'm possibly mistaken.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Andres Freund
Hi,

On 2018-12-25 10:25:46 +0100, Fabien COELHO wrote:

> Hallo Michael,
>
> > Yeah, new rebased version attached.
>
> Patch v8 applies cleanly, compiles, global & local make check are ok.
>
> A few comments:
>
> About added tests: the node is left running at the end of the script, which
> is not very clean. I'd suggest to either move the added checks before
> stopping, or to stop again at the end of the script, depending on the
> intention.

Michael?


> I'm wondering (possibly again) about the existing early exit if one block
> cannot be read on retry: the command should count this as a kind of bad
> block, proceed on checking other files, and obviously fail in the end, but
> having checked everything else and generated a report. I do not think that
> this condition warrants a full stop. ISTM that under rare race conditions
> (eg, an unlucky concurrent "drop database" or "drop table") this could
> happen when online, although I could not trigger one despite heavy testing,
> so I'm possibly mistaken.

This seems like a defensible judgement call either way.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
On Sun, Feb 03, 2019 at 02:06:45AM -0800, Andres Freund wrote:
> On 2018-12-25 10:25:46 +0100, Fabien COELHO wrote:
>> About added tests: the node is left running at the end of the script, which
>> is not very clean. I'd suggest to either move the added checks before
>> stopping, or to stop again at the end of the script, depending on the
>> intention.
>
> Michael?

Unlikely P., and most likely B.

I have marked the patch as returned with feedback as it has been a
couple of weeks already.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
In reply to this post by Andres Freund
Hi,

Am Sonntag, den 03.02.2019, 02:06 -0800 schrieb Andres Freund:

> Hi,
>
> On 2018-12-25 10:25:46 +0100, Fabien COELHO wrote:
> > Hallo Michael,
> >
> > > Yeah, new rebased version attached.
> >
> > Patch v8 applies cleanly, compiles, global & local make check are ok.
> >
> > A few comments:
> >
> > About added tests: the node is left running at the end of the script, which
> > is not very clean. I'd suggest to either move the added checks before
> > stopping, or to stop again at the end of the script, depending on the
> > intention.
>
> Michael?
Uh, I kinda forgot about this, I've made the tests stop the node now.

> > I'm wondering (possibly again) about the existing early exit if one block
> > cannot be read on retry: the command should count this as a kind of bad
> > block, proceed on checking other files, and obviously fail in the end, but
> > having checked everything else and generated a report. I do not think that
> > this condition warrants a full stop. ISTM that under rare race conditions
> > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > happen when online, although I could not trigger one despite heavy testing,
> > so I'm possibly mistaken.
>
> This seems like a defensible judgement call either way.
Right now we have a few tests that explicitly check that
pg_verify_checksums fail on broken data ("foo" in the file).  Those
would then just get skipped AFAICT, which I think is the worse behaviour
, but if everybody thinks that should be the way to go, we can
drop/adjust those tests and make pg_verify_checksums skip them.

Thoughts?

In the meanwhile, v9 is attached with the above change and rebased
(without changes) to master.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

online-verification-of-checksums_V9.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Fabien COELHO-3

Hallo Michael,

>>> I'm wondering (possibly again) about the existing early exit if one block
>>> cannot be read on retry: the command should count this as a kind of bad
>>> block, proceed on checking other files, and obviously fail in the end, but
>>> having checked everything else and generated a report. I do not think that
>>> this condition warrants a full stop. ISTM that under rare race conditions
>>> (eg, an unlucky concurrent "drop database" or "drop table") this could
>>> happen when online, although I could not trigger one despite heavy testing,
>>> so I'm possibly mistaken.
>>
>> This seems like a defensible judgement call either way.
>
> Right now we have a few tests that explicitly check that
> pg_verify_checksums fail on broken data ("foo" in the file).  Those
> would then just get skipped AFAICT, which I think is the worse behaviour
> , but if everybody thinks that should be the way to go, we can
> drop/adjust those tests and make pg_verify_checksums skip them.
>
> Thoughts?

My point is that it should fail as it does, only not immediately (early
exit), but after having checked everything else. This mean avoiding
calling "exit(1)" here and there (lseek, fopen...), but taking note that
something bad happened, and call exit only in the end.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Andres Freund
Hi,

On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:

> > > > I'm wondering (possibly again) about the existing early exit if one block
> > > > cannot be read on retry: the command should count this as a kind of bad
> > > > block, proceed on checking other files, and obviously fail in the end, but
> > > > having checked everything else and generated a report. I do not think that
> > > > this condition warrants a full stop. ISTM that under rare race conditions
> > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > happen when online, although I could not trigger one despite heavy testing,
> > > > so I'm possibly mistaken.
> > >
> > > This seems like a defensible judgement call either way.
> >
> > Right now we have a few tests that explicitly check that
> > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > would then just get skipped AFAICT, which I think is the worse behaviour
> > , but if everybody thinks that should be the way to go, we can
> > drop/adjust those tests and make pg_verify_checksums skip them.
> >
> > Thoughts?
>
> My point is that it should fail as it does, only not immediately (early
> exit), but after having checked everything else. This mean avoiding calling
> "exit(1)" here and there (lseek, fopen...), but taking note that something
> bad happened, and call exit only in the end.

I can see both as being valuable (one gives you a more complete picture,
the other a quicker answer in scripts). For me that's the point where
it's the prerogative of the author to make that choice.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4


On 2/5/19 8:01 AM, Andres Freund wrote:

> Hi,
>
> On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
>>>>> I'm wondering (possibly again) about the existing early exit if one block
>>>>> cannot be read on retry: the command should count this as a kind of bad
>>>>> block, proceed on checking other files, and obviously fail in the end, but
>>>>> having checked everything else and generated a report. I do not think that
>>>>> this condition warrants a full stop. ISTM that under rare race conditions
>>>>> (eg, an unlucky concurrent "drop database" or "drop table") this could
>>>>> happen when online, although I could not trigger one despite heavy testing,
>>>>> so I'm possibly mistaken.
>>>>
>>>> This seems like a defensible judgement call either way.
>>>
>>> Right now we have a few tests that explicitly check that
>>> pg_verify_checksums fail on broken data ("foo" in the file).  Those
>>> would then just get skipped AFAICT, which I think is the worse behaviour
>>> , but if everybody thinks that should be the way to go, we can
>>> drop/adjust those tests and make pg_verify_checksums skip them.
>>>
>>> Thoughts?
>>
>> My point is that it should fail as it does, only not immediately (early
>> exit), but after having checked everything else. This mean avoiding calling
>> "exit(1)" here and there (lseek, fopen...), but taking note that something
>> bad happened, and call exit only in the end.
>
> I can see both as being valuable (one gives you a more complete picture,
> the other a quicker answer in scripts). For me that's the point where
> it's the prerogative of the author to make that choice.
>

Why not make this configurable, using a command-line option?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

Am Dienstag, den 05.02.2019, 11:30 +0100 schrieb Tomas Vondra:

> On 2/5/19 8:01 AM, Andres Freund wrote:
> > On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > > > I'm wondering (possibly again) about the existing early exit if one block
> > > > > > cannot be read on retry: the command should count this as a kind of bad
> > > > > > block, proceed on checking other files, and obviously fail in the end, but
> > > > > > having checked everything else and generated a report. I do not think that
> > > > > > this condition warrants a full stop. ISTM that under rare race conditions
> > > > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > > > happen when online, although I could not trigger one despite heavy testing,
> > > > > > so I'm possibly mistaken.
> > > > >
> > > > > This seems like a defensible judgement call either way.
> > > >
> > > > Right now we have a few tests that explicitly check that
> > > > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > > > would then just get skipped AFAICT, which I think is the worse behaviour
> > > > , but if everybody thinks that should be the way to go, we can
> > > > drop/adjust those tests and make pg_verify_checksums skip them.
> > > >
> > > > Thoughts?
> > >
> > > My point is that it should fail as it does, only not immediately (early
> > > exit), but after having checked everything else. This mean avoiding calling
> > > "exit(1)" here and there (lseek, fopen...), but taking note that something
> > > bad happened, and call exit only in the end.
> >
> > I can see both as being valuable (one gives you a more complete picture,
> > the other a quicker answer in scripts). For me that's the point where
> > it's the prerogative of the author to make that choice.

Personally, I would prefer to keep it as simple as possible for now and
get this patch committed; in my opinion the behaviour is already like
this (early exit on corrupt files) so I don't think the online
verification patch should change this.

If we see complaints about this, then I'd be happy to change it
afterwards.

> Why not make this configurable, using a command-line option?

I like this even less - this tool is about verifying checksums, so
adding options on what to do when it encounters broken pages looks out-
of-scope to me.  Unless we want to say it should generally abort on the
first issue (i.e. on wrong checksums as well).


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Stephen Frost
Greetings,

* Michael Banck ([hidden email]) wrote:

> Am Dienstag, den 05.02.2019, 11:30 +0100 schrieb Tomas Vondra:
> > On 2/5/19 8:01 AM, Andres Freund wrote:
> > > On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > > > > I'm wondering (possibly again) about the existing early exit if one block
> > > > > > > cannot be read on retry: the command should count this as a kind of bad
> > > > > > > block, proceed on checking other files, and obviously fail in the end, but
> > > > > > > having checked everything else and generated a report. I do not think that
> > > > > > > this condition warrants a full stop. ISTM that under rare race conditions
> > > > > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > > > > happen when online, although I could not trigger one despite heavy testing,
> > > > > > > so I'm possibly mistaken.
> > > > > >
> > > > > > This seems like a defensible judgement call either way.
> > > > >
> > > > > Right now we have a few tests that explicitly check that
> > > > > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > > > > would then just get skipped AFAICT, which I think is the worse behaviour
> > > > > , but if everybody thinks that should be the way to go, we can
> > > > > drop/adjust those tests and make pg_verify_checksums skip them.
> > > > >
> > > > > Thoughts?
> > > >
> > > > My point is that it should fail as it does, only not immediately (early
> > > > exit), but after having checked everything else. This mean avoiding calling
> > > > "exit(1)" here and there (lseek, fopen...), but taking note that something
> > > > bad happened, and call exit only in the end.
> > >
> > > I can see both as being valuable (one gives you a more complete picture,
> > > the other a quicker answer in scripts). For me that's the point where
> > > it's the prerogative of the author to make that choice.
... unless people here object or prefer other options, and then it's up
to discussion and hopefully some consensus comes out of it.

Also, I have to say that I really don't think the 'quicker answer'
argument holds any weight, making me question if that's a valid
use-case.  If there *isn't* an issue, which we would likely all agree is
the case the vast majority of the time that this is going to be run,
then it's going to take quite a while and anyone calling it should
expect and be prepared for that.  In the extremely rare cases, what does
exiting early actually do for us?

> Personally, I would prefer to keep it as simple as possible for now and
> get this patch committed; in my opinion the behaviour is already like
> this (early exit on corrupt files) so I don't think the online
> verification patch should change this.

I'm also in the camp of "would rather it not exit immediately, so the
extent of the issue is clear".

> If we see complaints about this, then I'd be happy to change it
> afterwards.

I really don't think this is something we should change later on in a
future release..  If the consensus is that there's really two different
but valid use-cases then we should make it configurable, but I'm not
convinced there is.

> > Why not make this configurable, using a command-line option?
>
> I like this even less - this tool is about verifying checksums, so
> adding options on what to do when it encounters broken pages looks out-
> of-scope to me.  Unless we want to say it should generally abort on the
> first issue (i.e. on wrong checksums as well).

I definitely disagree that it's somehow 'out of scope' for this tool to
skip broken pages, when we can tell that they're broken.  There is a
question here about how to handle a short read since that can happen
under normal conditions if we're unlucky.  The same is also true for
files disappearing entirely.

So, let's talk/think through a few cases:

A file with just 'foo\n' in it- could that be a page starting with
an LSN around 666F6F0A that we somehow only read the first few bytes of?
If not, why not?  I could possibly see an argument that we expect to
always get at least 512 bytes in a read, or 4K, but it seems like we
could possibly run into edge cases on odd filesystems or such.  In the
end, I'm leaning towards categorizing different things, well,
differently- a short read would be reported as a NOTICE or equivilant,
perhaps, meaning that the test case needs to do something more than just
have a file with 'foo' in it, but that is likely a good things anyway-
the test cases would be better if they were closer to real world.  Other
read failures would be reported in a more serious category assuming they
are "this really shouldn't happen" cases.  A file disappearing isn't a
"can't happen" case, and might be reported at the same 'NOTICE' level
(or maybe with a 'verbose' ption).

A file that's 8k in size and has a checksum but it's not right seems
pretty clear to me.  Might as well include a count of pages which have a
valid checksum, I would think, though perhaps only in a 'verbose' mode
would that get reported.

A completely zero'd page could also be reported at a NOTICE level or
with a count, or perhaps only with verbose.

Other thoughts about use-cases and what should happen..?

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

Am Mittwoch, den 06.02.2019, 11:39 -0500 schrieb Stephen Frost:

> * Michael Banck ([hidden email]) wrote:
> > Am Dienstag, den 05.02.2019, 11:30 +0100 schrieb Tomas Vondra:
> > > On 2/5/19 8:01 AM, Andres Freund wrote:
> > > > On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > > > > > I'm wondering (possibly again) about the existing early exit if one block
> > > > > > > > cannot be read on retry: the command should count this as a kind of bad
> > > > > > > > block, proceed on checking other files, and obviously fail in the end, but
> > > > > > > > having checked everything else and generated a report. I do not think that
> > > > > > > > this condition warrants a full stop. ISTM that under rare race conditions
> > > > > > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > > > > > happen when online, although I could not trigger one despite heavy testing,
> > > > > > > > so I'm possibly mistaken.
> > > > > > >
> > > > > > > This seems like a defensible judgement call either way.
> > > > > >
> > > > > > Right now we have a few tests that explicitly check that
> > > > > > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > > > > > would then just get skipped AFAICT, which I think is the worse behaviour
> > > > > > , but if everybody thinks that should be the way to go, we can
> > > > > > drop/adjust those tests and make pg_verify_checksums skip them.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > My point is that it should fail as it does, only not immediately (early
> > > > > exit), but after having checked everything else. This mean avoiding calling
> > > > > "exit(1)" here and there (lseek, fopen...), but taking note that something
> > > > > bad happened, and call exit only in the end.
> > > >
> > > > I can see both as being valuable (one gives you a more complete picture,
> > > > the other a quicker answer in scripts). For me that's the point where
> > > > it's the prerogative of the author to make that choice.
>
> ... unless people here object or prefer other options, and then it's up
> to discussion and hopefully some consensus comes out of it.
>
> Also, I have to say that I really don't think the 'quicker answer'
> argument holds any weight, making me question if that's a valid
> use-case.  If there *isn't* an issue, which we would likely all agree is
> the case the vast majority of the time that this is going to be run,
> then it's going to take quite a while and anyone calling it should
> expect and be prepared for that.  In the extremely rare cases, what does
> exiting early actually do for us?
>
> > Personally, I would prefer to keep it as simple as possible for now and
> > get this patch committed; in my opinion the behaviour is already like
> > this (early exit on corrupt files) so I don't think the online
> > verification patch should change this.
>
> I'm also in the camp of "would rather it not exit immediately, so the
> extent of the issue is clear".
>
> > If we see complaints about this, then I'd be happy to change it
> > afterwards.
>
> I really don't think this is something we should change later on in a
> future release..  If the consensus is that there's really two different
> but valid use-cases then we should make it configurable, but I'm not
> convinced there is.
OK, fair enough.

> > > Why not make this configurable, using a command-line option?
> >
> > I like this even less - this tool is about verifying checksums, so
> > adding options on what to do when it encounters broken pages looks out-
> > of-scope to me.  Unless we want to say it should generally abort on the
> > first issue (i.e. on wrong checksums as well).
>
> I definitely disagree that it's somehow 'out of scope' for this tool to
> skip broken pages, when we can tell that they're broken.  

I didn't mean that it's out-of-scope for pg_verify_checksums, I meant it
is out-of-scope for this patch, which adds online checking.

> There is a question here about how to handle a short read since that
> can happen under normal conditions if we're unlucky.  The same is also
> true for files disappearing entirely.
>
> So, let's talk/think through a few cases:
>
> A file with just 'foo\n' in it- could that be a page starting with
> an LSN around 666F6F0A that we somehow only read the first few bytes of?
> If not, why not?  I could possibly see an argument that we expect to
> always get at least 512 bytes in a read, or 4K, but it seems like we
> could possibly run into edge cases on odd filesystems or such.  In the
> end, I'm leaning towards categorizing different things, well,
> differently- a short read would be reported as a NOTICE or equivilant,
> perhaps, meaning that the test case needs to do something more than just
> have a file with 'foo' in it, but that is likely a good things anyway-
> the test cases would be better if they were closer to real world.  Other
> read failures would be reported in a more serious category assuming they
> are "this really shouldn't happen" cases.  A file disappearing isn't a
> "can't happen" case, and might be reported at the same 'NOTICE' level
> (or maybe with a 'verbose' ption).
In the context of this patch, we should also discern whether a
particular case is merely a notice (or warning) on an offline cluster, I
guess you think it should be?

So I've changed it such that a short read emits a "warning" message,
increments a new skippedfiles (as it is not just a skipped block)
variable and reports its number at the end - should it then exit with >
0 even if there were no wrong checksums?

> A file that's 8k in size and has a checksum but it's not right seems
> pretty clear to me.  Might as well include a count of pages which have a
> valid checksum, I would think, though perhaps only in a 'verbose' mode
> would that get reported.

What's the use for that? It already reports the number of scanned blocks
at the end, so that number is pretty easy to figure out from it and the
number of bad checksums.
 
> A completely zero'd page could also be reported at a NOTICE level or
> with a count, or perhaps only with verbose.

It is counted as a skipped block right now (well, every block that
qualifes for PageIsNew() is), but skipped blocks are not mentioned right
now. I guess the rationale is that it might lead to excessive screen
output (but then, verbose originally logged /every/ block), but you'd
have to check with the original authors.

So I have now changed behaviour so that short writes count as skipped
files and pg_verify_checksums no longer bails out on them. When this
occors a warning is written to stderr and their overall count is also
reported at the end. However, unless there are other blocks with bad
checksums, the exit status is kept at zero.

New patch attached.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

online-verification-of-checksums_V10.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Fabien COELHO-3

Hallo Mickael,

> So I have now changed behaviour so that short writes count as skipped
> files and pg_verify_checksums no longer bails out on them. When this
> occors a warning is written to stderr and their overall count is also
> reported at the end. However, unless there are other blocks with bad
> checksums, the exit status is kept at zero.

This seems fair when online, however I'm wondering whether it is when
offline. I'd say that the whole retry logic should be skipped in this
case? i.e. "if (block_retry || !online) { error message and continue }"
on both short read & checksum failure retries.

> New patch attached.

Patch applies cleanly, compiles, global & local make check ok.

I'm wondering whether it should exit(1) on "lseek" failures. Would it make
sense to skip the file and report it as such? Should it be counted as a
skippedfile?

WRT the final status, ISTM that slippedblocks & files could warrant an
error when offline, although they might be ok when online?

--
Fabien.

123456 ... 9