Online verification of checksums

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
180 messages Options
1234567 ... 9
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

Am Donnerstag, den 28.02.2019, 14:29 +0100 schrieb Fabien COELHO:

> > So I have now changed behaviour so that short writes count as skipped
> > files and pg_verify_checksums no longer bails out on them. When this
> > occors a warning is written to stderr and their overall count is also
> > reported at the end. However, unless there are other blocks with bad
> > checksums, the exit status is kept at zero.
>
> This seems fair when online, however I'm wondering whether it is when
> offline. I'd say that the whole retry logic should be skipped in this
> case? i.e. "if (block_retry || !online) { error message and continue }"
> on both short read & checksum failure retries.
Ok, the stand-alone pg_checksums program also got a PR about the LSN
skip logic not being helpful when the instance is offline and somebody
just writes /dev/urandom over the heap files: 

https://github.com/credativ/pg_checksums/pull/6

So I now tried to change the patch so that it only retries blocks when
online.

> Patch applies cleanly, compiles, global & local make check ok.
>
> I'm wondering whether it should exit(1) on "lseek" failures. Would it make
> sense to skip the file and report it as such? Should it be counted as a
> skippedfile?

Ok, I think it makes sense to march on and I changed it that way.

> WRT the final status, ISTM that slippedblocks & files could warrant an
> error when offline, although they might be ok when online?

Ok, also changed it that way.

New patch attached.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

online-verification-of-checksums_V11.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Robert Haas
In reply to this post by Michael Banck-2
On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
<[hidden email]> wrote:
> I have added a retry for this as well now, without a pg_sleep() as well.
> This catches around 80% of the half-reads, but a few slip through. At
> that point we bail out with exit(1), and the user can try again, which I
> think is fine?

Maybe I'm confused here, but catching 80% of torn pages doesn't sound
robust at all.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
Hi,

Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> <[hidden email]> wrote:
> > I have added a retry for this as well now, without a pg_sleep() as well.
> > This catches around 80% of the half-reads, but a few slip through. At
> > that point we bail out with exit(1), and the user can try again, which I
> > think is fine?
>
> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> robust at all.

The chance that pg_verify_checksums hits a torn page (at least in my
tests, see below) is already pretty low, a couple of times per 1000
runs. Maybe 4 out 5 times, the page is read fine on retry and we march
on. Otherwise, we now just issue a warning and skip the file (or so was
the idea, see below), do you think that is not acceptable?

I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
pg_verify_checksums in tight loops) with the current patch version, and
I am seeing short reads very, very rarely (maybe every 1000th run) with
a warning like:

|1174
|pg_verify_checksums: warning: could not read block 374 in file "data/base/18032/18045": read 4096 of 8192
|pg_verify_checksums: warning: could not read block 375 in file "data/base/18032/18045": read 4096 of 8192
|Files skipped: 2

The 1174 is the sequence number, the first 1173 runs of
pg_verify_checksums only skipped blocks.

However, the fact it shows two warnings for the same file means there is
something wrong here. It was continueing to the next block while I think
it should just skip to the next file on read failures. So I have changed
that now, new patch attached.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

online-verification-of-checksums_V12.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Stephen Frost
Greetings,

* Michael Banck ([hidden email]) wrote:

> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> > <[hidden email]> wrote:
> > > I have added a retry for this as well now, without a pg_sleep() as well.
> > > This catches around 80% of the half-reads, but a few slip through. At
> > > that point we bail out with exit(1), and the user can try again, which I
> > > think is fine?
> >
> > Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> > robust at all.
>
> The chance that pg_verify_checksums hits a torn page (at least in my
> tests, see below) is already pretty low, a couple of times per 1000
> runs. Maybe 4 out 5 times, the page is read fine on retry and we march
> on. Otherwise, we now just issue a warning and skip the file (or so was
> the idea, see below), do you think that is not acceptable?
>
> I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
> pg_verify_checksums in tight loops) with the current patch version, and
> I am seeing short reads very, very rarely (maybe every 1000th run) with
> a warning like:
>
> |1174
> |pg_verify_checksums: warning: could not read block 374 in file "data/base/18032/18045": read 4096 of 8192
> |pg_verify_checksums: warning: could not read block 375 in file "data/base/18032/18045": read 4096 of 8192
> |Files skipped: 2
>
> The 1174 is the sequence number, the first 1173 runs of
> pg_verify_checksums only skipped blocks.
>
> However, the fact it shows two warnings for the same file means there is
> something wrong here. It was continueing to the next block while I think
> it should just skip to the next file on read failures. So I have changed
> that now, new patch attached.
I'm confused- if previously it was continueing to the next block instead
of doing the re-read on the same block, why don't we just change it to
do the re-read on the same block properly and see if that fixes the
retry, instead of just giving up and skipping..?  I'm not necessairly
against skipping to the next file, to be clear, but I think I'd be
happier if we kept reading the file until we actually get EOF.

(I've not looked at the actual patch, just read what you wrote..)

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4
In reply to this post by Robert Haas
On 3/2/19 12:03 AM, Robert Haas wrote:

> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> <[hidden email]> wrote:
>> I have added a retry for this as well now, without a pg_sleep() as well.
>> This catches around 80% of the half-reads, but a few slip through. At
>> that point we bail out with exit(1), and the user can try again, which I
>> think is fine?
>
> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> robust at all.
>

FWIW I don't think this qualifies as torn page - i.e. it's not a full
read with a mix of old and new data. This is partial write, most likely
because we read the blocks one by one, and when we hit the last page
while the table is being extended, we may only see the fist 4kB. And if
we retry very fast, we may still see only the first 4kB.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4
In reply to this post by Stephen Frost


On 3/2/19 5:08 PM, Stephen Frost wrote:

> Greetings,
>
> * Michael Banck ([hidden email]) wrote:
>> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
>>> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
>>> <[hidden email]> wrote:
>>>> I have added a retry for this as well now, without a pg_sleep() as well.
>>>> This catches around 80% of the half-reads, but a few slip through. At
>>>> that point we bail out with exit(1), and the user can try again, which I
>>>> think is fine?
>>>
>>> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
>>> robust at all.
>>
>> The chance that pg_verify_checksums hits a torn page (at least in my
>> tests, see below) is already pretty low, a couple of times per 1000
>> runs. Maybe 4 out 5 times, the page is read fine on retry and we march
>> on. Otherwise, we now just issue a warning and skip the file (or so was
>> the idea, see below), do you think that is not acceptable?
>>
>> I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
>> pg_verify_checksums in tight loops) with the current patch version, and
>> I am seeing short reads very, very rarely (maybe every 1000th run) with
>> a warning like:
>>
>> |1174
>> |pg_verify_checksums: warning: could not read block 374 in file "data/base/18032/18045": read 4096 of 8192
>> |pg_verify_checksums: warning: could not read block 375 in file "data/base/18032/18045": read 4096 of 8192
>> |Files skipped: 2
>>
>> The 1174 is the sequence number, the first 1173 runs of
>> pg_verify_checksums only skipped blocks.
>>
>> However, the fact it shows two warnings for the same file means there is
>> something wrong here. It was continueing to the next block while I think
>> it should just skip to the next file on read failures. So I have changed
>> that now, new patch attached.
>
> I'm confused- if previously it was continueing to the next block instead
> of doing the re-read on the same block, why don't we just change it to
> do the re-read on the same block properly and see if that fixes the
> retry, instead of just giving up and skipping..?  I'm not necessairly
> against skipping to the next file, to be clear, but I think I'd be
> happier if we kept reading the file until we actually get EOF.
>
> (I've not looked at the actual patch, just read what you wrote..)
>

Notice that those two errors are actually for two consecutive blocks in
the same file. So what probably happened is that postgres started to
extend the page, and the verification tried to read the last page after
the kernel added just the first 4kB filesystem page. Then it probably
succeeded on a retry, and then the same thing happened on the next page.

I don't think EOF addresses this, though - the partial read happens
before we actually reach the end of the file.

And re-reads are not a solution either, because the second read may
still see only the first half, and then what - is it a permanent issue
(in which case it's a data corruption), or an extension in progress?

I wonder if we can simply ignore those errors entirely, if it's the last
page in the segment? We can't really check the file is "complete"
anyway, e.g. if you have multiple segments for a table, and the "middle"
one is a page shorter, we'll happily ignore that during verification.

Also, what if we're reading a file and it gets truncated (e.g. after
vacuum notices the last few pages are empty)? Doesn't that have the same
issue?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Andres Freund
Hi,


On 2019-03-02 22:49:33 +0100, Tomas Vondra wrote:

>
>
> On 3/2/19 5:08 PM, Stephen Frost wrote:
> > Greetings,
> >
> > * Michael Banck ([hidden email]) wrote:
> >> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> >>> On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> >>> <[hidden email]> wrote:
> >>>> I have added a retry for this as well now, without a pg_sleep() as well.
> >>>> This catches around 80% of the half-reads, but a few slip through. At
> >>>> that point we bail out with exit(1), and the user can try again, which I
> >>>> think is fine?
> >>>
> >>> Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> >>> robust at all.
> >>
> >> The chance that pg_verify_checksums hits a torn page (at least in my
> >> tests, see below) is already pretty low, a couple of times per 1000
> >> runs. Maybe 4 out 5 times, the page is read fine on retry and we march
> >> on. Otherwise, we now just issue a warning and skip the file (or so was
> >> the idea, see below), do you think that is not acceptable?
> >>
> >> I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
> >> pg_verify_checksums in tight loops) with the current patch version, and
> >> I am seeing short reads very, very rarely (maybe every 1000th run) with
> >> a warning like:
> >>
> >> |1174
> >> |pg_verify_checksums: warning: could not read block 374 in file "data/base/18032/18045": read 4096 of 8192
> >> |pg_verify_checksums: warning: could not read block 375 in file "data/base/18032/18045": read 4096 of 8192
> >> |Files skipped: 2
> >>
> >> The 1174 is the sequence number, the first 1173 runs of
> >> pg_verify_checksums only skipped blocks.
> >>
> >> However, the fact it shows two warnings for the same file means there is
> >> something wrong here. It was continueing to the next block while I think
> >> it should just skip to the next file on read failures. So I have changed
> >> that now, new patch attached.
> >
> > I'm confused- if previously it was continueing to the next block instead
> > of doing the re-read on the same block, why don't we just change it to
> > do the re-read on the same block properly and see if that fixes the
> > retry, instead of just giving up and skipping..?  I'm not necessairly
> > against skipping to the next file, to be clear, but I think I'd be
> > happier if we kept reading the file until we actually get EOF.
> >
> > (I've not looked at the actual patch, just read what you wrote..)
> >
>
> Notice that those two errors are actually for two consecutive blocks in
> the same file. So what probably happened is that postgres started to
> extend the page, and the verification tried to read the last page after
> the kernel added just the first 4kB filesystem page. Then it probably
> succeeded on a retry, and then the same thing happened on the next page.
>
> I don't think EOF addresses this, though - the partial read happens
> before we actually reach the end of the file.
>
> And re-reads are not a solution either, because the second read may
> still see only the first half, and then what - is it a permanent issue
> (in which case it's a data corruption), or an extension in progress?
>
> I wonder if we can simply ignore those errors entirely, if it's the last
> page in the segment? We can't really check the file is "complete"
> anyway, e.g. if you have multiple segments for a table, and the "middle"
> one is a page shorter, we'll happily ignore that during verification.
>
> Also, what if we're reading a file and it gets truncated (e.g. after
> vacuum notices the last few pages are empty)? Doesn't that have the same
> issue?

I gotta say, my conclusion from this debate is that it's simply a
mistake to do this without involvement of the server that can use
locking to prevent these kind of issues.  It seems pretty absurd to me
to have hacky workarounds around partial writes of a live server, around
truncation, etc, even though the server has ways to deal with that.

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
On Sat, Mar 02, 2019 at 02:00:31PM -0800, Andres Freund wrote:
> I gotta say, my conclusion from this debate is that it's simply a
> mistake to do this without involvement of the server that can use
> locking to prevent these kind of issues.  It seems pretty absurd to me
> to have hacky workarounds around partial writes of a live server, around
> truncation, etc, even though the server has ways to deal with that.

I agree with Andres on this one.  We are never going to make this
stuff safe if we don't handle page reads with the proper locks because
of torn pages.  What I think we should do is provide a SQL function
which reads a page in shared mode, and then checks its checksum if its
LSN is older than the previous redo point.  This discards cases with
rather hot pages, but if the page is hot enough then the backend
re-reading the page would just do the same by verifying the page
checksum by itself.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4
On 3/3/19 12:48 AM, Michael Paquier wrote:

> On Sat, Mar 02, 2019 at 02:00:31PM -0800, Andres Freund wrote:
>> I gotta say, my conclusion from this debate is that it's simply a
>> mistake to do this without involvement of the server that can use
>> locking to prevent these kind of issues.  It seems pretty absurd to me
>> to have hacky workarounds around partial writes of a live server, around
>> truncation, etc, even though the server has ways to deal with that.
>
> I agree with Andres on this one.  We are never going to make this
> stuff safe if we don't handle page reads with the proper locks because
> of torn pages.  What I think we should do is provide a SQL function
> which reads a page in shared mode, and then checks its checksum if its
> LSN is older than the previous redo point.  This discards cases with
> rather hot pages, but if the page is hot enough then the backend
> re-reading the page would just do the same by verifying the page
> checksum by itself.

Handling torn pages is not difficult, and the patch already does that
(it reads LSN of the last checkpoint LSN from the control file, and uses
it the same way basebackup does). That's working since (at least)
September, so I don't see how would the SQL function help with this?

The other issue (raised recently) is partial reads, where we read only a
fraction of the page. Basebackup simply ignores such pages, likely on
the assumption that it's either concurrent extension or truncation (in
which case it's newer than the last checkpoint LSN anyway). So maybe we
should do the same thing here. As I mentioned before, we can't reliably
detect incomplete segments anyway (at least I believe that's the case).

You and Andres may be right that trying to verify checksums online
without close interaction with the server is ultimately futile (or at
least overly complex). But I'm not sure those issues (torn pages and
partial reads) are very good arguments, considering basebackup has to
deal with them too. Not sure.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Fabien COELHO-3
In reply to this post by Michael Paquier-2

Bonjour Michaël,

>> I gotta say, my conclusion from this debate is that it's simply a
>> mistake to do this without involvement of the server that can use
>> locking to prevent these kind of issues.  It seems pretty absurd to me
>> to have hacky workarounds around partial writes of a live server, around
>> truncation, etc, even though the server has ways to deal with that.
>
> I agree with Andres on this one.  We are never going to make this stuff
> safe if we don't handle page reads with the proper locks because of torn
> pages. What I think we should do is provide a SQL function which reads a
> page in shared mode, and then checks its checksum if its LSN is older
> than the previous redo point.  This discards cases with rather hot
> pages, but if the page is hot enough then the backend re-reading the
> page would just do the same by verifying the page checksum by itself. --
> Michael
My 0.02€ about that, as one of the reviewer of the patch:

I agree that having a server function (extension?) to do a full checksum
verification, possibly bandwidth-controlled, would be a good thing.
However it would have side effects, such as interfering deeply with the
server page cache, which may or may not be desirable.

On the other hand I also see value in an independent system-level external
tool capable of a best effort checksum verification: the current check
that the cluster is offline to prevent pg_verify_checksum from running is
kind of artificial, and when online simply counting
online-database-related checksum issues looks like a reasonable
compromise.

So basically I think that allowing pg_verify_checksum to run on an online
cluster is still a good thing, provided that expected errors are correctly
handled.

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Banck-2
In reply to this post by Stephen Frost
Hi,

Am Samstag, den 02.03.2019, 11:08 -0500 schrieb Stephen Frost:h

> * Michael Banck ([hidden email]) wrote:
> > Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas:
> > > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck
> > > <[hidden email]> wrote:
> > > > I have added a retry for this as well now, without a pg_sleep() as well.
> > > > This catches around 80% of the half-reads, but a few slip through. At
> > > > that point we bail out with exit(1), and the user can try again, which I
> > > > think is fine?
> > >
> > > Maybe I'm confused here, but catching 80% of torn pages doesn't sound
> > > robust at all.
> >
> > The chance that pg_verify_checksums hits a torn page (at least in my
> > tests, see below) is already pretty low, a couple of times per 1000
> > runs. Maybe 4 out 5 times, the page is read fine on retry and we march
> > on. Otherwise, we now just issue a warning and skip the file (or so was
> > the idea, see below), do you think that is not acceptable?
> >
> > I re-ran the tests (concurrent createdb/pgbench -i -s 50/dropdb and
> > pg_verify_checksums in tight loops) with the current patch version, and
> > I am seeing short reads very, very rarely (maybe every 1000th run) with
> > a warning like:
> >
> > > 1174
> > > pg_verify_checksums: warning: could not read block 374 in file "data/base/18032/18045": read 4096 of 8192
> > > pg_verify_checksums: warning: could not read block 375 in file "data/base/18032/18045": read 4096 of 8192
> > > Files skipped: 2
> >
> > The 1174 is the sequence number, the first 1173 runs of
> > pg_verify_checksums only skipped blocks.
> >
> > However, the fact it shows two warnings for the same file means there is
> > something wrong here. It was continueing to the next block while I think
> > it should just skip to the next file on read failures. So I have changed
> > that now, new patch attached.
>
> I'm confused- if previously it was continueing to the next block instead
> of doing the re-read on the same block, why don't we just change it to
> do the re-read on the same block properly and see if that fixes the
> retry, instead of just giving up and skipping..?  

It was re-reading the block and continueing to read the file after it
got a short read even on re-read.

> I'm not necessairly against skipping to the next file, to be clear,
> but I think I'd be happier if we kept reading the file until we
> actually get EOF.

So if we read half a block twice we should seek() to the next block and
continue till EOF, ok. I think in most cases those pages will be new
anyway and there will be no checksum check, but it sounds like a cleaner
approach. I've seen one or two examples where we did successfully verify
the checksum of a page after a half-read, so it might be worth it.

The alternative would be to just bail out early and skip the file on the
first short read and (possibly) log a skipped file.

I still think that an external checksum verification tool has some
merit, given that basebackup does it and the current offline requirement
is really not useful in practise.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
In reply to this post by Tomas Vondra-4
On Sun, Mar 03, 2019 at 03:12:51AM +0100, Tomas Vondra wrote:
> You and Andres may be right that trying to verify checksums online
> without close interaction with the server is ultimately futile (or at
> least overly complex). But I'm not sure those issues (torn pages and
> partial reads) are very good arguments, considering basebackup has to
> deal with them too. Not sure.

FWIW, I don't think that the backend is right in its way of checking
checksums the way it does currently either with warnings and a limited
set of failures generated.  I raised concerns about that unfortunately
after 11 has been GA'ed, which was too late, so this time, for this
patch, I prefer raising them before the fact and I'd rather not spread
this kind of methodology around the core code more and more.  I work a
lot with virtualization, and I have seen ESX hanging around I/O
requests from time to time depending on the environment used (which is
actually wrong, anyway, but a lot of tests happen on a daily basis on
the stuff I work on).  What's presented on this thread is *never*
going to be 100% safe, and would generate false positives which can be
confusing for the user.  This is not a good sign.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
In reply to this post by Michael Banck-2
On Sun, Mar 03, 2019 at 11:51:48AM +0100, Michael Banck wrote:
> I still think that an external checksum verification tool has some
> merit, given that basebackup does it and the current offline requirement
> is really not useful in practise.

I am not going to argue again about the way checksum verification is
done in a base backup..  :)

Being able to do an online verification of checksums has a lot of
value, do not take me wrong, and an SQL interface to do that does not
prevent having a frontend wrapper using it.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
In reply to this post by Fabien COELHO-3
On Sun, Mar 03, 2019 at 07:58:26AM +0100, Fabien COELHO wrote:
> I agree that having a server function (extension?) to do a full checksum
> verification, possibly bandwidth-controlled, would be a good thing. However
> it would have side effects, such as interfering deeply with the server page
> cache, which may or may not be desirable.

In what is that different from VACUUM or a sequential scan?  It is
possible to use buffer ring replacement strategies in such cases using
the normal clock-sweep algorithm, so that scanning a range of pages
does not really impact Postgres shared buffer cache.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Fabien COELHO-3

Bonjour Michaël,

>> I agree that having a server function (extension?) to do a full checksum
>> verification, possibly bandwidth-controlled, would be a good thing. However
>> it would have side effects, such as interfering deeply with the server page
>> cache, which may or may not be desirable.
>
> In what is that different from VACUUM or a sequential scan?

Scrubbing would read all files, not only relation data? I'm unsure about
what does VACUUM, but it is probably pretty similar.

> It is possible to use buffer ring replacement strategies in such cases
> using the normal clock-sweep algorithm, so that scanning a range of
> pages does not really impact Postgres shared buffer cache.

Good! I did not know that there was an existing strategy to avoid filling
the cache.

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Magnus Hagander-2
In reply to this post by Michael Paquier-2
On Mon, Mar 4, 2019, 04:10 Michael Paquier <[hidden email]> wrote:
On Sun, Mar 03, 2019 at 07:58:26AM +0100, Fabien COELHO wrote:
> I agree that having a server function (extension?) to do a full checksum
> verification, possibly bandwidth-controlled, would be a good thing. However
> it would have side effects, such as interfering deeply with the server page
> cache, which may or may not be desirable.

In what is that different from VACUUM or a sequential scan?  It is
possible to use buffer ring replacement strategies in such cases using
the normal clock-sweep algorithm, so that scanning a range of pages
does not really impact Postgres shared buffer cache.


Yeah, I wouldn't worry too much about the effect on the postgres cache when that is done. It could of course have a much worse impact on the os cache or on the "smart" (aka dumb) storage system cache. But that effect will be there just as much with a separate tool. 

/Magnus 

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4
In reply to this post by Michael Paquier-2


On 3/4/19 4:09 AM, Michael Paquier wrote:

> On Sun, Mar 03, 2019 at 07:58:26AM +0100, Fabien COELHO wrote:
>> I agree that having a server function (extension?) to do a full checksum
>> verification, possibly bandwidth-controlled, would be a good thing. However
>> it would have side effects, such as interfering deeply with the server page
>> cache, which may or may not be desirable.
>
> In what is that different from VACUUM or a sequential scan?  It is
> possible to use buffer ring replacement strategies in such cases using
> the normal clock-sweep algorithm, so that scanning a range of pages
> does not really impact Postgres shared buffer cache.
> --

But Fabien was talking about page cache, not shared buffers. And we
can't use custom ring buffer there. OTOH I don't see why accessing the
file through SQL function would behave any differently than direct
access (i.e. what the tool does now).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Tomas Vondra-4
In reply to this post by Michael Paquier-2



On 3/4/19 2:00 AM, Michael Paquier wrote:

> On Sun, Mar 03, 2019 at 03:12:51AM +0100, Tomas Vondra wrote:
>> You and Andres may be right that trying to verify checksums online
>> without close interaction with the server is ultimately futile (or at
>> least overly complex). But I'm not sure those issues (torn pages and
>> partial reads) are very good arguments, considering basebackup has to
>> deal with them too. Not sure.
>
> FWIW, I don't think that the backend is right in its way of checking
> checksums the way it does currently either with warnings and a limited
> set of failures generated.  I raised concerns about that unfortunately
> after 11 has been GA'ed, which was too late, so this time, for this
> patch, I prefer raising them before the fact and I'd rather not spread
> this kind of methodology around the core code more and more.

I still don't understand what issue you see in how basebackup verifies
checksums. Can you point me to the explanation you've sent after 11 was
released?

> I work a lot with virtualization, and I have seen ESX hanging around
> I/O requests from time to time depending on the environment used
> (which is actually wrong, anyway, but a lot of tests happen on a
> daily basis on the stuff I work on).  What's presented on this thread
> is *never* going to be 100% safe, and would generate false positives
> which can be confusing for the user.  This is not a good sign.

So you have a workload/configuration that actually results in data
corruption yet we fail to detect that? Or we generate false positives?
Or what do you mean by "100% safe" here?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Magnus Hagander-2
In reply to this post by Tomas Vondra-4


On Mon, Mar 4, 2019 at 3:02 PM Tomas Vondra <[hidden email]> wrote:


On 3/4/19 4:09 AM, Michael Paquier wrote:
> On Sun, Mar 03, 2019 at 07:58:26AM +0100, Fabien COELHO wrote:
>> I agree that having a server function (extension?) to do a full checksum
>> verification, possibly bandwidth-controlled, would be a good thing. However
>> it would have side effects, such as interfering deeply with the server page
>> cache, which may or may not be desirable.
>
> In what is that different from VACUUM or a sequential scan?  It is
> possible to use buffer ring replacement strategies in such cases using
> the normal clock-sweep algorithm, so that scanning a range of pages
> does not really impact Postgres shared buffer cache.
> --

But Fabien was talking about page cache, not shared buffers. And we
can't use custom ring buffer there. OTOH I don't see why accessing the
file through SQL function would behave any differently than direct
access (i.e. what the tool does now).

It shouldn't.

One other thought that I had around this though, which if it's been covered before and I missed it, please disregard :)

The *online* version of the tool is very similar to running pg_basebackup to /dev/null, is it not? Except it doesn't set the cluster to backup mode. Perhaps what we really want is a simpler way to do *that*. That wouldn't necessarily make it a SQL callable function, but it would be a CLI tool that would call a command on a walsender for example.

(We'd of course still need the standalone tool for offline checks)
 
--
Reply | Threaded
Open this post in threaded view
|

Re: Online verification of checksums

Michael Paquier-2
In reply to this post by Tomas Vondra-4
On Mon, Mar 04, 2019 at 03:08:09PM +0100, Tomas Vondra wrote:
> I still don't understand what issue you see in how basebackup verifies
> checksums. Can you point me to the explanation you've sent after 11 was
> released?

The history is mostly on this thread:
https://www.postgresql.org/message-id/20181020044248.GD2553@...

> So you have a workload/configuration that actually results in data
> corruption yet we fail to detect that? Or we generate false positives?
> Or what do you mean by "100% safe" here?

What's proposed on this thread could generate false positives.  Checks
which have deterministic properties and clean failure handling are
reliable when it comes to reports.
--
Michael

signature.asc (849 bytes) Download Attachment
1234567 ... 9