Offline enabling/disabling of data checksums

classic Classic list List threaded Threaded
149 messages Options
12345678
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
On Wed, Mar 13, 2019 at 10:44:03AM +0100, Fabien COELHO wrote:
> Yep. That is the issue I think is preventable by fsyncing updated data
> *then* writing & syncing the control file, and that should be done by
> pg_checksums.

Well, pg_rewind works similarly: control file gets updated and then
the whole data directory gets flushed.  In my opinion, the take here
is that we log something after the sync of the whole data folder is
done, so as in the event of a crash an operator can make sure that
everything has happened.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
In reply to this post by Michael Paquier-2
Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:

> On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > I'm not sure of the punctuation logic on the help line: the first sentence
> > does not end with a ".". I could not find an instance of this style in other
> > help on pg commands. I'd suggest "check data checksums (default)" would work
> > around and be more in line with other commands help.
>
> Good idea, let's do that.
>
> > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> > then started a "pg_checksums --enable" on a stopped cluster, then started
> > the cluster while the enabling was in progress, then connected and updated
> > data.
>
> Well, yes, don't do that.  You can get into the same class of problems
> while running pg_rewind, pg_basebackup or even pg_resetwal once the
> initial control file check is done for each one of these tools.
>
> > I do not think it is a good thing that two commands can write to the data
> > directory at the same time, really.
>
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel.  That would be...  Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

> > About fsync-ing: ISTM that it is possible that the control file is written
> > to disk while data are still not written, so a failure in between would
> > leave the cluster with an inconsistent state. I think that it should fsync
> > the data *then* update the control file and fsync again on that one.
>
> if --enable is used, we fsync the whole data directory after writing
> all the blocks and updating the control file at the end.  The case you
> are referring to here is in fsync_pgdata(), not pg_checksums actually,
> because you could reach the same state after a simple initdb.  

But in the initdb case you don't have any valuable data in the instance
yet.

> It
> could be possible to reach a state where the control file has
> checksums enabled and some blocks are not correctly synced, still you
> would notice rather quickly if the server is in an incorrect state at
> the follow-up startup.

Would you?  I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3
In reply to this post by Michael Paquier-2

Hello,

>> Yep. That is the issue I think is preventable by fsyncing updated data
>> *then* writing & syncing the control file, and that should be done by
>> pg_checksums.
>
> Well, pg_rewind works similarly: control file gets updated and then
> the whole data directory gets flushed.

So it is basically prone to the same potential issue?

> In my opinion, the take here is that we log something after the sync of
> the whole data folder is done, so as in the event of a crash an operator
> can make sure that everything has happened.

I do not understand. I'm basically only suggesting to reorder 3 lines and
add an fsync so that this potential problem goes away, see attached poc
(which does not compile because pg_fsync is in the backend only, however
it works with fsync but on linux, I'm unsure of the portability,
probably pg_fsync should be moved to port or something).

--
Fabien.

checksum-fsync-reorder.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Magnus Hagander-2
In reply to this post by Michael Banck-2
On Wed, Mar 13, 2019 at 11:41 AM Michael Banck <[hidden email]> wrote:
Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier:
> On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote:
> > I'm not sure of the punctuation logic on the help line: the first sentence
> > does not end with a ".". I could not find an instance of this style in other
> > help on pg commands. I'd suggest "check data checksums (default)" would work
> > around and be more in line with other commands help.
>
> Good idea, let's do that.
>
> > I slowed down pg_checksums by adding a 0.1s sleep when scanning a new file,
> > then started a "pg_checksums --enable" on a stopped cluster, then started
> > the cluster while the enabling was in progress, then connected and updated
> > data.
>
> Well, yes, don't do that.  You can get into the same class of problems
> while running pg_rewind, pg_basebackup or even pg_resetwal once the
> initial control file check is done for each one of these tools.
>
> > I do not think it is a good thing that two commands can write to the data
> > directory at the same time, really.
>
> We don't prevent either a pg_resetwal and a pg_basebackup to run in
> parallel.  That would be...  Interesting.

But does pg_basebackup actually change the primary's data directory? I
don't think so, so that does not seem to be a problem.

pg_rewind and pg_resetwal are (TTBOMK) pretty quick operations, while
pg_checksums can potentially run for hours, so I see the point of taking
extra care here.

On the other hand, two pg_checksums running in parallel also seem not
much of a problem as the cluster is offline anyway.

What is much more of a footgun is one DBA starting pg_checksums --enable
on a 1TB cluster, then going for lunch, and then the other DBA wondering
why the DB is down and starting the instance again.

We read the control file on pg_checksums' startup, so once pg_checksums
finishs it'll write the old checkpoint LSN into pg_control (along with
the updated checksum version). This is pilot error, but I think we
should try to guard against it.

I propose we re-read the control file for the enable case after we
finished operating on all files and (i) check the instance is still
offline and (ii) update the checksums version from there. That should be
a small but worthwhile change that could be done anyway.

In (i) you need to also check that is' not offline *again*. Somebody could start *and* stop the database while pg_checksums is running. But that should hopefully be enough to check the time field?


Another option would be to add a new feature which reliably blocks an
instance from starting up due to maintenance - either a new control file
field, some message in postmaster.pid (like "pg_checksums maintenance in
progress") that would prevent pg_ctl or postgres/postmaster from
starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
"pg_checksums in progress' or some other trigger file.

Instead of overloading yet another thing on postmaster.pid, it might be better to just have a separate file that if it exists, blocks startup with a message defined as the content of that file?


> It
> could be possible to reach a state where the control file has
> checksums enabled and some blocks are not correctly synced, still you
> would notice rather quickly if the server is in an incorrect state at
> the follow-up startup.

Would you?  I think I'm with Fabien on this one and it seems worthwhile
to run fsync_pgdata() before and after update_controlfile() - the second
one should be really quick anyway. 

Also, I suggest to maybe add a notice in verbose mode that we are
syncing the data directory - otherwise the user might wonder what's
going on at 100% done, though I haven't seen a large delay in my tests
so far.

Seems like a good idea -- there certainly could be a substantial delay there depending on data size and underlying storage.
 
--
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Sergei Kornilov
In reply to this post by Michael Paquier-2
Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.

Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3
In reply to this post by Michael Banck-2

Hallo Michael,

> I propose we re-read the control file for the enable case after we
> finished operating on all files and (i) check the instance is still
> offline and (ii) update the checksums version from there. That should be
> a small but worthwhile change that could be done anyway.

That looks like a simple but mostly effective guard.

> Another option would be to add a new feature which reliably blocks an
> instance from starting up due to maintenance - either a new control file
> field, some message in postmaster.pid (like "pg_checksums maintenance in
> progress") that would prevent pg_ctl or postgres/postmaster from
> starting up like 'FATAL:  bogus data in lock file "postmaster.pid":
> "pg_checksums in progress' or some other trigger file.

I think that a clear cluster-locking can-be-overriden-if-needed
shared-between-commands mechanism would be a good thing (tm), although it
requires some work.

My initial suggestion was to update the control file with an appropriate
state, eg some general "admin command in progress", but I understood that
it is rejected, and for another of your patch it seems that the
"postmaster.pid" file is the right approach. Fine with me, the point is
that it should be effective and consistent accross all relevant commands.

A good point about the "postmaster.pid" trick, when it does not contain
the posmaster pid, is that overriding is as simple as "rm postmaster.pid".

>> It could be possible to reach a state where the control file has
>> checksums enabled and some blocks are not correctly synced, still you
>> would notice rather quickly if the server is in an incorrect state at
>> the follow-up startup.
>
> Would you?  I think I'm with Fabien on this one and it seems worthwhile
> to run fsync_pgdata() before and after update_controlfile() - the second
> one should be really quick anyway. 

Note that fsync_pgdata is kind of heavy, it recurses everywhere. I think
that a simple fsync on the control file only is enough.

> Also, I suggest to maybe add a notice in verbose mode that we are
> syncing the data directory - otherwise the user might wonder what's
> going on at 100% done, though I haven't seen a large delay in my tests
> so far.

I agree, as it might not be cheap.

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
In reply to this post by Magnus Hagander-2
Hi,

Am Mittwoch, den 13.03.2019, 11:47 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:41 AM Michael Banck <[hidden email]> wrote:
> > I propose we re-read the control file for the enable case after we
> > finished operating on all files and (i) check the instance is still
> > offline and (ii) update the checksums version from there. That should be
> > a small but worthwhile change that could be done anyway.
>
> In (i) you need to also check that is' not offline *again*. Somebody
> could start *and* stop the database while pg_checksums is running. But
> that should hopefully be enough to check the time field?

Good point.

> > Also, I suggest to maybe add a notice in verbose mode that we are
> > syncing the data directory - otherwise the user might wonder what's
> > going on at 100% done, though I haven't seen a large delay in my tests
> > so far.
>
> Seems like a good idea -- there certainly could be a substantial delay
> there depending on data size and underlying storage.

The attached patch should do the above, on top of Michael's last
patchset.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

0001-Guard-against-concurrent-cluster-changes-when-enabli.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Magnus Hagander-2
In reply to this post by Sergei Kornilov
On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <[hidden email]> wrote:
Hi

One new question from me: how about replication?
Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.

Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is no other way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done -- but not for this version).

You would have the same with PITR backups for example. And especially if you have some tool that does block or segment level differential.

Of course, we have to make sure that this actually fails.

I wonder if we should bring out the big hammer and actually change the system id in pg_control when checksums are enabled/disabled by this tool? That should make it clear to any tool that it's changed.


Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA

You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is a cheap check and should be done. 

--
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> > Also we support ./configure --with-blocksize=(not equals 8)? make
> > check on HEAD fails for me. If we support this - i think we need
> > recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>
> You mean if the backend and pg_checksums is built with different
> blocksize? Yeah, that sounds like something which is a cheap check and
> should be done. 

I've been doing that in my pg_checksums fork for a while (as it further
removed from the Postgres binaries) but yeah we should check for that as
well in pg_checksums, see attached patch.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

0001-Check-data-directory-block-size.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Sergei Kornilov
In reply to this post by Magnus Hagander-2
Hi

>> One new question from me: how about replication?
>> Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
>> Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is no other way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done -- but not for this version).

I mean this should be at least documented.
Change system id... Maybe is reasonable

>> Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>
> You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is a cheap check and should be done.

Yep

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Sergei Kornilov
In reply to this post by Michael Banck-2
Hi,

>>  > Also we support ./configure --with-blocksize=(not equals 8)? make
>>  > check on HEAD fails for me. If we support this - i think we need
>>  > recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>>
>>  You mean if the backend and pg_checksums is built with different
>>  blocksize? Yeah, that sounds like something which is a cheap check and
>>  should be done.
>
> I've been doing that in my pg_checksums fork for a while (as it further
> removed from the Postgres binaries) but yeah we should check for that as
> well in pg_checksums, see attached patch.

Seems good. And I think we need backpath this check to pg11. similar to cross-version compatibility checks

regards, Sergei

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Magnus Hagander-2
In reply to this post by Sergei Kornilov
On Wed, Mar 13, 2019 at 12:40 PM Sergei Kornilov <[hidden email]> wrote:
Hi

>> One new question from me: how about replication?
>> Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without.
>> Or cluster with checksums, then disable checksums on primary, but standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite clearly requires a rebuild of the standby, there is no other way (this is one of the reasons for the online enabling in that patch, so I still hope we can get that done -- but not for this version).

I mean this should be at least documented.
Change system id... Maybe is reasonable

I think this is dangerous enough that it needs to be enforced and not documented.

Most people who care about checksums are also going to be having either replication or backup...


>> Also we support ./configure --with-blocksize=(not equals 8)? make check on HEAD fails for me. If we support this - i think we need recheck BLCKSZ between compiled pg_checksum and used in PGDATA
>
> You mean if the backend and pg_checksums is built with different blocksize? Yeah, that sounds like something which is a cheap check and should be done.

Yep

This one I could more live with it only being a documented problem rather than enforced, but it also seems very simple to enforce.
 
--
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
In reply to this post by Michael Banck-2
On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote:
> The attached patch should do the above, on top of Michael's last
> patchset.

What you are doing here looks like a good defense in itself.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
In reply to this post by Sergei Kornilov
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:
> Seems good. And I think we need backpath this check to pg11. similar
> to cross-version compatibility checks

Good point raised, a backpatch looks adapted.  It would be nice to get
into something more dynamic, but pg_checksum_block() uses directly
BLCKSZ :(
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
In reply to this post by Magnus Hagander-2
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:

> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <[hidden email]> wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
In reply to this post by Magnus Hagander-2
Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> I think this is dangerous enough that it needs to be enforced and not
> documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Another possibility would be to extend the replication protocol's
IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Magnus Hagander-2
In reply to this post by Michael Banck-2
On Wed, Mar 13, 2019 at 4:46 PM Michael Banck <[hidden email]> wrote:
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov <[hidden email]> wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
>
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way

What about shutting down both and running pg_checksums --enable on the
standby as well?

That sounds pretty fragile to me. But if we can prove that the user has done things in the right order, sure. But how can we do that in an offline process? what if the user just quickly restarted the primary note after the standby had been shut down? We'll need to somehow validate it across the nodes.. 
--
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Magnus Hagander-2
In reply to this post by Michael Banck-2
On Wed, Mar 13, 2019 at 4:51 PM Michael Banck <[hidden email]> wrote:
Hi,

Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander:
> I think this is dangerous enough that it needs to be enforced and not
> documented.

Changing the cluster ID might have some other side-effects, I think
there are several cloud-native 3rd party solutions that use the cluster
ID as some kind of unique identifier for an instance. It might not be an
issue in practise, but then again, it might break other stuff down the
road.

Well, whatever we do they have to update, right? If we're not changing it, then we're basically saying that it's (systemid, checksums) that is the identifier of the cluster, not just systemid. They'd have to go around and check each node individually for the configuration and not just use systemid anyway, so what's the actual win?


Another possibility would be to extend the replication protocol's
IDENTIFY_SYSTEM command to also report the checksum version so that the
standby can check against the local control file on startup. But I am
not sure we can easily extend IDENTIFY_SYSTEM this way nor whether we
should for this rather corner-casey thing?

We could, but is it really a win in those scenarios? Vs just making the systemid different? With systemid being different it's obvious that something needs to be done. If it's not then at the best, if we check it in the standby startup, the standby won't start. But people can still end up with things like unusuable/corrupt backups for example. 

--
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
In reply to this post by Sergei Kornilov
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote:
> Seems good. And I think we need backpath this check to pg11. similar
> to cross-version compatibility checks

+    fprintf(stderr, _("%s: data directory block size %d is different to compiled-in block size %d.\n"),
+            progname, ControlFile->blcksz, BLCKSZ);
The error message looks grammatically a bit weird to me.  What about
the following?  Say:
"database block size of %u is different from supported block size of
%u."
Better ideas are welcome.

Please note that hose integers are unsigned by the way.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
In reply to this post by Magnus Hagander-2
On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote:
> Enabling or disabling the checksums offline on the master quite clearly
> requires a rebuild of the standby, there is no other way (this is one of
> the reasons for the online enabling in that patch, so I still hope we can
> get that done -- but not for this version).

I am curious to understand why this would require a rebuild of the
standby.  Technically FPWs don't update the checksum of a page when it
is WAL-logged, so even if a primary and a standby don't agree on the
checksum configuration, it is the timing where pages are flushed in
the local instance which counts for checksum correctness.

> You mean if the backend and pg_checksums is built with different blocksize?
> Yeah, that sounds like something which is a cheap check and should be done.

Yes, we should check after that, checksum calculation uses BLCKSZ with
a hardcoded value, so a mismatch would cause computation failures.  It
could be possible to not have this restriction if we made the block
size an argument of the checksum calculation though.
--
Michael

signature.asc (849 bytes) Download Attachment
12345678