Offline enabling/disabling of data checksums

classic Classic list List threaded Threaded
149 messages Options
1 ... 345678
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
Hi,

Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
> On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> > 1. There's a typo in line 578 which makes it fail to compile:
> >
> > > src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in this function)
> > >   }y
>
> I am wondering where you got this one.  My local branch does not have
> it, and the patch I sent does not seem to have it either.

Mea culpa, I must've fat fingered something in the editor before
applying your patch, sorry. I should've double-checked.

> > 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> > with the controlfile_path?
>
> PG_MODE_DISABLE needs controlfile_path as well.  We could make the
> cleanup only available when using --enable, the code just looked more
> simple in its current shape.  I think it's just more simple to set
> everything unconditionally.  This code may become more complicated in
> the future.

Ok.

> > 3. There's (I think) leftover debug output in the following places:
> >
> > > + printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> > > + controlfile_path_temp);
> > > + printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> > > + controlfile_path);
> > > + printf(_("Syncing data folder\n"));
> >
> > (that one is debatable, we are mentioning this only in verbose mode in
> > pg_basebackup but pg_checksums is more chatty anyway, so probably
> > fine).
>
> This is wanted.  Many folks have been complaning on this thread about
> crashes and such, surely we want logs about what happens :)
>
> > > + printf(_("Updating control file\n"));
> >
> > Besides to the syncing message (which is user-relevant cause they might
> > wonder what is taking so long), the others seem to be implementation
> > details we don't need to tell the user about.
>
> Perhaps having them under --verbose makes more sense?

Well if we think it is essential in order to tell the user what happened
in the case of an error, it shouldn't be verbose I guess.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3
In reply to this post by Michael Paquier-2

Bonjour Michaël-san,

> Yes, that would be nice, for now I have focused.  For pg_resetwal yes
> we could do it easily.  Would you like to send a patch?

Here is a proposal for "pg_resetwal".

The implementation basically removes a lot of copy paste and calls the
new update_controlfile function instead. I like removing useless code:-)

The reserwal implementation was doing a rm/create cycle, which was leaving
a small window for losing the controlfile. Not neat.

I do not see the value of *not* fsyncing the control file when writing it,
as it is by definition very precious, so I added a fsync. The server side
branch uses the backend available "pg_fsync", which complies with server
settings there and can do nothing if fsync is disabled.

Maybe the two changes could be committed separately.

--
Fabien.

controlfile-update-1.patch (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
On Sun, Mar 17, 2019 at 10:01:20AM +0100, Fabien COELHO wrote:
> The implementation basically removes a lot of copy paste and calls the new
> update_controlfile function instead. I like removing useless code:-)

Yes, I spent something like 10 minutes looking at that code yesterday
and I agree that removing the control file to recreate it is not
really necessary, even if the window between its removal and
recreation is short.

> I do not see the value of *not* fsyncing the control file when writing it,
> as it is by definition very precious, so I added a fsync. The server side
> branch uses the backend available "pg_fsync", which complies with server
> settings there and can do nothing if fsync is disabled.

The issue here is that trying to embed directly the fsync routines
from file_utils.c into pg_resetwal.c messes up the inclusions because
pg_resetwal.c includes backend-side includes, which themselves touch
fd.h :(

In short your approach avoids some extra mess with the include
dependencies. .

> Maybe the two changes could be committed separately.

I was thinking about this one, and for pg_rewind we don't care about
the fsync of the control file because the full data folder gets
fsync'd afterwards and in the event of a crash in the middle of a
rewind the target data folder is surely not something to use, but we
do for pg_checksums, and we do for pg_resetwal.  Even if there is the
argument that usually callers of update_controlfile() would care a
lot about the control file and fsync it, I think that we need some
control on if we do the fsync or not because many tools have a
--no-sync and that should be fully respected.  So while your patch is
on a good track, I would suggest to do the following things to
complete it:
- Add an extra argument bits16 to update_controlfile to pass a set of
optional flags, with NOSYNC being the only and current value.  The
default is to flush the file.
- Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.
- And then delete UpdateControlFile() in xlog.c, and use
update_controlfile() instead to remove even more code.  The version in
xlog.c uses BasicOpenFile(), so we should use also that in
update_controlfile() instead of OpenTransientFile().  As any errors
result in a PANIC we don't care about leaking fds.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3

Michaël-san,

> The issue here is that trying to embed directly the fsync routines
> from file_utils.c into pg_resetwal.c messes up the inclusions because
> pg_resetwal.c includes backend-side includes, which themselves touch
> fd.h :(
>
> In short your approach avoids some extra mess with the include
> dependencies. .

I could remove the two "catalog/" includes from pg_resetwal, I assume that
you meant these ones.

>> Maybe the two changes could be committed separately.
>
> I was thinking about this one, and for pg_rewind we don't care about
> the fsync of the control file because the full data folder gets
> fsync'd afterwards and in the event of a crash in the middle of a
> rewind the target data folder is surely not something to use, but we
> do for pg_checksums, and we do for pg_resetwal.  Even if there is the
> argument that usually callers of update_controlfile() would care a
> lot about the control file and fsync it, I think that we need some
> control on if we do the fsync or not because many tools have a
> --no-sync and that should be fully respected.

> So while your patch is on a good track, I would suggest to do the
> following things to complete it:

> - Add an extra argument bits16 to update_controlfile to pass a set of
> optional flags, with NOSYNC being the only and current value.  The
> default is to flush the file.

Hmmm. I just did that, but what about just a boolean? What other options
could be required? Maybe some locking/checking?

> - Move the wait event calls WAIT_EVENT_CONTROL_FILE_WRITE_UPDATE and
> WAIT_EVENT_CONTROL_FILE_SYNC_UPDATE to controldata_utils.c.

Done.

> - And then delete UpdateControlFile() in xlog.c, and use
> update_controlfile() instead to remove even more code.

I was keeping that one for another patch because it touches the backend
code, but it makes sense to do that in one go for consistency.

I kept the initial no-parameter function which calls the new one with 4
parameters, though, because it looks more homogeneous this way in the
backend code. This is debatable.

> The version in xlog.c uses BasicOpenFile(), so we should use also that
> in update_controlfile() instead of OpenTransientFile(). As any errors
> result in a PANIC we don't care about leaking fds.

Done.

Attached is an update.

--
Fabien.

controlfile-update-2.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I could remove the two "catalog/" includes from pg_resetwal, I assume that
> you meant these ones.

Not exactly.  What I meant is that if you try to call directly
fsync_fname and fsync_parent_path from file_utils.h, then you get into
trouble because of xlog.h..  Sure you can remove also the ones you
removed.

> Hmmm. I just did that, but what about just a boolean? What other options
> could be required? Maybe some locking/checking?

It is already expected from the caller to properly take
ControlFileLock.  Note I tend to worry too much about the
extensibility of published APIs these days as well, so perhaps just a
boolean would be fine, please let me reconsider that after some sleep,
and it is not like the contents of this routine are going to become
much complicated either, except potentially to control the flags on
open(). :p

> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

True, this actually makes back-patching a bit easier, and there are 13
calls of UpdateControlFile().

> Attached is an update.

Thanks, I'll take a look at that tomorrow.  You have one error at the
end of update_controlfile(), where close() could issue a frontend-like
error for the backend, calling exit() on the way.  That's not good.
(No need to send a new patch, I'll fix it myself.)
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3

> You have one error at the end of update_controlfile(), where close()
> could issue a frontend-like error for the backend, calling exit() on the
> way.  That's not good. (No need to send a new patch, I'll fix it
> myself.)

Indeed. I meant to merge the "if (close(fd))", but ended merging the error
generation as well.

--
Fabien

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
In reply to this post by Fabien COELHO-3
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote:
> I kept the initial no-parameter function which calls the new one with 4
> parameters, though, because it looks more homogeneous this way in the
> backend code. This is debatable.

From a compatibility point of view, your position actually makes
sense, at least to me and after sleeping on it as UpdateControlFile is
not static, and that there are interactions with the other local
routines to read and write the control file because of the variable
ControlFile at the top of xlog.c.  So I have kept the original
interface, being now only a wrapper of the new routine.

> Attached is an update.

Thanks, I have committed the patch after fixing a couple of things.
After considering the interface, I have switched to a single boolean
as I could not actually imagine with what kind of fancy features this
could be extended further more.  If I am wrong, let's adjust it later
on.  Here are my notes about the fixes:
- pg_resetwal got broken because the path to the control file was
incorrect.  Running tests of pg_upgrade or the TAP tests of
pg_resetwal showed the failure.
- The previously-mentioned problem with close() in the new routine is
fixed.
- Header comments at the top of update_controlfile were a bit messed
up (s/Not/Note/, missed an "up" as well).
- pg_rewind was issuing a flush of the control file even if --no-sync
was used.
- Nit: incorrect header order in controldata_utils.c.  I have kept the
backend-only includes grouped though.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3

Bonjour Michaël,

> Here are my notes about the fixes:

Thanks for the fixes.

> - pg_resetwal got broken because the path to the control file was
> incorrect.  Running tests of pg_upgrade or the TAP tests of
> pg_resetwal showed the failure.

Hmmm… I thought I had done that with "make check-world":-(

> - pg_rewind was issuing a flush of the control file even if --no-sync
> was used.

Indeed, I missed this one.

> - Nit: incorrect header order in controldata_utils.c.  I have kept the
> backend-only includes grouped though.

I'll pay attention to that the next time.

Thanks for the push.

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
In reply to this post by Michael Banck-2
On Fri, Mar 15, 2019 at 01:37:27PM +0100, Michael Banck wrote:
> Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
>> Perhaps having them under --verbose makes more sense?
>
> Well if we think it is essential in order to tell the user what happened
> in the case of an error, it shouldn't be verbose I guess.

I would still keep them to be honest.  I don't know, if others find
the tool too chatty we could always rework that part and tune it.

Please find attached an updated patch set, I have rebased that stuff
on top of my recent commits to refactor the control file updates.
While reviewing, I have found a problem in the docs (forgot a <para>
markup previously), and there was a problem with the parent path fsync
causing an interruption to not return the correct error code, and
actually we should just use durable_rename() in this case (if
--no-sync gets in then pg_mv_file() should be used of course).

I have also been thinking about what we could add in the
documentation, so this version adds a draft to describe the cases
where enabling checksums can lead to corruption when involving
multiple nodes in a cluster and tools doing physical copy of relation
blocks.

I have not done the --no-sync part yet on purpose, as that will most
likely conflict based on the feedback received for this version..
--
Michael

0001-Add-options-to-enable-and-disable-checksums-in-pg_ch.patch (24K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3

Bonjour Michaël,

> Please find attached an updated patch set, I have rebased that stuff
> on top of my recent commits to refactor the control file updates.

Patch applies cleanly, compiles, make check-world seems ok, doc build ok.

It would help if the patch includes a version number. I assume that this
is v7.

Doc looks ok.

Moving the controlfile looks like an effective way to prevent any
concurrent start, as the fs operation is probably atomic and especially if
external tools uses the same trick. However this is not the case yet, eg
"pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method
could be unified, possibly with some functions in "controlfile_utils.c".

However, I think that there still is a race condition because of the order
in which it is implemented:

  pg_checksums reads control file
  pg_checksums checks control file contents...
  ** cluster may be started and the control file updated
  pg_checksums moves the (updated) control file
  pg_checksums proceeds on a running cluster
  pg_checksums moves back the control file
  pg_checksums updates the control file contents, overriding updates

I think that the correct way to handle this for enable/disable is:

  pg_checksums moves the control file
  pg_checksums reads, checks, proceeds, updates
  pg_checksums moves back the control file

This probably means extending a little bit the update_controlfile function
to allow a suffix. No big deal.

Ok, this might not work, because of the following, less likely, race
condition:

  postmaster opens control file RW
  pg_checksums moves control file, posmater open file handle follows
  ...

So ISTM that we really need some locking to have something clean.

Why not always use "pgrename" instead of the strange pg_mv_file macro?

Help line about --check not simplified as suggested in a prior review,
although you said you would take it into account.

Tests look ok.

--
Fabien.
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Paquier-2
On Tue, Mar 19, 2019 at 11:48:25AM +0100, Fabien COELHO wrote:
> Moving the controlfile looks like an effective way to prevent any concurrent
> start, as the fs operation is probably atomic and especially if external
> tools uses the same trick. However this is not the case yet, eg
> "pg_resetwal" uses a "postmaster.pid" hack instead.

pg_upgrade does so.  Note that pg_resetwal does not check either that
the PID in the file is actually running.

> Probably the method could be unified, possibly with some functions
> in "controlfile_utils.c".

Hm.  postmaster.pid is just here to make sure that the instance is not
started at all, while we require the instance to be stopped cleanly
with other tools, so that's not really consistent in my opinion to
combine both.

> Ok, this might not work, because of the following, less likely, race
> condition:
>
>  postmaster opens control file RW
>  pg_checksums moves control file, postmater open file handle follows
>  ...
>
> So ISTM that we really need some locking to have something clean.

We are talking about complicating a method which is already fine for a
a window where the whole operation works, as it could take hours to
enable checksums, versus a couple of instructions.  I am not sure that
it is worth complicating the code more.

> Help line about --check not simplified as suggested in a prior review,
> although you said you would take it into account.

Oops, it looks like this got lost because of the successive rebases.
I am sure to have updated it at some point..  Anyway, thanks for
pointing it out, I got that fixed on my local branch.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
In reply to this post by Fabien COELHO-3
Hi,

Am Dienstag, den 19.03.2019, 11:48 +0100 schrieb Fabien COELHO:

> Moving the controlfile looks like an effective way to prevent any
> concurrent start, as the fs operation is probably atomic and especially if
> external tools uses the same trick. However this is not the case yet, eg
> "pg_resetwal" uses a "postmaster.pid" hack instead. Probably the method
> could be unified, possibly with some functions in "controlfile_utils.c".
>
> However, I think that there still is a race condition because of the order
> in which it is implemented:
>
>   pg_checksums reads control file
>   pg_checksums checks control file contents...
>   ** cluster may be started and the control file updated
>   pg_checksums moves the (updated) control file
>   pg_checksums proceeds on a running cluster
>   pg_checksums moves back the control file
>   pg_checksums updates the control file contents, overriding updates
>
> I think that the correct way to handle this for enable/disable is:
>
>   pg_checksums moves the control file
>   pg_checksums reads, checks, proceeds, updates
>   pg_checksums moves back the control file
>
> This probably means extending a little bit the update_controlfile function
> to allow a suffix. No big deal.
>
> Ok, this might not work, because of the following, less likely, race
> condition:
>
>   postmaster opens control file RW
>   pg_checksums moves control file, posmater open file handle follows
>   ...
>
> So ISTM that we really need some locking to have something clean.

I think starting the postmaster during offline maintenance is already
quite some pilot error. As pg_checksums can potentially run for hours
though, I agree it is important to disable the cluster in the meantime.

There's really not a lot going on between pg_checksums reading the
control file and moving it away - what you propose above sounds a bit
like overengineering to me.

If anything, we could include the postmaster.pid check from pg_resetwal
after we have renamed the control file to make absolutely sure that the
cluster is offline. Once the control file is gone and there is no
postmaster.pid, it surely cannot be pg_checksums' problem anymore if a
postmaster is started regardless of maintenance.

I leave that to Michael to decide whether he thinks the above is
warranted.

I think the more important open issue is what to do about PITR and
streaming replication, see my replies to Magnus elsewhere in the thread.

> Why not always use "pgrename" instead of the strange pg_mv_file macro?

pg_ugprade does it the same way, possibly both could be converted to
pgrename, dunno.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Fabien COELHO-3
In reply to this post by Michael Paquier-2

>> Ok, this might not work, because of the following, less likely, race
>> condition:
>>
>>  postmaster opens control file RW
>>  pg_checksums moves control file, postmater open file handle follows
>>  ...
>>
>> So ISTM that we really need some locking to have something clean.
>
> We are talking about complicating a method which is already fine for a
> a window where the whole operation works, as it could take hours to
> enable checksums, versus a couple of instructions.  I am not sure that
> it is worth complicating the code more.

Hmmm. Possibly. The point is that anything only needs to be implemented
once. The whole point of pg is to have ACID transactional properties, but
it does not achieve that on the controlfile, which I find paradoxical:-)

Now if there is a race condition opportunity, ISTM that it should be as
short as possible. Renaming before manipulating seems safer if other
commands proceeds like that as well. Probably if pg always rename *THEN*
open before doing anything in all commands it could be safe, provided that
the renaming is atomic, which I think is the case.

That would avoid locking, at the price of a small probability of having a
controlfile in a controlfile.command-that-failed-at-the-wrong-time state.
Maybe it is okay. Maybe there is a need to be able to force the
state back to something to recover from such unlikely event, but probably
it does already exists (eg postmaster could be dead without releasing the
controlfile state).

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Andres Freund
In reply to this post by Michael Paquier-2
Hi,

On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> +/*
> + * Locations of persistent and temporary control files.  The control
> + * file gets renamed into a temporary location when enabling checksums
> + * to prevent a parallel startup of Postgres.
> + */
> +#define CONTROL_FILE_PATH "global/pg_control"
> +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"

I think this should be outright rejected. Again, you're making the
control file into something it isn't. And there's no buyin for this as
far as I can tell outside of Fabien and you. For crying out loud, if the
server crashes during this YOU'VE CORRUPTED THE CLUSTER.

- Andres

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
Hi,

Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:

> On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > +/*
> > + * Locations of persistent and temporary control files.  The control
> > + * file gets renamed into a temporary location when enabling checksums
> > + * to prevent a parallel startup of Postgres.
> > + */
> > +#define CONTROL_FILE_PATH "global/pg_control"
> > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
>
> I think this should be outright rejected. Again, you're making the
> control file into something it isn't. And there's no buyin for this as
> far as I can tell outside of Fabien and you. For crying out loud, if the
> server crashes during this YOU'VE CORRUPTED THE CLUSTER.

The cluster is supposed to be offline during this.  This is just an
additional precaution so that nobody starts it during the operation -
similar to how pg_upgrade disables the old data directory.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Andres Freund
On 2019-03-19 16:55:12 +0100, Michael Banck wrote:

> Hi,
>
> Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > +/*
> > > + * Locations of persistent and temporary control files.  The control
> > > + * file gets renamed into a temporary location when enabling checksums
> > > + * to prevent a parallel startup of Postgres.
> > > + */
> > > +#define CONTROL_FILE_PATH "global/pg_control"
> > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
> >
> > I think this should be outright rejected. Again, you're making the
> > control file into something it isn't. And there's no buyin for this as
> > far as I can tell outside of Fabien and you. For crying out loud, if the
> > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
>
> The cluster is supposed to be offline during this.  This is just an
> additional precaution so that nobody starts it during the operation -
> similar to how pg_upgrade disables the old data directory.

I don't see how that matters. Afterwards the cluster needs low level
surgery to be recovered. That's a) undocumented b) likely to be done
wrongly.  This is completely unacceptable *AND UNNECESSARY*.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
Hi,

Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:

> On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > +/*
> > > > + * Locations of persistent and temporary control files.  The control
> > > > + * file gets renamed into a temporary location when enabling checksums
> > > > + * to prevent a parallel startup of Postgres.
> > > > + */
> > > > +#define CONTROL_FILE_PATH "global/pg_control"
> > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > >
> > > I think this should be outright rejected. Again, you're making the
> > > control file into something it isn't. And there's no buyin for this as
> > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> >
> > The cluster is supposed to be offline during this.  This is just an
> > additional precaution so that nobody starts it during the operation -
> > similar to how pg_upgrade disables the old data directory.
>
> I don't see how that matters. Afterwards the cluster needs low level
> surgery to be recovered. That's a) undocumented b) likely to be done
> wrongly.  This is completely unacceptable *AND UNNECESSARY*.

Can you explain why low level surgery is needed and how that would look
like?

If pg_checksums successfully enables checksums, it will move back the
control file and update the checksum version - the cluster is ready to
be started again unless I am missing something?

If pg_checksums is interrupted by the admin, it will move back the
control file and the cluster is ready to be started again as well.

If pg_checksums aborts with a failure, the admin will have to move back
the control file before starting up the instance again, but I don't
think that counts?

If pg_checksums crashes due to I/O failures or other causes I can see
how possibly the block it was currently writing might need low level
surgery, but in that case we are in the domain of forensics already I
guess and that still does not pertain to the control file?

Sorry for being obtuse, I don't get it.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Andres Freund
Hi,

On 2019-03-19 17:08:17 +0100, Michael Banck wrote:

> Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > +/*
> > > > > + * Locations of persistent and temporary control files.  The control
> > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > + * to prevent a parallel startup of Postgres.
> > > > > + */
> > > > > +#define CONTROL_FILE_PATH "global/pg_control"
> > > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > >
> > > > I think this should be outright rejected. Again, you're making the
> > > > control file into something it isn't. And there's no buyin for this as
> > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > >
> > > The cluster is supposed to be offline during this.  This is just an
> > > additional precaution so that nobody starts it during the operation -
> > > similar to how pg_upgrade disables the old data directory.
> >
> > I don't see how that matters. Afterwards the cluster needs low level
> > surgery to be recovered. That's a) undocumented b) likely to be done
> > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
>
> Can you explain why low level surgery is needed and how that would look
> like?
>
> If pg_checksums successfully enables checksums, it will move back the
> control file and update the checksum version - the cluster is ready to
> be started again unless I am missing something?
>
> If pg_checksums is interrupted by the admin, it will move back the
> control file and the cluster is ready to be started again as well.
>
> If pg_checksums aborts with a failure, the admin will have to move back
> the control file before starting up the instance again, but I don't
> think that counts?

That absolutely counts. Even a short period would imo be unacceptable,
but this will take *hours* in many clusters. It's completely possible
that the machine crashes while the enabling is in progress.

And after restarting postgres or even pg_checksums, you'll just get a
message that there's no control file. How on earth is a normal user
supposed to recover from that?  Now, you could have a check for the
control file under the temporary name, and emit a hint about renaming,
but that has its own angers (like people renaming it just to start
postgres).

And you're basically adding it because Fabien doesn't like
postmaster.pid and wants to invent another lockout mechanism in this
thread.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Michael Banck-2
Hi,

Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:

> On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > +/*
> > > > > > + * Locations of persistent and temporary control files.  The control
> > > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > + */
> > > > > > +#define CONTROL_FILE_PATH "global/pg_control"
> > > > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > > >
> > > > > I think this should be outright rejected. Again, you're making the
> > > > > control file into something it isn't. And there's no buyin for this as
> > > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > >
> > > > The cluster is supposed to be offline during this.  This is just an
> > > > additional precaution so that nobody starts it during the operation -
> > > > similar to how pg_upgrade disables the old data directory.
> > >
> > > I don't see how that matters. Afterwards the cluster needs low level
> > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> >
> > Can you explain why low level surgery is needed and how that would look
> > like?
> >
> > If pg_checksums successfully enables checksums, it will move back the
> > control file and update the checksum version - the cluster is ready to
> > be started again unless I am missing something?
> >
> > If pg_checksums is interrupted by the admin, it will move back the
> > control file and the cluster is ready to be started again as well.
> >
> > If pg_checksums aborts with a failure, the admin will have to move back
> > the control file before starting up the instance again, but I don't
> > think that counts?
>
> That absolutely counts. Even a short period would imo be unacceptable,
> but this will take *hours* in many clusters. It's completely possible
> that the machine crashes while the enabling is in progress.
>
> And after restarting postgres or even pg_checksums, you'll just get a
> message that there's no control file. How on earth is a normal user
> supposed to recover from that?  Now, you could have a check for the
> control file under the temporary name, and emit a hint about renaming,
> but that has its own angers (like people renaming it just to start
> postgres).

Ok, thanks for explaining. 

I guess if we check for the temporary name in postmaster during startup
if pg_control isn't there then a more generally useful name like
"pg_control.maintenance" should be picked. We could then spit out a nice
error message or hint like "the cluster has been disabled for
maintenance. In order to start it up anyway, rename
pg_control.maintenance to pg_control" or so.

In any case, this would be more of a operational or availability issue
and not a data-loss issue, as I feared from your previous mails.

> And you're basically adding it because Fabien doesn't like
> postmaster.pid and wants to invent another lockout mechanism in this
> thread.

I think the hazard of another DBA (or some automated configuration
management or HA tool for that matter) looking at postmaster.pid,
deciding that it is not a legit file from a running instance, deleting
it and then starting up Postgres while pg_checksums is still at work is
worse than the above scenario, but maybe if we make the content of
postmaster.pid clear enough (like "maintenance in progress"?) it would
be enough of a hint? Or do you have concrete suggestions on how this
should work?

I had the feeling (ab)using postmaster.pid for this would fly less than
using the same scheme as pg_upgrade does, but I'm fine doing it either
way.


Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: [hidden email]

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Reply | Threaded
Open this post in threaded view
|

Re: Offline enabling/disabling of data checksums

Andres Freund
Hi,

On 2019-03-19 17:30:16 +0100, Michael Banck wrote:

> Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:
> > On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > > +/*
> > > > > > > + * Locations of persistent and temporary control files.  The control
> > > > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > > + */
> > > > > > > +#define CONTROL_FILE_PATH "global/pg_control"
> > > > > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > > > >
> > > > > > I think this should be outright rejected. Again, you're making the
> > > > > > control file into something it isn't. And there's no buyin for this as
> > > > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > > >
> > > > > The cluster is supposed to be offline during this.  This is just an
> > > > > additional precaution so that nobody starts it during the operation -
> > > > > similar to how pg_upgrade disables the old data directory.
> > > >
> > > > I don't see how that matters. Afterwards the cluster needs low level
> > > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > > wrongly.  This is completely unacceptable *AND UNNECESSARY*.
> > >
> > > Can you explain why low level surgery is needed and how that would look
> > > like?
> > >
> > > If pg_checksums successfully enables checksums, it will move back the
> > > control file and update the checksum version - the cluster is ready to
> > > be started again unless I am missing something?
> > >
> > > If pg_checksums is interrupted by the admin, it will move back the
> > > control file and the cluster is ready to be started again as well.
> > >
> > > If pg_checksums aborts with a failure, the admin will have to move back
> > > the control file before starting up the instance again, but I don't
> > > think that counts?
> >
> > That absolutely counts. Even a short period would imo be unacceptable,
> > but this will take *hours* in many clusters. It's completely possible
> > that the machine crashes while the enabling is in progress.
> >
> > And after restarting postgres or even pg_checksums, you'll just get a
> > message that there's no control file. How on earth is a normal user
> > supposed to recover from that?  Now, you could have a check for the
> > control file under the temporary name, and emit a hint about renaming,
> > but that has its own angers (like people renaming it just to start
> > postgres).
>
> Ok, thanks for explaining. 
>
> I guess if we check for the temporary name in postmaster during startup
> if pg_control isn't there then a more generally useful name like
> "pg_control.maintenance" should be picked. We could then spit out a nice
> error message or hint like "the cluster has been disabled for
> maintenance. In order to start it up anyway, rename
> pg_control.maintenance to pg_control" or so.

To be very clear: I am going to try to stop any patch with this
mechanism from going into the tree. I think it's an absurdly bad
idea. There'd need to be significant support from a number of other
committers to convince me otherwise.


> In any case, this would be more of a operational or availability issue
> and not a data-loss issue, as I feared from your previous mails.

It's just about undistinguishable for normal users.


> > And you're basically adding it because Fabien doesn't like
> > postmaster.pid and wants to invent another lockout mechanism in this
> > thread.
>
> I think the hazard of another DBA (or some automated configuration
> management or HA tool for that matter) looking at postmaster.pid,
> deciding that it is not a legit file from a running instance, deleting
> it and then starting up Postgres while pg_checksums is still at work is
> worse than the above scenario, but maybe if we make the content of
> postmaster.pid clear enough (like "maintenance in progress"?) it would
> be enough of a hint?

Err, how would such a tool decide to do that safely? And if it did so,
how would it not cause problems in postgres' normal operation, given
that that postmaster.pid is crucial to prevent two postgres instances
running at the same time?


> Or do you have concrete suggestions on how this should work?

create a postmaster.pid with the pid of pg_checksums. That'll trigger a
postgres start from checking whether that pid is still alive. There'd
probably need to be a tiny change to CreateLockFile() to prevent it from
checking whether any shared memory is connected.  FWIW, it'd probably
actually be good if pg_checksums (and some other tools), did most if not
all the checks in CreateLockFile().

I'm not sure it needs to be this patch's responsibility to come up with
a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
don't really have a lockout mechanism, and it hasn't caused a ton of
problems. I think it'd be good to invent something better, but it can't
be some half assed approach that'll lead to people think their database
is gone.


> I had the feeling (ab)using postmaster.pid for this would fly less than
> using the same scheme as pg_upgrade does, but I'm fine doing it either
> way.

I don't think pg_upgrade is a valid comparison, given that the goal
there is to permanently disable the cluster. It also emits a hint about
it. And only does so at the end of a run.

Greetings,

Andres Freund

1 ... 345678