Re: [HACKERS] Infrastructure changes for recovery

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Thu, 2008-08-07 at 12:44 +0100, Simon Riggs wrote:

> I would like to propose some changes to the infrastructure for recovery.
> These changes are beneficial in themselves, but also form the basis for
> other work we might later contemplate.
>
> Currently
> * the startup process performs restartpoints during recovery
> * the death of the startup process is tied directly to the change of
> state in the postmaster following recovery
>
> I propose to
> * have startup process signal postmaster when it starts Redo phase (if
> it starts it)

> Decoupling things in this way allows us to
> 1. arrange for the bgwriter to start during Redo, so it can:
> i) clean dirty blocks for the startup process
> ii) perform restartpoints in background
>   Both of these aspects will increase performance of recovery

Taking into account comments from Tom and Alvaro

Included patch with the following changes:

* new postmaster mode known as consistent recovery, entered only when
recovery passes safe/consistent point. InRedo is now set in all
processes when started/connected in consistent recovery mode.

* bgwriter and stats process starts in consistent recovery mode.
bgwriter changes mode when startup process completes.

* bgwriter now performs restartpoints and also cleans shared_buffers
while the startup process performs redo apply

* recovery.conf parameter log_restartpoints is now deprecated, since
function overlaps with log_checkpoints too much. I've kept the
distinction between restartpoints and checkpoints in code, to avoid
convoluted code. Minor change, not critical.

[Replying to one of Alvaro's other comments: Startup process still uses
XLogReadBuffer. I'm not planning on changing that either, at least not
in this patch.]

Patch doesn't conflict with rmgr plugin patch.

Passes make check, but that's easy.
Various other tests all seem to be working.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

recovery_infrastruc.v2.patch (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> Included patch with the following changes:

> * new postmaster mode known as consistent recovery, entered only when
> recovery passes safe/consistent point. InRedo is now set in all
> processes when started/connected in consistent recovery mode.

I looked at this and didn't like the InRedo signaling at all.  In the
first place, it looks like you're expecting the flag to be passed down
via fork(), but that won't work in EXEC_BACKEND mode.  (Yes, easily
fixed, but it's wrong as-is.)  In the second place, the method of
signaling it to the bgwriter looks to have race conditions: in
principle, at least, I think the startup process could try to clear
the shared-memory InRedo flag before the bgwriter has gotten as far as
setting it.  (This might work if the startup process can't exit redo
mode without the bgwriter having finished a checkpoint, but such a
dependency is too rube goldbergian for me.)

ISTM that it would probably be better if there were exactly one InRedo
flag in shared memory, probably in xlog.c's shared state, with the
postmaster not being responsible for setting or clearing it; rather
the startup process should do those things.

> * bgwriter and stats process starts in consistent recovery mode.
> bgwriter changes mode when startup process completes.

I'm not sure about the interaction of this.  In particular, what about
recovery restart points before we have reached the safe stop point?
I don't think we want to give up the capability of having those.

Also, it seems pretty bogus to update the in-memory ControlFile
checkpoint values before the restart point is actually done.  It looks
to me like what you have done is to try to use those fields as signaling
for the restart request in addition to their existing purposes, which
I think is confusing and probably dangerous.  I'd rather there were a
different signaling path and ControlFile maintains its currrent
definition.

I found the changes in bgwriter.c unreadable.  You've evidently
moved blocks of code around, but exactly what did you mean to
change?  Why is there so much duplicated code now?

> I've kept the distinction between restartpoints and checkpoints in
> code, to avoid convoluted code.

Hmm, I dunno, it seems like that might be a bad choice.  Are you sure
it's not cleaner to just use the regular checkpoint code?

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:

> Simon Riggs <[hidden email]> writes:
> > Included patch with the following changes:
>
> > * new postmaster mode known as consistent recovery, entered only when
> > recovery passes safe/consistent point. InRedo is now set in all
> > processes when started/connected in consistent recovery mode.
>
> I looked at this and didn't like the InRedo signaling at all.  In the
> first place, it looks like you're expecting the flag to be passed down
> via fork(), but that won't work in EXEC_BACKEND mode.  (Yes, easily
> fixed, but it's wrong as-is.)

OK, thanks. I didn't check that.

>  In the second place, the method of
> signaling it to the bgwriter looks to have race conditions: in
> principle, at least, I think the startup process could try to clear
> the shared-memory InRedo flag before the bgwriter has gotten as far as
> setting it.  (This might work if the startup process can't exit redo
> mode without the bgwriter having finished a checkpoint, but such a
> dependency is too rube goldbergian for me.)

We never interrupt what the bgwriter does. If it is in the middle of
doing a checkpoint when recovery finishes, then Startup process just
waits behind it and then issues a shutdown checkpoint. If there's loads
of blocks to be written it doesn't matter who writes them. There's only
one soup spoon, and it doesn't matter who stirs the soup.

> ISTM that it would probably be better if there were exactly one InRedo
> flag in shared memory, probably in xlog.c's shared state, with the
> postmaster not being responsible for setting or clearing it; rather
> the startup process should do those things.

So replace InRedo with an IsInRedo() function that accesses XLogCtl?

Sounds good

> > * bgwriter and stats process starts in consistent recovery mode.
> > bgwriter changes mode when startup process completes.
>
> I'm not sure about the interaction of this.  In particular, what about
> recovery restart points before we have reached the safe stop point?
> I don't think we want to give up the capability of having those.

Maybe. I felt it made the code cleaner to give that up. Realistically
its a fairly short window of time. But shouldn't be too hard to put
back.

> Also, it seems pretty bogus to update the in-memory ControlFile
> checkpoint values before the restart point is actually done.  It looks
> to me like what you have done is to try to use those fields as signaling
> for the restart request in addition to their existing purposes, which
> I think is confusing and probably dangerous.  I'd rather there were a
> different signaling path and ControlFile maintains its currrent
> definition.

OK

> I found the changes in bgwriter.c unreadable.  You've evidently
> moved blocks of code around, but exactly what did you mean to
> change?  Why is there so much duplicated code now?

The patch utility did that. Some code reuse confused it. It's real easy
to read though if you apply the patch and then read the main loop.
You'll see what I mean.

> > I've kept the distinction between restartpoints and checkpoints in
> > code, to avoid convoluted code.
>
> Hmm, I dunno, it seems like that might be a bad choice.  Are you sure
> it's not cleaner to just use the regular checkpoint code?

When I tried to write it, it just looked to my eyes like every single
line had a caveat which looked ugly and multiplied the testing. You're
the code dude, always happy to structure things as you suggest. If
you're sure, that is.

Thanks for the review.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:
>> Hmm, I dunno, it seems like that might be a bad choice.  Are you sure
>> it's not cleaner to just use the regular checkpoint code?

> When I tried to write it, it just looked to my eyes like every single
> line had a caveat which looked ugly and multiplied the testing. You're
> the code dude, always happy to structure things as you suggest. If
> you're sure, that is.

No, was just wondering if the other way would be better.  If you're
sure not, that's fine.

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs
In reply to this post by Tom Lane-2

On Mon, 2008-09-08 at 13:34 -0400, Tom Lane wrote:

> ISTM that it would probably be better if there were exactly one InRedo
> flag in shared memory, probably in xlog.c's shared state, with the
> postmaster not being responsible for setting or clearing it; rather
> the startup process should do those things.

Done

> > * bgwriter and stats process starts in consistent recovery mode.
> > bgwriter changes mode when startup process completes.
>
> I'm not sure about the interaction of this.  In particular, what about
> recovery restart points before we have reached the safe stop point?
> I don't think we want to give up the capability of having those.
>
> Also, it seems pretty bogus to update the in-memory ControlFile
> checkpoint values before the restart point is actually done.  It looks
> to me like what you have done is to try to use those fields as signaling
> for the restart request in addition to their existing purposes, which
> I think is confusing and probably dangerous.  I'd rather there were a
> different signaling path and ControlFile maintains its currrent
> definition.
Done


Testing takes a while on this, I probably won't complete it until
Friday. So enclosed patch is for eyeballs only at this stage.

I added in the XLogCtl padding we've discussed before, while I'm there.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

recovery_infrastruc.v3.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Alvaro Herrera-7
Simon Riggs wrote:

> --- 5716,5725 ----
>   CheckpointStats.ckpt_sync_end_t,
>   &sync_secs, &sync_usecs);
>  
> ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
>   "%d transaction log file(s) added, %d removed, %d recycled; "
>   "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
> + (checkpoint ? "  checkpoint" : "restartpoint"),
>   CheckpointStats.ckpt_bufs_written,
>   (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
>   CheckpointStats.ckpt_segs_added,

Very minor nit: this really needs a rework.  It is relatively OK in the
previous code, but it was already stuffing too much in a single message.
Maybe

ereport(LOG,
(errmsg(checkpoint ? "checkpoint complete" : "restartpoint complete"),
errdetail("Wrote %d buffers (%.1f%%); "
        "%d transaction log file(s) added, %d removed, %d recycled; "
        "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s.",
        ...
)))

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Fri, 2008-09-12 at 14:14 -0400, Alvaro Herrera wrote:

> Simon Riggs wrote:
>
> > --- 5716,5725 ----
> >   CheckpointStats.ckpt_sync_end_t,
> >   &sync_secs, &sync_usecs);
> >  
> > ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
> >   "%d transaction log file(s) added, %d removed, %d recycled; "
> >   "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
> > + (checkpoint ? "  checkpoint" : "restartpoint"),
> >   CheckpointStats.ckpt_bufs_written,
> >   (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> >   CheckpointStats.ckpt_segs_added,
>
> Very minor nit: this really needs a rework.  

All I changed was the word "restartpoint"... its otherwise identical to
existing message. I'd rather not change that.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Alvaro Herrera-7
Simon Riggs wrote:

>
> On Fri, 2008-09-12 at 14:14 -0400, Alvaro Herrera wrote:
> > Simon Riggs wrote:
> >
> > > --- 5716,5725 ----
> > >   CheckpointStats.ckpt_sync_end_t,
> > >   &sync_secs, &sync_usecs);
> > >  
> > > ! elog(LOG, "%s complete: wrote %d buffers (%.1f%%); "
> > >   "%d transaction log file(s) added, %d removed, %d recycled; "
> > >   "write=%ld.%03d s, sync=%ld.%03d s, total=%ld.%03d s",
> > > + (checkpoint ? "  checkpoint" : "restartpoint"),
> > >   CheckpointStats.ckpt_bufs_written,
> > >   (double) CheckpointStats.ckpt_bufs_written * 100 / NBuffers,
> > >   CheckpointStats.ckpt_segs_added,
> >
> > Very minor nit: this really needs a rework.  
>
> All I changed was the word "restartpoint"... its otherwise identical to
> existing message. I'd rather not change that.

The new message is not translatable, the original was.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Fri, 2008-09-12 at 15:31 -0400, Alvaro Herrera wrote:
> Simon Riggs wrote:
> >
> > All I changed was the word "restartpoint"... its otherwise identical to
> > existing message. I'd rather not change that.
>
> The new message is not translatable, the original was.

OK, perhaps the word "restartpoint" is redundant anyhow. Anybody looking
at the log can see we're in recovery. So I'll switch it back to say just
checkpoint whether we do restartpoint or checkpoints.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Tom Lane-2
In reply to this post by Simon Riggs
Simon Riggs <[hidden email]> wrote:
> Testing takes a while on this, I probably won't complete it until
> Friday. So enclosed patch is for eyeballs only at this stage.

What's the status on that patch?

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Tue, 2008-09-16 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs <[hidden email]> wrote:
> > Testing takes a while on this, I probably won't complete it until
> > Friday. So enclosed patch is for eyeballs only at this stage.
>
> What's the status on that patch?

Checking EXEC_BACKEND case and that reporting back later today.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs
In reply to this post by Tom Lane-2

On Tue, 2008-09-16 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs <[hidden email]> wrote:
> > Testing takes a while on this, I probably won't complete it until
> > Friday. So enclosed patch is for eyeballs only at this stage.
>
> What's the status on that patch?

Having some trouble trying to get a clean state change from recovery to
normal mode.

Startup needs to be able to write WAL at the end of recovery so it can
write a ShutdownCheckpoint, yet must not be allowed to write WAL before
that. Other services are still not fully up yet. So every other process
apart from Startup musn't write WAL until Startup has fully completed
its actions, which is sometime later.

bgwriter needs to know about the impending state change so it can finish
off any checkpoint its currently working on. But then can't start doing
normal processing yet either. So it must have a third state that is
between recovery and normal processing. When normal processing is
reached, it *must* write WAL immediately from that point onwards, yet
using the new timeline that startup decides upon.

So the idea of a single database-wide boolean state seems impossible. We
need a local canInsertWAL flag that is set at different times in
different processes.

Global states changes are now

not started
started - postmaster process then startup process
safestoppingpoint - bgwriter starts
startup_does_shutdown_checkpoint - bgwriter finishes up, just waits,
   startup is now allowed to insert WAL for checkpoint record
startup completes StartupXLog - bgwriter begins normal processing
startup exits - postmaster in normal state

We *might* solve this by making the final checkpoint that Startup writes
into an online checkpoint. That would then allow a straight and safe
transition for bgwriter from just one state to the other. But that
leaves some other ugliness that I don't want to deal with, 'cos there's
other fish frying.

Feels like I should shutdown the bgwriter after recovery and then allow
it to be cranked up again after normal processing starts, and do all of
this through postmaster state changes. That way bgwriter doesn't need to
do a dynamic state change.

Comments anyone?

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Thu, 2008-09-18 at 09:05 +0100, Simon Riggs wrote:

> Feels like I should shutdown the bgwriter after recovery and then
> allow it to be cranked up again after normal processing starts, and do
> all of this through postmaster state changes. That way bgwriter
> doesn't need to do a dynamic state change.

This approach appears to be working nicely so far. Some ugly bits of
former patch removed.

Patch passes basic tests and changes state cleanly.

Restarting test cycle on this patch now, confirm tomorrow.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

recovery_infrastruc.v5.patch (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Tom Lane-2
In reply to this post by Simon Riggs
Simon Riggs <[hidden email]> writes:
> Having some trouble trying to get a clean state change from recovery to
> normal mode.

> Startup needs to be able to write WAL at the end of recovery so it can
> write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> that. Other services are still not fully up yet. So every other process
> apart from Startup musn't write WAL until Startup has fully completed
> its actions, which is sometime later.
> ...
> We *might* solve this by making the final checkpoint that Startup writes
> into an online checkpoint.

Do we really need a checkpoint there at all?  I can't recall right now
if there was a good reason why Vadim made it do that.  I have a feeling
that it might have had to do with an assumption that the recovery
environment was completely distinct from live environment; which is a
statement that's certainly not going to be true in a query-answering
slave.

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:

> Simon Riggs <[hidden email]> writes:
> > Having some trouble trying to get a clean state change from recovery to
> > normal mode.
>
> > Startup needs to be able to write WAL at the end of recovery so it can
> > write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> > that. Other services are still not fully up yet. So every other process
> > apart from Startup musn't write WAL until Startup has fully completed
> > its actions, which is sometime later.
> > ...
> > We *might* solve this by making the final checkpoint that Startup writes
> > into an online checkpoint.
>
> Do we really need a checkpoint there at all?  I can't recall right now
> if there was a good reason why Vadim made it do that.  I have a feeling
> that it might have had to do with an assumption that the recovery
> environment was completely distinct from live environment; which is a
> statement that's certainly not going to be true in a query-answering
> slave.

Hmm, a question I hadn't considered.

We definitely don't want to do PreallocXlogFiles().

"Timelines only change at shutdown checkpoints". But its just as easy to
write a short timeline change record rather than the checkpoint itself,
so we can sort that out.

It should be sufficient to request bgwriter to perform an immediate
checkpoint, rather than have startup process perform it. That way
startup can exit and we can change to normal processing faster.

If we crash before the next checkpoint then we would revert to archive
recovery or crash recovery. The last restartpoint might well be
somewhere else. In streaming mode we would presumably have that data
locally, so not really a problem. But we musn't mark control file in
production yet, but we can do so following the next checkpoint.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
>> Do we really need a checkpoint there at all?

> "Timelines only change at shutdown checkpoints".

Hmm.  I *think* that that is just a debugging crosscheck rather than a
critical property.  But yeah, it would take some close investigation,
which maybe isn't warranted if you have a less-invasive solution.

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <[hidden email]> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
>
> > "Timelines only change at shutdown checkpoints".
>
> Hmm.  I *think* that that is just a debugging crosscheck rather than a
> critical property.  But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

The important thing is that everybody uses the new timelineid.

AFAICS we actually write new timelineids into the WAL file if the
previous timelineid, so it can't be that risky.

Ignore v5 then and look for v6 on Monday. Hopefully the final one :-)

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs
In reply to this post by Tom Lane-2

On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <[hidden email]> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
>
> > "Timelines only change at shutdown checkpoints".
>
> Hmm.  I *think* that that is just a debugging crosscheck rather than a
> critical property.  But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

OK, new patch, version 6. Some major differences to previous patch.

* new IsRecoveryProcessingMode() in shmem
* padding in XLogCtl to ensure above call is cheap
* specific part of bgwriter shmem for passing restartpoint data
* avoid Shutdown checkpoint at end of recovery, with carefully
considered positioning of statements (beware!)
* only one new postmaster mode, PM_RECOVERY
* bgwriter changes state without stopping/starting

Modes I have tested so far
* make check
* Start, Stop
* Crash Recovery
* Archive Recovery
* Archive Recovery, switch in middle of restartpoint

Modes not yet tested
* EXEC_BACKEND

Ready for serious review prior to commit. I will be performing further
testing also.

 backend/access/transam/multixact.c |    2
 backend/access/transam/xlog.c      |  328 ++++++++++++---!!!!!!!!!!!!
 backend/postmaster/bgwriter.c      |  371 +++++---!!!!!!!!!!!!!!!!!!!!!
 backend/postmaster/postmaster.c    |   62 ++++!!
 backend/storage/buffer/README      |    5
 backend/storage/buffer/bufmgr.c    |   34 +!!
 include/access/xlog.h              |   14 !
 include/access/xlog_internal.h     |    3
 include/catalog/pg_control.h       |    2
 include/postmaster/bgwriter.h      |    2
 include/storage/bufmgr.h           |    2
 include/storage/pmsignal.h         |    1
 12 files changed, 279 insertions(+), 56 deletions(-), 491 mods(!)

There's a few subtle points along the way. I've tried to explain them
all in code comments, but questions welcome. At v6, most things are now
done a particular way for a specific reason.

Look especially at InRecovery, which is used extensively in different
parts of the code. The meaning of this has been subdivided into two
meanings, so only *some* of the places that use it have been changed.
All have been checked.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

recovery_infrastruc.v6.patch (52K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Simon Riggs

On Mon, 2008-09-22 at 23:06 +0100, Simon Riggs wrote:

> On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> > Simon Riggs <[hidden email]> writes:
> > > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> > >> Do we really need a checkpoint there at all?
> >
> > > "Timelines only change at shutdown checkpoints".
> >
> > Hmm.  I *think* that that is just a debugging crosscheck rather than a
> > critical property.  But yeah, it would take some close investigation,
> > which maybe isn't warranted if you have a less-invasive solution.
>
> OK, new patch, version 6. Some major differences to previous patch.

> Ready for serious review prior to commit. I will be performing further
> testing also.

Version 7

I've removed the concept of interrupting a restartpoint half way
through, I found a fault there. It was more ugly than the alternative
and less robust. The code now waits at the end of recovery if we are in
the middle of a restartpoint, but forces a do-it-more-quickly also. That
means we won't always get a fast start even though we skip the shutdown
checkpoint, but at least we're sure there's no chance of breakage
because of concurrent activiy, state changes etc..

I'm happy with this now.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

recovery_infrastruc.v7.patch (51K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HACKERS] Infrastructure changes for recovery

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> Version 7

After reading this for awhile, I realized that there is a rather
fundamental problem with it: it switches into "consistent recovery"
mode as soon as it's read WAL beyond ControlFile->minRecoveryPoint.
In a crash recovery situation that typically is before the last
checkpoint (if indeed it's not still zero), and what that means is
that this patch will activate the bgwriter and start letting in
backends instantaneously after a crash, long before we can have any
certainty that the DB state really is consistent.

In a normal crash recovery situation this would be easily fixed by
simply not letting it go to "consistent recovery" state at all, but
what about recovery from a restartpoint?  We don't want a slave that's
crashed once to never let backends in again.  But I don't see how to
determine that we're far enough past the restartpoint to be consistent
again.  In crash recovery we assume (without proof ;-)) that we're
consistent once we reach the end of valid-looking WAL, but that rule
doesn't help for a slave that's following a continuing WAL sequence.

Perhaps something could be done based on noting when we have to pull in
a WAL segment from the recovery_command, but it sounds like a pretty
fragile assumption.

Anyway, that's sufficiently bad that I'm bouncing the patch for
reconsideration.  Some other issues I noted before giving up:

* I'm a bit uncomfortable with the fact that the
IsRecoveryProcessingMode flag is read and written with no lock.
There's no atomicity or concurrent-write problem, sure, but on
a multiprocessor with weak memory ordering guarantees (eg PPC)
readers could get a fairly stale value of the flag.  The false
to true transition happens before anyone except the startup process is
running, so that's no problem; the net effect is then that backends
might think that recovery mode was still active for awhile after it
wasn't.  This seems a bit scary, eg in the patch as it stands that'll
disable XLogFlush calls that should have happened.  You could fix that
by grabbing/releasing some spinlock (any old one) around the accesses,
but are any of the call sites performance-critical?  The one in
XLogInsert looks like it is, if nothing else.

* I kinda think you broke XLogFlush anyway.  It's certainly clear
that the WARNING case at the bottom is unreachable with the patch,
and I think that means that you've messed up an important error
robustness behavior.  Is it still possible to get out of recovery mode
if there are any bad LSNs in the shared buffer pool?

* The use of InRecovery in CreateCheckPoint seems pretty bogus, since
that function can be called from the bgwriter in which the flag will
never be true.  Either this needs to be IsRecoveryProcessingMode(),
or it's useless because we'll never create ordinary checkpoints until
after subtrans.c is up anyway.

* The patch moves the clearing of InRecovery from after to before
StartupCLOG, RecoverPreparedTransactions, etc.  Is that really a
good idea?  It makes it hard for those modules to know if they are
coming up after a normal or recovery startup.  I think they may not
care at the moment, but I'd leave that alone without a darn good
reason to change it.

* The comment about CheckpointLock being only pro forma is now wrong,
if you are going to use locking it to implement a delay in exitRecovery.
But I don't understand why the delay there.  If it's needed, seems like
the path where a restartpoint *isn't* in progress is wrong --- don't you
actually need to start one and wait for it?  Also I note that if the
LWLockConditionalAcquire succeeds, you neglect to release the lock,
which surely can't be right.

* The tail end of StartupXLOG() looks pretty unsafe to me.  Surely
we mustn't clear IsRecoveryProcessingMode until after we have
established the safe-recovery checkpoint.  The comments there seem to
be only vaguely related to the current state of the patch, too.

* Logging of recoveryLastXTime seems pretty bogus now.  It's supposed to
be associated with a restartpoint completion report, but now it's just
out in the ether somewhere and doesn't represent a guarantee that we're
synchronized that far.

* backup.sgml needs to be updated to say that log_restartpoints is
obsolete.

* There are a bunch of disturbing assumptions in the SLRU-related
modules about their StartUp calls being executed without any concurrent
access.  This isn't really a problem that needs to be dealt with in this
patch, I think, but that will all have to be cleaned up before we dare
allow any backends to run concurrently with recovery.  btree's
suppression of relcache invals for metapage updates will be a problem
too.

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
12