Preventing hangups in bgworker start/stop during DB shutdown

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Preventing hangups in bgworker start/stop during DB shutdown

Tom Lane-2
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).

The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.

I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion.  I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure.  But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.

Thoughts?

                        regards, tom lane

[1] https://www.postgresql.org/message-id/flat/16785-c0207d8c67fb5f25%40postgresql.org


diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 7ef6259eb5..b3ab8b0fa3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -231,13 +231,15 @@ FindRegisteredWorkerBySlotNumber(int slotno)
 }
 
 /*
- * Notice changes to shared memory made by other backends.  This code
- * runs in the postmaster, so we must be very careful not to assume that
- * shared memory contents are sane.  Otherwise, a rogue backend could take
- * out the postmaster.
+ * Notice changes to shared memory made by other backends.
+ * Accept new dynamic worker requests only if allow_new_workers is true.
+ *
+ * This code runs in the postmaster, so we must be very careful not to assume
+ * that shared memory contents are sane.  Otherwise, a rogue backend could
+ * take out the postmaster.
  */
 void
-BackgroundWorkerStateChange(void)
+BackgroundWorkerStateChange(bool allow_new_workers)
 {
  int slotno;
 
@@ -297,6 +299,15 @@ BackgroundWorkerStateChange(void)
  continue;
  }
 
+ /*
+ * If this is a dynamic worker request, and we aren't allowing new
+ * dynamic workers, then immediately mark it for termination; the next
+ * stanza will take care of cleaning it up.
+ */
+ if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART &&
+ !allow_new_workers)
+ slot->terminate = true;
+
  /*
  * If the worker is marked for termination, we don't need to add it to
  * the registered workers list; we can just free the slot. However, if
@@ -503,12 +514,54 @@ BackgroundWorkerStopNotifications(pid_t pid)
  }
 }
 
+/*
+ * Cancel any not-yet-started worker requests that have BGW_NEVER_RESTART.
+ *
+ * This is called during a normal ("smart" or "fast") database shutdown.
+ * After this point, no new background workers will be started, so any
+ * dynamic worker requests should be killed off, allowing anything that
+ * might be waiting for them to clean up and exit.
+ *
+ * This function should only be called from the postmaster.
+ */
+void
+ForgetUnstartedBackgroundWorkers(void)
+{
+ slist_mutable_iter iter;
+
+ slist_foreach_modify(iter, &BackgroundWorkerList)
+ {
+ RegisteredBgWorker *rw;
+ BackgroundWorkerSlot *slot;
+
+ rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+ Assert(rw->rw_shmem_slot < max_worker_processes);
+ slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
+
+ /* If it's not yet started, and it's a dynamic worker ... */
+ if (slot->pid == InvalidPid &&
+ rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
+ {
+ /* ... then zap it, and notify anything that was waiting */
+ int notify_pid = rw->rw_worker.bgw_notify_pid;
+
+ ForgetBackgroundWorker(&iter);
+ if (notify_pid != 0)
+ kill(notify_pid, SIGUSR1);
+ }
+ }
+}
+
 /*
  * Reset background worker crash state.
  *
  * We assume that, after a crash-and-restart cycle, background workers without
  * the never-restart flag should be restarted immediately, instead of waiting
- * for bgw_restart_time to elapse.
+ * for bgw_restart_time to elapse.  On the other hand, workers with that flag
+ * should be forgotten, because they are dynamic requests from processes that
+ * no longer exist.
+ *
+ * This function should only be called from the postmaster.
  */
 void
 ResetBackgroundWorkerCrashTimes(void)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5d09822c81..fa35bf4369 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3796,6 +3796,13 @@ PostmasterStateMachine(void)
  */
  if (pmState == PM_STOP_BACKENDS)
  {
+ /*
+ * Forget any pending requests for dynamic background workers, since
+ * we're no longer willing to launch any new workers.  (If additional
+ * requests arrive, BackgroundWorkerStateChange will reject them.)
+ */
+ ForgetUnstartedBackgroundWorkers();
+
  /* Signal all backend children except walsenders */
  SignalSomeChildren(SIGTERM,
    BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
@@ -5153,7 +5160,8 @@ sigusr1_handler(SIGNAL_ARGS)
  /* Process background worker state change. */
  if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
  {
- BackgroundWorkerStateChange();
+ /* Accept new dynamic worker requests only if not stopping. */
+ BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS);
  StartWorkerNeeded = true;
  }
 
diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index f7e24664d5..dd6cabc45b 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -46,11 +46,12 @@ extern slist_head BackgroundWorkerList;
 
 extern Size BackgroundWorkerShmemSize(void);
 extern void BackgroundWorkerShmemInit(void);
-extern void BackgroundWorkerStateChange(void);
+extern void BackgroundWorkerStateChange(bool allow_new_workers);
 extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
 extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
 extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
 extern void BackgroundWorkerStopNotifications(pid_t pid);
+extern void ForgetUnstartedBackgroundWorkers(void);
 extern void ResetBackgroundWorkerCrashTimes(void);
 
 /* Function to start a background worker, called from postmaster.c */
Reply | Threaded
Open this post in threaded view
|

Re: Preventing hangups in bgworker start/stop during DB shutdown

Craig Ringer-5
On Wed, 23 Dec 2020 at 05:40, Tom Lane <[hidden email]> wrote:
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).

The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.

I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion.  I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure.  But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.

Thoughts?

Callers who launch bgworkers already have to cope with conditions such as the worker failing immediately after launch, or before attaching to the shmem segment used for worker management by whatever extension is launching it.

So I think it's reasonable to lie and say we launched it. The caller must already cope with this case to behave correctly.

Patch specifics:

> This function should only be called from the postmaster

It'd be good to

    Assert(IsPostmasterEnvironment && !IsUnderPostmaster)

in these functions.

Otherwise at first read the patch and rationale looks sensible to me.

(When it comes to the bgw APIs in general I have a laundry list of things I'd like to change or improve around signal handling, error trapping and recovery, and lots more, but that's for another thread.)
Reply | Threaded
Open this post in threaded view
|

Re: Preventing hangups in bgworker start/stop during DB shutdown

Bharath Rupireddy
In reply to this post by Tom Lane-2
On Wed, Dec 23, 2020 at 3:10 AM Tom Lane <[hidden email]> wrote:

> Here's an attempt at closing the race condition discussed in [1]
> (and in some earlier threads, though I'm too lazy to find them).
>
> The core problem is that the bgworker management APIs were designed
> without any thought for exception conditions, notably "we're not
> gonna launch any more workers because we're shutting down the database".
> A process waiting for a worker in WaitForBackgroundWorkerStartup or
> WaitForBackgroundWorkerShutdown will wait forever, so that the database
> fails to shut down without manual intervention.
>
> I'd supposed that we would need some incompatible changes in those APIs
> in order to fix this, but after further study it seems like we could
> hack things by just acting as though a request that won't be serviced
> has already run to completion.  I'm not terribly satisfied with that
> as a long-term solution --- it seems to me that callers should be told
> that there was a failure.  But this looks to be enough to solve the
> lockup condition for existing callers, and it seems like it'd be okay
> to backpatch.
>
> Thoughts?
>
> [1] https://www.postgresql.org/message-id/flat/16785-c0207d8c67fb5f25%40postgresql.org

1) Yeah, the postmaster will not be able to start the bg workers in
following cases, when bgworker_should_start_now returns false. So we
might encounter the hang issue.
static bool
bgworker_should_start_now(BgWorkerStartTime start_time)
{
    switch (pmState)
    {
        case PM_NO_CHILDREN:
        case PM_WAIT_DEAD_END:
        case PM_SHUTDOWN_2:
        case PM_SHUTDOWN:
        case PM_WAIT_BACKENDS:
        case PM_STOP_BACKENDS:
            break;

2) What if postmaster enters pmState >= PM_STOP_BACKENDS state after
it calls BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS)?
First of all, is it possible? I think yes, because we are in
sigusr1_handler(), and I don't see we blocking the signal that sets
pmState >= PM_STOP_BACKENDS either in sigusr1_handler or in
BackgroundWorkerStateChange(). Though it's a small window, we might
get into the hangup issue? If yes, can we check the pmState in the for
loop in BackgroundWorkerStateChange()?

     if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
     {
-        BackgroundWorkerStateChange();
+        /* Accept new dynamic worker requests only if not stopping. */
+        BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS);
         StartWorkerNeeded = true;
     }

3) Can we always say that if bgw_restart_time is BGW_NEVER_RESTART,
then it's a dynamic bg worker? I think we can also have normal
bgworkers with BGW_NEVER_RESTART flag(I see one in worker_spi.c
_PG_init()), will there be any problem? Or do we need some comment
tweaking?

+        /*
+         * If this is a dynamic worker request, and we aren't allowing new
+         * dynamic workers, then immediately mark it for termination; the next
+         * stanza will take care of cleaning it up.
+         */
+        if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART &&
+            !allow_new_workers)
+            slot->terminate = true;

4) IIUC, in the patch we mark slot->terminate = true only for
BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
bgw_restart_time seconds and don't we hit the hanging issue(that we
are trying to solve here) for those bg workers?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: Preventing hangups in bgworker start/stop during DB shutdown

Tom Lane-2
Bharath Rupireddy <[hidden email]> writes:
> On Wed, Dec 23, 2020 at 3:10 AM Tom Lane <[hidden email]> wrote:
>> Here's an attempt at closing the race condition discussed in [1]
>> (and in some earlier threads, though I'm too lazy to find them).

> 2) What if postmaster enters pmState >= PM_STOP_BACKENDS state after
> it calls BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS)?
> First of all, is it possible? I think yes, because we are in
> sigusr1_handler(), and I don't see we blocking the signal that sets
> pmState >= PM_STOP_BACKENDS either in sigusr1_handler or in
> BackgroundWorkerStateChange().

If you're asking whether the postmaster's signal handlers can interrupt
each other, they can't; see comment at the start of each one.  If you're
wondering about the order of operations in sigusr1_handler, I agree that
seems wrong now.  I'm inclined to move the BackgroundWorkerStateChange
call to be just before maybe_start_bgworkers().  That way it's after the
possible entry to PM_HOT_STANDBY state.  The later steps can't change
pmState, except for PostmasterStateMachine, which would be responsible for
anything that needs to be done to bgworkers as a result of making a state
change.

> 3) Can we always say that if bgw_restart_time is BGW_NEVER_RESTART,
> then it's a dynamic bg worker?

That assumption's already baked into ResetBackgroundWorkerCrashTimes,
for one.  Personally I'd have designed things with some more-explicit
indicator, but I'm not interested in revisiting those API decisions now;
any cleanup we might undertake would result in a non-back-patchable fix.

As far as whether it's formally correct to do this given the current
APIs, I think it is.  We're essentially pretending that the worker
got launched and instantly exited.  If it's BGW_NEVER_RESTART then that
would result in deregistration in any case, while if it's not that,
then the worker record should get kept for a possible later restart.

> I think we can also have normal
> bgworkers with BGW_NEVER_RESTART flag(I see one in worker_spi.c
> _PG_init()),

That might be a bug in worker_spi.c, but since that's only test code,
I don't care about it too much.  Nobody's really going to care what
that module does in a postmaster shutdown.

> 4) IIUC, in the patch we mark slot->terminate = true only for
> BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
> bgw_restart_time seconds and don't we hit the hanging issue(that we
> are trying to solve here) for those bg workers?

The hang problem is not with the worker itself, it's with anything
that might be waiting around for the worker to finish.  It doesn't
seem to me to make a whole lot of sense to wait for a restartable
worker; what would that mean?

There's definitely room to revisit and clarify these APIs, and maybe
(if we don't change them altogether) add some Asserts about what are
sane combinations of properties.  But my purpose today is just to get
a back-patchable bug fix, and that sort of change wouldn't fit in.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Preventing hangups in bgworker start/stop during DB shutdown

Tom Lane-2
I wrote:
> Bharath Rupireddy <[hidden email]> writes:
>> 4) IIUC, in the patch we mark slot->terminate = true only for
>> BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
>> bgw_restart_time seconds and don't we hit the hanging issue(that we
>> are trying to solve here) for those bg workers?

> The hang problem is not with the worker itself, it's with anything
> that might be waiting around for the worker to finish.  It doesn't
> seem to me to make a whole lot of sense to wait for a restartable
> worker; what would that mean?

Upon further looking around, I noted that autoprewarm's
autoprewarm_start_worker() function does that, so we can't really
dismiss it.

However, what we can do instead is to change the condition to be
"cancel pending bgworker requests if there is a waiting process".
Almost all of the time, that means it's a dynamic bgworker with
BGW_NEVER_RESTART, so there's no difference.  In the exceptional
cases like autoprewarm_start_worker, this would result in removing
a bgworker registration record for a restartable worker ... but
since we're shutting down, that record would have no effect before
the postmaster exits, anyway.  I think we can live with that, at
least till such time as somebody redesigns this in a cleaner way.

I pushed a fix along those lines.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Preventing hangups in bgworker start/stop during DB shutdown

Craig Ringer-5
On Fri, 25 Dec 2020 at 06:07, Tom Lane <[hidden email]> wrote:
I wrote:
> Bharath Rupireddy <[hidden email]> writes:
>> 4) IIUC, in the patch we mark slot->terminate = true only for
>> BGW_NEVER_RESTART kind bg workers, what happens if a bg worker has
>> bgw_restart_time seconds and don't we hit the hanging issue(that we
>> are trying to solve here) for those bg workers?

> The hang problem is not with the worker itself, it's with anything
> that might be waiting around for the worker to finish.  It doesn't
> seem to me to make a whole lot of sense to wait for a restartable
> worker; what would that mean?

Upon further looking around, I noted that autoprewarm's
autoprewarm_start_worker() function does that, so we can't really
dismiss it.

However, what we can do instead is to change the condition to be
"cancel pending bgworker requests if there is a waiting process".
Almost all of the time, that means it's a dynamic bgworker with
BGW_NEVER_RESTART, so there's no difference.  In the exceptional
cases like autoprewarm_start_worker, this would result in removing
a bgworker registration record for a restartable worker ... but
since we're shutting down, that record would have no effect before
the postmaster exits, anyway.  I think we can live with that, at
least till such time as somebody redesigns this in a cleaner way.

I pushed a fix along those lines.


Thanks for the change.

Cleanups like this in the BGW API definitely make life easier.