Unportable implementation of background worker start

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Unportable implementation of background worker start

Tom Lane-2
I chanced to notice that on gaur/pademelon, the "select_parallel"
regression test sometimes takes a great deal longer than normal,
for no obvious reason.  It does eventually terminate, but sometimes
only after 10 to 20 minutes rather than the couple dozen seconds
that are typical on that slow machine.

After a fair amount of hair-pulling, I traced the problem to the
fact that maybe_start_bgworker() will only start at most one worker
per call; after that, it sets StartWorkerNeeded = true and returns,
opining in a comment that ServerLoop will make another call shortly.

Unfortunately, that's hogwash.  It happens to work reliably on Linux,
because according to signal(7)

       The following interfaces are never restarted after being interrupted by
       a signal handler, regardless of the use of SA_RESTART; they always fail
       with the error EINTR when interrupted by a signal handler:

           ... select(2) ...

However, that's a Linux-ism.  What POSIX 2008 says about it, in the
select(2) reference page, is that

        If SA_RESTART has been set for the interrupting signal, it is
        implementation-defined whether the function restarts or returns
        with [EINTR].

HPUX apparently adopts the "restart" definition, and as we've previously
found out, not only does select(2) not return immediately but it actually
seems to reset the timeout timer to its original value.  (Some googling
suggests that restarting occurs on SVR4-derived kernels but not
BSD-derived ones.)

So what happens, if several RegisterDynamicBackgroundWorker requests
arrive in a single iteration of the postmaster's sigusr1_handler,
is that the first one is serviced thanks to the maybe_start_bgworker
call appearing in sigusr1_handler, and then we return to the select()
call and sleep.  The next start request is serviced only when the
typically-60-second select timeout expires, or more usually when some
other interrupt arrives at the postmaster.  In the regression tests,
what seems to happen is that we get a PMSIGNAL_START_AUTOVAC_WORKER
from the autovac launcher every thirty seconds, allowing one more bgworker
to get launched each time we go through sigusr1_handler.

Here's an actual trace of one select_parallel.sql query trying to
launch four parallel workers; I added a bunch of elog(LOG) calls
to help diagnose what was going on:

2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 1
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 2
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 3
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.448 EDT [8827] LOG:  signaling postmaster for startup of slot 4
2017-04-19 17:25:33.448 EDT [8827] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.563 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  registering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.563 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:25:33.563 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:25:33.566 EDT [8871] LOG:  starting parallel worker 3
2017-04-19 17:25:33.642 EDT [8871] LOG:  exiting parallel worker 3
2017-04-19 17:25:33.642 EDT [8871] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:25:33.647 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:25:33.647 EDT [6456] LOG:  entering reaper
2017-04-19 17:25:33.647 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:25:33.647 EDT [6456] LOG:  reaped bgworker pid 8871 status 0
2017-04-19 17:25:33.647 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:03.114 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:26:03.115 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:26:03.115 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:26:03.118 EDT [8874] LOG:  starting parallel worker 2
2017-04-19 17:26:03.164 EDT [8874] LOG:  exiting parallel worker 2
2017-04-19 17:26:03.164 EDT [8874] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:26:03.185 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:26:03.185 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:03.186 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:26:03.186 EDT [6456] LOG:  reaped bgworker pid 8874 status 0
2017-04-19 17:26:03.186 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:03.284 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:03.284 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:33.378 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:26:33.378 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:26:33.378 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:26:33.382 EDT [8876] LOG:  starting parallel worker 1
2017-04-19 17:26:33.428 EDT [8876] LOG:  exiting parallel worker 1
2017-04-19 17:26:33.428 EDT [8876] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:26:33.452 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:26:33.453 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:33.453 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:26:33.453 EDT [6456] LOG:  reaped bgworker pid 8876 status 0
2017-04-19 17:26:33.453 EDT [6456] LOG:  leaving reaper
2017-04-19 17:26:33.560 EDT [6456] LOG:  entering reaper
2017-04-19 17:26:33.560 EDT [6456] LOG:  leaving reaper
2017-04-19 17:27:03.114 EDT [6456] LOG:  entering sigusr1_handler
2017-04-19 17:27:03.115 EDT [6456] LOG:  entered maybe_start_bgworker, StartWorkerNeeded=1, HaveCrashedWorker=1
2017-04-19 17:27:03.115 EDT [6456] LOG:  starting background worker process "parallel worker for PID 8827"
2017-04-19 17:27:03.118 EDT [8879] LOG:  starting parallel worker 0
2017-04-19 17:27:03.167 EDT [8879] LOG:  exiting parallel worker 0
2017-04-19 17:27:03.167 EDT [8879] STATEMENT:  select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
2017-04-19 17:27:03.174 EDT [6456] LOG:  leaving sigusr1_handler
2017-04-19 17:27:03.174 EDT [6456] LOG:  entering reaper
2017-04-19 17:27:03.175 EDT [6456] LOG:  unregistering background worker "parallel worker for PID 8827"
2017-04-19 17:27:03.184 EDT [6456] LOG:  reaped bgworker pid 8879 status 0
2017-04-19 17:27:03.184 EDT [6456] LOG:  leaving reaper

While I haven't yet tested it, it seems like a fix might be as simple
as deleting these lines in maybe_start_bgworker:

            /*
             * Have ServerLoop call us again.  Note that there might not
             * actually *be* another runnable worker, but we don't care all
             * that much; we will find out the next time we run.
             */
            StartWorkerNeeded = true;
            return;

So I'm wondering what the design rationale was for only starting one
bgworker per invocation.

It also appears to me that do_start_bgworker's treatment of fork
failure is completely brain dead.  Did anyone really think about
that case?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Alvaro Herrera-9
Tom Lane wrote:

> While I haven't yet tested it, it seems like a fix might be as simple
> as deleting these lines in maybe_start_bgworker:
>
>             /*
>              * Have ServerLoop call us again.  Note that there might not
>              * actually *be* another runnable worker, but we don't care all
>              * that much; we will find out the next time we run.
>              */
>             StartWorkerNeeded = true;
>             return;
>
> So I'm wondering what the design rationale was for only starting one
> bgworker per invocation.

The rationale was that there may be other tasks waiting for postmaster
attention, and if there are many bgworkers needing to be started, the
other work may be delayed for a long time.  This is not the first time
that this rationale has been challenged, but so far there hasn't been
any good reason to change it.  One option is to just remove it as you
propose, but a different one is to stop using select(2) in ServerLoop,
because those behavior differences seem to make it rather unusable.

> It also appears to me that do_start_bgworker's treatment of fork
> failure is completely brain dead.  Did anyone really think about
> that case?

Hmm, I probably modelled it on autovacuum without giving that case much
additional consideration.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> Tom Lane wrote:
>> So I'm wondering what the design rationale was for only starting one
>> bgworker per invocation.

> The rationale was that there may be other tasks waiting for postmaster
> attention, and if there are many bgworkers needing to be started, the
> other work may be delayed for a long time.  This is not the first time
> that this rationale has been challenged, but so far there hasn't been
> any good reason to change it.  One option is to just remove it as you
> propose, but a different one is to stop using select(2) in ServerLoop,
> because those behavior differences seem to make it rather unusable.

Hm.  Do you have a more-portable alternative?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Alvaro Herrera-9
Tom Lane wrote:

> Alvaro Herrera <[hidden email]> writes:
> > Tom Lane wrote:
> >> So I'm wondering what the design rationale was for only starting one
> >> bgworker per invocation.
>
> > The rationale was that there may be other tasks waiting for postmaster
> > attention, and if there are many bgworkers needing to be started, the
> > other work may be delayed for a long time.  This is not the first time
> > that this rationale has been challenged, but so far there hasn't been
> > any good reason to change it.  One option is to just remove it as you
> > propose, but a different one is to stop using select(2) in ServerLoop,
> > because those behavior differences seem to make it rather unusable.
>
> Hm.  Do you have a more-portable alternative?

I was thinking in a WaitEventSet from latch.c.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> Tom Lane wrote:
>> Hm.  Do you have a more-portable alternative?

> I was thinking in a WaitEventSet from latch.c.

Yeah, some googling turns up the suggestion that a self-pipe is a portable
way to get consistent semantics from select(); latch.c has already done
that.  I suppose that using latch.c would be convenient in that we'd have
to write little new code, but it's a tad annoying to add the overhead of a
self-pipe on platforms where we don't need it (which seems to be most).

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Andres Freund
On 2017-04-19 18:56:26 -0400, Tom Lane wrote:

> Alvaro Herrera <[hidden email]> writes:
> > Tom Lane wrote:
> >> Hm.  Do you have a more-portable alternative?
>
> > I was thinking in a WaitEventSet from latch.c.
>
> Yeah, some googling turns up the suggestion that a self-pipe is a portable
> way to get consistent semantics from select(); latch.c has already done
> that.  I suppose that using latch.c would be convenient in that we'd have
> to write little new code, but it's a tad annoying to add the overhead of a
> self-pipe on platforms where we don't need it (which seems to be most).

FWIW, I'd wished before that we used something a bit more modern than
select() if available... It's nice to be able to listen to a larger
number of sockets without repeated O(sockets) overhead.

BTW, we IIRC had discussed removing the select() backed latch
implementation in this release.  I'll try to dig up that discussion.

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
Andres Freund <[hidden email]> writes:
> FWIW, I'd wished before that we used something a bit more modern than
> select() if available... It's nice to be able to listen to a larger
> number of sockets without repeated O(sockets) overhead.

[ raised eyebrow... ]  Is anyone really running postmasters with enough
listen sockets for that to be meaningful?

> BTW, we IIRC had discussed removing the select() backed latch
> implementation in this release.  I'll try to dig up that discussion.

Might be sensible.  Even my pet dinosaurs have poll(2).  We should
check the buildfarm to see if the select() implementation is being
tested at all.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
In reply to this post by Tom Lane-2
I wrote:
> Alvaro Herrera <[hidden email]> writes:
>> Tom Lane wrote:
>>> Hm.  Do you have a more-portable alternative?

>> I was thinking in a WaitEventSet from latch.c.

My first reaction was that that sounded like a lot more work than removing
two lines from maybe_start_bgworker and adjusting some comments.  But on
closer inspection, the slow-bgworker-start issue isn't the only problem
here.  On a machine that restarts select()'s timeout after an interrupt,
as (at least) HPUX does, the postmaster will actually never iterate
ServerLoop's loop except immediately after receiving a new connection
request.  The stream of interrupts from the autovac launcher is alone
sufficient to prevent the initial 60-second timeout from ever elapsing.
So if there are no new connection requests for awhile, none of the
housekeeping actions in ServerLoop get done.

Most of those actions are concerned with restarting failed background
tasks, which is something we could get by without --- it's unlikely
that those tasks would fail without causing a database-wide restart,
and then there really isn't going to be much need for them until at least
one new connection request has arrived.  But the last step in that loop is
concerned with touching the sockets and lock files to prevent aggressive
/tmp cleaners from removing them, and that's something that can't be let
slide, or we might as well not have the logic at all.

I've confirmed experimentally that on my HPUX box, a postmaster not
receiving new connections for an hour or more in fact fails to update
the mod times on the sockets and lock files.  So that problem isn't
hypothetical.

So it's looking to me like we do need to do something about this, and
ideally back-patch it all the way.  But WaitEventSet doesn't exist
before 9.6.  Do we have the stomach for back-patching that into
stable branches?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
I wrote:
> Alvaro Herrera <[hidden email]> writes:
>> Tom Lane wrote:
>>> Hm.  Do you have a more-portable alternative?

>> I was thinking in a WaitEventSet from latch.c.

> My first reaction was that that sounded like a lot more work than removing
> two lines from maybe_start_bgworker and adjusting some comments.  But on
> closer inspection, the slow-bgworker-start issue isn't the only problem
> here.  On a machine that restarts select()'s timeout after an interrupt,
> as (at least) HPUX does, the postmaster will actually never iterate
> ServerLoop's loop except immediately after receiving a new connection
> request.

I had a go at making the postmaster use a WaitEventSet, and immediately
ran into problems: latch.c is completely unprepared to be used anywhere
except a postmaster child process.  I think we can get around the issue
for the self-pipe, as per the attached untested patch.  But there remains
a problem: we should do a FreeWaitEventSet() after forking a child
process to ensure that postmaster children aren't running around with
open FDs for the postmaster's stuff.  This is no big problem in a regular
Unix build; we can give ClosePostmasterPorts() the responsibility.
It *is* a problem in EXEC_BACKEND children, which won't have inherited
the WaitEventSet data structure.  Maybe we could ignore the problem for
Unix EXEC_BACKEND builds, since we consider those to be for debug
purposes only, not for production.  But I don't think we can get away
with it for Windows --- or are the HANDLEs in a Windows WaitEventSet
not inheritable resources?

                        regards, tom lane


diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4798370..d4ac54c 100644
*** a/src/backend/storage/ipc/latch.c
--- b/src/backend/storage/ipc/latch.c
*************** static volatile sig_atomic_t waiting = f
*** 129,134 ****
--- 129,136 ----
  /* Read and write ends of the self-pipe */
  static int selfpipe_readfd = -1;
  static int selfpipe_writefd = -1;
+ /* Process owning the self-pipe --- needed for checking purposes */
+ static int selfpipe_owner_pid = 0;
 
  /* Private function prototypes */
  static void sendSelfPipeByte(void);
*************** InitializeLatchSupport(void)
*** 158,164 ****
  #ifndef WIN32
  int pipefd[2];
 
! Assert(selfpipe_readfd == -1);
 
  /*
  * Set up the self-pipe that allows a signal handler to wake up the
--- 160,202 ----
  #ifndef WIN32
  int pipefd[2];
 
! if (IsUnderPostmaster)
! {
! /*
! * We might have inherited connections to a self-pipe created by the
! * postmaster.  It's critical that child processes create their own
! * self-pipes, of course, and we really want them to close the
! * inherited FDs for safety's sake.
! */
! if (selfpipe_owner_pid != 0)
! {
! /* Assert we go through here but once in a child process */
! Assert(selfpipe_owner_pid != MyProcPid);
! /* Release postmaster's pipe */
! close(selfpipe_readfd);
! close(selfpipe_writefd);
! /* Clean up, just for safety's sake; we'll set these in a bit */
! selfpipe_readfd = selfpipe_writefd = -1;
! selfpipe_owner_pid = 0;
! }
! else
! {
! /*
! * Postmaster didn't create a self-pipe ... or else we're in an
! * EXEC_BACKEND build, and don't know because we didn't inherit
! * values for the static variables.  (In that case we'll fail to
! * close the inherited FDs, but that seems acceptable since
! * EXEC_BACKEND builds aren't meant for production on Unix.)
! */
! Assert(selfpipe_readfd == -1);
! }
! }
! else
! {
! /* In postmaster or standalone backend, assert we do this but once */
! Assert(selfpipe_readfd == -1);
! Assert(selfpipe_owner_pid == 0);
! }
 
  /*
  * Set up the self-pipe that allows a signal handler to wake up the
*************** InitializeLatchSupport(void)
*** 176,188 ****
 
  selfpipe_readfd = pipefd[0];
  selfpipe_writefd = pipefd[1];
  #else
  /* currently, nothing to do here for Windows */
  #endif
  }
 
  /*
!  * Initialize a backend-local latch.
   */
  void
  InitLatch(volatile Latch *latch)
--- 214,227 ----
 
  selfpipe_readfd = pipefd[0];
  selfpipe_writefd = pipefd[1];
+ selfpipe_owner_pid = MyProcPid;
  #else
  /* currently, nothing to do here for Windows */
  #endif
  }
 
  /*
!  * Initialize a process-local latch.
   */
  void
  InitLatch(volatile Latch *latch)
*************** InitLatch(volatile Latch *latch)
*** 193,199 ****
 
  #ifndef WIN32
  /* Assert InitializeLatchSupport has been called in this process */
! Assert(selfpipe_readfd >= 0);
  #else
  latch->event = CreateEvent(NULL, TRUE, FALSE, NULL);
  if (latch->event == NULL)
--- 232,238 ----
 
  #ifndef WIN32
  /* Assert InitializeLatchSupport has been called in this process */
! Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid);
  #else
  latch->event = CreateEvent(NULL, TRUE, FALSE, NULL);
  if (latch->event == NULL)
*************** OwnLatch(volatile Latch *latch)
*** 256,262 ****
 
  #ifndef WIN32
  /* Assert InitializeLatchSupport has been called in this process */
! Assert(selfpipe_readfd >= 0);
  #endif
 
  if (latch->owner_pid != 0)
--- 295,301 ----
 
  #ifndef WIN32
  /* Assert InitializeLatchSupport has been called in this process */
! Assert(selfpipe_readfd >= 0 && selfpipe_owner_pid == MyProcPid);
  #endif
 
  if (latch->owner_pid != 0)
*************** DisownLatch(volatile Latch *latch)
*** 289,295 ****
   * is incurred when WL_TIMEOUT is given, so avoid using a timeout if possible.
   *
   * The latch must be owned by the current process, ie. it must be a
!  * backend-local latch initialized with InitLatch, or a shared latch
   * associated with the current process by calling OwnLatch.
   *
   * Returns bit mask indicating which condition(s) caused the wake-up. Note
--- 328,334 ----
   * is incurred when WL_TIMEOUT is given, so avoid using a timeout if possible.
   *
   * The latch must be owned by the current process, ie. it must be a
!  * process-local latch initialized with InitLatch, or a shared latch
   * associated with the current process by calling OwnLatch.
   *
   * Returns bit mask indicating which condition(s) caused the wake-up. Note
*************** FreeWaitEventSet(WaitEventSet *set)
*** 597,603 ****
   * used to modify previously added wait events using ModifyWaitEvent().
   *
   * In the WL_LATCH_SET case the latch must be owned by the current process,
!  * i.e. it must be a backend-local latch initialized with InitLatch, or a
   * shared latch associated with the current process by calling OwnLatch.
   *
   * In the WL_SOCKET_READABLE/WRITEABLE case, EOF and error conditions are
--- 636,642 ----
   * used to modify previously added wait events using ModifyWaitEvent().
   *
   * In the WL_LATCH_SET case the latch must be owned by the current process,
!  * i.e. it must be a process-local latch initialized with InitLatch, or a
   * shared latch associated with the current process by calling OwnLatch.
   *
   * In the WL_SOCKET_READABLE/WRITEABLE case, EOF and error conditions are


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Andres Freund
On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
> or are the HANDLEs in a Windows WaitEventSet not inheritable
> resources?

I think we have control over that. According to
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx
CreateProcess() has to be called with bInheritHandles = true (which we
do for backends), and SECURITY_ATTRIBUTES.bInheritHandle has to be true
too.  The latter we already only do for InitSharedLatch(), but not for
InitLatch(), nor for the WSACreateEvent's created for sockets - those
apparently can never be inherited.

So that kind of sounds like it should be doable.

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
>> or are the HANDLEs in a Windows WaitEventSet not inheritable
>> resources?

> I think we have control over that. According to
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms724466(v=vs.85).aspx
> CreateProcess() has to be called with bInheritHandles = true (which we
> do for backends), and SECURITY_ATTRIBUTES.bInheritHandle has to be true
> too.  The latter we already only do for InitSharedLatch(), but not for
> InitLatch(), nor for the WSACreateEvent's created for sockets - those
> apparently can never be inherited.

> So that kind of sounds like it should be doable.

Ah, good.  I'll add a comment about that and press on.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Andres Freund
In reply to this post by Tom Lane-2
On 2017-04-20 00:50:13 -0400, Tom Lane wrote:

> I wrote:
> > Alvaro Herrera <[hidden email]> writes:
> >> Tom Lane wrote:
> >>> Hm.  Do you have a more-portable alternative?
>
> >> I was thinking in a WaitEventSet from latch.c.
>
> My first reaction was that that sounded like a lot more work than removing
> two lines from maybe_start_bgworker and adjusting some comments.  But on
> closer inspection, the slow-bgworker-start issue isn't the only problem
> here.  On a machine that restarts select()'s timeout after an interrupt,
> as (at least) HPUX does, the postmaster will actually never iterate
> ServerLoop's loop except immediately after receiving a new connection
> request.  The stream of interrupts from the autovac launcher is alone
> sufficient to prevent the initial 60-second timeout from ever elapsing.
> So if there are no new connection requests for awhile, none of the
> housekeeping actions in ServerLoop get done.

FWIW, I vaguely remember somewhat related issues on x86/linux too.  On
busy machines in autovacuum_freeze_max_age territory (pretty frequent
thing these days), the signalling frequency caused problems in
postmaster, but I unfortunately don't remember the symptoms very well.


> So it's looking to me like we do need to do something about this, and
> ideally back-patch it all the way.  But WaitEventSet doesn't exist
> before 9.6.  Do we have the stomach for back-patching that into
> stable branches?

Hm, that's not exactly no code - on the other hand it's (excepting
the select(2) path) reasonably well exercised.  What we could do is to
convert master, see how the beta process likes it, and then backpatch?
These don't like super pressing issues.

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
In reply to this post by Tom Lane-2
I wrote:
> Andres Freund <[hidden email]> writes:
>> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
>>> or are the HANDLEs in a Windows WaitEventSet not inheritable
>>> resources?

>> So that kind of sounds like it should be doable.

> Ah, good.  I'll add a comment about that and press on.

So ... what would you say to replacing epoll_create() with
epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
represent inheritable-across-exec resources on any platform,
making it a lot easier to deal with the EXEC_BACKEND case.

AFAIK, both APIs are Linux-only, and epoll_create1() is not much
newer than epoll_create(), so it seems like we'd not be giving up
much portability if we insist on epoll_create1.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Andres Freund
On 2017-04-20 19:53:02 -0400, Tom Lane wrote:

> I wrote:
> > Andres Freund <[hidden email]> writes:
> >> On 2017-04-20 19:23:28 -0400, Tom Lane wrote:
> >>> or are the HANDLEs in a Windows WaitEventSet not inheritable
> >>> resources?
>
> >> So that kind of sounds like it should be doable.
>
> > Ah, good.  I'll add a comment about that and press on.
>
> So ... what would you say to replacing epoll_create() with
> epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
> represent inheritable-across-exec resources on any platform,
> making it a lot easier to deal with the EXEC_BACKEND case.
>
> AFAIK, both APIs are Linux-only, and epoll_create1() is not much
> newer than epoll_create(), so it seems like we'd not be giving up
> much portability if we insist on epoll_create1.

I'm generally quite in favor of using CLOEXEC as much as possible in our
tree.  I'm a bit concerned with epoll_create1's availability tho - the
glibc support for it was introduced in 2.9, whereas epoll_create is in
2.3.2.  On the other hand 2.9 was released 2008-11-13.   If we remain
concerned we could just fcntl(fd, F_SETFD, FD_CLOEXEC) instead - that
should only be like three lines more code or such, and should be
available for a lot longer.

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
>> So ... what would you say to replacing epoll_create() with
>> epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
>> represent inheritable-across-exec resources on any platform,
>> making it a lot easier to deal with the EXEC_BACKEND case.

> I'm generally quite in favor of using CLOEXEC as much as possible in our
> tree.  I'm a bit concerned with epoll_create1's availability tho - the
> glibc support for it was introduced in 2.9, whereas epoll_create is in
> 2.3.2.  On the other hand 2.9 was released 2008-11-13.

Also, if it's not there we'd fall back to using plain poll(), which is
not so awful that we need to work hard to avoid it.  I'd just as soon
keep the number of combinations down.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Andres Freund
On 2017-04-20 20:05:02 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2017-04-20 19:53:02 -0400, Tom Lane wrote:
> >> So ... what would you say to replacing epoll_create() with
> >> epoll_create1(EPOLL_CLOEXEC) ?  Then a WaitEventSet would not
> >> represent inheritable-across-exec resources on any platform,
> >> making it a lot easier to deal with the EXEC_BACKEND case.
>
> > I'm generally quite in favor of using CLOEXEC as much as possible in our
> > tree.  I'm a bit concerned with epoll_create1's availability tho - the
> > glibc support for it was introduced in 2.9, whereas epoll_create is in
> > 2.3.2.  On the other hand 2.9 was released 2008-11-13.
>
> Also, if it's not there we'd fall back to using plain poll(), which is
> not so awful that we need to work hard to avoid it.  I'd just as soon
> keep the number of combinations down.

Just using fcntl(SET, CLOEXEC) wound't increase the number of
combinations?

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
Andres Freund <[hidden email]> writes:
> On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
>> Also, if it's not there we'd fall back to using plain poll(), which is
>> not so awful that we need to work hard to avoid it.  I'd just as soon
>> keep the number of combinations down.

> Just using fcntl(SET, CLOEXEC) wound't increase the number of
> combinations?

True, if you just did it that way unconditionally.  But doesn't that
require an extra kernel call per CreateWaitEventSet()?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Andres Freund
On 2017-04-20 20:10:41 -0400, Tom Lane wrote:

> Andres Freund <[hidden email]> writes:
> > On 2017-04-20 20:05:02 -0400, Tom Lane wrote:
> >> Also, if it's not there we'd fall back to using plain poll(), which is
> >> not so awful that we need to work hard to avoid it.  I'd just as soon
> >> keep the number of combinations down.
>
> > Just using fcntl(SET, CLOEXEC) wound't increase the number of
> > combinations?
>
> True, if you just did it that way unconditionally.  But doesn't that
> require an extra kernel call per CreateWaitEventSet()?

It does - the question is whether that matters much.  FE/BE uses a
persistent wait set, but unfortunately much of other latch users
don't. And some of them can be somewhat frequent - so I guess that'd
possibly be measurable.  Ok, so I'm on board with epoll1.

If somebody were to change more frequent latch users to use persistent
wait sets, that'd be good too.

- Andres


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Tom Lane-2
In reply to this post by Andres Freund
Andres Freund <[hidden email]> writes:
> On 2017-04-20 00:50:13 -0400, Tom Lane wrote:
>> My first reaction was that that sounded like a lot more work than removing
>> two lines from maybe_start_bgworker and adjusting some comments.  But on
>> closer inspection, the slow-bgworker-start issue isn't the only problem
>> here.

> FWIW, I vaguely remember somewhat related issues on x86/linux too.

After sleeping and thinking more, I've realized that the
slow-bgworker-start issue actually exists on *every* platform, it's just
harder to hit when select() is interruptable.  But consider the case
where multiple bgworker-start requests arrive while ServerLoop is
actively executing (perhaps because a connection request just came in).
The postmaster has signals blocked, so nothing happens for the moment.
When we go around the loop and reach

            PG_SETMASK(&UnBlockSig);

the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
bgworker start requests, and services just one of them.  Then control
returns and proceeds to

            selres = select(nSockets, &rmask, NULL, NULL, &timeout);

But now there's no interrupt pending.  So the remaining start requests
do not get serviced until (a) some other postmaster interrupt arrives,
or (b) the one-minute timeout elapses.  They could be waiting awhile.

Bottom line is that any request for more than one bgworker at a time
faces a non-negligible risk of suffering serious latency.

I'm coming back to the idea that at least in the back branches, the
thing to do is allow maybe_start_bgworker to start multiple workers.
Is there any actual evidence for the claim that that might have
bad side effects?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unportable implementation of background worker start

Alvaro Herrera-9
Tom Lane wrote:

> After sleeping and thinking more, I've realized that the
> slow-bgworker-start issue actually exists on *every* platform, it's just
> harder to hit when select() is interruptable.  But consider the case
> where multiple bgworker-start requests arrive while ServerLoop is
> actively executing (perhaps because a connection request just came in).
> The postmaster has signals blocked, so nothing happens for the moment.
> When we go around the loop and reach
>
>             PG_SETMASK(&UnBlockSig);
>
> the pending SIGUSR1 is delivered, and sigusr1_handler reads all the
> bgworker start requests, and services just one of them.  Then control
> returns and proceeds to
>
>             selres = select(nSockets, &rmask, NULL, NULL, &timeout);
>
> But now there's no interrupt pending.  So the remaining start requests
> do not get serviced until (a) some other postmaster interrupt arrives,
> or (b) the one-minute timeout elapses.  They could be waiting awhile.
>
> Bottom line is that any request for more than one bgworker at a time
> faces a non-negligible risk of suffering serious latency.

Interesting.  It's hard to hit, for sure.

> I'm coming back to the idea that at least in the back branches, the
> thing to do is allow maybe_start_bgworker to start multiple workers.
>
> Is there any actual evidence for the claim that that might have
> bad side effects?

Well, I ran tests with a few dozen thousand sample workers and the
neglect for other things (such as connection requests) was visible, but
that's probably not a scenario many servers run often currently.  I
don't strongly object to the idea of removing the "return" in older
branches, since it's evidently a problem.  However, as bgworkers start
to be used more, I think we should definitely have some protection.  In
a system with a large number of workers available for parallel queries,
it seems possible for a high velocity server to get stuck in the loop
for some time.  (I haven't actually verified this, though.  My
experiments were with the early kind, static bgworkers.)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
123
Previous Thread Next Thread
Loading...