DSM segment handle generation in background workers

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

DSM segment handle generation in background workers

Thomas Munro-3
Hello,

I noticed that every parallel worker generates the same sequence of
handles here:

        /*
         * Loop until we find an unused identifier for the new control
segment. We
         * sometimes use 0 as a sentinel value indicating that no
control segment
         * is known to exist, so avoid using that value for a real control
         * segment.
         */
        for (;;)
        {
                Assert(dsm_control_address == NULL);
                Assert(dsm_control_mapped_size == 0);
                dsm_control_handle = random();
                if (dsm_control_handle == DSM_HANDLE_INVALID)
                        continue;
                if (dsm_impl_op(DSM_OP_CREATE, dsm_control_handle, segsize,

&dsm_control_impl_private, &dsm_control_address,

&dsm_control_mapped_size, ERROR))
                        break;
        }

It's harmless AFAICS, but it produces sequences of syscalls like this
when Parallel Hash is building the hash table:

shm_open("/PostgreSQL.240477264",O_RDWR|O_CREAT|O_EXCL,0600) ERR#17
'File exists'
shm_open("/PostgreSQL.638747851",O_RDWR|O_CREAT|O_EXCL,0600) ERR#17
'File exists'
shm_open("/PostgreSQL.1551053007",O_RDWR|O_CREAT|O_EXCL,0600) = 5 (0x5)

That's because the bgworker startup path doesn't contain a call to
srandom(...distinguishing stuff...), unlike BackendRun().  I suppose
do_start_bgworker() could gain a similar call... or perhaps that call
should be moved into InitPostmasterChild().  If we put it in there
right after MyStartTime is assigned a new value, we could use the same
incantation that PostmasterMain() uses.

I noticed that the comment in PostmasterMain() refers to
PostmasterRandom(), which is gone.

--
Thomas Munro
http://www.enterprisedb.com

Reply | Threaded
Open this post in threaded view
|

Re: DSM segment handle generation in background workers

Thomas Munro-3
On Mon, Oct 8, 2018 at 1:17 AM Thomas Munro
<[hidden email]> wrote:
> That's because the bgworker startup path doesn't contain a call to
> srandom(...distinguishing stuff...), unlike BackendRun().  I suppose
> do_start_bgworker() could gain a similar call... or perhaps that call
> should be moved into InitPostmasterChild().  If we put it in there
> right after MyStartTime is assigned a new value, we could use the same
> incantation that PostmasterMain() uses.
>
> I noticed that the comment in PostmasterMain() refers to
> PostmasterRandom(), which is gone.

Maybe something like this?

--
Thomas Munro
http://www.enterprisedb.com

move-srandom.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: DSM segment handle generation in background workers

Tom Lane-2
Thomas Munro <[hidden email]> writes:
> On Mon, Oct 8, 2018 at 1:17 AM Thomas Munro
> <[hidden email]> wrote:
>> That's because the bgworker startup path doesn't contain a call to
>> srandom(...distinguishing stuff...), unlike BackendRun().  I suppose
>> do_start_bgworker() could gain a similar call... or perhaps that call
>> should be moved into InitPostmasterChild().  If we put it in there
>> right after MyStartTime is assigned a new value, we could use the same
>> incantation that PostmasterMain() uses.

> Maybe something like this?

I think the bit with

#ifndef HAVE_STRONG_RANDOM
  random_seed = 0;
  random_start_time.tv_usec = 0;
#endif

should also be done in every postmaster child, no?  If we want to hide the
postmaster's state from child processes, we ought to hide it from all.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: DSM segment handle generation in background workers

Thomas Munro-3
On Tue, Oct 9, 2018 at 1:53 PM Tom Lane <[hidden email]> wrote:

> Thomas Munro <[hidden email]> writes:
> > On Mon, Oct 8, 2018 at 1:17 AM Thomas Munro
> > <[hidden email]> wrote:
> >> That's because the bgworker startup path doesn't contain a call to
> >> srandom(...distinguishing stuff...), unlike BackendRun().  I suppose
> >> do_start_bgworker() could gain a similar call... or perhaps that call
> >> should be moved into InitPostmasterChild().  If we put it in there
> >> right after MyStartTime is assigned a new value, we could use the same
> >> incantation that PostmasterMain() uses.
>
> > Maybe something like this?
>
> I think the bit with
>
> #ifndef HAVE_STRONG_RANDOM
>         random_seed = 0;
>         random_start_time.tv_usec = 0;
> #endif
>
> should also be done in every postmaster child, no?  If we want to hide the
> postmaster's state from child processes, we ought to hide it from all.
Ok, here is a sketch patch with a new function InitRandomSeeds() to do
that.  It is called from PostmasterMain(), InitPostmasterChild() and
InitStandaloneProcess() for consistency.

It seems a bit strange to me that internal infrastructure shares a
random number generator with SQL-callable functions random() and
setseed(), though I'm not saying it's harmful.

While wondering if something like this should be back-patched, I
noticed that SQL random() is declared as parallel-restricted, which is
good: it means we aren't exposing a random() function that generates
the same values in every parallel worker.  So I think this is probably
just a minor nuisance and should probably only be pushed to master, or
at most to 11 (since Parallel Hash likes to create DSM segments in
workers), unless someone can think of a more serious way this can hurt
you.

(Tangentially:  I suppose it might be useful to have a different SQL
random number function that is parallel safe, that isn't associated
with a user-controllable seed, and whose seed is different in each
backend.)

--
Thomas Munro
http://www.enterprisedb.com

0001-Make-sure-we-initialize-random-seeds-per-backend.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: DSM segment handle generation in background workers

Thomas Munro-3
On Tue, Oct 9, 2018 at 3:21 PM Thomas Munro
<[hidden email]> wrote:

> On Tue, Oct 9, 2018 at 1:53 PM Tom Lane <[hidden email]> wrote:
> > Thomas Munro <[hidden email]> writes:
> > > On Mon, Oct 8, 2018 at 1:17 AM Thomas Munro
> > > <[hidden email]> wrote:
> > >> That's because the bgworker startup path doesn't contain a call to
> > >> srandom(...distinguishing stuff...), unlike BackendRun().  I suppose
> > >> do_start_bgworker() could gain a similar call... or perhaps that call
> > >> should be moved into InitPostmasterChild().  If we put it in there
> > >> right after MyStartTime is assigned a new value, we could use the same
> > >> incantation that PostmasterMain() uses.
> >
> > > Maybe something like this?
> >
> > I think the bit with
> >
> > #ifndef HAVE_STRONG_RANDOM
> >         random_seed = 0;
> >         random_start_time.tv_usec = 0;
> > #endif
> >
> > should also be done in every postmaster child, no?  If we want to hide the
> > postmaster's state from child processes, we ought to hide it from all.
>
> Ok, here is a sketch patch with a new function InitRandomSeeds() to do
> that.  It is called from PostmasterMain(), InitPostmasterChild() and
> InitStandaloneProcess() for consistency.
Here's a version I propose to push to master and 11 tomorrow, if there
are no objections.

--
Thomas Munro
http://www.enterprisedb.com

0001-Use-different-random-seeds-in-all-backends.patch (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: DSM segment handle generation in background workers

Tom Lane-2
Thomas Munro <[hidden email]> writes:
>> Ok, here is a sketch patch with a new function InitRandomSeeds() to do
>> that.  It is called from PostmasterMain(), InitPostmasterChild() and
>> InitStandaloneProcess() for consistency.

> Here's a version I propose to push to master and 11 tomorrow, if there
> are no objections.

The comment adjacent to the change in InitStandaloneProcess bothers me.
In particular, it points out that what BackendRun() is currently doing
creates more entropy in the process's random seed than what you have
here: MyStartTime is only good to the nearest second, whereas the code
you removed from BackendRun() factors in fractional seconds to whatever
the precision of GetCurrentTimestamp is.  This does not seem like an
improvement.

I'm inclined to think we should refactor a bit more aggressively so
that, rather than dumbing backends down to the standard of other
processes, we make them all use the better method.  A reasonable way
to approach this would be to invent a global variable MyStartTimestamp
beside MyStartTime, and initialize both of them in the places that
currently initialize the latter, using code like that in
BackendInitialize:

        /* save process start time */
        port->SessionStartTime = GetCurrentTimestamp();
        MyStartTime = timestamptz_to_time_t(port->SessionStartTime);

and then this new InitRandomSeeds function could adopt BackendRun's
seed initialization method instead of the stupider way.

Possibly it'd be sane to merge the MyStartTime* initializations
and the srandom resets into one function, not sure.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: DSM segment handle generation in background workers

Thomas Munro-3
On Thu, Oct 11, 2018 at 5:51 PM Tom Lane <[hidden email]> wrote:
> The comment adjacent to the change in InitStandaloneProcess bothers me.
> In particular, it points out that what BackendRun() is currently doing
> creates more entropy in the process's random seed than what you have
> here: MyStartTime is only good to the nearest second, whereas the code
> you removed from BackendRun() factors in fractional seconds to whatever
> the precision of GetCurrentTimestamp is.  This does not seem like an
> improvement.

True.

> I'm inclined to think we should refactor a bit more aggressively so
> that, rather than dumbing backends down to the standard of other
> processes, we make them all use the better method.  A reasonable way
> to approach this would be to invent a global variable MyStartTimestamp
> beside MyStartTime, and initialize both of them in the places that
> currently initialize the latter, using code like that in
> BackendInitialize:
>
>         /* save process start time */
>         port->SessionStartTime = GetCurrentTimestamp();
>         MyStartTime = timestamptz_to_time_t(port->SessionStartTime);
>
> and then this new InitRandomSeeds function could adopt BackendRun's
> seed initialization method instead of the stupider way.
Ok, done.

With MyStartTimestamp in the picture, port->SessionStartTime is
probably redundant... <looks around> and in fact the comment in struct
Port says it shouldn't even be there.  So... I removed it, and used
the new MyStartTimestamp in the couple of places that wanted it.
Thoughts?

> Possibly it'd be sane to merge the MyStartTime* initializations
> and the srandom resets into one function, not sure.

+1

The main difficulty was coming up with a name for the function that
does those things.  I went with InitProcessGlobals().  Better
suggestions welcome.

--
Thomas Munro
http://www.enterprisedb.com

0001-Refactor-random-seed-and-start-time-initialization.patch (13K) Download Attachment