LogwrtResult contended spinlock

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

LogwrtResult contended spinlock

Alvaro Herrera-9
Jaime Casanova recently reported a situation where pglogical replicating
from 64 POS sites to a single central (64-core) node, each with two
replication sets, causes XLog's info_lck to become highly contended
because of frequently reading LogwrtResult.  We tested the simple fix of
adding a new LWLock that protects LogwrtResult and LogwrtRqst; that
seems to solve the problem easily enough.

At first I wanted to make the new LWLock cover only LogwrtResult proper,
and leave LogwrtRqst alone.  However on doing it, it seemed that that
might change the locking protocol in a nontrivial way.  So I decided to
make it cover both and call it a day.  We did verify that the patch
solves the reported problem, at any rate.

--
Álvaro Herrera                PostgreSQL Expert, https://www.2ndQuadrant.com/

0001-use-an-LWLock-rather-than-spinlock-for-Logwrt-Result.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
Hi,

On August 31, 2020 11:21:56 AM PDT, Alvaro Herrera <[hidden email]> wrote:

>Jaime Casanova recently reported a situation where pglogical
>replicating
>from 64 POS sites to a single central (64-core) node, each with two
>replication sets, causes XLog's info_lck to become highly contended
>because of frequently reading LogwrtResult.  We tested the simple fix
>of
>adding a new LWLock that protects LogwrtResult and LogwrtRqst; that
>seems to solve the problem easily enough.
>
>At first I wanted to make the new LWLock cover only LogwrtResult
>proper,
>and leave LogwrtRqst alone.  However on doing it, it seemed that that
>might change the locking protocol in a nontrivial way.  So I decided to
>make it cover both and call it a day.  We did verify that the patch
>solves the reported problem, at any rate.

Wouldn't the better fix here be to allow reading of individual members without a lock? E.g. by wrapping each in a 64bit atomic.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Alvaro Herrera-9
On 2020-Aug-31, Andres Freund wrote:

> Hi,
>
> On August 31, 2020 11:21:56 AM PDT, Alvaro Herrera <[hidden email]> wrote:

> >At first I wanted to make the new LWLock cover only LogwrtResult
> >proper,
> >and leave LogwrtRqst alone.  However on doing it, it seemed that that
> >might change the locking protocol in a nontrivial way.  So I decided to
> >make it cover both and call it a day.  We did verify that the patch
> >solves the reported problem, at any rate.
>
> Wouldn't the better fix here be to allow reading of individual members
> without a lock? E.g. by wrapping each in a 64bit atomic.

Heh, Simon said the same.  It's not clear to me due to the lack of
general availability of 64-bit atomics.  If they are spinlock-protected
when emulated, I think that would make the problem worse.

IIRC Thomas wanted to start relying on atomic 64-bit vars in some patch,
but I don't remember what it was.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
Hi,

On August 31, 2020 11:34:45 AM PDT, Alvaro Herrera <[hidden email]> wrote:

>On 2020-Aug-31, Andres Freund wrote:
>
>> Hi,
>>
>> On August 31, 2020 11:21:56 AM PDT, Alvaro Herrera
><[hidden email]> wrote:
>
>> >At first I wanted to make the new LWLock cover only LogwrtResult
>> >proper,
>> >and leave LogwrtRqst alone.  However on doing it, it seemed that
>that
>> >might change the locking protocol in a nontrivial way.  So I decided
>to
>> >make it cover both and call it a day.  We did verify that the patch
>> >solves the reported problem, at any rate.
>>
>> Wouldn't the better fix here be to allow reading of individual
>members
>> without a lock? E.g. by wrapping each in a 64bit atomic.
>
>Heh, Simon said the same.  It's not clear to me due to the lack of
>general availability of 64-bit atomics.  If they are spinlock-protected
>when emulated, I think that would make the problem worse.
>
>IIRC Thomas wanted to start relying on atomic 64-bit vars in some
>patch,
>but I don't remember what it was.

All relevant platforms have 64bit atomics. So I don't think there's much point in worrying about the emulated performance. Correctness, sure. Performance, not so much.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Alvaro Herrera-9
Looking at patterns like this

        if (XLogCtl->LogwrtRqst.Write < EndPos)
                XLogCtl->LogwrtRqst.Write = EndPos;

It seems possible to implement with

    do {
    XLogRecPtr currwrite;

        currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
        if (currwrite > EndPos)
            break;  // already done by somebody else
        if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
                                           currwrite, EndPos))
            break;  // successfully updated
    } while (true);

This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
seem good material for a general routine.

This *seems* correct to me, though this is muddy territory to me.  Also,
are there better ways to go about this?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
Hi,

On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:

> Looking at patterns like this
>
> if (XLogCtl->LogwrtRqst.Write < EndPos)
> XLogCtl->LogwrtRqst.Write = EndPos;
>
> It seems possible to implement with
>
>     do {
>     XLogRecPtr currwrite;
>
>         currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
> if (currwrite > EndPos)
>             break;  // already done by somebody else
>         if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
>                                   currwrite, EndPos))
>             break;  // successfully updated
>     } while (true);
>
> This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
> seem good material for a general routine.
>
> This *seems* correct to me, though this is muddy territory to me.  Also,
> are there better ways to go about this?

Hm, I was thinking that we'd first go for reading it without a spinlock,
but continuing to write it as we currently do.

But yea, I can't see an issue with what you propose here. I personally
find do {} while () weird and avoid it when not explicitly useful, but
that's extremely minor, obviously.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
Hi,

On 2020-09-04 10:05:45 -0700, Andres Freund wrote:

> On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
> > Looking at patterns like this
> >
> > if (XLogCtl->LogwrtRqst.Write < EndPos)
> > XLogCtl->LogwrtRqst.Write = EndPos;
> >
> > It seems possible to implement with
> >
> >     do {
> >     XLogRecPtr currwrite;
> >
> >         currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
> > if (currwrite > EndPos)
> >             break;  // already done by somebody else
> >         if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
> >                                   currwrite, EndPos))
> >             break;  // successfully updated
> >     } while (true);
> >
> > This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
> > seem good material for a general routine.
> >
> > This *seems* correct to me, though this is muddy territory to me.  Also,
> > are there better ways to go about this?
>
> Hm, I was thinking that we'd first go for reading it without a spinlock,
> but continuing to write it as we currently do.
>
> But yea, I can't see an issue with what you propose here. I personally
> find do {} while () weird and avoid it when not explicitly useful, but
> that's extremely minor, obviously.

Re general routine: On second thought, it might actually be worth having
it. Even just for LSNs - there's plenty places where it's useful to
ensure a variable is at least a certain size.  I think I would be in
favor of a general helper function.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Anastasia Lubennikova
On 04.09.2020 20:13, Andres Freund wrote:

> Hi,
>
> On 2020-09-04 10:05:45 -0700, Andres Freund wrote:
>> On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
>>> Looking at patterns like this
>>>
>>> if (XLogCtl->LogwrtRqst.Write < EndPos)
>>> XLogCtl->LogwrtRqst.Write = EndPos;
>>>
>>> It seems possible to implement with
>>>
>>>      do {
>>>       XLogRecPtr currwrite;
>>>
>>>          currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
>>> if (currwrite > EndPos)
>>>              break;  // already done by somebody else
>>>          if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
>>>                                   currwrite, EndPos))
>>>              break;  // successfully updated
>>>      } while (true);
>>>
>>> This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
>>> seem good material for a general routine.
>>>
>>> This *seems* correct to me, though this is muddy territory to me.  Also,
>>> are there better ways to go about this?
>> Hm, I was thinking that we'd first go for reading it without a spinlock,
>> but continuing to write it as we currently do.
>>
>> But yea, I can't see an issue with what you propose here. I personally
>> find do {} while () weird and avoid it when not explicitly useful, but
>> that's extremely minor, obviously.
> Re general routine: On second thought, it might actually be worth having
> it. Even just for LSNs - there's plenty places where it's useful to
> ensure a variable is at least a certain size.  I think I would be in
> favor of a general helper function.
Do you mean by general helper function something like this?

void
swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)
{
   while (true) {
     XLogRecPtr  currwrite;

     currwrite = pg_atomic_read_u64(old_value);

     if (to_largest)
       if (currwrite > new_value)
         break;  /* already done by somebody else */
     else
       if (currwrite < new_value)
         break;  /* already done by somebody else */

     if (pg_atomic_compare_exchange_u64(old_value,
                        currwrite, new_value))
     break;  /* already done by somebody else */
   }
}


which will be called like
swap_lsn(XLogCtl->LogwrtRqst.Write, EndPos, true);

>
> Greetings,
>
> Andres Freund

This CF entry was inactive for a while. Alvaro, are you going to
continue working on it?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Álvaro Herrera
On 2020-Nov-24, Anastasia Lubennikova wrote:

> On 04.09.2020 20:13, Andres Freund wrote:

> > Re general routine: On second thought, it might actually be worth having
> > it. Even just for LSNs - there's plenty places where it's useful to
> > ensure a variable is at least a certain size.  I think I would be in
> > favor of a general helper function.
> Do you mean by general helper function something like this?
>
> void
> swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)

Something like that, yeah, though maybe name it "pg_atomic_increase_lsn"
or some similar name that makes it clear that

1. it is supposed to use atomics
2. it can only be used to *advance* a value rather than a generic swap.

(I'm not 100% clear that that's the exact API we need.)

> This CF entry was inactive for a while. Alvaro, are you going to continue
> working on it?

Yes, please move it forward.  I'll post an update sometime before the
next CF.


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Masahiko Sawada
Hi Alvaro,

On Wed, Nov 25, 2020 at 12:02 AM Alvaro Herrera <[hidden email]> wrote:

>
> On 2020-Nov-24, Anastasia Lubennikova wrote:
>
> > On 04.09.2020 20:13, Andres Freund wrote:
>
> > > Re general routine: On second thought, it might actually be worth having
> > > it. Even just for LSNs - there's plenty places where it's useful to
> > > ensure a variable is at least a certain size.  I think I would be in
> > > favor of a general helper function.
> > Do you mean by general helper function something like this?
> >
> > void
> > swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)
>
> Something like that, yeah, though maybe name it "pg_atomic_increase_lsn"
> or some similar name that makes it clear that
>
> 1. it is supposed to use atomics
> 2. it can only be used to *advance* a value rather than a generic swap.
>
> (I'm not 100% clear that that's the exact API we need.)
>
> > This CF entry was inactive for a while. Alvaro, are you going to continue
> > working on it?
>
> Yes, please move it forward.  I'll post an update sometime before the
> next CF.

Anything update on this?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Álvaro Herrera
On 2021-Jan-22, Masahiko Sawada wrote:

> On Wed, Nov 25, 2020 at 12:02 AM Alvaro Herrera <[hidden email]> wrote:

> > Yes, please move it forward.  I'll post an update sometime before the
> > next CF.
>
> Anything update on this?

I'll update this one early next week.

Thanks!

--
Álvaro Herrera       Valdivia, Chile
"I would rather have GNU than GNOT."  (ccchips, lwn.net/Articles/37595/)


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Masahiko Sawada
On Fri, Jan 22, 2021 at 10:39 PM Alvaro Herrera <[hidden email]> wrote:

>
> On 2021-Jan-22, Masahiko Sawada wrote:
>
> > On Wed, Nov 25, 2020 at 12:02 AM Alvaro Herrera <[hidden email]> wrote:
>
> > > Yes, please move it forward.  I'll post an update sometime before the
> > > next CF.
> >
> > Anything update on this?
>
> I'll update this one early next week.

Great, thanks! I'll look at that.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2020-Aug-31, Andres Freund wrote:

> Wouldn't the better fix here be to allow reading of individual members without a lock? E.g. by wrapping each in a 64bit atomic.

So I've been playing with this and I'm annoyed about having two
datatypes to represent Write/Flush positions:

typedef struct XLogwrtRqst
{
        XLogRecPtr Write; /* last byte + 1 to write out */
        XLogRecPtr Flush; /* last byte + 1 to flush */
} XLogwrtRqst;

typedef struct XLogwrtResult
{
        XLogRecPtr Write; /* last byte + 1 written out */
        XLogRecPtr Flush; /* last byte + 1 flushed */
} XLogwrtResult;

Don't they look, um, quite similar?  I am strongly tempted to remove
that distinction, since it seems quite pointless, and introduce a
different one:

typedef struct XLogwrtAtomic
{
        pg_atomic_uint64 Write; /* last byte + 1 of write position */
        pg_atomic_uint64 Flush; /* last byte + 1 of flush position */
} XLogwrtAtomic;

this one, with atomics, would be used for the XLogCtl struct members
LogwrtRqst and LogwrtResult, and are always accessed using atomic ops.
On the other hand we would have

typedef struct XLogwrt
{
        XLogRecPtr Write; /* last byte + 1 of write position */
        XLogRecPtr Flush; /* last byte + 1 of flush position */
} XLogwrt;

to be used for the local-memory-only LogwrtResult, using normal
assignment.

Now, I do wonder if there's a point in keeping LogwrtResult as a local
variable at all; maybe since the ones in shared memory are going to use
unlocked access, we don't need it anymore?  I'd prefer to defer that
decision to after this patch is done, since ISTM that it'd merit more
careful benchmarking.

Thoughts?

--
Álvaro Herrera       Valdivia, Chile


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Alvaro Herrera-9
So I tried this, but -- perhaps not suprisingly -- I can't get it to
work properly; the synchronization fails.  I suspect I need some
barriers, but I tried adding a few (probably some that are not really
necessary) and that didn't have the expected effect.  Strangely, all
tests work for me, but the pg_upgrade one in particular fails.

(The attached is of course POC quality at best.)

I'll have another look next week.

--
Álvaro Herrera                            39°49'30"S 73°17'W

v2-0001-add-pg_atomic_monotonic_advance_u64.patch (1K) Download Attachment
v2-0002-atomics-in-xlog.c.patch (12K) Download Attachment
v2-0003-add-barriers.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
In reply to this post by Alvaro Herrera-9
Hi,


On 2021-01-29 12:40:18 -0300, Alvaro Herrera wrote:

> On 2020-Aug-31, Andres Freund wrote:
>
> > Wouldn't the better fix here be to allow reading of individual members without a lock? E.g. by wrapping each in a 64bit atomic.
>
> So I've been playing with this and I'm annoyed about having two
> datatypes to represent Write/Flush positions:
>
> typedef struct XLogwrtRqst
> {
> XLogRecPtr Write; /* last byte + 1 to write out */
> XLogRecPtr Flush; /* last byte + 1 to flush */
> } XLogwrtRqst;
>
> typedef struct XLogwrtResult
> {
> XLogRecPtr Write; /* last byte + 1 written out */
> XLogRecPtr Flush; /* last byte + 1 flushed */
> } XLogwrtResult;
>
> Don't they look, um, quite similar?  I am strongly tempted to remove
> that distinction, since it seems quite pointless, and introduce a
> different one:

Their spelling drives me nuts. Like, one is *Rqst, the other *Result?
Comeon.


> Now, I do wonder if there's a point in keeping LogwrtResult as a local
> variable at all; maybe since the ones in shared memory are going to use
> unlocked access, we don't need it anymore?  I'd prefer to defer that
> decision to after this patch is done, since ISTM that it'd merit more
> careful benchmarking.

I think doing that might be a bit harmful - we update LogwrtResult
fairly granularly in XLogWrite(). Doing that in those small steps in
shared memory will increase the likelihood of cache misses in other
backends.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
In reply to this post by Alvaro Herrera-9
Hi,

On 2021-01-29 22:47:56 -0300, Alvaro Herrera wrote:
> So I tried this, but -- perhaps not suprisingly -- I can't get it to
> work properly; the synchronization fails.

What do you mean by "synchronization fails"?


> Strangely, all tests work for me, but the pg_upgrade one in particular
> fails.

It's one of the few tests using fsync=on.


> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> + AssertPointerAlignment(ptr, 8);
> +#endif
> + /* FIXME is this algorithm correct if we have u64 simulation? */

I don't see a problem.


> + while (true)
> + {
> + uint64 currval;
> +
> + currval = pg_atomic_read_u64(ptr);
> + if (currval > target_)
> + break; /* already done by somebody else */
> + if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> + break; /* successfully done */
> + }
> +}

I suggest writing this as

    currval = pg_atomic_read_u64(ptr);
    while (currval < target_)
    {
        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
            break;
    }

>  /*
>   * Inserting to WAL is protected by a small fixed number of WAL insertion
> @@ -596,8 +599,10 @@ typedef struct XLogCtlData
>  {
>   XLogCtlInsert Insert;
>  
> + XLogwrtAtomic LogwrtRqst;
> + XLogwrtAtomic LogwrtResult;
> +
>   /* Protected by info_lck: */
> - XLogwrtRqst LogwrtRqst;

Not sure putting these into the same cacheline is a good idea.


> @@ -2166,12 +2163,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
>   if (opportunistic)
>   break;
>  
> - /* Before waiting, get info_lck and update LogwrtResult */
> - SpinLockAcquire(&XLogCtl->info_lck);
> - if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
> - XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
> - LogwrtResult = XLogCtl->LogwrtResult;
> - SpinLockRelease(&XLogCtl->info_lck);
> + /* Before waiting, update LogwrtResult */
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, OldPageRqstPtr);
> +
> + LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> + LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);

I don't think it's quite as easy as this. Write/Flush now aren't
guaranteed to be coherent with each other - previously they were. And
because it's in a global variable used everywhere, we can't just be more
careful about update protocols in one place...

We also shouldn't re-read a variable that we just did via the
pg_atomic_monotonic_advance_u64().

I think we should stop updating both the Write/Flush position at the
same time. That way we don't have an expectation of them being coherent
with each other. Most places really don't need both anyway.


>   {
> - SpinLockAcquire(&XLogCtl->info_lck);
> - XLogCtl->LogwrtResult = LogwrtResult;
> - if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
> - XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
> - if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
> - XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
> - SpinLockRelease(&XLogCtl->info_lck);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> +
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Flush, LogwrtResult.Flush);

Hm. We went from one cheap atomic operation (SpinLockAcquire) to four
expensive ones in the happy path. That's not free...

I don't think we need to manipulate XLogCtl->LogwrtResult.* using atomic
ops - they can only be updated with WALWriteLock held, right?

XLogCtl->LogwrtResult was updated with plain assignment before, why did
you change it to pg_atomic_monotonic_advance_u64()?


> @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
>   {
>   /* advance global request to include new block(s) */
>   pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> + pg_memory_barrier();
> +

That's not really useful - the path that actually updates already
implies a barrier. It'd probably be better to add a barrier to a "never
executed cmpxchg" fastpath.



> @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
>   WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
>   LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
>   LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> + pg_read_barrier();

I'm not sure you really can get away with just a read barrier in these
places. We can't e.g. have later updates to other shared memory
variables be "moved" to before the barrier (which a read barrier
allows).

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Alvaro Herrera-9
Hello,

So I addressed about half of your comments in this version merely by
fixing silly bugs.  The problem I had which I described as
"synchronization fails" was one of those silly bugs.

So in further thinking, it seems simpler to make only LogwrtResult
atomic, and leave LogwrtRqst as currently, using the spinlock.  This
should solve the contention problem we saw at the customer (but I've
asked Jaime very nicely to do a test run, if possible, to confirm).

For things like logical replication, which call GetFlushRecPtr() very
frequently (causing the spinlock issue we saw) it is good, because we're
no longer hitting the spinlock at all in that case.

I have another (pretty mechanical) patch that renames LogwrtResult.Write
to LogWriteResult and LogwrtResult.Flush to LogFlushResult.  That more
clearly shows that we're no longer updating them on unison.  Didn't want
to attach here because I didn't rebase on current one.  But it seems
logical: there's no longer any point in doing struct assignment, which
is the only thing that stuff was good for.


On 2021-Jan-29, Andres Freund wrote:

> > @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> >   {
> >   /* advance global request to include new block(s)
> >   */
> >   pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> > + pg_memory_barrier();
>
> That's not really useful - the path that actually updates already
> implies a barrier. It'd probably be better to add a barrier to a "never
> executed cmpxchg" fastpath.
Got it.  Do you mean in pg_atomic_monotonic_advance_u64() itself?  I'm
not sure which is the nicer semantics.  (If it's got to be at the
caller, then we'll need to return a boolean from there, which sounds
worse.)

> > @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
> >   WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
> >   LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> >   LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> > + pg_read_barrier();
>
> I'm not sure you really can get away with just a read barrier in these
> places. We can't e.g. have later updates to other shared memory
> variables be "moved" to before the barrier (which a read barrier
> allows).
Ah, that makes sense.

I have not really studied the barrier locations terribly closely in this
version of the patch.  It probably misses some (eg. in GetFlushRecPtr
and GetXLogWriteRecPtr).  It is passing the tests for me, but alone
that's probably not enough.  I'm gonna try and study the generated
assembly and see if I can make sense of things ...

--
Álvaro Herrera                            39°49'30"S 73°17'W

0001-add-pg_atomic_monotonic_advance_u64.patch (1K) Download Attachment
0002-make-LogwrtResult-atomic.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LogwrtResult contended spinlock

Andres Freund
Hi,

On 2021-02-02 20:19:19 -0300, Alvaro Herrera wrote:

> > > @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> > >   {
> > >   /* advance global request to include new block(s)
> > >   */
> > >   pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> > > + pg_memory_barrier();
> >
> > That's not really useful - the path that actually updates already
> > implies a barrier. It'd probably be better to add a barrier to a "never
> > executed cmpxchg" fastpath.
>
> Got it.  Do you mean in pg_atomic_monotonic_advance_u64() itself?

Yes.


> I'm not sure which is the nicer semantics.  (If it's got to be at the
> caller, then we'll need to return a boolean from there, which sounds
> worse.)

Nearly all other modifying atomic operations have full barrier
semantics, so I think it'd be better to have it inside the
pg_atomic_monotonic_advance_u64().


> +/*
> + * Monotonically advance the given variable using only atomic operations until
> + * it's at least the target value.
> + */
> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> + uint64 currval;
> +
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> + AssertPointerAlignment(ptr, 8);
> +#endif
> +
> + currval = pg_atomic_read_u64(ptr);
> + while (currval < target_)
> + {
> + if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> + break;
> + }
> +}

So I think it'd be

static inline void
pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
{
    uint64      currval;

#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
    AssertPointerAlignment(ptr, 8);
#endif

    currval = pg_atomic_read_u64(ptr);
    if (currval >= target_)
    {
        pg_memory_barrier();
        return;
    }

    while (currval < target_)
    {
        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
            break;
    }
}


> @@ -1172,9 +1170,10 @@ XLogInsertRecord(XLogRecData *rdata,
>   /* advance global request to include new block(s) */
>   if (XLogCtl->LogwrtRqst.Write < EndPos)
>   XLogCtl->LogwrtRqst.Write = EndPos;
> - /* update local result copy while I have the chance */
> - LogwrtResult = XLogCtl->LogwrtResult;
>   SpinLockRelease(&XLogCtl->info_lck);
> + /* update local result copy */
> + LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> + LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
>   }

As mentioned before - it's not clear to me why this is a valid thing to
do without verifying all LogwrtResult.* usages. You can get updates
completely out of order / independently.


> @@ -2675,8 +2673,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
>   * code in a couple of places.
>   */
>   {
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> + pg_memory_barrier();
>   SpinLockAcquire(&XLogCtl->info_lck);
> - XLogCtl->LogwrtResult = LogwrtResult;

I still don't see why we need "locked" atomic operations here, rather
than just a pg_atomic_write_u64(). They can only be modified
with WALWriteLock held.  There's two reasons for using a spinlock in
this place:
1) it avoids torn reads of 64bit values -
   pg_atomic_write_u64()/pg_atomic_read_u64() avoid that already.
2) it ensures that Write/Flush are updated in unison - but that's not
   useful anymore, given that other places now read the variables
   separately.


> @@ -3064,8 +3063,10 @@ XLogBackgroundFlush(void)
>   WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
>
>   /* if we have already flushed that far, consider async commit records */
> + LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
>   if (WriteRqst.Write <= LogwrtResult.Flush)
>   {
> + pg_memory_barrier();
>   SpinLockAcquire(&XLogCtl->info_lck);
>   WriteRqst.Write = XLogCtl->asyncXactLSN;
>   SpinLockRelease(&XLogCtl->info_lck);

A SpinLockAcquire() is a full memory barrier on its own I think. This'd
probably better solved by just making asyncXactLSN atomic...


Greetings,

Andres Freund