Moving relation extension locks out of heavyweight lock manager

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
126 messages Options
1234 ... 7
Reply | Threaded
Open this post in threaded view
|

Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
Hi all,

Currently, the relation extension lock is implemented using
heavyweight lock manager and almost functions (except for
brin_page_cleanup) using LockRelationForExntesion use it with
ExclusiveLock mode. But actually it doesn't need multiple lock modes
or deadlock detection or any of the other functionality that the
heavyweight lock manager provides. I think It's enough to use
something like LWLock. So I'd like to propose to change relation
extension lock management so that it works using LWLock instead.

Attached draft patch makes relation extension locks uses LWLock rather
than heavyweight lock manager, using by shared hash table storing
information of the relation extension lock. The basic idea is that we
add hash table in shared memory for relation extension locks and each
hash entry is LWLock struct. Whenever the process wants to acquire
relation extension locks, it searches appropriate LWLock entry in hash
table and acquire it. The process can remove a hash entry when
unlocking it if nobody is holding and waiting it.

This work would be helpful not only for existing workload but also
future works like some parallel utility commands, which is discussed
on other threads[1]. At least for parallel vacuum, this feature helps
to solve issue that the implementation of parallel vacuum has.

I ran pgbench for 10 min three times(scale factor is 5000), here is a
performance measurement result.

clients   TPS(HEAD)   TPS(Patched)
4           2092.612       2031.277
8           3153.732       3046.789
16         4562.072       4625.419
32         6439.391       6479.526
64         7767.364       7779.636
100       7917.173       7906.567

* 16 core Xeon E5620 2.4GHz
* 32 GB RAM
* ioDrive

In current implementation, it seems there is no performance degradation so far.
Please give me feedback.

[1]
* Block level parallel vacuum WIP
   <https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com>
* CREATE TABLE with parallel workers, 10.0?
  <https://www.postgresql.org/message-id/CAFBoRzeoDdjbPV4riCE%2B2ApV%2BY8nV4HDepYUGftm5SuKWna3rQ%40mail.gmail.com>
* utility commands benefiting from parallel plan
  <https://www.postgresql.org/message-id/CAJrrPGcY3SZa40vU%2BR8d8dunXp9JRcFyjmPn2RF9_4cxjHd7uA%40mail.gmail.com>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Moving_extension_lock_out_of_heavyweight_lock_v1.patch (45K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Robert Haas
On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
> Currently, the relation extension lock is implemented using
> heavyweight lock manager and almost functions (except for
> brin_page_cleanup) using LockRelationForExntesion use it with
> ExclusiveLock mode. But actually it doesn't need multiple lock modes
> or deadlock detection or any of the other functionality that the
> heavyweight lock manager provides. I think It's enough to use
> something like LWLock. So I'd like to propose to change relation
> extension lock management so that it works using LWLock instead.

That's not a good idea because it'll make the code that executes while
holding that lock noninterruptible.

Possibly something based on condition variables would work better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Tom Lane-2
Robert Haas <[hidden email]> writes:
> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
>> ... I'd like to propose to change relation
>> extension lock management so that it works using LWLock instead.

> That's not a good idea because it'll make the code that executes while
> holding that lock noninterruptible.

Is that really a problem?  We typically only hold it over one kernel call,
which ought to be noninterruptible anyway.  Also, the CheckpointLock is
held for far longer, and we've not heard complaints about that one.

I'm slightly suspicious of the claim that we don't need deadlock
detection.  There are places that e.g. touch FSM while holding this
lock.  It might be all right but it needs close review, not just an
assertion that it's not a problem.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

akapila
In reply to this post by Masahiko Sawada
On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawada <[hidden email]> wrote:

> This work would be helpful not only for existing workload but also
> future works like some parallel utility commands, which is discussed
> on other threads[1]. At least for parallel vacuum, this feature helps
> to solve issue that the implementation of parallel vacuum has.
>
> I ran pgbench for 10 min three times(scale factor is 5000), here is a
> performance measurement result.
>
> clients   TPS(HEAD)   TPS(Patched)
> 4           2092.612       2031.277
> 8           3153.732       3046.789
> 16         4562.072       4625.419
> 32         6439.391       6479.526
> 64         7767.364       7779.636
> 100       7917.173       7906.567
>
> * 16 core Xeon E5620 2.4GHz
> * 32 GB RAM
> * ioDrive
>
> In current implementation, it seems there is no performance degradation so far.
>

I think it is good to check pgbench, but we should do tests of the
bulk load as this lock is stressed during such a workload.  Some of
the tests we have done when we have improved the performance of bulk
load can be found in an e-mail [1].

[1] -
https://www.postgresql.org/message-id/CAFiTN-tkX6gs-jL8VrPxg6OG9VUAKnObUq7r7pWQqASzdF5OwA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

akapila
In reply to this post by Tom Lane-2
On Fri, May 12, 2017 at 9:14 AM, Tom Lane <[hidden email]> wrote:

> Robert Haas <[hidden email]> writes:
>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
>>> ... I'd like to propose to change relation
>>> extension lock management so that it works using LWLock instead.
>
>> That's not a good idea because it'll make the code that executes while
>> holding that lock noninterruptible.
>
> Is that really a problem?  We typically only hold it over one kernel call,
> which ought to be noninterruptible anyway.
>

During parallel bulk load operations, I think we hold it over multiple
kernel calls.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
In reply to this post by akapila
On Sat, May 13, 2017 at 8:19 PM, Amit Kapila <[hidden email]> wrote:

> On Thu, May 11, 2017 at 6:09 AM, Masahiko Sawada <[hidden email]> wrote:
>> This work would be helpful not only for existing workload but also
>> future works like some parallel utility commands, which is discussed
>> on other threads[1]. At least for parallel vacuum, this feature helps
>> to solve issue that the implementation of parallel vacuum has.
>>
>> I ran pgbench for 10 min three times(scale factor is 5000), here is a
>> performance measurement result.
>>
>> clients   TPS(HEAD)   TPS(Patched)
>> 4           2092.612       2031.277
>> 8           3153.732       3046.789
>> 16         4562.072       4625.419
>> 32         6439.391       6479.526
>> 64         7767.364       7779.636
>> 100       7917.173       7906.567
>>
>> * 16 core Xeon E5620 2.4GHz
>> * 32 GB RAM
>> * ioDrive
>>
>> In current implementation, it seems there is no performance degradation so far.
>>
>
> I think it is good to check pgbench, but we should do tests of the
> bulk load as this lock is stressed during such a workload.  Some of
> the tests we have done when we have improved the performance of bulk
> load can be found in an e-mail [1].
>

Thank you for sharing.

I've measured using two test scripts attached on that thread. Here is result.

* Copy test script
Client    HEAD     Patched
4          452.60     455.53
8          561.74     561.09
16        592.50     592.21
32        602.53     599.53
64        605.01     606.42

* Insert test script
Client    HEAD     Patched
4          159.04     158.44
8          169.41     169.69
16        177.11     178.14
32        182.14     181.99
64        182.11     182.73

It seems there is no performance degradation so far.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Robert Haas
In reply to this post by akapila
On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <[hidden email]> wrote:

> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <[hidden email]> wrote:
>> Robert Haas <[hidden email]> writes:
>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
>>>> ... I'd like to propose to change relation
>>>> extension lock management so that it works using LWLock instead.
>>
>>> That's not a good idea because it'll make the code that executes while
>>> holding that lock noninterruptible.
>>
>> Is that really a problem?  We typically only hold it over one kernel call,
>> which ought to be noninterruptible anyway.
>
> During parallel bulk load operations, I think we hold it over multiple
> kernel calls.

We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
kernel call, no?  Nor is vm_extend.

Also, it's not just the backend doing the filesystem operation that's
non-interruptible, but also any waiters, right?

Maybe this isn't a big problem, but it does seem to be that it would
be better to avoid it if we can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Wed, May 17, 2017 at 1:30 AM, Robert Haas <[hidden email]> wrote:

> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <[hidden email]> wrote:
>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <[hidden email]> wrote:
>>> Robert Haas <[hidden email]> writes:
>>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
>>>>> ... I'd like to propose to change relation
>>>>> extension lock management so that it works using LWLock instead.
>>>
>>>> That's not a good idea because it'll make the code that executes while
>>>> holding that lock noninterruptible.
>>>
>>> Is that really a problem?  We typically only hold it over one kernel call,
>>> which ought to be noninterruptible anyway.
>>
>> During parallel bulk load operations, I think we hold it over multiple
>> kernel calls.
>
> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
> kernel call, no?  Nor is vm_extend.

Yeah, these functions could call more than one kernel calls while
holding extension lock.

> Also, it's not just the backend doing the filesystem operation that's
> non-interruptible, but also any waiters, right?
>
> Maybe this isn't a big problem, but it does seem to be that it would
> be better to avoid it if we can.
>

I agree to change it to be interruptible for more safety.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada <[hidden email]> wrote:

> On Wed, May 17, 2017 at 1:30 AM, Robert Haas <[hidden email]> wrote:
>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <[hidden email]> wrote:
>>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <[hidden email]> wrote:
>>>> Robert Haas <[hidden email]> writes:
>>>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
>>>>>> ... I'd like to propose to change relation
>>>>>> extension lock management so that it works using LWLock instead.
>>>>
>>>>> That's not a good idea because it'll make the code that executes while
>>>>> holding that lock noninterruptible.
>>>>
>>>> Is that really a problem?  We typically only hold it over one kernel call,
>>>> which ought to be noninterruptible anyway.
>>>
>>> During parallel bulk load operations, I think we hold it over multiple
>>> kernel calls.
>>
>> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
>> kernel call, no?  Nor is vm_extend.
>
> Yeah, these functions could call more than one kernel calls while
> holding extension lock.
>
>> Also, it's not just the backend doing the filesystem operation that's
>> non-interruptible, but also any waiters, right?
>>
>> Maybe this isn't a big problem, but it does seem to be that it would
>> be better to avoid it if we can.
>>
>
> I agree to change it to be interruptible for more safety.
>
Attached updated version patch. To use the lock mechanism similar to
LWLock but interrupt-able, I introduced new lock manager for extension
lock. A lot of code especially locking and unlocking, is inspired by
LWLock but it uses the condition variables to wait for acquiring lock.
Other part is not changed from previous patch. This is still a PoC
patch, lacks documentation. The following is the measurement result
with test script same as I used before.

* Copy test script
     HEAD    Patched
4    436.6   436.1
8    561.8   561.8
16   580.7   579.4
32   588.5   597.0
64   596.1   599.0

* Insert test script
     HEAD    Patched
4    156.5   156.0
8    167.0   167.9
16   176.2   175.6
32   181.1   181.0
64   181.5   183.0

Since I replaced heavyweight lock with lightweight lock I expected the
performance slightly improves from HEAD but it was almost same result.
I'll continue to look at more detail.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Moving_extension_lock_out_of_heavyweight_lock_v2.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Thu, Jun 22, 2017 at 12:03 AM, Masahiko Sawada <[hidden email]> wrote:

> On Fri, May 19, 2017 at 11:12 AM, Masahiko Sawada <[hidden email]> wrote:
>> On Wed, May 17, 2017 at 1:30 AM, Robert Haas <[hidden email]> wrote:
>>> On Sat, May 13, 2017 at 7:27 AM, Amit Kapila <[hidden email]> wrote:
>>>> On Fri, May 12, 2017 at 9:14 AM, Tom Lane <[hidden email]> wrote:
>>>>> Robert Haas <[hidden email]> writes:
>>>>>> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada <[hidden email]> wrote:
>>>>>>> ... I'd like to propose to change relation
>>>>>>> extension lock management so that it works using LWLock instead.
>>>>>
>>>>>> That's not a good idea because it'll make the code that executes while
>>>>>> holding that lock noninterruptible.
>>>>>
>>>>> Is that really a problem?  We typically only hold it over one kernel call,
>>>>> which ought to be noninterruptible anyway.
>>>>
>>>> During parallel bulk load operations, I think we hold it over multiple
>>>> kernel calls.
>>>
>>> We do.  Also, RelationGetNumberOfBlocks() is not necessarily only one
>>> kernel call, no?  Nor is vm_extend.
>>
>> Yeah, these functions could call more than one kernel calls while
>> holding extension lock.
>>
>>> Also, it's not just the backend doing the filesystem operation that's
>>> non-interruptible, but also any waiters, right?
>>>
>>> Maybe this isn't a big problem, but it does seem to be that it would
>>> be better to avoid it if we can.
>>>
>>
>> I agree to change it to be interruptible for more safety.
>>
>
> Attached updated version patch. To use the lock mechanism similar to
> LWLock but interrupt-able, I introduced new lock manager for extension
> lock. A lot of code especially locking and unlocking, is inspired by
> LWLock but it uses the condition variables to wait for acquiring lock.
> Other part is not changed from previous patch. This is still a PoC
> patch, lacks documentation. The following is the measurement result
> with test script same as I used before.
>
> * Copy test script
>      HEAD    Patched
> 4    436.6   436.1
> 8    561.8   561.8
> 16   580.7   579.4
> 32   588.5   597.0
> 64   596.1   599.0
>
> * Insert test script
>      HEAD    Patched
> 4    156.5   156.0
> 8    167.0   167.9
> 16   176.2   175.6
> 32   181.1   181.0
> 64   181.5   183.0
>
> Since I replaced heavyweight lock with lightweight lock I expected the
> performance slightly improves from HEAD but it was almost same result.
> I'll continue to look at more detail.
>
The previous patch conflicts with current HEAD, I rebased the patch to
current HEAD.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Moving_extension_lock_out_of_heavyweight_lock_v3.patch (51K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Thomas Munro-3
On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <[hidden email]> wrote:
> The previous patch conflicts with current HEAD, I rebased the patch to
> current HEAD.

Hi Masahiko-san,

FYI this doesn't build anymore.  I think it's just because the wait
event enumerators were re-alphabetised in pgstat.h:

../../../../src/include/pgstat.h:820:2: error: redeclaration of
enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’
  WAIT_EVENT_LOGICAL_SYNC_DATA,
  ^
../../../../src/include/pgstat.h:806:2: note: previous definition of
‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here
  WAIT_EVENT_LOGICAL_SYNC_DATA,
  ^
../../../../src/include/pgstat.h:821:2: error: redeclaration of
enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’
  WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
  ^
../../../../src/include/pgstat.h:807:2: note: previous definition of
‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here
  WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
  ^

--
Thomas Munro
http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Thomas Munro-3
On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro
<[hidden email]> wrote:
> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <[hidden email]> wrote:
>> The previous patch conflicts with current HEAD, I rebased the patch to
>> current HEAD.
>
> Hi Masahiko-san,

Hi Sawada-san,

I have just learned from a colleague who is knowledgeable about
Japanese customs and kind enough to correct me that the appropriate
term of address for our colleagues in Japan on this mailing list is
<lastname>-san.  I was confused about that -- apologies for my
clumsiness.

--
Thomas Munro
http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Fri, Sep 8, 2017 at 8:25 AM, Thomas Munro
<[hidden email]> wrote:

> On Fri, Sep 8, 2017 at 10:24 AM, Thomas Munro
> <[hidden email]> wrote:
>> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <[hidden email]> wrote:
>>> The previous patch conflicts with current HEAD, I rebased the patch to
>>> current HEAD.
>>
>> Hi Masahiko-san,
>
> Hi Sawada-san,
>
> I have just learned from a colleague who is knowledgeable about
> Japanese customs and kind enough to correct me that the appropriate
> term of address for our colleagues in Japan on this mailing list is
> <lastname>-san.  I was confused about that -- apologies for my
> clumsiness.

Don't worry about it, either is ok. In Japan there is a custom of
writing <lastname>-san but <firstname>-san is also not incorrect :-)
(also I think it's hard to distinguish between last name and first
name of Japanese name).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
In reply to this post by Thomas Munro-3
On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro
<[hidden email]> wrote:

> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <[hidden email]> wrote:
>> The previous patch conflicts with current HEAD, I rebased the patch to
>> current HEAD.
>
> Hi Masahiko-san,
>
> FYI this doesn't build anymore.  I think it's just because the wait
> event enumerators were re-alphabetised in pgstat.h:
>
> ../../../../src/include/pgstat.h:820:2: error: redeclaration of
> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’
>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>   ^
> ../../../../src/include/pgstat.h:806:2: note: previous definition of
> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here
>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>   ^
> ../../../../src/include/pgstat.h:821:2: error: redeclaration of
> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’
>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>   ^
> ../../../../src/include/pgstat.h:807:2: note: previous definition of
> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here
>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>   ^
>
Thank you for the information! Attached rebased patch.

--
Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Moving_extension_lock_out_of_heavyweight_lock_v4.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Fri, Sep 8, 2017 at 4:32 AM, Masahiko Sawada <[hidden email]> wrote:

> On Fri, Sep 8, 2017 at 7:24 AM, Thomas Munro
> <[hidden email]> wrote:
>> On Wed, Aug 16, 2017 at 2:13 PM, Masahiko Sawada <[hidden email]> wrote:
>>> The previous patch conflicts with current HEAD, I rebased the patch to
>>> current HEAD.
>>
>> Hi Masahiko-san,
>>
>> FYI this doesn't build anymore.  I think it's just because the wait
>> event enumerators were re-alphabetised in pgstat.h:
>>
>> ../../../../src/include/pgstat.h:820:2: error: redeclaration of
>> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_DATA’
>>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>>   ^
>> ../../../../src/include/pgstat.h:806:2: note: previous definition of
>> ‘WAIT_EVENT_LOGICAL_SYNC_DATA’ was here
>>   WAIT_EVENT_LOGICAL_SYNC_DATA,
>>   ^
>> ../../../../src/include/pgstat.h:821:2: error: redeclaration of
>> enumerator ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’
>>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>>   ^
>> ../../../../src/include/pgstat.h:807:2: note: previous definition of
>> ‘WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE’ was here
>>   WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE,
>>   ^
>>
>
> Thank you for the information! Attached rebased patch.
>
Since the previous patch conflicts with current HEAD, I attached the
updated patch for next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Moving_extension_lock_out_of_heavyweight_lock_v5.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Robert Haas
On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <[hidden email]> wrote:
> Since the previous patch conflicts with current HEAD, I attached the
> updated patch for next CF.

I think we should back up here and ask ourselves a couple of questions:

1. What are we trying to accomplish here?

2. Is this the best way to accomplish it?

To the first question, the problem as I understand it as follows:
Heavyweight locks don't conflict between members of a parallel group.
However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
don't arise, because parallel operations are strictly read-only
(except for inserts by the leader into a just-created table, when only
one member of the group can be taking the lock anyway).  However, once
we allow writes, they become possible, so some solution is needed.

To the second question, there are a couple of ways we could fix this.
First, we could continue to allow these locks to be taken in the
heavyweight lock manager, but make them conflict even between members
of the same lock group.  This is, however, complicated.  A significant
problem (or so I think) is that the deadlock detector logic, which is
already quite hard to test, will become even more complicated, since
wait edges between members of a lock group need to exist at some times
and not other times.  Moreover, to the best of my knowledge, the
increased complexity would have no benefit, because it doesn't look to
me like we ever take any other heavyweight lock while holding one of
these four kinds of locks.  Therefore, no deadlock can occur: if we're
waiting for one of these locks, the process that holds it is not
waiting for any other heavyweight lock.  This gives rise to a second
idea: move these locks out of the heavyweight lock manager and handle
them with separate code that does not have deadlock detection and
doesn't need as many lock modes.  I think that idea is basically
sound, although it's possibly not the only sound idea.

However, that makes me wonder whether we shouldn't be a bit more
aggressive with this patch: why JUST relation extension locks?  Why
not all four types of locks listed above?  Actually, tuple locks are a
bit sticky, because they have four lock modes.  The other three kinds
are very similar -- all you can do is "take it" (implicitly, in
exclusive mode), "try to take it" (again, implicitly, in exclusive
mode), or "wait for it to be released" (i.e. share lock and then
release).  Another idea is to try to handle those three types and
leave the tuple locking problem for another day.

I suggest that a good thing to do more or less immediately, regardless
of when this patch ends up being ready, would be to insert an
insertion that LockAcquire() is never called while holding a lock of
one of these types.  If that assertion ever fails, then the whole
theory that these lock types don't need deadlock detection is wrong,
and we'd like to find out about that sooner or later.

On the details of the patch, it appears that RelExtLockAcquire()
executes the wait-for-lock code with the partition lock held, and then
continues to hold the partition lock for the entire time that the
relation extension lock is held.  That not only makes all code that
runs while holding the lock non-interruptible but makes a lot of the
rest of this code pointless.  How is any of this atomics code going to
be reached by more than one process at the same time if the entire
bucket is exclusive-locked?  I would guess that the concurrency is not
very good here for the same reason.  Of course, just releasing the
bucket lock wouldn't be right either, because then ext_lock might go
away while we've got a pointer to it, which wouldn't be good.  I think
you could make this work if each lock had both a locker count and a
pin count, and the object can only be removed when the pin_count is 0.
So the lock algorithm would look like this:

- Acquire the partition LWLock.
- Find the item of interest, creating it if necessary.  If out of
memory for more elements, sweep through the table and reclaim
0-pin-count entries, then retry.
- Increment the pin count.
- Attempt to acquire the lock atomically; if we succeed, release the
partition lock and return.
- If this was a conditional-acquire, then decrement the pin count,
release the partition lock and return.
- Release the partition lock.
- Sleep on the condition variable until we manage to atomically
acquire the lock.

The unlock algorithm would just decrement the pin count and, if the
resulting value is non-zero, broadcast on the condition variable.

Although I think this will work, I'm not sure this is actually a great
algorithm.  Every lock acquisition has to take and release the
partition lock, use at least two more atomic ops (to take the pin and
the lock), and search a hash table.  I don't think that's going to be
staggeringly fast.  Maybe it's OK.  It's not that much worse, possibly
not any worse, than what the main lock manager does now.  However,
especially if we implement a solution specific to relation locks, it
seems like it would be better if we could somehow optimize based on
the facts that (1) many relation locks will not conflict and (2) it's
very common for the same backend to take and release the same
extension lock over and over again.  I don't have a specific proposal
right now.

Whatever we end up with, I think we should write some kind of a test
harness to benchmark the number of acquire/release cycles per second
that we can do with the current relation extension lock system vs. the
proposed new system.  Ideally, we'd be faster, since we're proposing a
more specialized mechanism.  But at least we should not be slower.
pgbench isn't a good test because the relation extension lock will
barely be taken let alone contended; we need to check something like
parallel copies into the same table to see any effect.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas <[hidden email]> wrote:
> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <[hidden email]> wrote:
>> Since the previous patch conflicts with current HEAD, I attached the
>> updated patch for next CF.
>
> I think we should back up here and ask ourselves a couple of questions:

Thank you for summarizing of the purpose and discussion of this patch.

> 1. What are we trying to accomplish here?
>
> 2. Is this the best way to accomplish it?
>
> To the first question, the problem as I understand it as follows:
> Heavyweight locks don't conflict between members of a parallel group.
> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
> don't arise, because parallel operations are strictly read-only
> (except for inserts by the leader into a just-created table, when only
> one member of the group can be taking the lock anyway).  However, once
> we allow writes, they become possible, so some solution is needed.
>
> To the second question, there are a couple of ways we could fix this.
> First, we could continue to allow these locks to be taken in the
> heavyweight lock manager, but make them conflict even between members
> of the same lock group.  This is, however, complicated.  A significant
> problem (or so I think) is that the deadlock detector logic, which is
> already quite hard to test, will become even more complicated, since
> wait edges between members of a lock group need to exist at some times
> and not other times.  Moreover, to the best of my knowledge, the
> increased complexity would have no benefit, because it doesn't look to
> me like we ever take any other heavyweight lock while holding one of
> these four kinds of locks.  Therefore, no deadlock can occur: if we're
> waiting for one of these locks, the process that holds it is not
> waiting for any other heavyweight lock.  This gives rise to a second
> idea: move these locks out of the heavyweight lock manager and handle
> them with separate code that does not have deadlock detection and
> doesn't need as many lock modes.  I think that idea is basically
> sound, although it's possibly not the only sound idea.

I'm on the same page.

>
> However, that makes me wonder whether we shouldn't be a bit more
> aggressive with this patch: why JUST relation extension locks?  Why
> not all four types of locks listed above?  Actually, tuple locks are a
> bit sticky, because they have four lock modes.  The other three kinds
> are very similar -- all you can do is "take it" (implicitly, in
> exclusive mode), "try to take it" (again, implicitly, in exclusive
> mode), or "wait for it to be released" (i.e. share lock and then
> release).  Another idea is to try to handle those three types and
> leave the tuple locking problem for another day.
>
> I suggest that a good thing to do more or less immediately, regardless
> of when this patch ends up being ready, would be to insert an
> insertion that LockAcquire() is never called while holding a lock of
> one of these types.  If that assertion ever fails, then the whole
> theory that these lock types don't need deadlock detection is wrong,
> and we'd like to find out about that sooner or later.

I understood. I'll check that first. If this direction has no problem
and we changed these three locks so that it uses new lock mechanism,
we'll not be able to use these locks at the same time. Since it also
means that we impose a limitation to the future we should think
carefully about it. We can implement the deadlock detection mechanism
for it again but it doesn't make sense.

>
> On the details of the patch, it appears that RelExtLockAcquire()
> executes the wait-for-lock code with the partition lock held, and then
> continues to hold the partition lock for the entire time that the
> relation extension lock is held.  That not only makes all code that
> runs while holding the lock non-interruptible but makes a lot of the
> rest of this code pointless.  How is any of this atomics code going to
> be reached by more than one process at the same time if the entire
> bucket is exclusive-locked?  I would guess that the concurrency is not
> very good here for the same reason.  Of course, just releasing the
> bucket lock wouldn't be right either, because then ext_lock might go
> away while we've got a pointer to it, which wouldn't be good.  I think
> you could make this work if each lock had both a locker count and a
> pin count, and the object can only be removed when the pin_count is 0.
> So the lock algorithm would look like this:
>
> - Acquire the partition LWLock.
> - Find the item of interest, creating it if necessary.  If out of
> memory for more elements, sweep through the table and reclaim
> 0-pin-count entries, then retry.
> - Increment the pin count.
> - Attempt to acquire the lock atomically; if we succeed, release the
> partition lock and return.
> - If this was a conditional-acquire, then decrement the pin count,
> release the partition lock and return.
> - Release the partition lock.
> - Sleep on the condition variable until we manage to atomically
> acquire the lock.
>
> The unlock algorithm would just decrement the pin count and, if the
> resulting value is non-zero, broadcast on the condition variable.

Thank you for the suggestion!

> Although I think this will work, I'm not sure this is actually a great
> algorithm.  Every lock acquisition has to take and release the
> partition lock, use at least two more atomic ops (to take the pin and
> the lock), and search a hash table.  I don't think that's going to be
> staggeringly fast.  Maybe it's OK.  It's not that much worse, possibly
> not any worse, than what the main lock manager does now.  However,
> especially if we implement a solution specific to relation locks, it
> seems like it would be better if we could somehow optimize based on
> the facts that (1) many relation locks will not conflict and (2) it's
> very common for the same backend to take and release the same
> extension lock over and over again.  I don't have a specific proposal
> right now.

Yeah, we can optimize based on the purpose of the solution. In either
case I should answer the above question first.

>
> Whatever we end up with, I think we should write some kind of a test
> harness to benchmark the number of acquire/release cycles per second
> that we can do with the current relation extension lock system vs. the
> proposed new system.  Ideally, we'd be faster, since we're proposing a
> more specialized mechanism.  But at least we should not be slower.
> pgbench isn't a good test because the relation extension lock will
> barely be taken let alone contended; we need to check something like
> parallel copies into the same table to see any effect.
>

I did a benchmark using a custom script that always updates the
primary key (disabling HOT updates). But parallel copies into the same
tale would also be good. Thank you.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Mon, Oct 30, 2017 at 3:17 PM, Masahiko Sawada <[hidden email]> wrote:

> On Fri, Oct 27, 2017 at 12:03 AM, Robert Haas <[hidden email]> wrote:
>> On Thu, Oct 26, 2017 at 12:36 PM, Masahiko Sawada <[hidden email]> wrote:
>>> Since the previous patch conflicts with current HEAD, I attached the
>>> updated patch for next CF.
>>
>> I think we should back up here and ask ourselves a couple of questions:
>
> Thank you for summarizing of the purpose and discussion of this patch.
>
>> 1. What are we trying to accomplish here?
>>
>> 2. Is this the best way to accomplish it?
>>
>> To the first question, the problem as I understand it as follows:
>> Heavyweight locks don't conflict between members of a parallel group.
>> However, this is wrong for LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
>> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN.  Currently, those cases
>> don't arise, because parallel operations are strictly read-only
>> (except for inserts by the leader into a just-created table, when only
>> one member of the group can be taking the lock anyway).  However, once
>> we allow writes, they become possible, so some solution is needed.
>>
>> To the second question, there are a couple of ways we could fix this.
>> First, we could continue to allow these locks to be taken in the
>> heavyweight lock manager, but make them conflict even between members
>> of the same lock group.  This is, however, complicated.  A significant
>> problem (or so I think) is that the deadlock detector logic, which is
>> already quite hard to test, will become even more complicated, since
>> wait edges between members of a lock group need to exist at some times
>> and not other times.  Moreover, to the best of my knowledge, the
>> increased complexity would have no benefit, because it doesn't look to
>> me like we ever take any other heavyweight lock while holding one of
>> these four kinds of locks.  Therefore, no deadlock can occur: if we're
>> waiting for one of these locks, the process that holds it is not
>> waiting for any other heavyweight lock.  This gives rise to a second
>> idea: move these locks out of the heavyweight lock manager and handle
>> them with separate code that does not have deadlock detection and
>> doesn't need as many lock modes.  I think that idea is basically
>> sound, although it's possibly not the only sound idea.
>
> I'm on the same page.
>
>>
>> However, that makes me wonder whether we shouldn't be a bit more
>> aggressive with this patch: why JUST relation extension locks?  Why
>> not all four types of locks listed above?  Actually, tuple locks are a
>> bit sticky, because they have four lock modes.  The other three kinds
>> are very similar -- all you can do is "take it" (implicitly, in
>> exclusive mode), "try to take it" (again, implicitly, in exclusive
>> mode), or "wait for it to be released" (i.e. share lock and then
>> release).  Another idea is to try to handle those three types and
>> leave the tuple locking problem for another day.
>>
>> I suggest that a good thing to do more or less immediately, regardless
>> of when this patch ends up being ready, would be to insert an
>> insertion that LockAcquire() is never called while holding a lock of
>> one of these types.  If that assertion ever fails, then the whole
>> theory that these lock types don't need deadlock detection is wrong,
>> and we'd like to find out about that sooner or later.
>
> I understood. I'll check that first.

I've checked whether LockAcquire is called while holding a lock of one
of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
we cannot move these four lock types together out of heavy-weight
lock, but can move only the relation extension lock with tricks.

Here is detail of the survey.

* LOCKTAG_RELATION_EXTENSION
There is a path that LockRelationForExtension() could be called while
holding another relation extension lock. In brin_getinsertbuffer(), we
acquire a relation extension lock for a index relation and could
initialize a new buffer (brin_initailize_empty_new_buffer()). During
initializing a new buffer, we call RecordPageWithFreeSpace() which
eventually can call fsm_readbuf(rel, addr, true) where the third
argument is "extend". We can process this problem by having the list
(or local hash) of acquired locks and skip acquiring the lock if
already had. For other call paths calling LockRelationForExtension, I
don't see any problem.

* LOCKTAG_PAGE, LOCKTAG_TUPLE, LOCKTAG_SPECULATIVE_INSERTION
There is a path that we can acquire a relation extension lock while
holding these lock.
For LOCKTAG_PAGE, in ginInsertCleanup() we acquire a page lock for the
meta page and process the pending list which could acquire a relation
extension lock for a index relation. For LOCKTAG_TUPLE, in
heap_update() we acquire a tuple lock and could call
RelationGetBufferForTuple(). For LOCKTAG_SPECULATIVE_INSERTION, in
ExecInsert() we acquire a speculative insertion lock and call
heap_insert and ExecInsertIndexTuples(). The operation that is called
while holding each lock type can acquire a relation extension lock.

Also the following is the list of places where we call LockAcquire()
with four lock types (result of git grep "XXX"). I've checked based on
the following list.

* LockRelationForExtension()
contrib/bloom/blutils.c:
LockRelationForExtension(index, ExclusiveLock);
contrib/pgstattuple/pgstattuple.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/access/brin/brin_pageops.c:
LockRelationForExtension(idxrel, ShareLock);
src/backend/access/brin/brin_pageops.c:
LockRelationForExtension(irel, ExclusiveLock);
src/backend/access/brin/brin_revmap.c:
LockRelationForExtension(irel, ExclusiveLock);
src/backend/access/gin/ginutil.c:
LockRelationForExtension(index, ExclusiveLock);
src/backend/access/gin/ginvacuum.c:
LockRelationForExtension(index, ExclusiveLock);
src/backend/access/gin/ginvacuum.c:
LockRelationForExtension(index, ExclusiveLock);
src/backend/access/gist/gistutil.c:
LockRelationForExtension(r, ExclusiveLock);
src/backend/access/gist/gistvacuum.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/access/gist/gistvacuum.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/access/heap/hio.c:
LockRelationForExtension(relation, ExclusiveLock);
src/backend/access/heap/hio.c:
LockRelationForExtension(relation, ExclusiveLock);
src/backend/access/heap/visibilitymap.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/access/nbtree/nbtpage.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/access/nbtree/nbtree.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/access/spgist/spgutils.c:
LockRelationForExtension(index, ExclusiveLock);
src/backend/access/spgist/spgvacuum.c:
LockRelationForExtension(index, ExclusiveLock);
src/backend/commands/vacuumlazy.c:
LockRelationForExtension(onerel, ExclusiveLock);
src/backend/storage/freespace/freespace.c:
LockRelationForExtension(rel, ExclusiveLock);
src/backend/storage/lmgr/lmgr.c:LockRelationForExtension(Relation
relation, LOCKMODE lockmode)

* ConditionalLockRelationForExtension
src/backend/access/heap/hio.c:          else if
(!ConditionalLockRelationForExtension(relation, ExclusiveLock))
src/backend/storage/lmgr/lmgr.c:ConditionalLockRelationForExtension(Relation
relation, LOCKMODE lockmode)

* LockPage
src/backend/access/gin/ginfast.c:               LockPage(index,
GIN_METAPAGE_BLKNO, ExclusiveLock);

* ConditionalLockPage
src/backend/access/gin/ginfast.c:               if
(!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock))

* LockTuple
src/backend/access/heap/heapam.c:       LockTuple((rel), (tup),
tupleLockExtraInfo[mode].hwlock)

* ConditionalLockTuple
src/backend/access/heap/heapam.c:       ConditionalLockTuple((rel),
(tup), tupleLockExtraInfo[mode].hwlock)
src/backend/storage/lmgr/lmgr.c:ConditionalLockTuple(Relation
relation, ItemPointer tid, LOCKMODE lockmode)

* SpeculativeInsertionLockAcquire
src/backend/executor/nodeModifyTable.c:                 specToken =
SpeculativeInsertionLockAcquire(GetCurrentTransactionId());

> If this direction has no problem
> and we changed these three locks so that it uses new lock mechanism,
> we'll not be able to use these locks at the same time. Since it also
> means that we impose a limitation to the future we should think
> carefully about it. We can implement the deadlock detection mechanism
> for it again but it doesn't make sense.
>
>>
>> On the details of the patch, it appears that RelExtLockAcquire()
>> executes the wait-for-lock code with the partition lock held, and then
>> continues to hold the partition lock for the entire time that the
>> relation extension lock is held.  That not only makes all code that
>> runs while holding the lock non-interruptible but makes a lot of the
>> rest of this code pointless.  How is any of this atomics code going to
>> be reached by more than one process at the same time if the entire
>> bucket is exclusive-locked?  I would guess that the concurrency is not
>> very good here for the same reason.  Of course, just releasing the
>> bucket lock wouldn't be right either, because then ext_lock might go
>> away while we've got a pointer to it, which wouldn't be good.  I think
>> you could make this work if each lock had both a locker count and a
>> pin count, and the object can only be removed when the pin_count is 0.
>> So the lock algorithm would look like this:
>>
>> - Acquire the partition LWLock.
>> - Find the item of interest, creating it if necessary.  If out of
>> memory for more elements, sweep through the table and reclaim
>> 0-pin-count entries, then retry.
>> - Increment the pin count.
>> - Attempt to acquire the lock atomically; if we succeed, release the
>> partition lock and return.
>> - If this was a conditional-acquire, then decrement the pin count,
>> release the partition lock and return.
>> - Release the partition lock.
>> - Sleep on the condition variable until we manage to atomically
>> acquire the lock.
>>
>> The unlock algorithm would just decrement the pin count and, if the
>> resulting value is non-zero, broadcast on the condition variable.
>
> Thank you for the suggestion!
>
>> Although I think this will work, I'm not sure this is actually a great
>> algorithm.  Every lock acquisition has to take and release the
>> partition lock, use at least two more atomic ops (to take the pin and
>> the lock), and search a hash table.  I don't think that's going to be
>> staggeringly fast.  Maybe it's OK.  It's not that much worse, possibly
>> not any worse, than what the main lock manager does now.  However,
>> especially if we implement a solution specific to relation locks, it
>> seems like it would be better if we could somehow optimize based on
>> the facts that (1) many relation locks will not conflict and (2) it's
>> very common for the same backend to take and release the same
>> extension lock over and over again.  I don't have a specific proposal
>> right now.
>
> Yeah, we can optimize based on the purpose of the solution. In either
> case I should answer the above question first.
>
>>
>> Whatever we end up with, I think we should write some kind of a test
>> harness to benchmark the number of acquire/release cycles per second
>> that we can do with the current relation extension lock system vs. the
>> proposed new system.  Ideally, we'd be faster, since we're proposing a
>> more specialized mechanism.  But at least we should not be slower.
>> pgbench isn't a good test because the relation extension lock will
>> barely be taken let alone contended; we need to check something like
>> parallel copies into the same table to see any effect.
>>
>
> I did a benchmark using a custom script that always updates the
> primary key (disabling HOT updates). But parallel copies into the same
> tale would also be good. Thank you.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Robert Haas
On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada <[hidden email]> wrote:

>>> I suggest that a good thing to do more or less immediately, regardless
>>> of when this patch ends up being ready, would be to insert an
>>> insertion that LockAcquire() is never called while holding a lock of
>>> one of these types.  If that assertion ever fails, then the whole
>>> theory that these lock types don't need deadlock detection is wrong,
>>> and we'd like to find out about that sooner or later.
>>
>> I understood. I'll check that first.
>
> I've checked whether LockAcquire is called while holding a lock of one
> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
> we cannot move these four lock types together out of heavy-weight
> lock, but can move only the relation extension lock with tricks.
>
> Here is detail of the survey.

Thanks for these details, but I'm not sure I fully understand.

> * LOCKTAG_RELATION_EXTENSION
> There is a path that LockRelationForExtension() could be called while
> holding another relation extension lock. In brin_getinsertbuffer(), we
> acquire a relation extension lock for a index relation and could
> initialize a new buffer (brin_initailize_empty_new_buffer()). During
> initializing a new buffer, we call RecordPageWithFreeSpace() which
> eventually can call fsm_readbuf(rel, addr, true) where the third
> argument is "extend". We can process this problem by having the list
> (or local hash) of acquired locks and skip acquiring the lock if
> already had. For other call paths calling LockRelationForExtension, I
> don't see any problem.

Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock?

Basically, what matters here in the end is whether we can articulate a
deadlock-proof rule around the order in which these locks are
acquired.  The simplest such rule would be "you can only acquire one
lock of any of these types at a time, and you can't subsequently
acquire a heavyweight lock".  But a more complicated rule would be OK
too, e.g. "you can acquire as many heavyweight locks as you want, and
after that you can optionally acquire one page, tuple, or speculative
token lock, and after that you can acquire a relation extension lock".
The latter rule, although more complex, is still deadlock-proof,
because the heavyweight locks still use the deadlock detector, and the
rest has a consistent order of lock acquisition that precludes one
backend taking A then B while another backend takes B then A.  I'm not
entirely clear whether your survey leads us to a place where we can
articulate such a deadlock-proof rule.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: Moving relation extension locks out of heavyweight lock manager

Masahiko Sawada
On Wed, Nov 8, 2017 at 5:41 AM, Robert Haas <[hidden email]> wrote:

> On Mon, Nov 6, 2017 at 4:42 AM, Masahiko Sawada <[hidden email]> wrote:
>>>> I suggest that a good thing to do more or less immediately, regardless
>>>> of when this patch ends up being ready, would be to insert an
>>>> insertion that LockAcquire() is never called while holding a lock of
>>>> one of these types.  If that assertion ever fails, then the whole
>>>> theory that these lock types don't need deadlock detection is wrong,
>>>> and we'd like to find out about that sooner or later.
>>>
>>> I understood. I'll check that first.
>>
>> I've checked whether LockAcquire is called while holding a lock of one
>> of four types: LOCKTAG_RELATION_EXTENSION, LOCKTAG_PAGE,
>> LOCKTAG_TUPLE, and LOCKTAG_SPECULATIVE_TOKEN. To summary, I think that
>> we cannot move these four lock types together out of heavy-weight
>> lock, but can move only the relation extension lock with tricks.
>>
>> Here is detail of the survey.
>
> Thanks for these details, but I'm not sure I fully understand.
>
>> * LOCKTAG_RELATION_EXTENSION
>> There is a path that LockRelationForExtension() could be called while
>> holding another relation extension lock. In brin_getinsertbuffer(), we
>> acquire a relation extension lock for a index relation and could
>> initialize a new buffer (brin_initailize_empty_new_buffer()). During
>> initializing a new buffer, we call RecordPageWithFreeSpace() which
>> eventually can call fsm_readbuf(rel, addr, true) where the third
>> argument is "extend". We can process this problem by having the list
>> (or local hash) of acquired locks and skip acquiring the lock if
>> already had. For other call paths calling LockRelationForExtension, I
>> don't see any problem.
>
> Does calling fsm_readbuf(rel,addr,true) take some heavyweight lock?

No, I meant fsm_readbuf(rel,addr,true) can acquire a relation
extension lock. So it's not problem.

> Basically, what matters here in the end is whether we can articulate a
> deadlock-proof rule around the order in which these locks are
> acquired.

You're right, my survey was not enough to make a decision.

As far as the acquiring these four lock types goes, there are two call
paths that acquire any type of locks while holding another type of
lock. The one is that acquiring a relation extension lock and then
acquiring a relation extension lock for the same relation again. As
explained before, this can be resolved by remembering the holding lock
(perhaps holding only last one is enough). Another is that acquiring
either a tuple lock, a page lock or a speculative insertion lock and
then acquiring a relation extension lock. In the second case, we try
to acquire these two locks in the same order; acquiring 3 types lock
and then extension lock. So it's not problem if we apply the rule that
is that we disallow to try acquiring these three lock types while
holding any relation extension lock. Also, as far as I surveyed there
is no path to acquire a relation lock while holding other 3 type
locks.

>  The simplest such rule would be "you can only acquire one
> lock of any of these types at a time, and you can't subsequently
> acquire a heavyweight lock".  But a more complicated rule would be OK
> too, e.g. "you can acquire as many heavyweight locks as you want, and
> after that you can optionally acquire one page, tuple, or speculative
> token lock, and after that you can acquire a relation extension lock".
> The latter rule, although more complex, is still deadlock-proof,
> because the heavyweight locks still use the deadlock detector, and the
> rest has a consistent order of lock acquisition that precludes one
> backend taking A then B while another backend takes B then A.  I'm not
> entirely clear whether your survey leads us to a place where we can
> articulate such a deadlock-proof rule.

Speaking of the acquiring these four lock types and heavy weight lock,
there obviously is a call path to acquire any of four lock types while
holding a heavy weight lock. In reverse, there also is a call path
that we acquire a heavy weight lock while holding any of four lock
types. The call path I found is that in heap_delete we acquire a tuple
lock and call XactLockTableWait or MultiXactIdWait which eventually
could acquire LOCKTAG_TRANSACTION in order to wait for the concurrent
transactions finish. But IIUC since these functions acquire the lock
for the concurrent transaction's transaction id, deadlocks doesn't
happen.
However, there might be other similar call paths if I'm missing
something. For example, we do some operations that might acquire any
heavy weight locks other than LOCKTAG_TRANSACTION, while holding a
page lock (in ginInsertCleanup) or holding a specualtive insertion
lock (in nodeModifyTable).

To summary, I think we can put the following rules in order to move
four lock types out of heavy weight lock.

1. Do not acquire either a tuple lock, a page lock or a speculative
insertion lock while holding a extension lock.
2. Do not acquire any heavy weight lock except for LOCKTAG_TRANSACTION
while holding any of these four lock types.

Also I'm concerned that it imposes the rules for developers which is
difficult to check statically. We can put several assertions to source
code but it's hard to test the all possible paths by regression tests.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
1234 ... 7