adding wait_start column to pg_locks

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

adding wait_start column to pg_locks

torikoshia
Hi,

When examining the duration of locks, we often do join on pg_locks
and pg_stat_activity and use columns such as query_start or
state_change.

However, since these columns are the moment when queries have
started or their state has changed, we cannot get the exact lock
duration in this way.

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

One problem with this idea would be the performance impact of
calling gettimeofday repeatedly.
To avoid it, I reused the result of the gettimeofday which was
called for deadlock_timeout timer start as suggested in the
previous discussion[1].

Attached a patch.

BTW in this patch, for fast path locks, wait_start is set to
zero because it seems the lock will not be waited for.
If my understanding is wrong, I would appreciate it if
someone could point out.


Any thoughts?


[1]
https://www.postgresql.org/message-id/28804.1407907184%40sss.pgh.pa.us

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

v1-0001-Add-wait_start-field-into-pg_locks.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Justin Pryzby
On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:
> So I'm now thinking about adding a new column in pg_locks which
> keeps the time at which locks started waiting.
>
> Attached a patch.

This is failing make check-world, would you send an updated patch ?

I added you as an author so it shows up here.
http://cfbot.cputube.org/atsushi-torikoshi.html

--
Justin


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
On 2021-01-02 06:49, Justin Pryzby wrote:

> On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:
>> So I'm now thinking about adding a new column in pg_locks which
>> keeps the time at which locks started waiting.
>>
>> Attached a patch.
>
> This is failing make check-world, would you send an updated patch ?
>
> I added you as an author so it shows up here.
> http://cfbot.cputube.org/atsushi-torikoshi.html
Thanks!

Attached an updated patch.

Regards,

v2-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Ian Lawrence Barwick

Hi

2021年1月4日(月) 15:04 torikoshia <[hidden email]>:

>
> On 2021-01-02 06:49, Justin Pryzby wrote:
> > On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:
> >> So I'm now thinking about adding a new column in pg_locks which
> >> keeps the time at which locks started waiting.
> >>
> >> Attached a patch.
> >
> > This is failing make check-world, would you send an updated patch ?
> >
> > I added you as an author so it shows up here.
> > http://cfbot.cputube.org/atsushi-torikoshi.html
>
> Thanks!
>
> Attached an updated patch.

I took a look at this patch as it seems useful (and I have an item on my bucket
list labelled "look at the locking code", which I am not at all familiar with).

I tested the patch by doing the following:

Session 1:

    postgres=# CREATE TABLE foo (id int);
    CREATE TABLE

    postgres=# BEGIN ;
    BEGIN

    postgres=*# INSERT INTO foo VALUES (1);
    INSERT 0 1

Session 2:

    postgres=# BEGIN ;
    BEGIN

    postgres=*# LOCK TABLE foo;

Session 3:

    postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start
                 FROM pg_locks
                WHERE relation = 'foo'::regclass AND NOT granted\x\g\x

    -[ RECORD 1 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3513935
    mode       | AccessExclusiveLock
    granted    | f
    fastpath   | f
    wait_start | 2021-01-14 12:03:06.683053+09

So far so good, but checking *all* the locks on this relation:

    postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start
                 FROM pg_locks
                WHERE relation = 'foo'::regclass\x\g\x

    -[ RECORD 1 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3513824
    mode       | RowExclusiveLock
    granted    | t
    fastpath   | f
    wait_start | 2021-01-14 12:03:06.683053+09
    -[ RECORD 2 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3513935
    mode       | AccessExclusiveLock
    granted    | f
    fastpath   | f
    wait_start | 2021-01-14 12:03:06.683053+09

shows the RowExclusiveLock granted in session 1 as apparently waiting from
exactly the same time as session 2's attempt to acquire the lock, which is clearly
not right.

Also, if a further session attempts to acquire a lock, we get:

    postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start
                 FROM pg_locks
                WHERE relation = 'foo'::regclass\x\g\x

    -[ RECORD 1 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3513824
    mode       | RowExclusiveLock
    granted    | t
    fastpath   | f
    wait_start | 2021-01-14 12:05:53.747309+09
    -[ RECORD 2 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3514240
    mode       | AccessExclusiveLock
    granted    | f
    fastpath   | f
    wait_start | 2021-01-14 12:05:53.747309+09
    -[ RECORD 3 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3513935
    mode       | AccessExclusiveLock
    granted    | f
    fastpath   | f
    wait_start | 2021-01-14 12:05:53.747309+09

i.e. all entries now have "wait_start" set to the start time of the latest session's
lock acquisition attempt.

Looking at the code, this happens as the wait start time is being recorded in
the lock record itself, so always contains the value reported by the latest lock
acquisition attempt.

It looks like the logical place to store the value is in the PROCLOCK
structure; the attached patch reworks your patch to do that, and given the above
scenario produces following output:

    postgres=# SELECT locktype, relation, pid, mode, granted, fastpath, wait_start
                 FROM pg_locks
                WHERE relation = 'foo'::regclass\x\g\x

    -[ RECORD 1 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3516391
    mode       | RowExclusiveLock
    granted    | t
    fastpath   | f
    wait_start |
    -[ RECORD 2 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3516470
    mode       | AccessExclusiveLock
    granted    | f
    fastpath   | f
    wait_start | 2021-01-14 12:19:16.217163+09
    -[ RECORD 3 ]-----------------------------
    locktype   | relation
    relation   | 16452
    pid        | 3516968
    mode       | AccessExclusiveLock
    granted    | f
    fastpath   | f
    wait_start | 2021-01-14 12:18:08.195429+09

As mentioned, I'm not at all familiar with the locking code so it's quite
possible that it's incomplete,there may be non-obvious side-effects, or it's
the wrong approach altogether etc., so further review is necessary.


Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

v3-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Robert Haas
On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick <[hidden email]> wrote:
> It looks like the logical place to store the value is in the PROCLOCK
> structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

But I haven't looked at the patch so maybe I'm dumb.

--
Robert Haas
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Ian Lawrence Barwick

2021年1月15日(金) 3:45 Robert Haas <[hidden email]>:
On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick <[hidden email]> wrote:
> It looks like the logical place to store the value is in the PROCLOCK
> structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the process's
PROCLOCK, rather than in the lock itself (which in the original version
of the patch resulted in all processes with an interest in the lock appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).

As mentioned, I hadn't really ever looked at the lock code before so might
be barking up the wrong forest.
 

Regards

Ian Barwick

--
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:
> Looking at the code, this happens as the wait start time is being
> recorded in
> the lock record itself, so always contains the value reported by the
> latest lock
> acquisition attempt.

I think you are right and wait_start should not be recorded
in the LOCK.


On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

> 2021年1月15日(金) 3:45 Robert Haas <[hidden email]>:
>
>> On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
>> <[hidden email]> wrote:
>>> It looks like the logical place to store the value is in the
>> PROCLOCK
>>> structure; ...
>>
>> That seems surprising, because there's one PROCLOCK for every
>> combination of a process and a lock. But, a process can't be waiting
>> for more than one lock at the same time, because once it starts
>> waiting to acquire the first one, it can't do anything else, and
>> thus
>> can't begin waiting for a second one. So I would have thought that
>> this would be recorded in the PROC.
>
> Umm, I think we're at cross-purposes here. The suggestion is to note
> the time when the process started waiting for the lock in the
> process's
> PROCLOCK, rather than in the lock itself (which in the original
> version
> of the patch resulted in all processes with an interest in the lock
> appearing
> to have been waiting to acquire it since the time a lock acquisition
> was most recently attempted).

AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.


Regards,

--
Atsushi Torikoshi


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
On 2021-01-15 15:23, torikoshia wrote:

> Thanks for your reviewing and comments!
>
> On 2021-01-14 12:39, Ian Lawrence Barwick wrote:
>> Looking at the code, this happens as the wait start time is being
>> recorded in
>> the lock record itself, so always contains the value reported by the
>> latest lock
>> acquisition attempt.
>
> I think you are right and wait_start should not be recorded
> in the LOCK.
>
>
> On 2021-01-15 11:48, Ian Lawrence Barwick wrote:
>> 2021年1月15日(金) 3:45 Robert Haas <[hidden email]>:
>>
>>> On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
>>> <[hidden email]> wrote:
>>>> It looks like the logical place to store the value is in the
>>> PROCLOCK
>>>> structure; ...
>>>
>>> That seems surprising, because there's one PROCLOCK for every
>>> combination of a process and a lock. But, a process can't be waiting
>>> for more than one lock at the same time, because once it starts
>>> waiting to acquire the first one, it can't do anything else, and
>>> thus
>>> can't begin waiting for a second one. So I would have thought that
>>> this would be recorded in the PROC.
>>
>> Umm, I think we're at cross-purposes here. The suggestion is to note
>> the time when the process started waiting for the lock in the
>> process's
>> PROCLOCK, rather than in the lock itself (which in the original
>> version
>> of the patch resulted in all processes with an interest in the lock
>> appearing
>> to have been waiting to acquire it since the time a lock acquisition
>> was most recently attempted).
>
> AFAIU, it seems possible to record wait_start in the PROCLOCK but
> redundant since each process can wait at most one lock.
>
> To confirm my understanding, I'm going to make another patch that
> records wait_start in the PGPROC.
Attached a patch.

I noticed previous patches left the wait_start untouched even after
it acquired lock.
The patch also fixed it.

Any thoughts?


Regards,

--
Atsushi Torikoshi

v4-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/01/18 12:00, torikoshia wrote:

> On 2021-01-15 15:23, torikoshia wrote:
>> Thanks for your reviewing and comments!
>>
>> On 2021-01-14 12:39, Ian Lawrence Barwick wrote:
>>> Looking at the code, this happens as the wait start time is being recorded in
>>> the lock record itself, so always contains the value reported by the latest lock
>>> acquisition attempt.
>>
>> I think you are right and wait_start should not be recorded
>> in the LOCK.
>>
>>
>> On 2021-01-15 11:48, Ian Lawrence Barwick wrote:
>>> 2021年1月15日(金) 3:45 Robert Haas <[hidden email]>:
>>>
>>>> On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
>>>> <[hidden email]> wrote:
>>>>> It looks like the logical place to store the value is in the
>>>> PROCLOCK
>>>>> structure; ...
>>>>
>>>> That seems surprising, because there's one PROCLOCK for every
>>>> combination of a process and a lock. But, a process can't be waiting
>>>> for more than one lock at the same time, because once it starts
>>>> waiting to acquire the first one, it can't do anything else, and
>>>> thus
>>>> can't begin waiting for a second one. So I would have thought that
>>>> this would be recorded in the PROC.
>>>
>>> Umm, I think we're at cross-purposes here. The suggestion is to note
>>> the time when the process started waiting for the lock in the
>>> process's
>>> PROCLOCK, rather than in the lock itself (which in the original
>>> version
>>> of the patch resulted in all processes with an interest in the lock
>>> appearing
>>> to have been waiting to acquire it since the time a lock acquisition
>>> was most recently attempted).
>>
>> AFAIU, it seems possible to record wait_start in the PROCLOCK but
>> redundant since each process can wait at most one lock.
>>
>> To confirm my understanding, I'm going to make another patch that
>> records wait_start in the PGPROC.
>
> Attached a patch.
>
> I noticed previous patches left the wait_start untouched even after
> it acquired lock.
> The patch also fixed it.
>
> Any thoughts?

Thanks for updating the patch! I think that this is really useful feature!!
I have two minor comments.

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>

The column name "wait_start" should be "waitstart" for the sake of consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of "lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.

+       Lock acquisition wait start time. <literal>NULL</literal> if
+       lock acquired.

There seems the case where the wait start time is NULL even when "grant"
is false. It's better to add note about that case into the docs? For example,
I found that the wait start time is NULL while the startup process is waiting
for the lock. Is this only that case?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
On 2021-01-21 12:48, Fujii Masao wrote:

> Thanks for updating the patch! I think that this is really useful
> feature!!

Thanks for reviewing!

> I have two minor comments.
>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>wait_start</structfield> <type>timestamptz</type>
>
> The column name "wait_start" should be "waitstart" for the sake of
> consistency
> with other column names in pg_locks? pg_locks seems to avoid including
> an underscore in column names, so "locktype" is used instead of
> "lock_type",
> "virtualtransaction" is used instead of "virtual_transaction", etc.
>
> +       Lock acquisition wait start time. <literal>NULL</literal> if
> +       lock acquired.
>
Agreed.

I also changed the variable name "wait_start" in struct PGPROC and
LockInstanceData to "waitStart" for the same reason.


> There seems the case where the wait start time is NULL even when
> "grant"
> is false. It's better to add note about that case into the docs? For
> example,
> I found that the wait start time is NULL while the startup process is
> waiting
> for the lock. Is this only that case?

Thanks, this is because I set 'waitstart' in the following
condition.

   ---src/backend/storage/lmgr/proc.c
   > 1250         if (!InHotStandby)

As far as considering this, I guess startup process would
be the only case.

I also think that in case of startup process, it seems possible
to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
did it in the attached patch.


Any thoughts?


Regards,

--
Atsushi Torikoshi

v5-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/01/22 14:37, torikoshia wrote:

> On 2021-01-21 12:48, Fujii Masao wrote:
>
>> Thanks for updating the patch! I think that this is really useful feature!!
>
> Thanks for reviewing!
>
>> I have two minor comments.
>>
>> +      <entry role="catalog_table_entry"><para role="column_definition">
>> +       <structfield>wait_start</structfield> <type>timestamptz</type>
>>
>> The column name "wait_start" should be "waitstart" for the sake of consistency
>> with other column names in pg_locks? pg_locks seems to avoid including
>> an underscore in column names, so "locktype" is used instead of "lock_type",
>> "virtualtransaction" is used instead of "virtual_transaction", etc.
>>
>> +       Lock acquisition wait start time. <literal>NULL</literal> if
>> +       lock acquired.
>>
>
> Agreed.
>
> I also changed the variable name "wait_start" in struct PGPROC and
> LockInstanceData to "waitStart" for the same reason.
>
>
>> There seems the case where the wait start time is NULL even when "grant"
>> is false. It's better to add note about that case into the docs? For example,
>> I found that the wait start time is NULL while the startup process is waiting
>> for the lock. Is this only that case?
>
> Thanks, this is because I set 'waitstart' in the following
> condition.
>
>    ---src/backend/storage/lmgr/proc.c
>    > 1250         if (!InHotStandby)
>
> As far as considering this, I guess startup process would
> be the only case.
>
> I also think that in case of startup process, it seems possible
> to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
> did it in the attached patch.

This change seems to cause "waitstart" to be reset every time
ResolveRecoveryConflictWithLock() is called in the do-while loop.
I guess this is not acceptable. Right?

To avoid that issue, IMO the following change is better. Thought?

-       else if (log_recovery_conflict_waits)
+       else
         {
+               TimestampTz now = GetCurrentTimestamp();
+
+               MyProc->waitStart = now;
+
                 /*
                  * Set the wait start timestamp if logging is enabled and in hot
                  * standby.
                  */
-               standbyWaitStart = GetCurrentTimestamp();
+                if (log_recovery_conflict_waits)
+                        standbyWaitStart = now
         }

This change causes the startup process to call GetCurrentTimestamp()
additionally even when log_recovery_conflict_waits is disabled. Which
might decrease the performance of the startup process, but that performance
degradation can happen only when the startup process waits in
ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost
harmless to call GetCurrentTimestamp() additionally in that case. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/01/22 18:11, Fujii Masao wrote:

>
>
> On 2021/01/22 14:37, torikoshia wrote:
>> On 2021-01-21 12:48, Fujii Masao wrote:
>>
>>> Thanks for updating the patch! I think that this is really useful feature!!
>>
>> Thanks for reviewing!
>>
>>> I have two minor comments.
>>>
>>> +      <entry role="catalog_table_entry"><para role="column_definition">
>>> +       <structfield>wait_start</structfield> <type>timestamptz</type>
>>>
>>> The column name "wait_start" should be "waitstart" for the sake of consistency
>>> with other column names in pg_locks? pg_locks seems to avoid including
>>> an underscore in column names, so "locktype" is used instead of "lock_type",
>>> "virtualtransaction" is used instead of "virtual_transaction", etc.
>>>
>>> +       Lock acquisition wait start time. <literal>NULL</literal> if
>>> +       lock acquired.
>>>
>>
>> Agreed.
>>
>> I also changed the variable name "wait_start" in struct PGPROC and
>> LockInstanceData to "waitStart" for the same reason.
>>
>>
>>> There seems the case where the wait start time is NULL even when "grant"
>>> is false. It's better to add note about that case into the docs? For example,
>>> I found that the wait start time is NULL while the startup process is waiting
>>> for the lock. Is this only that case?
>>
>> Thanks, this is because I set 'waitstart' in the following
>> condition.
>>
>>    ---src/backend/storage/lmgr/proc.c
>>    > 1250         if (!InHotStandby)
>>
>> As far as considering this, I guess startup process would
>> be the only case.
>>
>> I also think that in case of startup process, it seems possible
>> to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
>> did it in the attached patch.
>
> This change seems to cause "waitstart" to be reset every time
> ResolveRecoveryConflictWithLock() is called in the do-while loop.
> I guess this is not acceptable. Right?
>
> To avoid that issue, IMO the following change is better. Thought?
>
> -       else if (log_recovery_conflict_waits)
> +       else
>          {
> +               TimestampTz now = GetCurrentTimestamp();
> +
> +               MyProc->waitStart = now;
> +
>                  /*
>                   * Set the wait start timestamp if logging is enabled and in hot
>                   * standby.
>                   */
> -               standbyWaitStart = GetCurrentTimestamp();
> +                if (log_recovery_conflict_waits)
> +                        standbyWaitStart = now
>          }
>
> This change causes the startup process to call GetCurrentTimestamp()
> additionally even when log_recovery_conflict_waits is disabled. Which
> might decrease the performance of the startup process, but that performance
> degradation can happen only when the startup process waits in
> ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost
> harmless to call GetCurrentTimestamp() additionally in that case. Thought?

According to the off-list discussion with you, this should not happen because ResolveRecoveryConflictWithDatabase() sets MyProc->waitStart only when it's not set yet (i.e., = 0). That's good. So I'd withdraw my comment.

+ if (MyProc->waitStart == 0)
+ MyProc->waitStart = now;
<snip>
+ MyProc->waitStart = get_timeout_start_time(DEADLOCK_TIMEOUT);

Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
On 2021-01-25 23:44, Fujii Masao wrote:
> Another comment is; Doesn't the change of MyProc->waitStart need the
> lock table's partition lock? If yes, we can do that by moving
> LWLockRelease(partitionLock) just after the change of
> MyProc->waitStart, but which causes the time that lwlock is being held
> to be long. So maybe we need another way to do that.

Thanks for your comments!

It would be ideal for the consistency of the view to record "waitstart"
during holding the table partition lock.
However, as you pointed out, it would give non-negligible performance
impacts.

I may miss something, but as far as I can see, the influence of not
holding the lock is that "waitstart" can be NULL even though "granted"
is false.

I think people want to know the start time of the lock when locks are
held for a long time.
In that case, "waitstart" should have already been recorded.

If this is true, I think the current implementation may be enough on the
condition that users understand it can happen that "waitStart" is NULL
and "granted" is false.

Attached a patch describing this in the doc and comments.


Any Thoughts?

Regards,


--
Atsushi Torikoshi

v6-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/02/02 22:00, torikoshia wrote:

> On 2021-01-25 23:44, Fujii Masao wrote:
>> Another comment is; Doesn't the change of MyProc->waitStart need the
>> lock table's partition lock? If yes, we can do that by moving
>> LWLockRelease(partitionLock) just after the change of
>> MyProc->waitStart, but which causes the time that lwlock is being held
>> to be long. So maybe we need another way to do that.
>
> Thanks for your comments!
>
> It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock.
> However, as you pointed out, it would give non-negligible performance impacts.
>
> I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false.
>
> I think people want to know the start time of the lock when locks are held for a long time.
> In that case, "waitstart" should have already been recorded.

Sounds reasonable.


> If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false.
>
> Attached a patch describing this in the doc and comments.
>
>
> Any Thoughts?

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?


+       Lock acquisition wait start time.

Isn't it better to describe this more clearly? What about the following?

     Time when the server process started waiting for this lock,
     or null if the lock is held.

+       Note that updating this field and lock acquisition are not performed
+       synchronously for performance reasons.  Therefore, depending on the
+       timing, it can happen that <structfield>waitstart</structfield> is
+       <literal>NULL</literal> even though
+       <structfield>granted</structfield> is false.

I agree that it's helpful to add the note about that NULL can be returned even when "granted" is false. But IMO we don't need to document why this behavior can happen internally. So what about the following?

     Note that this can be null for a very short period of time after
     the wait started even though <structfield>granted</structfield>
     is <literal>false</literal>.

Since the document for pg_locks uses "null" instead of <literal>NULL</literal> (I'm not sure why, though), I used "null" for the sake of consistency.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/02/03 1:49, Fujii Masao wrote:

>
>
> On 2021/02/02 22:00, torikoshia wrote:
>> On 2021-01-25 23:44, Fujii Masao wrote:
>>> Another comment is; Doesn't the change of MyProc->waitStart need the
>>> lock table's partition lock? If yes, we can do that by moving
>>> LWLockRelease(partitionLock) just after the change of
>>> MyProc->waitStart, but which causes the time that lwlock is being held
>>> to be long. So maybe we need another way to do that.
>>
>> Thanks for your comments!
>>
>> It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock.
>> However, as you pointed out, it would give non-negligible performance impacts.
>>
>> I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false.
>>
>> I think people want to know the start time of the lock when locks are held for a long time.
>> In that case, "waitstart" should have already been recorded.
>
> Sounds reasonable.
>
>
>> If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false.
>>
>> Attached a patch describing this in the doc and comments.
>>
>>
>> Any Thoughts?
>
> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
On 2021-02-03 11:23, Fujii Masao wrote:
>> 64-bit fetches are not atomic on some platforms. So spinlock is
>> necessary when updating "waitStart" without holding the partition
>> lock? Also GetLockStatusData() needs spinlock when reading
>> "waitStart"?
>
> Also it might be worth thinking to use 64-bit atomic operations like
> pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and
pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx
and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong,
I would appreciate any comments.


About the documentation, since your suggestion seems better than v6, I
used it as is.


Regards,

--
Atsushi Torikoshi

v7-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patch (16K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/02/05 0:03, torikoshia wrote:

> On 2021-02-03 11:23, Fujii Masao wrote:
>>> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?
>>
>> Also it might be worth thinking to use 64-bit atomic operations like
>> pg_atomic_read_u64(), for that.
>
> Thanks for your suggestion and advice!
>
> In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().
>
> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.
>
> I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.
>
>
> About the documentation, since your suggestion seems better than v6, I used it as is.
Thanks for updating the patch!

+ if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+ pg_atomic_write_u64(&MyProc->waitStart,
+ pg_atomic_read_u64((pg_atomic_uint64 *) &now));

pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.

+ deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+ pg_atomic_write_u64(&MyProc->waitStart,
+ pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));

Same as above.

+ /*
+ * Record waitStart reusing the deadlock timeout timer.
+ *
+ * It would be ideal this can be synchronously done with updating
+ * lock information. Howerver, since it gives performance impacts
+ * to hold partitionLock longer time, we do it here asynchronously.
+ */

IMO it's better to comment why we reuse the deadlock timeout timer.

  proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

v8.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

torikoshia
On 2021-02-05 18:49, Fujii Masao wrote:

> On 2021/02/05 0:03, torikoshia wrote:
>> On 2021-02-03 11:23, Fujii Masao wrote:
>>>> 64-bit fetches are not atomic on some platforms. So spinlock is
>>>> necessary when updating "waitStart" without holding the partition
>>>> lock? Also GetLockStatusData() needs spinlock when reading
>>>> "waitStart"?
>>>
>>> Also it might be worth thinking to use 64-bit atomic operations like
>>> pg_atomic_read_u64(), for that.
>>
>> Thanks for your suggestion and advice!
>>
>> In the attached patch I used pg_atomic_read_u64() and
>> pg_atomic_write_u64().
>>
>> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx
>> and pg_atomic_write_xxx only supports unsigned int, so I cast the
>> type.
>>
>> I may be using these functions not correctly, so if something is
>> wrong, I would appreciate any comments.
>>
>>
>> About the documentation, since your suggestion seems better than v6, I
>> used it as is.
>
> Thanks for updating the patch!
>
> + if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
> + pg_atomic_write_u64(&MyProc->waitStart,
> + pg_atomic_read_u64((pg_atomic_uint64 *) &now));
>
> pg_atomic_read_u64() is really necessary? I think that
> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
>
> + deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
> + pg_atomic_write_u64(&MyProc->waitStart,
> + pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
>
> Same as above.
>
> + /*
> + * Record waitStart reusing the deadlock timeout timer.
> + *
> + * It would be ideal this can be synchronously done with updating
> + * lock information. Howerver, since it gives performance impacts
> + * to hold partitionLock longer time, we do it here asynchronously.
> + */
>
> IMO it's better to comment why we reuse the deadlock timeout timer.
>
>   proc->waitStatus = waitStatus;
> + pg_atomic_init_u64(&MyProc->waitStart, 0);
>
> pg_atomic_write_u64() should be used instead? Because waitStart can be
> accessed concurrently there.
>
> I updated the patch and addressed the above review comments. Patch
> attached.
> Barring any objection, I will commit this version.

Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).


Regards,

--
Atsushi Torikoshi


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/02/09 17:48, torikoshia wrote:

> On 2021-02-05 18:49, Fujii Masao wrote:
>> On 2021/02/05 0:03, torikoshia wrote:
>>> On 2021-02-03 11:23, Fujii Masao wrote:
>>>>> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?
>>>>
>>>> Also it might be worth thinking to use 64-bit atomic operations like
>>>> pg_atomic_read_u64(), for that.
>>>
>>> Thanks for your suggestion and advice!
>>>
>>> In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().
>>>
>>> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.
>>>
>>> I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.
>>>
>>>
>>> About the documentation, since your suggestion seems better than v6, I used it as is.
>>
>> Thanks for updating the patch!
>>
>> +    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
>> +        pg_atomic_write_u64(&MyProc->waitStart,
>> +                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
>>
>> pg_atomic_read_u64() is really necessary? I think that
>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
>>
>> +        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
>> +        pg_atomic_write_u64(&MyProc->waitStart,
>> +                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
>>
>> Same as above.
>>
>> +        /*
>> +         * Record waitStart reusing the deadlock timeout timer.
>> +         *
>> +         * It would be ideal this can be synchronously done with updating
>> +         * lock information. Howerver, since it gives performance impacts
>> +         * to hold partitionLock longer time, we do it here asynchronously.
>> +         */
>>
>> IMO it's better to comment why we reuse the deadlock timeout timer.
>>
>>      proc->waitStatus = waitStatus;
>> +    pg_atomic_init_u64(&MyProc->waitStart, 0);
>>
>> pg_atomic_write_u64() should be used instead? Because waitStart can be
>> accessed concurrently there.
>>
>> I updated the patch and addressed the above review comments. Patch attached.
>> Barring any objection, I will commit this version.
>
> Thanks for modifying the patch!
> I agree with your comments.
>
> BTW, I ran pgbench several times before and after applying
> this patch.
>
> The environment is virtual machine(CentOS 8), so this is
> just for reference, but there were no significant difference
> in latency or tps(both are below 1%).

Thanks for the test! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply | Threaded
Open this post in threaded view
|

Re: adding wait_start column to pg_locks

Fujii Masao-4


On 2021/02/09 18:13, Fujii Masao wrote:

>
>
> On 2021/02/09 17:48, torikoshia wrote:
>> On 2021-02-05 18:49, Fujii Masao wrote:
>>> On 2021/02/05 0:03, torikoshia wrote:
>>>> On 2021-02-03 11:23, Fujii Masao wrote:
>>>>>> 64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?
>>>>>
>>>>> Also it might be worth thinking to use 64-bit atomic operations like
>>>>> pg_atomic_read_u64(), for that.
>>>>
>>>> Thanks for your suggestion and advice!
>>>>
>>>> In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().
>>>>
>>>> waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.
>>>>
>>>> I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.
>>>>
>>>>
>>>> About the documentation, since your suggestion seems better than v6, I used it as is.
>>>
>>> Thanks for updating the patch!
>>>
>>> +    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
>>> +        pg_atomic_write_u64(&MyProc->waitStart,
>>> +                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
>>>
>>> pg_atomic_read_u64() is really necessary? I think that
>>> "pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
>>>
>>> +        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
>>> +        pg_atomic_write_u64(&MyProc->waitStart,
>>> +                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
>>>
>>> Same as above.
>>>
>>> +        /*
>>> +         * Record waitStart reusing the deadlock timeout timer.
>>> +         *
>>> +         * It would be ideal this can be synchronously done with updating
>>> +         * lock information. Howerver, since it gives performance impacts
>>> +         * to hold partitionLock longer time, we do it here asynchronously.
>>> +         */
>>>
>>> IMO it's better to comment why we reuse the deadlock timeout timer.
>>>
>>>      proc->waitStatus = waitStatus;
>>> +    pg_atomic_init_u64(&MyProc->waitStart, 0);
>>>
>>> pg_atomic_write_u64() should be used instead? Because waitStart can be
>>> accessed concurrently there.
>>>
>>> I updated the patch and addressed the above review comments. Patch attached.
>>> Barring any objection, I will commit this version.
>>
>> Thanks for modifying the patch!
>> I agree with your comments.
>>
>> BTW, I ran pgbench several times before and after applying
>> this patch.
>>
>> The environment is virtual machine(CentOS 8), so this is
>> just for reference, but there were no significant difference
>> in latency or tps(both are below 1%).
>
> Thanks for the test! I pushed the patch.

But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


12