[PATCH] Identify LWLocks in tracepoints

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH] Identify LWLocks in tracepoints

Craig Ringer-5
Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.

This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.

I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.

Despite that, the patches improve the information available for LWLock analysis significantly.

Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.

Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.

Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.

Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.




v1-0004-Comments-on-LWLock-tranches.patch (7K) Download Attachment
v1-0001-Remove-bogus-lwlock__acquire-tracepoint-from-LWLo.patch (1K) Download Attachment
v1-0002-Pass-the-target-LWLock-and-tranche-ID-to-LWLock-t.patch (9K) Download Attachment
v1-0003-Add-to-the-tracepoints-in-LWLock-routines.patch (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Masahiko Sawada
Hi Craig,

On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<[hidden email]> wrote:

>
> Hi all
>
> The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock analysis significantly.
>
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.
>
> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.
>
> Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.
>
> Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.
>

You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1].  If this was intentional, then you
need to take no action.  However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]. Thanks for your contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Craig Ringer-5

On Mon, 28 Dec 2020 at 20:09, Masahiko Sawada <[hidden email]> wrote:
Hi Craig,

On Sat, Dec 19, 2020 at 2:00 PM Craig Ringer
<[hidden email]> wrote:
>
> Hi all
>
> The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not necessarily valid to compare any two LWLock* pointers between two process address spaces. The locks could be in DSM segments, and those DSM segments could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of (tranche-id, lock-index) because there's no requirement that locks in a tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock analysis significantly.
>
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be fired from LWLockWaitForVar, despite that function never actually acquiring the lock.
>
> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes it possible to differentiate between individual locks within a tranche, and (so long as they aren't tranches in a DSM segment) compare locks between processes. That means you can do lock-order analysis etc, which was not previously especially feasible. Traces also don't have to do userspace reads for the tranche name all the time, so the trace can run with lower overhead.
>
> Patch 3 adds a single-path tracepoint for all lock acquires and releases, so you only have to probe the lwlock__acquired and lwlock__release events to see all acquires/releases, whether conditional or otherwise. It also adds start markers that can be used for timing wallclock duration of LWLock acquires/releases.
>
> Patch 4 adds some comments on LWLock tranches to try to address some points I found confusing and hard to understand when investigating this topic.
>

You sent in your patch to pgsql-hackers on Dec 19, but you did not
post it to the next CommitFest[1].  If this was intentional, then you
need to take no action.  However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself before 2021-01-01 AoE[2]. Thanks for your contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

Thanks.

CF entry created at https://commitfest.postgresql.org/32/2927/ . I don't think it's urgent and will have limited review time so I didn't try to wedge it into the current CF.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Craig Ringer-5
In reply to this post by Craig Ringer-5
On Sat, 19 Dec 2020 at 13:00, Craig Ringer <[hidden email]> wrote:
Hi all

The attached patch set follows on from the discussion in [1] "Add LWLock blocker(s) information" by adding the actual LWLock* and the numeric tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.


I've attached a systemtap script that makes use of the information exported by the enhanced LWLock tracepoints. It offers something akin to dynamic -DLWLOCK_STATS with automatic statistical aggregation and some selective LWLOCK_DEBUG output.

The script isn't super pretty. I didn't try to separate event-data collection from results output, and there's some duplication in places. But it gives you an idea what's possible when we report lock pointers and tranche IDs to tracepoints and add entry/exit tracepoints.

Key features:

* Collect statistical aggregates on lwlock hold and wait durations across all processes. Stats are grouped by lockmode (shared or exclusive) and by tranche name, as well as rollup stats across all tranches.
* Report lock wait and hold durations for each process when that process exits. Again, reported by mode and tranche name.
* For long lock waits, print the waiter pid and waiting pid, along with each process's backend type and application_name if known, the acquire mode, and the acquire function

The output is intended to be human-readable, but it'd be quite simple to convert it into raw tsv-style output suitable for ingestion into statistical postprocessing or graphing tools.

It should be fairly easy to break down the stats by acquire function if desired, so LWLockAcquire(), LWLockWaitForVar(), and LWLockAcquireOrWait() are reported separately. They're combined in the current output.

Capturing the current query string is pretty simple if needed, but I didn't think it was likely to be especially useful.

Sample output for a pg_regress run attached. Abridged version follows. Here the !!W!! lines are "waited a long time", the !!H!! lines are "held a long time". Then [pid]:MyBackendType tranche_name wait_time_us (wait_time) in wait_func (appliation_name) => [blocker_pid] (blocker_application_name) . If blocker pid wasn't identified it won't be reported - I know how to fix that and will do so soon.

!!W!! [ 93030]:3          BufferContent    12993 (0m0.012993s) in lwlock__acquire__start (pg_regress/text)
!!W!! [ 93036]:3            LockManager    14540 (0m0.014540s) in lwlock__acquire__start (pg_regress/float8) => [ 93045] (pg_regress/regproc)
!!W!! [ 93035]:3          BufferContent    12608 (0m0.012608s) in lwlock__acquire__start (pg_regress/float4) => [ 93034] (pg_regress/oid)
!!W!! [ 93036]:3            LockManager    10301 (0m0.010301s) in lwlock__acquire__start (pg_regress/float8)
!!W!! [ 93043]:3            LockManager    10356 (0m0.010356s) in lwlock__acquire__start (pg_regress/pg_lsn)
!!H!! [ 93033]:3          BufferContent    20579 (0m0.020579s) (pg_regress/int8)
!!W!! [ 93027]:3          BufferContent    10766 (0m0.010766s) in lwlock__acquire__start (pg_regress/char) => [ 93037] (pg_regress/bit)
!!W!! [ 93036]:3                 OidGen    12876 (0m0.012876s) in lwlock__acquire__start (pg_regress/float8)
...

Then the summary rollup at the end of the run. This can also be output periodically during the run. Abbreviated for highlights:

wait locks: all procs                        tranche mode    count        total      avg variance      min      max
  W LW_EXCLUSIVE                              (all)    E    54185     14062734      259  1850265        1    44177
  W LW_SHARED                                 (all)    S     3668      1116022      304  1527261        2    18642

held locks: all procs                        tranche mode    count        total      avg variance      min      max
  H LW_EXCLUSIVE                              (all)    E 10438060    153077259       14    37035        1   195043
  H LW_SHARED                                 (all)    S 14199902     65466934        4     5318        1    44030

all procs by tranche                        tranche mode    count        total      avg variance      min      max
  W tranche                                   (all)    S     3668      1116022      304  1527261        2    18642
  W tranche                                   (all)    E    54185     14062734      259  1850265        1    44177
  W tranche                               WALInsert    E     9839      2393229      243  1180294        2    14209
  W tranche                           BufferContent    E     3012      1726543      573  3869409        2    28186
  W tranche                           BufferContent    S     1664       657855      395  2185694        2    18642
  W tranche                            LockFastPath    E    28314      6327801      223  1278053        1    26133
  W tranche                            LockFastPath    S       87        59401      682  3703217       19     9454
  W tranche                             LockManager    E     7223      2764392      382  2514863        2    44177


Hope this is interesting to someone.

sample_output.txt (70K) Download Attachment
lwlock_hold_durations.stp (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Dmitry Dolgov
In reply to this post by Craig Ringer-5
> On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
>
> The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not
> necessarily valid to compare any two LWLock* pointers between two process
> address spaces. The locks could be in DSM segments, and those DSM segments
> could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of
> (tranche-id, lock-index) because there's no requirement that locks in a
> tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock
> analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

    /* Re-initialization of individual LWLocks is not permitted */
    Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
> it possible to differentiate between individual locks within a tranche, and
> (so long as they aren't tranches in a DSM segment) compare locks between
> processes. That means you can do lock-order analysis etc, which was not
> previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

> Traces also don't have to do userspace reads for the tranche name all
> the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Peter Eisentraut-7
In reply to this post by Craig Ringer-5
On 2020-12-19 06:00, Craig Ringer wrote:
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
> fired from LWLockWaitForVar, despite that function never actually
> acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced.  It looks like a mistake to me too, but maybe Heikki wants
to comment.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Craig Ringer-5
In reply to this post by Dmitry Dolgov
On Wed, 13 Jan 2021 at 19:19, Dmitry Dolgov <[hidden email]> wrote:
> On Sat, Dec 19, 2020 at 01:00:01PM +0800, Craig Ringer wrote:
>
> The attached patch set follows on from the discussion in [1] "Add LWLock
> blocker(s) information" by adding the actual LWLock* and the numeric
> tranche ID to each LWLock related TRACE_POSTGRESQL_foo tracepoint.
>
> This does not provide complete information on blockers, because it's not
> necessarily valid to compare any two LWLock* pointers between two process
> address spaces. The locks could be in DSM segments, and those DSM segments
> could be mapped at different addresses.
>
> I wasn't able to work out a sensible way to map a LWLock* to any sort of
> (tranche-id, lock-index) because there's no requirement that locks in a
> tranche be contiguous or known individually to the lmgr.
>
> Despite that, the patches improve the information available for LWLock
> analysis significantly.

Thanks for the patches, this could be indeed useful. I've looked through
and haven't noticed any issues with either the tracepoint extensions or
commentaries, except that I find it is not that clear how trance_id
indicates a re-initialization here?

    /* Re-initialization of individual LWLocks is not permitted */
    Assert(tranche_id >= NUM_INDIVIDUAL_LWLOCKS || !IsUnderPostmaster);

There should be no reason for anything to call LWLockInitialize(...) on an individual LWLock, since they are all initialized during postmaster startup.

Doing so must be a bug.

But that's a trivial change that can be done separately.
 
> Patch 2 adds the tranche id and lock pointer for each trace hit. This makes
> it possible to differentiate between individual locks within a tranche, and
> (so long as they aren't tranches in a DSM segment) compare locks between
> processes. That means you can do lock-order analysis etc, which was not
> previously especially feasible.

I'm curious in which kind of situations lock-order analysis could be
helpful?

If code-path 1 does

    LWLockAcquire(LockA, LW_EXCLUSIVE);
    ...
    LWLockAcquire(LockB, LW_EXCLUSIVE);

and code-path 2 does:

    LWLockAcquire(LockB, LW_EXCLUSIVE);
    ...
    LWLockAcquire(LockA, LW_EXCLUSIVE);
 
then they're subject to deadlock. But you might not actually hit that often in test workloads if the timing required for the deadlock to occur is tight and/or occurs on infrequent operations.

It's not always easy to reason about or prove things about lock order when they're potentially nested deep within many layers of other calls and callbacks. Obviously something we try to avoid with LWLocks, but not impossible.

If you trace a workload and derive all possible nestings of lock acquire order, you can then prove things about whether there are any possible ordering conflicts and where they might arise.

A PoC to do so is on my TODO.

> Traces also don't have to do userspace reads for the tranche name all
> the time, so the trace can run with lower overhead.

This one is also interesting. Just for me to clarify, wouldn't there be
a bit of overhead anyway (due to switching from kernel context to user
space when a tracepoint was hit) that will mask name read overhead? Or
are there any available numbers about it?

I don't have numbers on that. Whether it matters will depend way too much on how you're using the probe points and collecting/consuming the data anyway.

It's a bit unfortunate (IMO) that we make a function call for each tracepoint invocation to get the tranche names. Ideally I'd prefer to be able to omit the tranche names lookups for these probes entirely for something as hot as LWLocks. But it's a bit of a pain to look up the tranche names from an external trace tool, so instead I'm inclined to see if we can enable systemtap's semaphores and only compute the tranche name if the target probe is actually enabled. But that'd be separate to this patch and require a build change in how systemtap support is compiled and linked.

BTW, a user->kernel->user context switch only occurs when the trace tool's probes use kernel space - such as for perf based probes, or for systemtap's kernel-runtime probes. The same markers can be used by e.g. systemtap's "dyninst" runtime that runs entirely in userspace.


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Craig Ringer-5
In reply to this post by Peter Eisentraut-7
On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut <[hidden email]> wrote:
On 2020-12-19 06:00, Craig Ringer wrote:
> Patch 1 fixes a bogus tracepoint where an lwlock__acquire event would be
> fired from LWLockWaitForVar, despite that function never actually
> acquiring the lock.

This was added in 68a2e52bbaf when LWLockWaitForVar() was first
introduced.  It looks like a mistake to me too, but maybe Heikki wants
to comment.

I'm certain it's a copy/paste bug.
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH] Identify LWLocks in tracepoints

Peter Eisentraut-7
On 2021-01-14 09:39, Craig Ringer wrote:

> On Thu, 14 Jan 2021 at 15:56, Peter Eisentraut
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On 2020-12-19 06:00, Craig Ringer wrote:
>      > Patch 1 fixes a bogus tracepoint where an lwlock__acquire event
>     would be
>      > fired from LWLockWaitForVar, despite that function never actually
>      > acquiring the lock.
>
>     This was added in 68a2e52bbaf when LWLockWaitForVar() was first
>     introduced.  It looks like a mistake to me too, but maybe Heikki wants
>     to comment.
>
>
> I'm certain it's a copy/paste bug.

I have committed that patch.