Postgres stucks in deadlock detection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgres stucks in deadlock detection

konstantin knizhnik
Hi hackers,
Please notice that it is not a first of April joke;)

Several times we and our customers are suffered from the problem that
Postgres got stuck in deadlock detection.
One scenario is YCSB workload with Zipf's distribution when many clients
are trying to update the same record and compete for it's lock.
Another scenario is large number of clients performing inserts in the
same table. In this case the source of the problem is relation extension
lock.
In both cases number of connection is large enough: several hundreds.

So what's happening? Due to high contention backends will not be able to
obtains requested lock in the specified deadlock detection timeout (1
second by default).
Wait it interrupted by timeout and backend tries to perform deadlock
detection. CheckDeadLock sets exclusive lock on all partitions locks...
Avalanche of deadlock timeout expiration
in backends and there contention of exclusive partition locks cause
Postgres to got stucks.
Speed falls down almost to zero and it is not possible even to login to
Postgres.

It is well known fact that Postgres is not scaling well for such larger
number of connections and it is necessary to use pgbouncer or some other
connection pooler
to limit number of backends. But modern systems has hundreds of CPU
cores. And to utilize all this resources we need to have hundreds of
active backaneds.
So this is not an artificial problem, but real show stopper, which takes
place on real workloads.

There are several ways to solve this problem.
First is trivial: increase deadlock detection timeout. In case of YCSB
it helps. But in case of many concurrent inserts, some backends are
waiting for lock for several minutes.
So there is no any realistic value of deadlock detection timeout which
can completely solve the problem.
Also significant increasing of deadlock detection timeout may case
blocking applications for unacceptable amount of time in case of real
deadlock occurrence.

There is a patch in commitfest proposed by Yury Sokolov:
https://commitfest.postgresql.org/18/1602/
He make deadlock check in two phases: first under shared lock and second
under exclusive lock.

I am proposing much simpler patch (attached) which uses atomic flag to
prevent concurrent deadlock detection by more than one backend.
The obvious drawback of such solution is that detection of unrelated
deadlock loops may take larger amount of time. But deadlock is abnormal
situation in any case and I do not know applications which consider
deadlocks as normal behavior. Also I didn't see in my life situations
when more than one independent deadlocks are happen at the same time
(but obviously it is possible).

So, I see three possible ways to fix this problem:
1. Yury Sololov's patch with two phase deadlock check
2. Avoid concurrent deadlock detection
3. Avoid concurrent deadlock detection + let CheckDeadLock detect all
deadlocks, not only those in which current transaction is involved.

I want to know opinion of community concerning this approaches (or may
we there are some other solutions).

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


deadlock.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Teodor Sigaev
> So, I see three possible ways to fix this problem:
> 1. Yury Sololov's patch with two phase deadlock check
I could be wrong, but LWLocks in pgsql aren't a "fair" locks: if LWLock is share
locked then exclusive lock will wait, but next shared lock will be acquired
immediately. If so, the we have a risk when two deadlocked processes will
infinitely wait for excluse lock over partitions while other processes will
repeady lock for share to find its owned deadlocks. I.e. lock graph has one loop
and members of that loop could not acquire exclusive lock becouse non-members of
loop constantly holds a share lock. It has a low probability, but nevertheless.


> 2. Avoid concurrent deadlock detection
Seems simple, workable solution with unpredicable delay for deadlock check.
Although it helps very good in cases when there isn't deadlock.


> 3. Avoid concurrent deadlock detection + let CheckDeadLock detect all deadlocks,
> not only those in which current transaction is involved.
I think, it's better solution, but I'm afraid it's too late for 11v...

--
Teodor Sigaev                                   E-mail: [hidden email]
                                                    WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Andres Freund
In reply to this post by konstantin knizhnik
Hi,

On 2018-04-04 11:54:14 +0300, Konstantin Knizhnik wrote:
> Several times we and our customers are suffered from the problem that
> Postgres got stuck in deadlock detection.
> One scenario is YCSB workload with Zipf's distribution when many clients are
> trying to update the same record and compete for it's lock.

> Another scenario is large number of clients performing inserts in the same
> table. In this case the source of the problem is relation extension lock.
> In both cases number of connection is large enough: several hundreds.

Have you ever observed that in the field? This sounds more artificial
than real to me.


> So, I see three possible ways to fix this problem:
> 1. Yury Sololov's patch with two phase deadlock check
> 2. Avoid concurrent deadlock detection
> 3. Avoid concurrent deadlock detection + let CheckDeadLock detect all
> deadlocks, not only those in which current transaction is involved.

4) Don't do anything about deadlock detection, because this is just a
   symptom five steps removed from the cause.

We've to pay attention to fixing actual problems, rather than purely
benchmark ones. Complexity has a significant price.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Teodor Sigaev
> Have you ever observed that in the field? This sounds more artificial
> than real to me.

Zabbix storage with 50Gb WAL per hour on insertion.

--
Teodor Sigaev                      E-mail: [hidden email]
                                       WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Andres Freund
On 2018-04-04 21:34:27 +0300, Teodor Sigaev wrote:
> > Have you ever observed that in the field? This sounds more artificial
> > than real to me.
>
> Zabbix storage with 50Gb WAL per hour on insertion.

That's not a very detailed description. Isn't that a largely insertion
only workload?

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Teodor Sigaev
>>> Have you ever observed that in the field? This sounds more artificial
>>> than real to me.
>>
>> Zabbix storage with 50Gb WAL per hour on insertion.
>
> That's not a very detailed description. Isn't that a largely insertion
> only workload?

It's mostly insert load, collecting monitoring information. Also there
are some read-only transactions, ~10 per second.

We didn't invent this from mind, client came to us with support request,
now we have two different cases.

--
Teodor Sigaev                      E-mail: [hidden email]
                                       WWW: http://www.sigaev.ru/

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Andres Freund
Hi,

On 2018-04-04 21:55:10 +0300, Teodor Sigaev wrote:

> > > > Have you ever observed that in the field? This sounds more artificial
> > > > than real to me.
> > >
> > > Zabbix storage with 50Gb WAL per hour on insertion.
> >
> > That's not a very detailed description. Isn't that a largely insertion
> > only workload?
>
> It's mostly insert load, collecting monitoring information. Also there are
> some read-only transactions, ~10 per second.

How are you running into deadlock detector issues then? Where do the
significant number of heavyweight lock weights come from?


> We didn't invent this from mind, client came to us with support request, now
> we have two different cases.

You gotta actually start describing those then.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

konstantin knizhnik
In reply to this post by Andres Freund
Hi,
Thank for your feedback.

On 04.04.2018 21:15, Andres Freund wrote:
Hi,

On 2018-04-04 11:54:14 +0300, Konstantin Knizhnik wrote:
Several times we and our customers are suffered from the problem that
Postgres got stuck in deadlock detection.
One scenario is YCSB workload with Zipf's distribution when many clients are
trying to update the same record and compete for it's lock.

      
Another scenario is large number of clients performing inserts in the same
table. In this case the source of the problem is relation extension lock.
In both cases number of connection is large enough: several hundreds.
Have you ever observed that in the field? This sounds more artificial
than real to me.

Yes, it was real customer's problem. And it  is not so difficult to reproduce it: you just need to have system with larger number of cores (in our case 72 physical cores)
and spawn 1000 clients which performs normal inserts to the same table:


cat test.sql
begin;
insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
end;

pgbench -U postgres test -c 1000 -j 36 -T 3600 -P 1 -f test.sql


With deadlock detection timeout equal to 100msec it happens immediately.
With default 1 second deadlock timeout it will take about 12 minutes after which the following messages can be found in the log:

2018-03-23 20:25:27.287 MSK [41469] STATEMENT: insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.287 MSK [41707] LOG: process 41707 acquired ExclusiveLock on extension of relation 16387 of database 16384 after 61294.863 ms
2018-03-23 20:25:27.287 MSK [41707] STATEMENT: insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.303 MSK [42251] LOG: process 42251 still waiting for ExclusiveLock on extension of relation 16387 of database 16384 after 1121.274 ms
2018-03-23 20:25:27.303 MSK [42251] DETAIL: Process holding the lock: 41707. Wait queue: 41577, 41880, 41482, 41649, 42275, 42237, 42117, 41430, 42121, 42091, 41507, 42265, 41738, 42018, 41490, 42048, 42269, 42108, 41843, 42213, 42134, 42341, 42083, 41763, 41536, 41980, 41793, 41878, 42065, 42152, 42193, 42022, 42103, 42354, 41849, 42120, 41520, 41593, 42020, 42123, 42012, 42235, 42242, 41982, 41661, 41734, 42385, 41756, 41733, 41415, 41874, 42161, 41749, 41431, 42175, 42100, 42222, 41435, 41762, 42352, 41840, 41879, 42348, 41445, 42088, 42187, 41992, 42327, 42258, 41586, 42034, 41851, 41440, 42192, 41726, 41471, 42185, 41607, 41844, 42016, 41868, 41397, 41932, 42343, 41545, 41690, 41549, 41684, 42304, 42105, 41551, 41580, 41728, 42172, 41898, 41560, 41411, 41657, 41444, 42170, 41481, 42241, 42263, 41884, 42014, 41608, 42289, 42191, 42067, 42011, 41959, 41681, 41768, 42194, 42090, 41527, 41655, 41638, 41458, 41552, 41446, 41403, 41666, 42021, 41614, 41542, 41588, 41937, 42008, 42280, 42071, 41390, 41483...

 

At this moment TPS drops to zero and it is even not possible to login to Postgres.
Increasing deadlock timeout to one minute still doesn't help:

2018-04-02 17:16:59 MSK [644699] [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.48 LOG: process 644699 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 585248.495 ms
2018-04-02 17:16:59 MSK [629624]: [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.64 LOG: process 629624 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 585132.924 ms
2018-04-02 17:16:59 MSK [646297]: [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.57 LOG: process 646297 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 584982.438 ms
2018-04-02 17:16:59 MSK [629264]: [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.66 LOG: process 629264 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 584959.859 ms


And this is the backtrace of backend which blocks the systems:

#0  0x00007fde34a73576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007fde34a73668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000686b02 in PGSemaphoreLock (sema=0x7fcdc7748618) at pg_sema.c:310
#3  0x00000000006ec0d4 in LWLockAcquire (lock=0x7fcdc774ee80, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1233
#4  0x00000000006e99c5 in CheckDeadLock () at proc.c:1671
#5  ProcSleep (locallock=locallock@entry=0x118a158, lockMethodTable=lockMethodTable@entry=0x9538a0 <default_lockmethod>) at proc.c:1261
#6  0x00000000006e49ef in WaitOnLock (locallock=locallock@entry=0x118a158, owner=owner@entry=0x12490f0) at lock.c:1702
#7  0x00000000006e5c0b in LockAcquireExtended (locktag=locktag@entry=0x7ffc5b5f7d10, lockmode=lockmode@entry=7, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000',
    reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998
#8  0x00000000006e6221 in LockAcquire (locktag=locktag@entry=0x7ffc5b5f7d10, lockmode=lockmode@entry=7, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:688
#9  0x00000000006e3d7c in LockRelationForExtension (relation=relation@entry=0x7fcdc73ecb68, lockmode=lockmode@entry=7) at lmgr.c:340
#10 0x00000000004b6ce4 in RelationGetBufferForTuple (relation=relation@entry=0x7fcdc73ecb68, len=72, otherBuffer=otherBuffer@entry=0, options=options@entry=0, bistate=bistate@entry=0x0,
    vmbuffer=vmbuffer@entry=0x7ffc5b5f7dc0, vmbuffer_other=0x0) at hio.c:526
#11 0x00000000004b04db in heap_insert (relation=relation@entry=0x7fcdc73ecb68, tup=tup@entry=0x12768f8, cid=<optimized out>, options=options@entry=0, bistate=bistate@entry=0x0) at heapam.c:2418
#12 0x00000000005fce33 in ExecInsert (canSetTag=1 '\001', estate=0x1263148, onconflict=<optimized out>, arbiterIndexes=0x0, planSlot=0x12638c8, slot=0x12638c8, mtstate=0x1263498)
    at nodeModifyTable.c:513
#13 ExecModifyTable (pstate=0x1263498) at nodeModifyTable.c:1781
#14 0x00000000005e0a5a in ExecProcNode (node=0x1263498) at ../../../src/include/executor/executor.h:250
#15 ExecutePlan (execute_once=<optimized out>, dest=0x1260e18, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_INSERT, use_parallel_mode=<optimized out>,
    planstate=0x1263498, estate=0x1263148) at execMain.c:1722
#16 standard_ExecutorRun (queryDesc=0x1291628, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#17 0x00000000006fc972 in ProcessQuery (plan=<optimized out>, sourceText=0x1236598 "insert into test select i, md5(i::text) from generate_series(1,1000) AS i;", params=0x0, queryEnv=0x0,
    dest=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:161
#18 0x00000000006fcbb0 in PortalRunMulti (portal=portal@entry=0x1183888, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x1260e18,
    altdest=altdest@entry=0x1260e18, completionTag=completionTag@entry=0x7ffc5b5f81d0 "") at pquery.c:1286
#19 0x00000000006fd651 in PortalRun (portal=portal@entry=0x1183888, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001',
    dest=dest@entry=0x1260e18, altdest=altdest@entry=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:799
#20 0x00000000006f974f in exec_simple_query (query_string=0x1236598 "insert into test select i, md5(i::text) from generate_series(1,1000) AS i;") at postgres.c:1099
#21 0x00000000006fb594 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x11920e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088
#22 0x0000000000479887 in BackendRun (port=0x11862b0) at postmaster.c:4405
#23 BackendStartup (port=0x11862b0) at postmaster.c:4077
#24 ServerLoop () at postmaster.c:1755
#25 0x0000000000694fec in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1165250) at postmaster.c:1363
#26 0x000000000047aca9 in main (argc=3, argv=0x1165250) at main.c:228




So, I see three possible ways to fix this problem:
1. Yury Sololov's patch with two phase deadlock check
2. Avoid concurrent deadlock detection
3. Avoid concurrent deadlock detection + let CheckDeadLock detect all
deadlocks, not only those in which current transaction is involved.
4) Don't do anything about deadlock detection, because this is just a
   symptom five steps removed from the cause.

We've to pay attention to fixing actual problems, rather than purely
benchmark ones. Complexity has a significant price.

Sorry, I once again has to repeat that it was real customer's problem and the bgbench scenario described above is just result of our attempts to reproduce and investigate the problem.
Moreover - this is not the first we have faced with this problem with postgres stalled because of cascade of deadlock detection timeouts expiration.

I do not think that deadlock detection is just "a symptom".
It has quite simple explanation: deadlock detection tries to exclusively obtain all partition locks.
So if under high load deadlock timeout is expired and deadlock detection is launched, then it cause avalanche of lock conflicts which cause expiration of deadlock timeouts in other backends,
which also try to perform deadlock detection and compete for this locks... And finally system is stalled.

The proposed patch fixes the problem: there are still some periodic fall downs of performance but it never drops to zero and it always possible to login to the system.
So I can not agree with you that deadlock detection is "just a symptom five steps removed from the cause".

Certainly, it is possible to find some other solution of the problem, for example do not use heavy weight locks for relation extension.
There is huge discussion about it and patch at commitfest with no recent activity and obscure perspectives:

https://www.postgresql.org/message-id/flat/CA%2BTgmobuP%2BQoJgaj2HWz5VoDgVA0KP7UGtTREY%2BPeghbPtWZLg%40mail.gmail.com#CA+TgmobuP+QoJgaj2HWz5VoDgVA0KP7UGtTREY+PeghbPtWZLg@...

But once again please notice that the same problem takes place not only with relation extension locks.
Please read description of the problem in the Yury Sokolov's patch:

https://commitfest.postgresql.org/18/1602/

In this case there are only updates and tuple locks but still the source of the problem is deadlock detection and simplest workaround is just to increase deadlock timeout.

If you want, I can provide you access at our system where all necessary infrastructure is already install and you can easily reproduce the problem and investigate it yourself.
But if something looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck;)
It seems to be obvious the obtaining all partition locks in CheckDeadLock:

    /*
     * Acquire exclusive lock on the entire shared lock data structures. Must
     * grab LWLocks in partition-number order to avoid LWLock deadlock.
     *
     * Note that the deadlock check interrupt had better not be enabled
     * anywhere that this process itself holds lock partition locks, else this
     * will wait forever.  Also note that LWLockAcquire creates a critical
     * section, so that this routine cannot be interrupted by cancel/die
     * interrupts.
     */
    for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
        LWLockAcquire(LockHashPartitionLockByIndex(i), LW_EXCLUSIVE);

can cause huge contention with other backends in which in turn can cause expiration of deadlock timeout in them and escalation of the problem.

From the other side, the fact that any of this

a) increasing deadlock timeout
b) avoid concurrent execution of deadlock check
c) perform first stage of deadlock check under shared lock

have very noticeable effect on the observed behavior and some of them allow to completely eliminate the problem,
makes me think that deadlock detection is not a symptom but the main source of the problem.

And the last note. I know that 1000 connections is too large for Postgres. But:
1. In customer's case the problem is reproduced even with 100 connections.
2. At modern SMPs systems with hundreds of cores running hundreds of backends is not something exotic/stupid.

And here we are faced with the problem which Postgres has never has before:
1. Too large snapshot and long time needed for its calculation
2. High contention for procarray locks
3. Very inefficient memory usage because of local caches and prepared statements.
...

At such systems we are faced with many new bottlenecks and we really need to address them.
I hope that nobody think that Postgres should normally work only at notebooks and quad core desktops?
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

konstantin knizhnik
Updated patch is attached.

On 05.04.2018 11:03, Konstantin Knizhnik wrote:
Hi,
Thank for your feedback.

On 04.04.2018 21:15, Andres Freund wrote:
Hi,

On 2018-04-04 11:54:14 +0300, Konstantin Knizhnik wrote:
Several times we and our customers are suffered from the problem that
Postgres got stuck in deadlock detection.
One scenario is YCSB workload with Zipf's distribution when many clients are
trying to update the same record and compete for it's lock.
Another scenario is large number of clients performing inserts in the same
table. In this case the source of the problem is relation extension lock.
In both cases number of connection is large enough: several hundreds.
Have you ever observed that in the field? This sounds more artificial
than real to me.

Yes, it was real customer's problem. And it  is not so difficult to reproduce it: you just need to have system with larger number of cores (in our case 72 physical cores)
and spawn 1000 clients which performs normal inserts to the same table:


cat test.sql
begin;
insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
end;

pgbench -U postgres test -c 1000 -j 36 -T 3600 -P 1 -f test.sql


With deadlock detection timeout equal to 100msec it happens immediately.
With default 1 second deadlock timeout it will take about 12 minutes after which the following messages can be found in the log:

2018-03-23 20:25:27.287 MSK [41469] STATEMENT: insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.287 MSK [41707] LOG: process 41707 acquired ExclusiveLock on extension of relation 16387 of database 16384 after 61294.863 ms
2018-03-23 20:25:27.287 MSK [41707] STATEMENT: insert into test select i, md5(i::text) from generate_series(1,1000) AS i;
2018-03-23 20:25:27.303 MSK [42251] LOG: process 42251 still waiting for ExclusiveLock on extension of relation 16387 of database 16384 after 1121.274 ms
2018-03-23 20:25:27.303 MSK [42251] DETAIL: Process holding the lock: 41707. Wait queue: 41577, 41880, 41482, 41649, 42275, 42237, 42117, 41430, 42121, 42091, 41507, 42265, 41738, 42018, 41490, 42048, 42269, 42108, 41843, 42213, 42134, 42341, 42083, 41763, 41536, 41980, 41793, 41878, 42065, 42152, 42193, 42022, 42103, 42354, 41849, 42120, 41520, 41593, 42020, 42123, 42012, 42235, 42242, 41982, 41661, 41734, 42385, 41756, 41733, 41415, 41874, 42161, 41749, 41431, 42175, 42100, 42222, 41435, 41762, 42352, 41840, 41879, 42348, 41445, 42088, 42187, 41992, 42327, 42258, 41586, 42034, 41851, 41440, 42192, 41726, 41471, 42185, 41607, 41844, 42016, 41868, 41397, 41932, 42343, 41545, 41690, 41549, 41684, 42304, 42105, 41551, 41580, 41728, 42172, 41898, 41560, 41411, 41657, 41444, 42170, 41481, 42241, 42263, 41884, 42014, 41608, 42289, 42191, 42067, 42011, 41959, 41681, 41768, 42194, 42090, 41527, 41655, 41638, 41458, 41552, 41446, 41403, 41666, 42021, 41614, 41542, 41588, 41937, 42008, 42280, 42071, 41390, 41483...

 

At this moment TPS drops to zero and it is even not possible to login to Postgres.
Increasing deadlock timeout to one minute still doesn't help:

2018-04-02 17:16:59 MSK [644699] [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.48 LOG: process 644699 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 585248.495 ms
2018-04-02 17:16:59 MSK [629624]: [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.64 LOG: process 629624 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 585132.924 ms
2018-04-02 17:16:59 MSK [646297]: [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.57 LOG: process 646297 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 584982.438 ms
2018-04-02 17:16:59 MSK [629264]: [6-1] db=bp,user=bp,app=[unknown],client=172.29.224.66 LOG: process 629264 acquired ExclusiveLock on extension of relation 23554 of database 16385 after 584959.859 ms


And this is the backtrace of backend which blocks the systems:

#0  0x00007fde34a73576 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1  0x00007fde34a73668 in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0
#2  0x0000000000686b02 in PGSemaphoreLock (sema=0x7fcdc7748618) at pg_sema.c:310
#3  0x00000000006ec0d4 in LWLockAcquire (lock=0x7fcdc774ee80, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1233
#4  0x00000000006e99c5 in CheckDeadLock () at proc.c:1671
#5  ProcSleep (locallock=locallock@entry=0x118a158, lockMethodTable=lockMethodTable@entry=0x9538a0 <default_lockmethod>) at proc.c:1261
#6  0x00000000006e49ef in WaitOnLock (locallock=locallock@entry=0x118a158, owner=owner@entry=0x12490f0) at lock.c:1702
#7  0x00000000006e5c0b in LockAcquireExtended (locktag=locktag@entry=0x7ffc5b5f7d10, lockmode=lockmode@entry=7, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000',
    reportMemoryError=reportMemoryError@entry=1 '\001') at lock.c:998
#8  0x00000000006e6221 in LockAcquire (locktag=locktag@entry=0x7ffc5b5f7d10, lockmode=lockmode@entry=7, sessionLock=sessionLock@entry=0 '\000', dontWait=dontWait@entry=0 '\000') at lock.c:688
#9  0x00000000006e3d7c in LockRelationForExtension (relation=relation@entry=0x7fcdc73ecb68, lockmode=lockmode@entry=7) at lmgr.c:340
#10 0x00000000004b6ce4 in RelationGetBufferForTuple (relation=relation@entry=0x7fcdc73ecb68, len=72, otherBuffer=otherBuffer@entry=0, options=options@entry=0, bistate=bistate@entry=0x0,
    vmbuffer=vmbuffer@entry=0x7ffc5b5f7dc0, vmbuffer_other=0x0) at hio.c:526
#11 0x00000000004b04db in heap_insert (relation=relation@entry=0x7fcdc73ecb68, tup=tup@entry=0x12768f8, cid=<optimized out>, options=options@entry=0, bistate=bistate@entry=0x0) at heapam.c:2418
#12 0x00000000005fce33 in ExecInsert (canSetTag=1 '\001', estate=0x1263148, onconflict=<optimized out>, arbiterIndexes=0x0, planSlot=0x12638c8, slot=0x12638c8, mtstate=0x1263498)
    at nodeModifyTable.c:513
#13 ExecModifyTable (pstate=0x1263498) at nodeModifyTable.c:1781
#14 0x00000000005e0a5a in ExecProcNode (node=0x1263498) at ../../../src/include/executor/executor.h:250
#15 ExecutePlan (execute_once=<optimized out>, dest=0x1260e18, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_INSERT, use_parallel_mode=<optimized out>,
    planstate=0x1263498, estate=0x1263148) at execMain.c:1722
#16 standard_ExecutorRun (queryDesc=0x1291628, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#17 0x00000000006fc972 in ProcessQuery (plan=<optimized out>, sourceText=0x1236598 "insert into test select i, md5(i::text) from generate_series(1,1000) AS i;", params=0x0, queryEnv=0x0,
    dest=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:161
#18 0x00000000006fcbb0 in PortalRunMulti (portal=portal@entry=0x1183888, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x1260e18,
    altdest=altdest@entry=0x1260e18, completionTag=completionTag@entry=0x7ffc5b5f81d0 "") at pquery.c:1286
#19 0x00000000006fd651 in PortalRun (portal=portal@entry=0x1183888, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001',
    dest=dest@entry=0x1260e18, altdest=altdest@entry=0x1260e18, completionTag=0x7ffc5b5f81d0 "") at pquery.c:799
#20 0x00000000006f974f in exec_simple_query (query_string=0x1236598 "insert into test select i, md5(i::text) from generate_series(1,1000) AS i;") at postgres.c:1099
#21 0x00000000006fb594 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x11920e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4088
#22 0x0000000000479887 in BackendRun (port=0x11862b0) at postmaster.c:4405
#23 BackendStartup (port=0x11862b0) at postmaster.c:4077
#24 ServerLoop () at postmaster.c:1755
#25 0x0000000000694fec in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1165250) at postmaster.c:1363
#26 0x000000000047aca9 in main (argc=3, argv=0x1165250) at main.c:228



So, I see three possible ways to fix this problem:
1. Yury Sololov's patch with two phase deadlock check
2. Avoid concurrent deadlock detection
3. Avoid concurrent deadlock detection + let CheckDeadLock detect all
deadlocks, not only those in which current transaction is involved.
4) Don't do anything about deadlock detection, because this is just a
   symptom five steps removed from the cause.

We've to pay attention to fixing actual problems, rather than purely
benchmark ones. Complexity has a significant price.

Sorry, I once again has to repeat that it was real customer's problem and the bgbench scenario described above is just result of our attempts to reproduce and investigate the problem.
Moreover - this is not the first we have faced with this problem with postgres stalled because of cascade of deadlock detection timeouts expiration.

I do not think that deadlock detection is just "a symptom".
It has quite simple explanation: deadlock detection tries to exclusively obtain all partition locks.
So if under high load deadlock timeout is expired and deadlock detection is launched, then it cause avalanche of lock conflicts which cause expiration of deadlock timeouts in other backends,
which also try to perform deadlock detection and compete for this locks... And finally system is stalled.

The proposed patch fixes the problem: there are still some periodic fall downs of performance but it never drops to zero and it always possible to login to the system.
So I can not agree with you that deadlock detection is "just a symptom five steps removed from the cause".

Certainly, it is possible to find some other solution of the problem, for example do not use heavy weight locks for relation extension.
There is huge discussion about it and patch at commitfest with no recent activity and obscure perspectives:

https://www.postgresql.org/message-id/flat/CA%2BTgmobuP%2BQoJgaj2HWz5VoDgVA0KP7UGtTREY%2BPeghbPtWZLg%40mail.gmail.com#CA+TgmobuP+QoJgaj2HWz5VoDgVA0KP7UGtTREY+PeghbPtWZLg@...

But once again please notice that the same problem takes place not only with relation extension locks.
Please read description of the problem in the Yury Sokolov's patch:

https://commitfest.postgresql.org/18/1602/

In this case there are only updates and tuple locks but still the source of the problem is deadlock detection and simplest workaround is just to increase deadlock timeout.

If you want, I can provide you access at our system where all necessary infrastructure is already install and you can easily reproduce the problem and investigate it yourself.
But if something looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck;)
It seems to be obvious the obtaining all partition locks in CheckDeadLock:

    /*
     * Acquire exclusive lock on the entire shared lock data structures. Must
     * grab LWLocks in partition-number order to avoid LWLock deadlock.
     *
     * Note that the deadlock check interrupt had better not be enabled
     * anywhere that this process itself holds lock partition locks, else this
     * will wait forever.  Also note that LWLockAcquire creates a critical
     * section, so that this routine cannot be interrupted by cancel/die
     * interrupts.
     */
    for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
        LWLockAcquire(LockHashPartitionLockByIndex(i), LW_EXCLUSIVE);

can cause huge contention with other backends in which in turn can cause expiration of deadlock timeout in them and escalation of the problem.

From the other side, the fact that any of this

a) increasing deadlock timeout
b) avoid concurrent execution of deadlock check
c) perform first stage of deadlock check under shared lock

have very noticeable effect on the observed behavior and some of them allow to completely eliminate the problem,
makes me think that deadlock detection is not a symptom but the main source of the problem.

And the last note. I know that 1000 connections is too large for Postgres. But:
1. In customer's case the problem is reproduced even with 100 connections.
2. At modern SMPs systems with hundreds of cores running hundreds of backends is not something exotic/stupid.

And here we are faced with the problem which Postgres has never has before:
1. Too large snapshot and long time needed for its calculation
2. High contention for procarray locks
3. Very inefficient memory usage because of local caches and prepared statements.
...

At such systems we are faced with many new bottlenecks and we really need to address them.
I hope that nobody think that Postgres should normally work only at notebooks and quad core desktops?
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

deadlock-2.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Evgeniy Shishkin
In reply to this post by konstantin knizhnik


> On Apr 5, 2018, at 11:03, Konstantin Knizhnik <[hidden email]> wrote:
>
> From the other side, the fact that any of this
>
> a) increasing deadlock timeout
> b) avoid concurrent execution of deadlock check
> c) perform first stage of deadlock check under shared lock
>
> have very noticeable effect on the observed behavior and some of them allow to completely eliminate the problem,
> makes me think that deadlock detection is not a symptom but the main source of the problem.

It seems to me that instead of running deadlock detection one-by-one you can use exponential backoff.
It should provide "good" degradation during high contention and does not make things bad on low contention.

Maybe you should combine it with Yura's patch for additional benefit and benchmark it on different contentions.
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Andres Freund
In reply to this post by konstantin knizhnik
Hi,

On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:

> Updated patch is attached.
> + /*
> + * Ensure that only one backend is checking for deadlock.
> + * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> + */
> + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> + {
> + enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> + return;
> + }
> + inside_deadlock_check = true;

I can't see that ever being accepted.  This means there's absolutely no
bound for deadlock checks happening even under light concurrency, even
if there's no contention for a large fraction of the time.

If you want to improve this, improve the efficiency of the
implementation, check multiple lockers at the same time (set a bit
afterwards that they don't recheck themselves). There's plenty
approaches that don't result in a significantly worse user experience.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

konstantin knizhnik


On 13.04.2018 18:41, Andres Freund wrote:

> Hi,
>
> On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
>> Updated patch is attached.
>> + /*
>> + * Ensure that only one backend is checking for deadlock.
>> + * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
>> + */
>> + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
>> + {
>> + enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
>> + return;
>> + }
>> + inside_deadlock_check = true;
> I can't see that ever being accepted.  This means there's absolutely no
> bound for deadlock checks happening even under light concurrency, even
> if there's no contention for a large fraction of the time.

It may cause problems only if
1. There is large number of active sessions
2. They perform deadlock-prone queries (so no attempts to avoid
deadlocks at application level)
3. Deadlock timeout is set to be very small (10 msec?)

Otherwise either probability that all backends  once and once again are
trying to check deadlocks concurrently is very small (and can be even
more reduced by using random timeout for subsequent deadlock checks),
either system can not normally function in any case because large number
of clients fall into deadlock.


>
> If you want to improve this, improve the efficiency of the
> implementation, check multiple lockers at the same time (set a bit
> afterwards that they don't recheck themselves). There's plenty
> approaches that don't result in a significantly worse user experience.

It seems to me that the best best solution will be to perform deadlock
by some dedicated process (deadlock checker) with specified frequency,
or do it in backends themselves but ensure that interval between
deadlock checks exceeds deadlock_timeout.
On both cases it is necessary to check for all deadlock loops, not only
deadlocks involving self transaction.
And definitely we should be able to abort "foreign" transaction, not
only transaction performed by this backend.
I do not know how difficult it will be to implement. I expect that it
will be not so easy, otherwise why deadlock detection was not
implemented in this way in Postgres from the very beginning...

I completely agree that there are plenty of different approaches, but
IMHO the currently used strategy is the worst one, because it can stall
system even if there are not deadlocks at all.
I always think that deadlock is a programmer's error rather than normal
situation. May be it is wrong assumption but I can not believe that some
system can normally work when deadlocks happen quite frequently. 
Consider classical example: transfer money from one account to another:

start transaction;
update accounts set balance=balance+? where aid=?;
update accounts set balance=balance-? where aid=?;
commit transactions;

With thousands accounts and random choice of account IDs concurrent
execution of this transaction by multiple clients (~100) cause deadlocks
in less than few percents of transactions.
My patch doesn't somehow affect performance and latency in this case.
Obviously that if number of accounts is changed to 10 and number of
clients to 1000, then deadlocks will happen permanently. But in this
case system will not be able to work normally either with my patch,
either without it.

So before implementing some complicated solution of the problem9too slow
deadlock detection), I think that first it is necessary to understand
whether there is such problem at al and under which workload it can happen.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Andres Freund
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:

> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > + * Ensure that only one backend is checking for deadlock.
> > > + * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > + */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > + enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > + return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Юрий Соколов
пт, 13 апр. 2018 г., 21:10 Andres Freund <[hidden email]>:
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > +  * Ensure that only one backend is checking for deadlock.
> > > +  * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > +  */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > +         enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +         return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

What about my version? 
It still performs deadlock detection every time, but it tries to detect it with shared lock first,
and only if there is probability of real deadlock, it rechecks with exclusive lock. 

Although even shared lock leads to some stalleness for active transactions, but in the catastrophic situation, where many backends to check for inexisting deadlock at the same time, it greately reduce pause. 

Regards, 
Yura. 
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

konstantin knizhnik


On 14.04.2018 10:09, Юрий Соколов wrote:
пт, 13 апр. 2018 г., 21:10 Andres Freund <[hidden email]>:
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > +  * Ensure that only one backend is checking for deadlock.
> > > +  * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > +  */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > +         enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +         return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

What about my version? 
It still performs deadlock detection every time, but it tries to detect it with shared lock first,
and only if there is probability of real deadlock, it rechecks with exclusive lock. 

Although even shared lock leads to some stalleness for active transactions, but in the catastrophic situation, where many backends to check for inexisting deadlock at the same time, it greately reduce pause.

Unfortunately your patch doesn't help with inserts.
In most cases Postgres obtains exclusive partition locks: any lock/unlock operation requires exclusive lock.
Shared locks of all partitions are used for collecting some information or to perform recovery.

Below results of three different versions:

1. Vanilla Postgres:

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 227.9 tps, lat 109.560 ms stddev 189.328
progress: 2.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 3.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 4.0 s, 53.0 tps, lat 1145.417 ms stddev 1607.484
progress: 5.0 s, 19.0 tps, lat 236.807 ms stddev 819.307
progress: 6.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 7.0 s, 2.0 tps, lat 6649.944 ms stddev 9.691
progress: 8.0 s, 60.0 tps, lat 2343.313 ms stddev 3335.967
progress: 9.0 s, 89.0 tps, lat 1813.495 ms stddev 3337.904
progress: 10.0 s, 99.1 tps, lat 2093.653 ms stddev 3758.468
progress: 11.0 s, 94.9 tps, lat 2424.560 ms stddev 4258.622
progress: 12.0 s, 79.0 tps, lat 2037.880 ms stddev 4152.258
progress: 13.0 s, 30.0 tps, lat 80.697 ms stddev 24.854
progress: 14.0 s, 2.0 tps, lat 71.642 ms stddev 0.022
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 16.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 17.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 18.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 19.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 21.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 22.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 23.0 s, 3.0 tps, lat 22396.463 ms stddev 1.964
progress: 24.0 s, 1.0 tps, lat 23667.927 ms stddev 0.001
progress: 25.0 s, 25.0 tps, lat 8862.603 ms stddev 11545.517
progress: 26.0 s, 27.0 tps, lat 6738.338 ms stddev 10984.215
progress: 27.0 s, 40.0 tps, lat 6656.893 ms stddev 11236.042
progress: 28.0 s, 49.0 tps, lat 10124.108 ms stddev 13042.391
progress: 29.0 s, 20.0 tps, lat 8638.022 ms stddev 12916.653
progress: 30.0 s, 45.0 tps, lat 9805.039 ms stddev 13624.253
progress: 31.0 s, 30.0 tps, lat 7151.608 ms stddev 12628.340
progress: 32.0 s, 48.0 tps, lat 9838.776 ms stddev 14316.187
progress: 33.0 s, 44.0 tps, lat 7479.878 ms stddev 13394.234
progress: 34.0 s, 55.0 tps, lat 8530.081 ms stddev 14327.334
progress: 35.0 s, 57.0 tps, lat 12054.371 ms stddev 15630.142
progress: 36.0 s, 17.0 tps, lat 8620.003 ms stddev 14896.671
progress: 37.0 s, 39.0 tps, lat 8488.141 ms stddev 15129.849
progress: 38.0 s, 47.0 tps, lat 12140.015 ms stddev 16098.595
progress: 39.0 s, 63.9 tps, lat 9525.523 ms stddev 15603.099
progress: 40.0 s, 64.0 tps, lat 11248.021 ms stddev 16118.116
progress: 41.0 s, 30.0 tps, lat 12375.117 ms stddev 17397.134
progress: 42.0 s, 30.0 tps, lat 11092.475 ms stddev 17997.073
progress: 43.0 s, 51.0 tps, lat 11488.794 ms stddev 18135.660
progress: 44.0 s, 35.0 tps, lat 11243.333 ms stddev 18802.389
progress: 45.0 s, 56.1 tps, lat 9892.959 ms stddev 18010.395
progress: 46.0 s, 29.0 tps, lat 17224.308 ms stddev 20820.138
progress: 47.0 s, 52.0 tps, lat 13475.267 ms stddev 20866.198
progress: 48.0 s, 52.0 tps, lat 8386.839 ms stddev 17733.223
progress: 49.0 s, 65.0 tps, lat 9060.374 ms stddev 18574.099
progress: 50.0 s, 29.0 tps, lat 10390.055 ms stddev 19808.192
progress: 51.0 s, 54.1 tps, lat 15433.057 ms stddev 21218.736
progress: 52.0 s, 39.0 tps, lat 18607.422 ms stddev 23413.561
progress: 53.0 s, 62.0 tps, lat 9898.151 ms stddev 18971.511
progress: 54.0 s, 34.0 tps, lat 9172.970 ms stddev 18655.790
progress: 55.0 s, 35.0 tps, lat 16339.829 ms stddev 24009.687
progress: 56.0 s, 49.0 tps, lat 11474.089 ms stddev 22152.509
progress: 57.0 s, 15.0 tps, lat 23273.790 ms stddev 23972.075
progress: 58.0 s, 45.0 tps, lat 17884.887 ms stddev 24246.962
progress: 59.0 s, 9.0 tps, lat 26061.653 ms stddev 28924.382
progress: 60.0 s, 34.0 tps, lat 17555.891 ms stddev 26873.251
progress: 61.0 s, 15.0 tps, lat 16637.255 ms stddev 22135.661
progress: 62.0 s, 20.0 tps, lat 21534.107 ms stddev 29106.973
progress: 63.0 s, 32.0 tps, lat 9828.245 ms stddev 22411.644
progress: 64.0 s, 61.1 tps, lat 17278.688 ms stddev 26523.489
progress: 65.0 s, 22.0 tps, lat 17707.084 ms stddev 28519.434
progress: 66.0 s, 29.0 tps, lat 14907.572 ms stddev 25857.086
progress: 67.0 s, 20.0 tps, lat 19984.031 ms stddev 30240.857
progress: 68.0 s, 37.0 tps, lat 13499.886 ms stddev 26002.883
progress: 69.0 s, 20.0 tps, lat 24032.741 ms stddev 32445.249
progress: 70.0 s, 38.0 tps, lat 23151.399 ms stddev 31408.612
progress: 71.0 s, 25.0 tps, lat 23165.598 ms stddev 32441.789
progress: 72.0 s, 40.0 tps, lat 10862.140 ms stddev 25275.667
progress: 73.0 s, 30.0 tps, lat 14560.809 ms stddev 28686.763
progress: 74.0 s, 14.0 tps, lat 26269.971 ms stddev 34970.753
progress: 75.0 s, 41.0 tps, lat 17504.487 ms stddev 30219.588
progress: 76.0 s, 49.0 tps, lat 15546.103 ms stddev 30200.665
progress: 77.0 s, 26.0 tps, lat 14385.324 ms stddev 26674.468
progress: 78.0 s, 42.0 tps, lat 22232.188 ms stddev 34742.547
progress: 79.0 s, 15.0 tps, lat 22361.853 ms stddev 33895.018
progress: 80.0 s, 45.0 tps, lat 17768.734 ms stddev 32813.238
progress: 81.0 s, 36.0 tps, lat 15820.789 ms stddev 31662.068
progress: 82.0 s, 31.0 tps, lat 16553.029 ms stddev 31800.897
progress: 83.0 s, 39.0 tps, lat 10799.064 ms stddev 27374.919
progress: 84.0 s, 28.0 tps, lat 27518.499 ms stddev 38362.262
progress: 85.0 s, 27.0 tps, lat 15802.221 ms stddev 32525.404
progress: 86.0 s, 40.0 tps, lat 20152.181 ms stddev 35171.042
progress: 87.0 s, 32.0 tps, lat 20598.004 ms stddev 34830.062
progress: 88.0 s, 30.0 tps, lat 14803.086 ms stddev 32350.232
progress: 89.0 s, 28.0 tps, lat 19647.986 ms stddev 35859.938
progress: 90.0 s, 40.0 tps, lat 18435.264 ms stddev 35364.559
progress: 91.0 s, 6.0 tps, lat 45248.291 ms stddev 45044.607
progress: 92.0 s, 32.0 tps, lat 29046.606 ms stddev 41959.102
progress: 93.0 s, 18.0 tps, lat 20746.328 ms stddev 38230.543
progress: 94.0 s, 49.0 tps, lat 21428.576 ms stddev 38599.069
progress: 95.0 s, 27.0 tps, lat 14428.229 ms stddev 33301.365
progress: 96.0 s, 38.0 tps, lat 18041.058 ms stddev 36628.789
progress: 97.0 s, 50.0 tps, lat 17589.414 ms stddev 36748.200
progress: 98.0 s, 47.0 tps, lat 16817.047 ms stddev 36381.922
progress: 99.0 s, 31.0 tps, lat 16137.795 ms stddev 35997.819
progress: 100.0 s, 32.0 tps, lat 21987.270 ms stddev 40859.388
progress: 101.0 s, 12.0 tps, lat 66793.331 ms stddev 47041.846
progress: 102.0 s, 161.0 tps, lat 84082.482 ms stddev 22621.806
progress: 103.0 s, 50.0 tps, lat 38104.113 ms stddev 21406.175
progress: 104.0 s, 40.0 tps, lat 37981.012 ms stddev 27412.163
progress: 105.0 s, 32.0 tps, lat 32448.045 ms stddev 27158.082
progress: 106.0 s, 39.0 tps, lat 25246.829 ms stddev 23138.786
progress: 107.0 s, 29.0 tps, lat 22691.016 ms stddev 23348.412
progress: 108.0 s, 32.0 tps, lat 20934.772 ms stddev 27771.861
progress: 109.0 s, 26.0 tps, lat 21292.394 ms stddev 31416.424
progress: 110.0 s, 24.0 tps, lat 99099.590 ms stddev 27154.536
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 4518
latency average = 23748.869 ms
latency stddev = 34606.615 ms
tps = 40.811158 (including connections establishing)
tps = 40.815387 (excluding connections establishing)

2. Yury's shared locks patch

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 161.9 tps, lat 58.956 ms stddev 65.879
progress: 2.0 s, 75.0 tps, lat 432.110 ms stddev 655.010
progress: 3.0 s, 111.0 tps, lat 312.127 ms stddev 669.351
progress: 4.0 s, 62.0 tps, lat 716.576 ms stddev 1262.699
progress: 5.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 6.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 7.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 8.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 9.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 10.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 11.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 16.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 17.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 18.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 19.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 20.0 s, 33.0 tps, lat 5910.665 ms stddev 8733.552
progress: 21.0 s, 20.0 tps, lat 4190.000 ms stddev 8012.222
progress: 22.0 s, 44.0 tps, lat 4452.898 ms stddev 8469.555
progress: 23.0 s, 17.0 tps, lat 7951.775 ms stddev 10550.763
progress: 24.0 s, 39.0 tps, lat 4894.502 ms stddev 9270.031
progress: 25.0 s, 48.0 tps, lat 4658.159 ms stddev 9365.012
progress: 26.0 s, 19.0 tps, lat 6740.618 ms stddev 11017.741
progress: 27.0 s, 34.0 tps, lat 6265.141 ms stddev 11004.369
progress: 28.0 s, 33.0 tps, lat 10034.798 ms stddev 13039.985
progress: 29.0 s, 24.0 tps, lat 13023.072 ms stddev 13941.556
progress: 30.0 s, 47.0 tps, lat 6365.943 ms stddev 11922.151
progress: 31.0 s, 31.0 tps, lat 5964.649 ms stddev 11769.435
progress: 32.0 s, 18.0 tps, lat 12246.518 ms stddev 15149.607
progress: 33.0 s, 32.0 tps, lat 5164.978 ms stddev 11534.913
progress: 34.0 s, 11.0 tps, lat 18304.152 ms stddev 16536.782
progress: 35.0 s, 44.0 tps, lat 9330.479 ms stddev 14948.877
progress: 36.0 s, 24.0 tps, lat 11817.116 ms stddev 16467.731
progress: 37.0 s, 41.0 tps, lat 16237.412 ms stddev 17303.912
progress: 38.0 s, 19.0 tps, lat 13821.626 ms stddev 17843.619
progress: 39.0 s, 46.0 tps, lat 6945.577 ms stddev 13146.942
progress: 40.0 s, 46.0 tps, lat 6160.800 ms stddev 13194.172
progress: 41.0 s, 8.0 tps, lat 15238.093 ms stddev 19417.694
progress: 42.0 s, 39.0 tps, lat 9974.308 ms stddev 17210.077
progress: 43.0 s, 15.0 tps, lat 14262.093 ms stddev 19735.918
progress: 44.0 s, 32.0 tps, lat 14679.202 ms stddev 18600.461
progress: 45.0 s, 17.0 tps, lat 2808.233 ms stddev 10258.923
progress: 46.0 s, 8.0 tps, lat 22812.538 ms stddev 22501.104
progress: 47.0 s, 27.0 tps, lat 10755.093 ms stddev 17452.703
progress: 48.0 s, 28.0 tps, lat 12695.089 ms stddev 20150.626
progress: 49.0 s, 40.0 tps, lat 7417.723 ms stddev 17090.798
progress: 50.0 s, 26.0 tps, lat 8836.551 ms stddev 17367.063
progress: 51.0 s, 36.0 tps, lat 10102.477 ms stddev 18143.073
progress: 52.0 s, 18.0 tps, lat 21193.476 ms stddev 24130.141
progress: 53.0 s, 31.0 tps, lat 5231.374 ms stddev 15250.089
progress: 54.0 s, 16.0 tps, lat 17832.545 ms stddev 23926.820
progress: 55.0 s, 40.0 tps, lat 8267.716 ms stddev 19144.680
progress: 56.0 s, 24.0 tps, lat 9717.556 ms stddev 20320.800
progress: 57.0 s, 112.1 tps, lat 29832.052 ms stddev 28082.255
progress: 58.0 s, 80.9 tps, lat 14180.656 ms stddev 24662.298
progress: 59.0 s, 65.1 tps, lat 11692.933 ms stddev 23265.631
progress: 60.0 s, 71.0 tps, lat 14238.960 ms stddev 25257.791
progress: 61.0 s, 73.9 tps, lat 13853.016 ms stddev 25232.753
progress: 62.0 s, 85.0 tps, lat 12292.783 ms stddev 24429.711
progress: 63.0 s, 47.0 tps, lat 14646.395 ms stddev 26355.656
progress: 64.0 s, 60.1 tps, lat 13740.789 ms stddev 25977.830
progress: 65.0 s, 69.0 tps, lat 14942.281 ms stddev 27052.182
progress: 66.0 s, 58.0 tps, lat 14530.473 ms stddev 26876.548
progress: 67.0 s, 56.0 tps, lat 12788.740 ms stddev 25702.123
progress: 68.0 s, 61.0 tps, lat 18125.661 ms stddev 29011.571
progress: 69.0 s, 50.0 tps, lat 7588.453 ms stddev 18761.996
progress: 70.0 s, 50.0 tps, lat 12869.978 ms stddev 21733.866
progress: 71.0 s, 51.0 tps, lat 7888.145 ms stddev 18308.426
progress: 72.0 s, 24.0 tps, lat 20570.400 ms stddev 24677.345
progress: 73.0 s, 35.0 tps, lat 12614.746 ms stddev 21674.382
progress: 74.0 s, 39.0 tps, lat 8972.717 ms stddev 19373.101
progress: 75.0 s, 24.0 tps, lat 17855.659 ms stddev 26005.986
progress: 76.0 s, 48.0 tps, lat 17213.313 ms stddev 25059.842
progress: 77.0 s, 46.0 tps, lat 10431.720 ms stddev 21575.560
progress: 78.0 s, 44.0 tps, lat 18981.286 ms stddev 27742.090
progress: 79.0 s, 17.0 tps, lat 26332.168 ms stddev 33125.673
progress: 80.0 s, 33.0 tps, lat 13477.140 ms stddev 20257.123
progress: 81.0 s, 46.0 tps, lat 10610.854 ms stddev 18704.784
progress: 82.0 s, 52.0 tps, lat 10895.657 ms stddev 21817.542
progress: 83.0 s, 48.0 tps, lat 9902.718 ms stddev 19765.415
progress: 84.0 s, 35.0 tps, lat 11279.406 ms stddev 21385.872
progress: 85.0 s, 51.0 tps, lat 17545.936 ms stddev 29036.491
progress: 86.0 s, 39.0 tps, lat 26478.287 ms stddev 27160.865
progress: 87.0 s, 59.0 tps, lat 17854.885 ms stddev 23408.893
progress: 88.0 s, 50.0 tps, lat 9628.923 ms stddev 18956.878
progress: 89.0 s, 64.0 tps, lat 9504.706 ms stddev 17715.302
progress: 90.0 s, 58.0 tps, lat 13678.080 ms stddev 21006.403
progress: 91.0 s, 43.0 tps, lat 9901.618 ms stddev 17408.284
progress: 92.0 s, 32.0 tps, lat 13467.742 ms stddev 22203.125
progress: 93.0 s, 55.0 tps, lat 8606.426 ms stddev 16911.836
progress: 94.0 s, 32.0 tps, lat 8291.617 ms stddev 23614.208
progress: 95.0 s, 38.0 tps, lat 14395.072 ms stddev 27052.394
progress: 96.0 s, 25.0 tps, lat 33590.759 ms stddev 43220.795
progress: 97.0 s, 50.0 tps, lat 15520.320 ms stddev 35150.669
progress: 98.0 s, 16.0 tps, lat 35268.519 ms stddev 42629.424
progress: 99.0 s, 46.0 tps, lat 17419.726 ms stddev 34680.490
progress: 100.0 s, 32.0 tps, lat 30165.553 ms stddev 41068.048
progress: 101.0 s, 20.0 tps, lat 34004.361 ms stddev 44439.703
progress: 102.0 s, 9.0 tps, lat 94924.234 ms stddev 17750.440
progress: 103.0 s, 18.0 tps, lat 78665.820 ms stddev 34437.075
progress: 104.0 s, 7.0 tps, lat 95136.317 ms stddev 19960.028
progress: 105.0 s, 21.0 tps, lat 76378.853 ms stddev 39943.159
progress: 106.0 s, 10.0 tps, lat 99620.400 ms stddev 16800.308
progress: 107.0 s, 175.2 tps, lat 73492.433 ms stddev 39690.326
progress: 108.0 s, 75.0 tps, lat 25470.068 ms stddev 13140.013
progress: 109.0 s, 58.0 tps, lat 24925.175 ms stddev 21826.550
progress: 110.0 s, 41.0 tps, lat 26060.593 ms stddev 32335.285
progress: 111.0 s, 43.0 tps, lat 106188.963 ms stddev 14389.898
progress: 112.0 s, 35.0 tps, lat 104723.980 ms stddev 17978.512
progress: 113.0 s, 133.1 tps, lat 90808.576 ms stddev 32832.832
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 4656
latency average = 23572.699 ms
latency stddev = 35309.693 ms
tps = 41.151969 (including connections establishing)
tps = 41.156079 (excluding connections establishing)


3. My patch:

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 109.0 tps, lat 186.699 ms stddev 155.513
progress: 2.0 s, 185.0 tps, lat 858.366 ms stddev 435.616
progress: 3.0 s, 155.0 tps, lat 1454.427 ms stddev 955.026
progress: 4.0 s, 323.2 tps, lat 1844.037 ms stddev 1573.388
progress: 5.0 s, 182.0 tps, lat 2038.689 ms stddev 1959.624
progress: 6.0 s, 139.0 tps, lat 2063.075 ms stddev 2421.067
progress: 7.0 s, 198.0 tps, lat 4243.453 ms stddev 2724.742
progress: 8.0 s, 234.0 tps, lat 5122.093 ms stddev 2582.399
progress: 9.0 s, 175.0 tps, lat 3268.453 ms stddev 3077.928
progress: 10.0 s, 145.0 tps, lat 4339.716 ms stddev 3855.064
progress: 11.0 s, 418.0 tps, lat 3638.447 ms stddev 3717.018
progress: 12.0 s, 254.0 tps, lat 5739.302 ms stddev 4222.194
progress: 13.0 s, 264.0 tps, lat 3463.426 ms stddev 3312.223
progress: 14.0 s, 241.0 tps, lat 3737.628 ms stddev 4345.002
progress: 15.0 s, 231.0 tps, lat 3655.541 ms stddev 3885.624
progress: 16.0 s, 372.0 tps, lat 4965.353 ms stddev 4092.753
progress: 17.0 s, 203.0 tps, lat 4152.482 ms stddev 3115.167
progress: 18.0 s, 189.0 tps, lat 4069.375 ms stddev 4096.535
progress: 19.0 s, 267.0 tps, lat 4495.891 ms stddev 3897.380
progress: 20.0 s, 182.8 tps, lat 2946.705 ms stddev 3371.206
progress: 21.0 s, 144.1 tps, lat 4176.410 ms stddev 2994.048
progress: 22.0 s, 186.0 tps, lat 4095.609 ms stddev 3998.535
progress: 23.0 s, 283.0 tps, lat 4091.103 ms stddev 4337.856
progress: 24.0 s, 413.0 tps, lat 4913.568 ms stddev 3602.845
progress: 25.0 s, 153.0 tps, lat 4897.216 ms stddev 3412.526
progress: 26.0 s, 362.0 tps, lat 4322.345 ms stddev 4745.684
progress: 27.0 s, 229.0 tps, lat 2722.046 ms stddev 2778.627
progress: 28.0 s, 375.0 tps, lat 3074.234 ms stddev 3372.233
progress: 29.0 s, 227.9 tps, lat 4165.033 ms stddev 3554.064
progress: 30.0 s, 403.2 tps, lat 3777.061 ms stddev 3338.528
progress: 31.0 s, 312.0 tps, lat 2692.811 ms stddev 2786.706
progress: 32.0 s, 172.0 tps, lat 2666.640 ms stddev 2557.934
progress: 33.0 s, 327.0 tps, lat 3753.528 ms stddev 3003.476
progress: 34.0 s, 288.0 tps, lat 3700.307 ms stddev 4360.617
progress: 35.0 s, 138.9 tps, lat 2793.501 ms stddev 2973.115
progress: 36.0 s, 199.1 tps, lat 3914.999 ms stddev 3620.217
progress: 37.0 s, 332.0 tps, lat 4438.935 ms stddev 3367.059
progress: 38.0 s, 273.0 tps, lat 3976.717 ms stddev 3605.981
progress: 39.0 s, 350.0 tps, lat 2564.301 ms stddev 3098.142
progress: 40.0 s, 198.0 tps, lat 3821.723 ms stddev 3720.681
progress: 41.0 s, 379.0 tps, lat 3540.596 ms stddev 3716.245
progress: 42.0 s, 337.0 tps, lat 3359.137 ms stddev 3432.100
progress: 43.0 s, 273.0 tps, lat 4083.521 ms stddev 3446.354
progress: 44.0 s, 267.0 tps, lat 3604.277 ms stddev 3931.611
progress: 45.0 s, 221.0 tps, lat 3187.834 ms stddev 3472.713
progress: 46.0 s, 257.0 tps, lat 4420.584 ms stddev 3269.787
progress: 47.0 s, 216.0 tps, lat 4784.282 ms stddev 4078.919
progress: 48.0 s, 340.0 tps, lat 2878.317 ms stddev 3159.234
progress: 49.0 s, 393.0 tps, lat 3296.432 ms stddev 3120.998
progress: 50.0 s, 384.0 tps, lat 3333.966 ms stddev 3632.887
progress: 51.0 s, 179.9 tps, lat 3357.831 ms stddev 3577.181
progress: 52.0 s, 193.1 tps, lat 2740.384 ms stddev 2853.463
progress: 53.0 s, 266.0 tps, lat 3338.786 ms stddev 3294.210
progress: 54.0 s, 198.0 tps, lat 4350.746 ms stddev 3632.925
progress: 55.0 s, 443.0 tps, lat 4533.887 ms stddev 3284.145
progress: 56.0 s, 209.0 tps, lat 2471.820 ms stddev 2872.507
progress: 57.0 s, 243.0 tps, lat 3657.152 ms stddev 3082.953
progress: 58.0 s, 204.0 tps, lat 3758.366 ms stddev 3981.754
progress: 59.0 s, 255.0 tps, lat 3641.082 ms stddev 4062.395
progress: 60.0 s, 331.0 tps, lat 3752.880 ms stddev 3732.255
progress: 61.0 s, 225.0 tps, lat 3390.763 ms stddev 2763.489
progress: 62.0 s, 298.0 tps, lat 4201.300 ms stddev 3632.733
progress: 63.0 s, 106.0 tps, lat 2692.495 ms stddev 3976.275
progress: 64.0 s, 118.0 tps, lat 3870.886 ms stddev 4552.055
progress: 65.0 s, 111.0 tps, lat 1877.879 ms stddev 2530.083
progress: 66.0 s, 114.0 tps, lat 4240.246 ms stddev 4699.575
progress: 67.0 s, 214.0 tps, lat 4742.589 ms stddev 4249.912
progress: 68.0 s, 279.0 tps, lat 5303.721 ms stddev 5630.854
progress: 69.0 s, 181.0 tps, lat 5367.429 ms stddev 5191.225
progress: 70.0 s, 223.0 tps, lat 5220.545 ms stddev 4810.474
progress: 71.0 s, 181.0 tps, lat 5094.250 ms stddev 4598.197
progress: 72.0 s, 191.0 tps, lat 4775.279 ms stddev 4638.785
progress: 73.0 s, 121.0 tps, lat 6151.599 ms stddev 6044.989
progress: 74.0 s, 352.0 tps, lat 5722.979 ms stddev 5637.691
progress: 75.0 s, 313.0 tps, lat 3681.700 ms stddev 4511.268
progress: 76.0 s, 293.9 tps, lat 6134.980 ms stddev 5633.180
progress: 77.0 s, 292.1 tps, lat 4964.182 ms stddev 5115.833
progress: 78.0 s, 247.0 tps, lat 2958.631 ms stddev 3396.316
progress: 79.0 s, 308.0 tps, lat 4295.905 ms stddev 5392.345
progress: 80.0 s, 260.0 tps, lat 5192.128 ms stddev 4042.820
progress: 81.0 s, 244.0 tps, lat 2378.675 ms stddev 2619.194
progress: 82.0 s, 261.0 tps, lat 3619.289 ms stddev 3678.498
progress: 83.0 s, 193.0 tps, lat 4519.653 ms stddev 4820.241
progress: 84.0 s, 313.0 tps, lat 3838.839 ms stddev 4010.695
progress: 85.0 s, 265.0 tps, lat 3470.704 ms stddev 3009.119
progress: 86.0 s, 169.0 tps, lat 3113.572 ms stddev 3338.449
progress: 87.0 s, 181.0 tps, lat 2722.756 ms stddev 3552.120
progress: 88.0 s, 192.0 tps, lat 4767.495 ms stddev 4794.403
progress: 89.0 s, 276.0 tps, lat 4417.211 ms stddev 4012.534
progress: 90.0 s, 249.0 tps, lat 4600.363 ms stddev 4613.885
progress: 91.0 s, 136.0 tps, lat 5292.106 ms stddev 3254.580
progress: 92.0 s, 335.0 tps, lat 3606.878 ms stddev 4238.592
progress: 93.0 s, 183.0 tps, lat 4802.608 ms stddev 4424.506
progress: 94.0 s, 208.0 tps, lat 4257.261 ms stddev 3882.918
progress: 95.0 s, 154.0 tps, lat 5939.258 ms stddev 4506.066
progress: 96.0 s, 245.0 tps, lat 4968.774 ms stddev 3695.050
progress: 97.0 s, 133.9 tps, lat 3991.399 ms stddev 4884.086
progress: 98.0 s, 309.1 tps, lat 4340.668 ms stddev 4231.593
progress: 99.0 s, 88.0 tps, lat 5672.978 ms stddev 4776.332
progress: 100.0 s, 221.0 tps, lat 3617.880 ms stddev 4441.404
progress: 101.0 s, 486.0 tps, lat 7574.035 ms stddev 4135.087
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 25206
latency average = 3990.599 ms
latency stddev = 3982.713 ms
tps = 248.988411 (including connections establishing)
tps = 249.015469 (excluding connections establishing)

As you can see performance and latency are ~6 times worser.  Please also notice periods of zero TPS in first two cases.
-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

konstantin knizhnik


On 16.04.2018 14:11, Konstantin Knizhnik wrote:


On 14.04.2018 10:09, Юрий Соколов wrote:
пт, 13 апр. 2018 г., 21:10 Andres Freund <[hidden email]>:
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > +  * Ensure that only one backend is checking for deadlock.
> > > +  * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > +  */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > +         enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +         return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

What about my version? 
It still performs deadlock detection every time, but it tries to detect it with shared lock first,
and only if there is probability of real deadlock, it rechecks with exclusive lock. 

Although even shared lock leads to some stalleness for active transactions, but in the catastrophic situation, where many backends to check for inexisting deadlock at the same time, it greately reduce pause.

Unfortunately your patch doesn't help with inserts.
In most cases Postgres obtains exclusive partition locks: any lock/unlock operation requires exclusive lock.
Shared locks of all partitions are used for collecting some information or to perform recovery.

Below results of three different versions:

1. Vanilla Postgres:

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 227.9 tps, lat 109.560 ms stddev 189.328
progress: 2.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 3.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 4.0 s, 53.0 tps, lat 1145.417 ms stddev 1607.484
progress: 5.0 s, 19.0 tps, lat 236.807 ms stddev 819.307
progress: 6.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 7.0 s, 2.0 tps, lat 6649.944 ms stddev 9.691
progress: 8.0 s, 60.0 tps, lat 2343.313 ms stddev 3335.967
progress: 9.0 s, 89.0 tps, lat 1813.495 ms stddev 3337.904
progress: 10.0 s, 99.1 tps, lat 2093.653 ms stddev 3758.468
progress: 11.0 s, 94.9 tps, lat 2424.560 ms stddev 4258.622
progress: 12.0 s, 79.0 tps, lat 2037.880 ms stddev 4152.258
progress: 13.0 s, 30.0 tps, lat 80.697 ms stddev 24.854
progress: 14.0 s, 2.0 tps, lat 71.642 ms stddev 0.022
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 16.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 17.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 18.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 19.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 21.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 22.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 23.0 s, 3.0 tps, lat 22396.463 ms stddev 1.964
progress: 24.0 s, 1.0 tps, lat 23667.927 ms stddev 0.001
progress: 25.0 s, 25.0 tps, lat 8862.603 ms stddev 11545.517
progress: 26.0 s, 27.0 tps, lat 6738.338 ms stddev 10984.215
progress: 27.0 s, 40.0 tps, lat 6656.893 ms stddev 11236.042
progress: 28.0 s, 49.0 tps, lat 10124.108 ms stddev 13042.391
progress: 29.0 s, 20.0 tps, lat 8638.022 ms stddev 12916.653
progress: 30.0 s, 45.0 tps, lat 9805.039 ms stddev 13624.253
progress: 31.0 s, 30.0 tps, lat 7151.608 ms stddev 12628.340
progress: 32.0 s, 48.0 tps, lat 9838.776 ms stddev 14316.187
progress: 33.0 s, 44.0 tps, lat 7479.878 ms stddev 13394.234
progress: 34.0 s, 55.0 tps, lat 8530.081 ms stddev 14327.334
progress: 35.0 s, 57.0 tps, lat 12054.371 ms stddev 15630.142
progress: 36.0 s, 17.0 tps, lat 8620.003 ms stddev 14896.671
progress: 37.0 s, 39.0 tps, lat 8488.141 ms stddev 15129.849
progress: 38.0 s, 47.0 tps, lat 12140.015 ms stddev 16098.595
progress: 39.0 s, 63.9 tps, lat 9525.523 ms stddev 15603.099
progress: 40.0 s, 64.0 tps, lat 11248.021 ms stddev 16118.116
progress: 41.0 s, 30.0 tps, lat 12375.117 ms stddev 17397.134
progress: 42.0 s, 30.0 tps, lat 11092.475 ms stddev 17997.073
progress: 43.0 s, 51.0 tps, lat 11488.794 ms stddev 18135.660
progress: 44.0 s, 35.0 tps, lat 11243.333 ms stddev 18802.389
progress: 45.0 s, 56.1 tps, lat 9892.959 ms stddev 18010.395
progress: 46.0 s, 29.0 tps, lat 17224.308 ms stddev 20820.138
progress: 47.0 s, 52.0 tps, lat 13475.267 ms stddev 20866.198
progress: 48.0 s, 52.0 tps, lat 8386.839 ms stddev 17733.223
progress: 49.0 s, 65.0 tps, lat 9060.374 ms stddev 18574.099
progress: 50.0 s, 29.0 tps, lat 10390.055 ms stddev 19808.192
progress: 51.0 s, 54.1 tps, lat 15433.057 ms stddev 21218.736
progress: 52.0 s, 39.0 tps, lat 18607.422 ms stddev 23413.561
progress: 53.0 s, 62.0 tps, lat 9898.151 ms stddev 18971.511
progress: 54.0 s, 34.0 tps, lat 9172.970 ms stddev 18655.790
progress: 55.0 s, 35.0 tps, lat 16339.829 ms stddev 24009.687
progress: 56.0 s, 49.0 tps, lat 11474.089 ms stddev 22152.509
progress: 57.0 s, 15.0 tps, lat 23273.790 ms stddev 23972.075
progress: 58.0 s, 45.0 tps, lat 17884.887 ms stddev 24246.962
progress: 59.0 s, 9.0 tps, lat 26061.653 ms stddev 28924.382
progress: 60.0 s, 34.0 tps, lat 17555.891 ms stddev 26873.251
progress: 61.0 s, 15.0 tps, lat 16637.255 ms stddev 22135.661
progress: 62.0 s, 20.0 tps, lat 21534.107 ms stddev 29106.973
progress: 63.0 s, 32.0 tps, lat 9828.245 ms stddev 22411.644
progress: 64.0 s, 61.1 tps, lat 17278.688 ms stddev 26523.489
progress: 65.0 s, 22.0 tps, lat 17707.084 ms stddev 28519.434
progress: 66.0 s, 29.0 tps, lat 14907.572 ms stddev 25857.086
progress: 67.0 s, 20.0 tps, lat 19984.031 ms stddev 30240.857
progress: 68.0 s, 37.0 tps, lat 13499.886 ms stddev 26002.883
progress: 69.0 s, 20.0 tps, lat 24032.741 ms stddev 32445.249
progress: 70.0 s, 38.0 tps, lat 23151.399 ms stddev 31408.612
progress: 71.0 s, 25.0 tps, lat 23165.598 ms stddev 32441.789
progress: 72.0 s, 40.0 tps, lat 10862.140 ms stddev 25275.667
progress: 73.0 s, 30.0 tps, lat 14560.809 ms stddev 28686.763
progress: 74.0 s, 14.0 tps, lat 26269.971 ms stddev 34970.753
progress: 75.0 s, 41.0 tps, lat 17504.487 ms stddev 30219.588
progress: 76.0 s, 49.0 tps, lat 15546.103 ms stddev 30200.665
progress: 77.0 s, 26.0 tps, lat 14385.324 ms stddev 26674.468
progress: 78.0 s, 42.0 tps, lat 22232.188 ms stddev 34742.547
progress: 79.0 s, 15.0 tps, lat 22361.853 ms stddev 33895.018
progress: 80.0 s, 45.0 tps, lat 17768.734 ms stddev 32813.238
progress: 81.0 s, 36.0 tps, lat 15820.789 ms stddev 31662.068
progress: 82.0 s, 31.0 tps, lat 16553.029 ms stddev 31800.897
progress: 83.0 s, 39.0 tps, lat 10799.064 ms stddev 27374.919
progress: 84.0 s, 28.0 tps, lat 27518.499 ms stddev 38362.262
progress: 85.0 s, 27.0 tps, lat 15802.221 ms stddev 32525.404
progress: 86.0 s, 40.0 tps, lat 20152.181 ms stddev 35171.042
progress: 87.0 s, 32.0 tps, lat 20598.004 ms stddev 34830.062
progress: 88.0 s, 30.0 tps, lat 14803.086 ms stddev 32350.232
progress: 89.0 s, 28.0 tps, lat 19647.986 ms stddev 35859.938
progress: 90.0 s, 40.0 tps, lat 18435.264 ms stddev 35364.559
progress: 91.0 s, 6.0 tps, lat 45248.291 ms stddev 45044.607
progress: 92.0 s, 32.0 tps, lat 29046.606 ms stddev 41959.102
progress: 93.0 s, 18.0 tps, lat 20746.328 ms stddev 38230.543
progress: 94.0 s, 49.0 tps, lat 21428.576 ms stddev 38599.069
progress: 95.0 s, 27.0 tps, lat 14428.229 ms stddev 33301.365
progress: 96.0 s, 38.0 tps, lat 18041.058 ms stddev 36628.789
progress: 97.0 s, 50.0 tps, lat 17589.414 ms stddev 36748.200
progress: 98.0 s, 47.0 tps, lat 16817.047 ms stddev 36381.922
progress: 99.0 s, 31.0 tps, lat 16137.795 ms stddev 35997.819
progress: 100.0 s, 32.0 tps, lat 21987.270 ms stddev 40859.388
progress: 101.0 s, 12.0 tps, lat 66793.331 ms stddev 47041.846
progress: 102.0 s, 161.0 tps, lat 84082.482 ms stddev 22621.806
progress: 103.0 s, 50.0 tps, lat 38104.113 ms stddev 21406.175
progress: 104.0 s, 40.0 tps, lat 37981.012 ms stddev 27412.163
progress: 105.0 s, 32.0 tps, lat 32448.045 ms stddev 27158.082
progress: 106.0 s, 39.0 tps, lat 25246.829 ms stddev 23138.786
progress: 107.0 s, 29.0 tps, lat 22691.016 ms stddev 23348.412
progress: 108.0 s, 32.0 tps, lat 20934.772 ms stddev 27771.861
progress: 109.0 s, 26.0 tps, lat 21292.394 ms stddev 31416.424
progress: 110.0 s, 24.0 tps, lat 99099.590 ms stddev 27154.536
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 4518
latency average = 23748.869 ms
latency stddev = 34606.615 ms
tps = 40.811158 (including connections establishing)
tps = 40.815387 (excluding connections establishing)

2. Yury's shared locks patch

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 161.9 tps, lat 58.956 ms stddev 65.879
progress: 2.0 s, 75.0 tps, lat 432.110 ms stddev 655.010
progress: 3.0 s, 111.0 tps, lat 312.127 ms stddev 669.351
progress: 4.0 s, 62.0 tps, lat 716.576 ms stddev 1262.699
progress: 5.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 6.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 7.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 8.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 9.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 10.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 11.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 16.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 17.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 18.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 19.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 20.0 s, 33.0 tps, lat 5910.665 ms stddev 8733.552
progress: 21.0 s, 20.0 tps, lat 4190.000 ms stddev 8012.222
progress: 22.0 s, 44.0 tps, lat 4452.898 ms stddev 8469.555
progress: 23.0 s, 17.0 tps, lat 7951.775 ms stddev 10550.763
progress: 24.0 s, 39.0 tps, lat 4894.502 ms stddev 9270.031
progress: 25.0 s, 48.0 tps, lat 4658.159 ms stddev 9365.012
progress: 26.0 s, 19.0 tps, lat 6740.618 ms stddev 11017.741
progress: 27.0 s, 34.0 tps, lat 6265.141 ms stddev 11004.369
progress: 28.0 s, 33.0 tps, lat 10034.798 ms stddev 13039.985
progress: 29.0 s, 24.0 tps, lat 13023.072 ms stddev 13941.556
progress: 30.0 s, 47.0 tps, lat 6365.943 ms stddev 11922.151
progress: 31.0 s, 31.0 tps, lat 5964.649 ms stddev 11769.435
progress: 32.0 s, 18.0 tps, lat 12246.518 ms stddev 15149.607
progress: 33.0 s, 32.0 tps, lat 5164.978 ms stddev 11534.913
progress: 34.0 s, 11.0 tps, lat 18304.152 ms stddev 16536.782
progress: 35.0 s, 44.0 tps, lat 9330.479 ms stddev 14948.877
progress: 36.0 s, 24.0 tps, lat 11817.116 ms stddev 16467.731
progress: 37.0 s, 41.0 tps, lat 16237.412 ms stddev 17303.912
progress: 38.0 s, 19.0 tps, lat 13821.626 ms stddev 17843.619
progress: 39.0 s, 46.0 tps, lat 6945.577 ms stddev 13146.942
progress: 40.0 s, 46.0 tps, lat 6160.800 ms stddev 13194.172
progress: 41.0 s, 8.0 tps, lat 15238.093 ms stddev 19417.694
progress: 42.0 s, 39.0 tps, lat 9974.308 ms stddev 17210.077
progress: 43.0 s, 15.0 tps, lat 14262.093 ms stddev 19735.918
progress: 44.0 s, 32.0 tps, lat 14679.202 ms stddev 18600.461
progress: 45.0 s, 17.0 tps, lat 2808.233 ms stddev 10258.923
progress: 46.0 s, 8.0 tps, lat 22812.538 ms stddev 22501.104
progress: 47.0 s, 27.0 tps, lat 10755.093 ms stddev 17452.703
progress: 48.0 s, 28.0 tps, lat 12695.089 ms stddev 20150.626
progress: 49.0 s, 40.0 tps, lat 7417.723 ms stddev 17090.798
progress: 50.0 s, 26.0 tps, lat 8836.551 ms stddev 17367.063
progress: 51.0 s, 36.0 tps, lat 10102.477 ms stddev 18143.073
progress: 52.0 s, 18.0 tps, lat 21193.476 ms stddev 24130.141
progress: 53.0 s, 31.0 tps, lat 5231.374 ms stddev 15250.089
progress: 54.0 s, 16.0 tps, lat 17832.545 ms stddev 23926.820
progress: 55.0 s, 40.0 tps, lat 8267.716 ms stddev 19144.680
progress: 56.0 s, 24.0 tps, lat 9717.556 ms stddev 20320.800
progress: 57.0 s, 112.1 tps, lat 29832.052 ms stddev 28082.255
progress: 58.0 s, 80.9 tps, lat 14180.656 ms stddev 24662.298
progress: 59.0 s, 65.1 tps, lat 11692.933 ms stddev 23265.631
progress: 60.0 s, 71.0 tps, lat 14238.960 ms stddev 25257.791
progress: 61.0 s, 73.9 tps, lat 13853.016 ms stddev 25232.753
progress: 62.0 s, 85.0 tps, lat 12292.783 ms stddev 24429.711
progress: 63.0 s, 47.0 tps, lat 14646.395 ms stddev 26355.656
progress: 64.0 s, 60.1 tps, lat 13740.789 ms stddev 25977.830
progress: 65.0 s, 69.0 tps, lat 14942.281 ms stddev 27052.182
progress: 66.0 s, 58.0 tps, lat 14530.473 ms stddev 26876.548
progress: 67.0 s, 56.0 tps, lat 12788.740 ms stddev 25702.123
progress: 68.0 s, 61.0 tps, lat 18125.661 ms stddev 29011.571
progress: 69.0 s, 50.0 tps, lat 7588.453 ms stddev 18761.996
progress: 70.0 s, 50.0 tps, lat 12869.978 ms stddev 21733.866
progress: 71.0 s, 51.0 tps, lat 7888.145 ms stddev 18308.426
progress: 72.0 s, 24.0 tps, lat 20570.400 ms stddev 24677.345
progress: 73.0 s, 35.0 tps, lat 12614.746 ms stddev 21674.382
progress: 74.0 s, 39.0 tps, lat 8972.717 ms stddev 19373.101
progress: 75.0 s, 24.0 tps, lat 17855.659 ms stddev 26005.986
progress: 76.0 s, 48.0 tps, lat 17213.313 ms stddev 25059.842
progress: 77.0 s, 46.0 tps, lat 10431.720 ms stddev 21575.560
progress: 78.0 s, 44.0 tps, lat 18981.286 ms stddev 27742.090
progress: 79.0 s, 17.0 tps, lat 26332.168 ms stddev 33125.673
progress: 80.0 s, 33.0 tps, lat 13477.140 ms stddev 20257.123
progress: 81.0 s, 46.0 tps, lat 10610.854 ms stddev 18704.784
progress: 82.0 s, 52.0 tps, lat 10895.657 ms stddev 21817.542
progress: 83.0 s, 48.0 tps, lat 9902.718 ms stddev 19765.415
progress: 84.0 s, 35.0 tps, lat 11279.406 ms stddev 21385.872
progress: 85.0 s, 51.0 tps, lat 17545.936 ms stddev 29036.491
progress: 86.0 s, 39.0 tps, lat 26478.287 ms stddev 27160.865
progress: 87.0 s, 59.0 tps, lat 17854.885 ms stddev 23408.893
progress: 88.0 s, 50.0 tps, lat 9628.923 ms stddev 18956.878
progress: 89.0 s, 64.0 tps, lat 9504.706 ms stddev 17715.302
progress: 90.0 s, 58.0 tps, lat 13678.080 ms stddev 21006.403
progress: 91.0 s, 43.0 tps, lat 9901.618 ms stddev 17408.284
progress: 92.0 s, 32.0 tps, lat 13467.742 ms stddev 22203.125
progress: 93.0 s, 55.0 tps, lat 8606.426 ms stddev 16911.836
progress: 94.0 s, 32.0 tps, lat 8291.617 ms stddev 23614.208
progress: 95.0 s, 38.0 tps, lat 14395.072 ms stddev 27052.394
progress: 96.0 s, 25.0 tps, lat 33590.759 ms stddev 43220.795
progress: 97.0 s, 50.0 tps, lat 15520.320 ms stddev 35150.669
progress: 98.0 s, 16.0 tps, lat 35268.519 ms stddev 42629.424
progress: 99.0 s, 46.0 tps, lat 17419.726 ms stddev 34680.490
progress: 100.0 s, 32.0 tps, lat 30165.553 ms stddev 41068.048
progress: 101.0 s, 20.0 tps, lat 34004.361 ms stddev 44439.703
progress: 102.0 s, 9.0 tps, lat 94924.234 ms stddev 17750.440
progress: 103.0 s, 18.0 tps, lat 78665.820 ms stddev 34437.075
progress: 104.0 s, 7.0 tps, lat 95136.317 ms stddev 19960.028
progress: 105.0 s, 21.0 tps, lat 76378.853 ms stddev 39943.159
progress: 106.0 s, 10.0 tps, lat 99620.400 ms stddev 16800.308
progress: 107.0 s, 175.2 tps, lat 73492.433 ms stddev 39690.326
progress: 108.0 s, 75.0 tps, lat 25470.068 ms stddev 13140.013
progress: 109.0 s, 58.0 tps, lat 24925.175 ms stddev 21826.550
progress: 110.0 s, 41.0 tps, lat 26060.593 ms stddev 32335.285
progress: 111.0 s, 43.0 tps, lat 106188.963 ms stddev 14389.898
progress: 112.0 s, 35.0 tps, lat 104723.980 ms stddev 17978.512
progress: 113.0 s, 133.1 tps, lat 90808.576 ms stddev 32832.832
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 4656
latency average = 23572.699 ms
latency stddev = 35309.693 ms
tps = 41.151969 (including connections establishing)
tps = 41.156079 (excluding connections establishing)


3. My patch:

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 109.0 tps, lat 186.699 ms stddev 155.513
progress: 2.0 s, 185.0 tps, lat 858.366 ms stddev 435.616
progress: 3.0 s, 155.0 tps, lat 1454.427 ms stddev 955.026
progress: 4.0 s, 323.2 tps, lat 1844.037 ms stddev 1573.388
progress: 5.0 s, 182.0 tps, lat 2038.689 ms stddev 1959.624
progress: 6.0 s, 139.0 tps, lat 2063.075 ms stddev 2421.067
progress: 7.0 s, 198.0 tps, lat 4243.453 ms stddev 2724.742
progress: 8.0 s, 234.0 tps, lat 5122.093 ms stddev 2582.399
progress: 9.0 s, 175.0 tps, lat 3268.453 ms stddev 3077.928
progress: 10.0 s, 145.0 tps, lat 4339.716 ms stddev 3855.064
progress: 11.0 s, 418.0 tps, lat 3638.447 ms stddev 3717.018
progress: 12.0 s, 254.0 tps, lat 5739.302 ms stddev 4222.194
progress: 13.0 s, 264.0 tps, lat 3463.426 ms stddev 3312.223
progress: 14.0 s, 241.0 tps, lat 3737.628 ms stddev 4345.002
progress: 15.0 s, 231.0 tps, lat 3655.541 ms stddev 3885.624
progress: 16.0 s, 372.0 tps, lat 4965.353 ms stddev 4092.753
progress: 17.0 s, 203.0 tps, lat 4152.482 ms stddev 3115.167
progress: 18.0 s, 189.0 tps, lat 4069.375 ms stddev 4096.535
progress: 19.0 s, 267.0 tps, lat 4495.891 ms stddev 3897.380
progress: 20.0 s, 182.8 tps, lat 2946.705 ms stddev 3371.206
progress: 21.0 s, 144.1 tps, lat 4176.410 ms stddev 2994.048
progress: 22.0 s, 186.0 tps, lat 4095.609 ms stddev 3998.535
progress: 23.0 s, 283.0 tps, lat 4091.103 ms stddev 4337.856
progress: 24.0 s, 413.0 tps, lat 4913.568 ms stddev 3602.845
progress: 25.0 s, 153.0 tps, lat 4897.216 ms stddev 3412.526
progress: 26.0 s, 362.0 tps, lat 4322.345 ms stddev 4745.684
progress: 27.0 s, 229.0 tps, lat 2722.046 ms stddev 2778.627
progress: 28.0 s, 375.0 tps, lat 3074.234 ms stddev 3372.233
progress: 29.0 s, 227.9 tps, lat 4165.033 ms stddev 3554.064
progress: 30.0 s, 403.2 tps, lat 3777.061 ms stddev 3338.528
progress: 31.0 s, 312.0 tps, lat 2692.811 ms stddev 2786.706
progress: 32.0 s, 172.0 tps, lat 2666.640 ms stddev 2557.934
progress: 33.0 s, 327.0 tps, lat 3753.528 ms stddev 3003.476
progress: 34.0 s, 288.0 tps, lat 3700.307 ms stddev 4360.617
progress: 35.0 s, 138.9 tps, lat 2793.501 ms stddev 2973.115
progress: 36.0 s, 199.1 tps, lat 3914.999 ms stddev 3620.217
progress: 37.0 s, 332.0 tps, lat 4438.935 ms stddev 3367.059
progress: 38.0 s, 273.0 tps, lat 3976.717 ms stddev 3605.981
progress: 39.0 s, 350.0 tps, lat 2564.301 ms stddev 3098.142
progress: 40.0 s, 198.0 tps, lat 3821.723 ms stddev 3720.681
progress: 41.0 s, 379.0 tps, lat 3540.596 ms stddev 3716.245
progress: 42.0 s, 337.0 tps, lat 3359.137 ms stddev 3432.100
progress: 43.0 s, 273.0 tps, lat 4083.521 ms stddev 3446.354
progress: 44.0 s, 267.0 tps, lat 3604.277 ms stddev 3931.611
progress: 45.0 s, 221.0 tps, lat 3187.834 ms stddev 3472.713
progress: 46.0 s, 257.0 tps, lat 4420.584 ms stddev 3269.787
progress: 47.0 s, 216.0 tps, lat 4784.282 ms stddev 4078.919
progress: 48.0 s, 340.0 tps, lat 2878.317 ms stddev 3159.234
progress: 49.0 s, 393.0 tps, lat 3296.432 ms stddev 3120.998
progress: 50.0 s, 384.0 tps, lat 3333.966 ms stddev 3632.887
progress: 51.0 s, 179.9 tps, lat 3357.831 ms stddev 3577.181
progress: 52.0 s, 193.1 tps, lat 2740.384 ms stddev 2853.463
progress: 53.0 s, 266.0 tps, lat 3338.786 ms stddev 3294.210
progress: 54.0 s, 198.0 tps, lat 4350.746 ms stddev 3632.925
progress: 55.0 s, 443.0 tps, lat 4533.887 ms stddev 3284.145
progress: 56.0 s, 209.0 tps, lat 2471.820 ms stddev 2872.507
progress: 57.0 s, 243.0 tps, lat 3657.152 ms stddev 3082.953
progress: 58.0 s, 204.0 tps, lat 3758.366 ms stddev 3981.754
progress: 59.0 s, 255.0 tps, lat 3641.082 ms stddev 4062.395
progress: 60.0 s, 331.0 tps, lat 3752.880 ms stddev 3732.255
progress: 61.0 s, 225.0 tps, lat 3390.763 ms stddev 2763.489
progress: 62.0 s, 298.0 tps, lat 4201.300 ms stddev 3632.733
progress: 63.0 s, 106.0 tps, lat 2692.495 ms stddev 3976.275
progress: 64.0 s, 118.0 tps, lat 3870.886 ms stddev 4552.055
progress: 65.0 s, 111.0 tps, lat 1877.879 ms stddev 2530.083
progress: 66.0 s, 114.0 tps, lat 4240.246 ms stddev 4699.575
progress: 67.0 s, 214.0 tps, lat 4742.589 ms stddev 4249.912
progress: 68.0 s, 279.0 tps, lat 5303.721 ms stddev 5630.854
progress: 69.0 s, 181.0 tps, lat 5367.429 ms stddev 5191.225
progress: 70.0 s, 223.0 tps, lat 5220.545 ms stddev 4810.474
progress: 71.0 s, 181.0 tps, lat 5094.250 ms stddev 4598.197
progress: 72.0 s, 191.0 tps, lat 4775.279 ms stddev 4638.785
progress: 73.0 s, 121.0 tps, lat 6151.599 ms stddev 6044.989
progress: 74.0 s, 352.0 tps, lat 5722.979 ms stddev 5637.691
progress: 75.0 s, 313.0 tps, lat 3681.700 ms stddev 4511.268
progress: 76.0 s, 293.9 tps, lat 6134.980 ms stddev 5633.180
progress: 77.0 s, 292.1 tps, lat 4964.182 ms stddev 5115.833
progress: 78.0 s, 247.0 tps, lat 2958.631 ms stddev 3396.316
progress: 79.0 s, 308.0 tps, lat 4295.905 ms stddev 5392.345
progress: 80.0 s, 260.0 tps, lat 5192.128 ms stddev 4042.820
progress: 81.0 s, 244.0 tps, lat 2378.675 ms stddev 2619.194
progress: 82.0 s, 261.0 tps, lat 3619.289 ms stddev 3678.498
progress: 83.0 s, 193.0 tps, lat 4519.653 ms stddev 4820.241
progress: 84.0 s, 313.0 tps, lat 3838.839 ms stddev 4010.695
progress: 85.0 s, 265.0 tps, lat 3470.704 ms stddev 3009.119
progress: 86.0 s, 169.0 tps, lat 3113.572 ms stddev 3338.449
progress: 87.0 s, 181.0 tps, lat 2722.756 ms stddev 3552.120
progress: 88.0 s, 192.0 tps, lat 4767.495 ms stddev 4794.403
progress: 89.0 s, 276.0 tps, lat 4417.211 ms stddev 4012.534
progress: 90.0 s, 249.0 tps, lat 4600.363 ms stddev 4613.885
progress: 91.0 s, 136.0 tps, lat 5292.106 ms stddev 3254.580
progress: 92.0 s, 335.0 tps, lat 3606.878 ms stddev 4238.592
progress: 93.0 s, 183.0 tps, lat 4802.608 ms stddev 4424.506
progress: 94.0 s, 208.0 tps, lat 4257.261 ms stddev 3882.918
progress: 95.0 s, 154.0 tps, lat 5939.258 ms stddev 4506.066
progress: 96.0 s, 245.0 tps, lat 4968.774 ms stddev 3695.050
progress: 97.0 s, 133.9 tps, lat 3991.399 ms stddev 4884.086
progress: 98.0 s, 309.1 tps, lat 4340.668 ms stddev 4231.593
progress: 99.0 s, 88.0 tps, lat 5672.978 ms stddev 4776.332
progress: 100.0 s, 221.0 tps, lat 3617.880 ms stddev 4441.404
progress: 101.0 s, 486.0 tps, lat 7574.035 ms stddev 4135.087
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 25206
latency average = 3990.599 ms
latency stddev = 3982.713 ms
tps = 248.988411 (including connections establishing)
tps = 249.015469 (excluding connections establishing)

As you can see performance and latency are ~6 times worser.  Please also notice periods of zero TPS in first two cases.

    

I found a way to speedup this parallel insert "benchmark" even more by remembering visited backends in deadlock check.
I added lastDeadlockCheck to PGPROC, assign it in indLockCycle:

+    for (i = 0; i < nVisitedProcs; i++)
+    {
+        visitedProcs[i]->lastDeadlockCheck = now;
+    }

and check it in CheckDeadLock:

+    /*
+     * Ensure that only one backend is checking for deadlock.
+     * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
+     */
+    if (now - MyProc->lastDeadlockCheck < DeadlockTimeout*1000
+        || !pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
+    {
+        enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+        return;
+    }
+    inside_deadlock_check = true;

It cause increase of TPS about 3 times:

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 286.0 tps, lat 378.756 ms stddev 195.544
progress: 2.0 s, 766.0 tps, lat 912.145 ms stddev 544.009
progress: 3.0 s, 806.0 tps, lat 1100.353 ms stddev 748.493
progress: 4.0 s, 927.0 tps, lat 1106.383 ms stddev 928.074
progress: 5.0 s, 799.0 tps, lat 1313.658 ms stddev 971.700
progress: 6.0 s, 822.0 tps, lat 1203.190 ms stddev 856.231
progress: 7.0 s, 847.0 tps, lat 1207.930 ms stddev 867.520
progress: 8.0 s, 932.1 tps, lat 1135.854 ms stddev 831.402
progress: 9.0 s, 709.0 tps, lat 1250.656 ms stddev 865.320
progress: 10.0 s, 901.9 tps, lat 1282.045 ms stddev 870.923
progress: 11.0 s, 601.0 tps, lat 1225.138 ms stddev 890.028
progress: 12.0 s, 877.0 tps, lat 1221.587 ms stddev 935.510
progress: 13.0 s, 772.0 tps, lat 1268.676 ms stddev 965.900
progress: 14.0 s, 391.0 tps, lat 1243.487 ms stddev 1061.889
progress: 15.0 s, 638.1 tps, lat 1889.763 ms stddev 1172.063
progress: 16.0 s, 844.0 tps, lat 1485.988 ms stddev 1269.034
progress: 17.0 s, 844.0 tps, lat 1322.878 ms stddev 1091.370
progress: 18.0 s, 934.1 tps, lat 1260.384 ms stddev 879.003
progress: 19.0 s, 700.0 tps, lat 1102.111 ms stddev 831.983
progress: 20.0 s, 840.0 tps, lat 1233.982 ms stddev 934.540
progress: 21.0 s, 869.0 tps, lat 1123.471 ms stddev 925.283
progress: 22.0 s, 755.9 tps, lat 1204.959 ms stddev 930.349
progress: 23.0 s, 579.0 tps, lat 1446.183 ms stddev 1029.218
progress: 24.0 s, 831.2 tps, lat 1552.375 ms stddev 1018.681
progress: 25.0 s, 748.0 tps, lat 1300.480 ms stddev 1027.219
progress: 26.0 s, 843.0 tps, lat 1181.850 ms stddev 953.880
progress: 27.0 s, 678.0 tps, lat 1285.499 ms stddev 977.864
progress: 28.0 s, 635.0 tps, lat 1618.651 ms stddev 1085.907
progress: 29.0 s, 817.0 tps, lat 1450.936 ms stddev 1071.654
progress: 30.0 s, 527.0 tps, lat 1182.952 ms stddev 967.962
progress: 31.0 s, 909.1 tps, lat 1428.860 ms stddev 1093.289
progress: 32.0 s, 800.0 tps, lat 1229.267 ms stddev 990.799
progress: 33.0 s, 825.0 tps, lat 1246.528 ms stddev 956.339
progress: 34.0 s, 918.0 tps, lat 1156.581 ms stddev 943.265
progress: 35.0 s, 840.0 tps, lat 1203.570 ms stddev 904.040
progress: 36.0 s, 743.0 tps, lat 1190.433 ms stddev 871.474
progress: 37.0 s, 666.0 tps, lat 1459.770 ms stddev 890.496
progress: 38.0 s, 572.9 tps, lat 1189.344 ms stddev 951.229
progress: 39.0 s, 401.8 tps, lat 1483.275 ms stddev 1261.468
progress: 40.0 s, 440.2 tps, lat 1889.157 ms stddev 1358.205
progress: 41.0 s, 485.1 tps, lat 1777.270 ms stddev 1670.644
progress: 42.0 s, 677.9 tps, lat 2097.539 ms stddev 1818.994
progress: 43.0 s, 708.9 tps, lat 1583.930 ms stddev 1552.509
progress: 44.0 s, 408.1 tps, lat 1527.591 ms stddev 1805.555
progress: 45.0 s, 500.0 tps, lat 2338.478 ms stddev 1747.161
progress: 46.0 s, 719.0 tps, lat 1497.862 ms stddev 1624.560
progress: 47.0 s, 609.0 tps, lat 1762.760 ms stddev 1630.875
progress: 48.0 s, 854.1 tps, lat 1863.932 ms stddev 1417.070
progress: 49.0 s, 718.0 tps, lat 1292.527 ms stddev 1155.167
progress: 50.0 s, 590.9 tps, lat 1290.508 ms stddev 1109.264
progress: 51.0 s, 723.1 tps, lat 1205.012 ms stddev 1193.895
progress: 52.0 s, 561.0 tps, lat 1752.166 ms stddev 1298.965
progress: 53.0 s, 1003.0 tps, lat 1431.500 ms stddev 1325.683
progress: 54.0 s, 781.9 tps, lat 1158.599 ms stddev 1201.845
progress: 55.0 s, 718.1 tps, lat 1344.554 ms stddev 1096.038
progress: 56.0 s, 702.0 tps, lat 1335.552 ms stddev 1143.465
progress: 57.0 s, 818.0 tps, lat 1484.493 ms stddev 1117.921
progress: 58.0 s, 710.0 tps, lat 1367.310 ms stddev 1011.972
progress: 59.0 s, 757.0 tps, lat 1237.101 ms stddev 1000.487
progress: 60.0 s, 513.0 tps, lat 1252.146 ms stddev 1080.443
progress: 61.0 s, 794.0 tps, lat 1469.649 ms stddev 1255.285
progress: 62.0 s, 779.9 tps, lat 1417.616 ms stddev 1176.119
progress: 63.0 s, 805.1 tps, lat 1152.615 ms stddev 1135.711
progress: 64.0 s, 689.0 tps, lat 1380.108 ms stddev 1213.144
progress: 65.0 s, 765.0 tps, lat 1441.557 ms stddev 1188.176
progress: 66.0 s, 731.0 tps, lat 1490.278 ms stddev 1173.860
progress: 67.0 s, 814.0 tps, lat 1242.075 ms stddev 1126.199
progress: 68.0 s, 721.9 tps, lat 1203.058 ms stddev 1091.802
progress: 69.0 s, 746.1 tps, lat 1545.313 ms stddev 1115.286
progress: 70.0 s, 807.0 tps, lat 1288.290 ms stddev 1097.507
progress: 71.0 s, 738.0 tps, lat 1196.415 ms stddev 1025.146
progress: 72.0 s, 820.9 tps, lat 1225.964 ms stddev 1095.748
progress: 73.0 s, 662.0 tps, lat 1202.538 ms stddev 1187.343
progress: 74.0 s, 560.0 tps, lat 1599.097 ms stddev 1213.314
progress: 75.0 s, 743.1 tps, lat 1698.436 ms stddev 1239.762
progress: 76.0 s, 765.0 tps, lat 1490.512 ms stddev 1166.862
progress: 77.0 s, 438.9 tps, lat 1703.758 ms stddev 1176.034
progress: 78.0 s, 485.0 tps, lat 1395.690 ms stddev 1244.769
progress: 79.0 s, 659.1 tps, lat 1793.684 ms stddev 1357.133
progress: 80.0 s, 619.1 tps, lat 1692.362 ms stddev 1356.774
progress: 81.0 s, 767.0 tps, lat 1566.055 ms stddev 1359.950
progress: 82.0 s, 805.9 tps, lat 1341.720 ms stddev 1206.535
progress: 83.0 s, 770.1 tps, lat 1361.601 ms stddev 1171.117
progress: 84.0 s, 795.0 tps, lat 1247.346 ms stddev 1120.079
progress: 85.0 s, 734.0 tps, lat 1425.364 ms stddev 1019.483
progress: 86.0 s, 706.9 tps, lat 1205.157 ms stddev 1030.842
progress: 87.0 s, 655.1 tps, lat 1568.083 ms stddev 1067.113
progress: 88.0 s, 575.0 tps, lat 1432.001 ms stddev 1073.638
progress: 89.0 s, 815.0 tps, lat 1411.512 ms stddev 1207.479
progress: 90.0 s, 841.9 tps, lat 1241.370 ms stddev 1116.575
progress: 91.0 s, 742.1 tps, lat 1514.167 ms stddev 1139.773
progress: 92.0 s, 645.9 tps, lat 1215.367 ms stddev 1036.669
progress: 93.0 s, 635.1 tps, lat 1368.759 ms stddev 1057.553
progress: 94.0 s, 392.0 tps, lat 1534.512 ms stddev 1537.359
progress: 95.0 s, 834.0 tps, lat 1609.177 ms stddev 1317.916
progress: 96.0 s, 665.9 tps, lat 1465.995 ms stddev 1357.130
progress: 97.0 s, 865.1 tps, lat 1535.662 ms stddev 1361.381
progress: 98.0 s, 525.0 tps, lat 1460.181 ms stddev 1195.200
progress: 99.0 s, 549.0 tps, lat 1573.314 ms stddev 1263.313
progress: 100.0 s, 708.0 tps, lat 1624.565 ms stddev 1275.677
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 72366
latency average = 1383.774 ms
latency stddev = 1157.510 ms
tps = 719.335255 (including connections establishing)
tps = 719.415616 (excluding connections establishing)


And if we completely eliminate deadlock detection penalty (by setting deadlock_timeout to 100 seconds),
then performance is almost the same:

[kk@hp06 knizhnik]$ vim pgsql/postgresql.conf
[kk@hp06 knizhnik]$ rm -f logfile ; pg_ctl -D pgsql -l logfile restart
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql postgres
progress: 1.0 s, 221.0 tps, lat 330.565 ms stddev 227.971
progress: 2.0 s, 689.0 tps, lat 910.581 ms stddev 523.957
progress: 3.0 s, 814.0 tps, lat 1275.584 ms stddev 840.832
progress: 4.0 s, 852.9 tps, lat 1248.029 ms stddev 961.648
progress: 5.0 s, 414.0 tps, lat 1659.273 ms stddev 922.593
progress: 6.0 s, 767.0 tps, lat 1501.320 ms stddev 1065.690
progress: 7.0 s, 833.9 tps, lat 1351.513 ms stddev 1064.518
progress: 8.0 s, 950.1 tps, lat 1111.468 ms stddev 865.477
progress: 9.0 s, 552.0 tps, lat 1453.967 ms stddev 859.895
progress: 10.0 s, 928.0 tps, lat 1298.292 ms stddev 980.558
progress: 11.0 s, 793.0 tps, lat 1032.620 ms stddev 946.944
progress: 12.0 s, 927.0 tps, lat 1231.162 ms stddev 970.198
progress: 13.0 s, 751.0 tps, lat 1182.206 ms stddev 909.727
progress: 14.0 s, 790.0 tps, lat 1406.509 ms stddev 883.296
progress: 15.0 s, 797.0 tps, lat 1179.952 ms stddev 870.870
progress: 16.0 s, 681.0 tps, lat 1412.632 ms stddev 868.443
progress: 17.0 s, 855.0 tps, lat 1205.755 ms stddev 927.496
progress: 18.0 s, 723.1 tps, lat 1391.945 ms stddev 1004.360
progress: 19.0 s, 739.0 tps, lat 1339.603 ms stddev 956.890
progress: 20.0 s, 899.0 tps, lat 1161.479 ms stddev 943.670
progress: 21.0 s, 802.0 tps, lat 1156.783 ms stddev 911.192
progress: 22.0 s, 742.0 tps, lat 1073.236 ms stddev 1000.682
progress: 23.0 s, 833.0 tps, lat 1316.291 ms stddev 1070.210
progress: 24.0 s, 631.0 tps, lat 1412.871 ms stddev 1154.117
progress: 25.0 s, 912.0 tps, lat 1418.955 ms stddev 1071.076
progress: 26.0 s, 690.0 tps, lat 1296.630 ms stddev 1086.067
progress: 27.0 s, 848.0 tps, lat 1231.006 ms stddev 957.761
progress: 28.0 s, 854.0 tps, lat 1237.371 ms stddev 958.098
progress: 29.0 s, 721.0 tps, lat 1220.976 ms stddev 955.515
progress: 30.0 s, 855.0 tps, lat 1236.774 ms stddev 999.678
progress: 31.0 s, 788.0 tps, lat 1259.278 ms stddev 963.804
progress: 32.0 s, 779.0 tps, lat 1365.986 ms stddev 979.700
progress: 33.0 s, 836.0 tps, lat 1201.902 ms stddev 947.314
progress: 34.0 s, 808.1 tps, lat 1149.186 ms stddev 929.090
progress: 35.0 s, 801.0 tps, lat 1276.192 ms stddev 954.079
progress: 36.0 s, 834.0 tps, lat 1270.994 ms stddev 1010.887
progress: 37.0 s, 732.0 tps, lat 1147.285 ms stddev 983.649
progress: 38.0 s, 604.0 tps, lat 1380.128 ms stddev 953.497
progress: 39.0 s, 887.0 tps, lat 1369.151 ms stddev 1052.405
progress: 40.0 s, 734.0 tps, lat 1160.158 ms stddev 1116.309
progress: 41.0 s, 477.0 tps, lat 1616.755 ms stddev 1130.704
progress: 42.0 s, 563.0 tps, lat 1688.876 ms stddev 1340.686
progress: 43.0 s, 782.0 tps, lat 1556.497 ms stddev 1335.607
progress: 44.0 s, 908.9 tps, lat 1424.074 ms stddev 1256.952
progress: 45.0 s, 572.0 tps, lat 987.486 ms stddev 1008.998
progress: 46.0 s, 588.0 tps, lat 1410.614 ms stddev 1231.046
progress: 47.0 s, 583.9 tps, lat 1427.524 ms stddev 1325.218
progress: 48.0 s, 499.1 tps, lat 2169.937 ms stddev 1605.538
progress: 49.0 s, 422.8 tps, lat 1466.671 ms stddev 1495.292
progress: 50.0 s, 586.2 tps, lat 2268.693 ms stddev 1732.014
progress: 51.0 s, 738.0 tps, lat 1620.541 ms stddev 1769.911
progress: 52.0 s, 748.9 tps, lat 1824.114 ms stddev 1602.422
progress: 53.0 s, 804.1 tps, lat 1379.048 ms stddev 1222.391
progress: 54.0 s, 797.9 tps, lat 1222.619 ms stddev 1004.013
progress: 55.0 s, 695.9 tps, lat 1049.683 ms stddev 1065.109
progress: 56.0 s, 541.2 tps, lat 1708.637 ms stddev 1123.485
progress: 57.0 s, 763.0 tps, lat 1640.274 ms stddev 1226.199
progress: 58.0 s, 700.0 tps, lat 1451.017 ms stddev 1239.310
progress: 59.0 s, 762.9 tps, lat 1398.833 ms stddev 994.492
progress: 60.0 s, 331.0 tps, lat 1027.525 ms stddev 1084.262
progress: 61.0 s, 419.0 tps, lat 2167.434 ms stddev 1423.208
progress: 62.0 s, 878.1 tps, lat 1917.376 ms stddev 1470.059
progress: 63.0 s, 591.9 tps, lat 1388.745 ms stddev 1381.181
progress: 64.0 s, 330.0 tps, lat 1211.436 ms stddev 1286.245
progress: 65.0 s, 210.0 tps, lat 1980.580 ms stddev 1619.982
progress: 66.0 s, 402.0 tps, lat 2808.983 ms stddev 1872.737
progress: 67.0 s, 549.9 tps, lat 2675.464 ms stddev 1907.931
progress: 68.0 s, 452.1 tps, lat 2371.877 ms stddev 2113.891
progress: 69.0 s, 781.0 tps, lat 2072.552 ms stddev 1753.277
progress: 70.0 s, 726.0 tps, lat 1444.933 ms stddev 1347.249
progress: 71.0 s, 794.0 tps, lat 1360.607 ms stddev 1250.773
progress: 72.0 s, 695.0 tps, lat 1339.559 ms stddev 985.322
progress: 73.0 s, 455.0 tps, lat 1569.596 ms stddev 1231.280
progress: 74.0 s, 808.0 tps, lat 1406.059 ms stddev 1224.969
progress: 75.0 s, 715.0 tps, lat 1479.549 ms stddev 1282.023
progress: 76.0 s, 807.0 tps, lat 1372.573 ms stddev 1200.525
progress: 77.0 s, 628.0 tps, lat 1294.128 ms stddev 1204.801
progress: 78.0 s, 674.0 tps, lat 1642.165 ms stddev 1157.958
progress: 79.0 s, 771.0 tps, lat 1375.102 ms stddev 1170.966
progress: 80.0 s, 643.0 tps, lat 1404.529 ms stddev 1210.807
progress: 81.0 s, 925.0 tps, lat 1248.700 ms stddev 1155.241
progress: 82.0 s, 629.0 tps, lat 1179.140 ms stddev 1100.482
progress: 83.0 s, 606.0 tps, lat 1323.679 ms stddev 1211.809
progress: 84.0 s, 612.0 tps, lat 1836.341 ms stddev 1379.864
progress: 85.0 s, 835.0 tps, lat 1549.102 ms stddev 1332.389
progress: 86.0 s, 437.0 tps, lat 1382.066 ms stddev 1304.104
progress: 87.0 s, 756.0 tps, lat 1668.598 ms stddev 1256.808
progress: 88.0 s, 563.0 tps, lat 1619.388 ms stddev 1336.066
progress: 89.0 s, 633.0 tps, lat 1526.824 ms stddev 1257.665
progress: 90.0 s, 665.0 tps, lat 1562.750 ms stddev 1321.690
progress: 91.0 s, 860.0 tps, lat 1445.318 ms stddev 1280.446
progress: 92.0 s, 703.0 tps, lat 1507.119 ms stddev 1148.576
progress: 93.0 s, 809.0 tps, lat 1267.364 ms stddev 1028.977
progress: 94.0 s, 597.9 tps, lat 1296.165 ms stddev 1119.035
progress: 95.0 s, 673.8 tps, lat 1317.714 ms stddev 1132.078
progress: 96.0 s, 673.3 tps, lat 1461.346 ms stddev 1152.063
progress: 97.0 s, 749.0 tps, lat 1491.871 ms stddev 1272.290
progress: 98.0 s, 747.0 tps, lat 1450.976 ms stddev 1301.491
progress: 99.0 s, 582.9 tps, lat 1605.604 ms stddev 1150.761
progress: 100.0 s, 655.1 tps, lat 1527.173 ms stddev 1111.883
transaction type: test.sql
scaling factor: 1
query mode: simple
number of clients: 1000
number of threads: 36
duration: 100 s
number of transactions actually processed: 70576
latency average = 1416.917 ms
latency stddev = 1201.539 ms
tps = 702.465649 (including connections establishing)
tps = 702.540071 (excluding connections establishing)


So I understand Andres concern about starvation of deadlock detection.
But I failed to create scenario where
a) Postgres normally works with current deadlock detection policy
b) Performance is dropped and latency increased because of starvation of deadlock detection caused by this patch.

And it is very hard not to notice 17-times difference.
Certainly it is true in the assumption that most deadlock timeout expiration are caused by high workload and contention, and not by real deadlocks.
But it seems to be quite common case.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

deadlock-4.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Robert Haas
On Wed, Apr 18, 2018 at 10:08 AM, Konstantin Knizhnik
<[hidden email]> wrote:
> And it is very hard not to notice 17-times difference.
> Certainly it is true in the assumption that most deadlock timeout expiration
> are caused by high workload and contention, and not by real deadlocks.
> But it seems to be quite common case.

If I understand this workload correctly, the contention is for the
relation extension lock.  But I think we're likely to move that out of
the heavyweight lock manager altogether in the not-too-distant future,
as proposed in https://commitfest.postgresql.org/17/1133/ ?  I'd be
interested in hearing what happens to performance with that patch
applied.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

konstantin knizhnik


On 20.04.2018 18:36, Robert Haas wrote:
On Wed, Apr 18, 2018 at 10:08 AM, Konstantin Knizhnik
[hidden email] wrote:
And it is very hard not to notice 17-times difference.
Certainly it is true in the assumption that most deadlock timeout expiration
are caused by high workload and contention, and not by real deadlocks.
But it seems to be quite common case.
If I understand this workload correctly, the contention is for the
relation extension lock.  But I think we're likely to move that out of
the heavyweight lock manager altogether in the not-too-distant future,
as proposed in https://commitfest.postgresql.org/17/1133/ ?  I'd be
interested in hearing what happens to performance with that patch
applied.


With the extension lock patch performance in increased to 1146 TPS.
So it is much better than with vanilla postgres and about 40% better than with deadlock patch (1146 vs. 719 TPS).
Profile is the following:

 33.51%  postgres                                [.] s_lock
   4.59%  postgres                                [.] LWLockWaitListLock
   3.67%  postgres                                [.] perform_spin_delay
   3.04%  [kernel]                                [k] gup_pgd_range
   2.43%  [kernel]                                [k] get_futex_key
   2.00%  [kernel]                                [k] __basepage_index
   1.20%  postgres                                [.] calculateDigestFromBuffer
   0.97%  [kernel]                                [k] update_load_avg
   0.97%  postgres                                [.] XLogInsertRecord
   0.93%  [kernel]                                [k] switch_mm_irqs_off
   0.90%  postgres                                [.] LWLockAttemptLock
   0.88%  [kernel]                                [k] _atomic_dec_and_lock
   0.84%  [kernel]                                [k] __schedule
   0.82%  postgres                                [.] ConditionVariableBroadcast
   0.75%  postgres                                [.] LWLockRelease
   0.74%  [kernel]                                [k] syscall_return_via_sysret
   0.65%  postgres                                [.] SetLatch
   0.64%  [kernel]                                [k] _raw_spin_lock_irqsave
   0.62%  [kernel]                                [k] copy_user_enhanced_fast_string
   0.59%  postgres                                [.] RelationPutHeapTuple
   0.55%  [kernel]                                [k] select_task_rq_fair
   0.54%  [kernel]                                [k] try_to_wake_up
   0.52%  [kernel]                                [k] menu_select

So definitely elimination heavy weight relation extension lock is good idea which eliminates the need for my deadlock patch ... but only in this insert test.
As I have mentioned at the beginning of this thread the same problem with deadlock detection timeout expiration we have with YSCB benchmark with zipf distribution.
Here the source of contention are tuple locks. And as far as I understand from the discussion in the mentioned thread, it is not possible to eliminate heavy weight tuple locks.



-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Álvaro Herrera
Konstantin Knizhnik wrote:

> As I have mentioned at the beginning of this thread the same problem with
> deadlock detection timeout expiration we have with YSCB benchmark with zipf
> distribution.
> Here the source of contention are tuple locks. And as far as I understand
> from the discussion in the mentioned thread, it is not possible to eliminate
> heavy weight tuple locks.

Well, if the source of tuple locks are foreign keys, maybe we can fix
that problem by removing the need for tuple locks in the first place --
for example, mark a table as prohibiting DELETEs, TRUNCATE, and any
UPDATE that modifies the columns of the primary key.  With that, you
don't need to lock rows in the FK triggers.  (This is like 'ALTER TABLE
tb SET READ ONLY', except the restrictions are less severe).

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Postgres stucks in deadlock detection

Robert Haas
In reply to this post by konstantin knizhnik
On Fri, Apr 20, 2018 at 12:14 PM, Konstantin Knizhnik
<[hidden email]> wrote:
> With the extension lock patch performance in increased to 1146 TPS.
> So it is much better than with vanilla postgres and about 40% better than
> with deadlock patch (1146 vs. 719 TPS).

Cool, that's good to know.

> So definitely elimination heavy weight relation extension lock is good idea
> which eliminates the need for my deadlock patch ... but only in this insert
> test.
> As I have mentioned at the beginning of this thread the same problem with
> deadlock detection timeout expiration we have with YSCB benchmark with zipf
> distribution.
> Here the source of contention are tuple locks. And as far as I understand
> from the discussion in the mentioned thread, it is not possible to eliminate
> heavy weight tuple locks.

Right, that variant would have to be solved in some other way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company