BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16016
Logged by:          Alexey Ermakov
Email address:      [hidden email]
PostgreSQL version: 11.5
Operating system:   Ubuntu 16.04.6 LTS
Description:        

Hello,

recently I encountered strange situation when replica server stopped serving
queries. most likely it happened because startup process was holding access
exclusive lock on pg_statistic's toast table and was waiting for some
backend which lead to deadlock.
situation resolved after max_standby_streaming_delay timeout kicked in and
conflicting backend's queries were canceled.

more details:

1) first waiting backend in logs:
2019-09-19 12:14:11.314 UTC 144285 user@db from [local] [vxid:21/106350189
txid:0] [BIND waiting] LOG:  process 144285 still waiting for
AccessShareLock on relation 2840 of database 16389 after 1000.115 ms
2019-09-19 12:14:11.314 UTC 144285 user@db from [local] [vxid:21/106350189
txid:0] [BIND waiting] DETAIL:  Process holding the lock: 6002. Wait queue:
144285, 136649, 127349, 76417, 97350, 93019, 109010, 103163, 136300.

2) 6002 pid is indeed startup process:
ps ax | grep 6002
6002 ?        Ss   200:16 postgres: 11/main: startup   recovering
000000010000425900000094

3) locked relation is pg_statistic's toast table:

select 2840::regclass;
        regclass
------------------------
 pg_toast.pg_toast_2619

select 2619::regclass;
   regclass
--------------
 pg_statistic

4) I guess autovacuum was working on pg_statistic table on master server
right before problem occurred and cleaned up some pages at the end of
relation.
but can't confirm, log_autovacuum_min_duration = 1s and no entries in logs,
but table is pretty small (~2MB) so autovacuum might be finished in less
than 1s.

5) according to client no DB migrations were performed at that time.


same thing (again toast table of pg_statistic) also happened year ago twice
on 9.6.6 version on completely different server and project.
unfortunately I wasn't able to get gdb backtrace of startup process, so it
might be difficult to understand where it got stuck :(

at year's ago case I was able to get output of pg_locks, but not much
interesting there:

select * from pg_locks;
  locktype  | database | relation | page | tuple | virtualxid  |
transactionid | classid | objid | objsubid | virtualtransaction |  pid   |  
     mode         | granted | fastpath
------------+----------+----------+------+-------+-------------+---------------+---------+-------+----------+--------------------+--------+---------------------+---------+----------
...truncated, I can show full output if it might help...
 virtualxid |          |          |      |       | 12/19089055 |            
  |         |       |          | 12/19089055        |  91145 | ExclusiveLock
      | t       | t
 virtualxid |          |          |      |       | 1/1         |            
  |         |       |          | 1/0                |  16141 | ExclusiveLock
      | t       | t
 relation   |    20508 |     2840 |      |       |             |            
  |         |       |          | 12/19089055        |  91145 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |            
  |         |       |          | 24/18427740        |  94698 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |            
  |         |       |          | 26/15964159        |  96387 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |            
  |         |       |          | 27/11337343        | 107448 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |            
  |         |       |          | 28/9666022         |  99110 |
AccessShareLock     | f       | f
 relation   |    20508 |     2840 |      |       |             |            
  |         |       |          | 1/0                |  16141 |
AccessExclusiveLock | t       | f
(56 rows)

in that case pid 16141 - startup process and again 2840 - oid of toast table
of pg_statistic.
16141 ?        Ss   14169:14 postgres: 9.6/main: startup process  
recovering 000000020000A3E3000000DC waiting

--
Thanks,
Alexey Ermakov

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

Sergei Kornilov
Hello

Seems I have testcase for this report:

primary + streaming replica with such settings:
log_lock_waits = 'on'
max_standby_streaming_delay = '-1'
fsync = off
synchronous_commit = off
(seems this is rare race condition and I can't catch on slow fsync)

Create some table for queries:
create table tablename as select generate_series(1,100) as i;

Run on primary:
pgbench -f primary.sql -c 1 -t 100000 --port 5555 postgres
primary.sql is:
vacuum full pg_statistic;
vacuum full tablename;
(some activity with AccessExclusiveLock)

On replica:
pgbench -f ro.sql --time=300 -n -c 20 --port 5556 postgres
Script content:
\set i random(1,100)
select * from tablename where i = :i;

Usually replica kills queries with ERROR:  deadlock detected and pgbench stops. But sometimes (usually less than 5-10 times on my host) both startup and backend with running query will start waiting something. Also new connections may become in "startup waiting" status indefinitely.

With another max_standby_streaming_delay queries are killed, but only after this timeout (as reported in this bug report). I think this should be detected as deadlock but not happens for some reason.

regards, Sergei


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

Tom Lane-2
Sergei Kornilov <[hidden email]> writes:
> Seems I have testcase for this report:

If you have a test case, could you collect stack traces from each
of the stuck processes?  That would eliminate a lot of hypothesizing.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

alexey.ermakov-2
Hello,

On 9/23/19 01:06, Tom Lane wrote:
> If you have a test case, could you collect stack traces from each
> of the stuck processes?  That would eliminate a lot of hypothesizing.
>
> regards, tom lane

I reproduced Sergei's test case on postgresql 11.5, replica hung up
almost immediately after pgbench ran.

9907 - pid of startup process, 16387 - oid of test table, 2619 - oid of
pg_statistic, 2840 - oid of toast table of pg_statistic.

1) pgbench on replica with 1 concurrent processes (-c 1):

this case looks a bit different from what happened on initial report
(which recently happened again btw) because at this time I can't even
open new connection via psql or run query with pg_stat_activity - in
hangs (pg_locks query works).
perhaps because this time access exclusive lock is on pg_statistic table
too, not only on it's toast table.

select * from pg_locks;
   locktype  | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid  
|        mode         | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
  relation   |    13217 |    11645 |      |       | |              
|         |       |          | 2/73518 | 17266 | AccessShareLock     |
t       | t
  virtualxid |          |          |      |       | 2/73518
|               |         |       |          | 2/73518 | 17266 |
ExclusiveLock       | t       | t
  virtualxid |          |          |      |       | 3/4590466
|               |         |       |          | 3/4590466 | 32074 |
ExclusiveLock       | t       | t
  virtualxid |          |          |      |       | 1/1 |              
|         |       |          | 1/0 |  9907 | ExclusiveLock       |
t       | t
  relation   |    13217 |     2841 |      |       | |              
|         |       |          | 1/0 |  9907 | AccessExclusiveLock |
t       | f
  relation   |    13217 |     2619 |      |       | |              
|         |       |          | 3/4590466 | 32074 | AccessShareLock     |
f       | f
  relation   |    13217 |     2840 |      |       | |              
|         |       |          | 1/0 |  9907 | AccessExclusiveLock |
t       | f
  relation   |    13217 |  5364206 |      |       | |              
|         |       |          | 1/0 |  9907 | AccessExclusiveLock |
t       | f
  relation   |    13217 |     2619 |      |       | |              
|         |       |          | 1/0 |  9907 | AccessExclusiveLock |
t       | f
  relation   |    13217 |  5364204 |      |       | |              
|         |       |          | 1/0 |  9907 | AccessExclusiveLock |
t       | f
  relation   |    13217 |    16387 |      |       | |              
|         |       |          | 3/4590466 | 32074 | AccessShareLock     |
t       | f
  relation   |    13217 |  5364201 |      |       | |              
|         |       |          | 1/0 |  9907 | AccessExclusiveLock |
t       | f
(12 rows)


stack trace of startup process 9907:
#0  0x00007fdff4a025b3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005608fde0c6cd in pg_usleep (microsec=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/port/pgsleep.c:56
#2  0x00005608fdc94126 in WaitExceedsMaxStandbyDelay () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:201
#3  ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x5608fe46a450,
reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:262
#4  0x00005608fdc9434a in ResolveRecoveryConflictWithVirtualXIDs
(reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:315
#5  ResolveRecoveryConflictWithSnapshot (latestRemovedXid=<optimized
out>, node=...) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:313
#6  0x00005608fda04ddc in heap_xlog_clean (record=0x5608fe426578) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:8238
#7  heap2_redo (record=0x5608fe426578) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:9391
#8  0x00005608fda4b974 in StartupXLOG () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/transam/xlog.c:7306
#9  0x00005608fdc3b4d3 in StartupProcessMain () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/startup.c:211
#10 0x00005608fda5b763 in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffc9277c7e0) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/bootstrap/bootstrap.c:441
#11 0x00005608fdc381e3 in StartChildProcess (type=StartupProcess) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:5337
#12 0x00005608fdc3af5e in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1372
#13 0x00005608fd9c0075 in main (argc=5, argv=0x5608fe425820) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/main/main.c:228

stack trace of backend 32074:
#0  0x00007fdff4a0c9f3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005608fdc8cb51 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffc9277a6f0, cur_timeout=-1, set=0x5608fe42bed8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x5608fe42bed8,
timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7ffc9277a6f0,
nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=50331648)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:1000
#3  0x00005608fdc8cfc4 in WaitLatchOrSocket (latch=0x7fdfebbf1594,
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1,
timeout@entry=0, wait_event_info=50331648)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:385
#4  0x00005608fdc8d075 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0,
wait_event_info=<optimized out>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:339
#5  0x00005608fdc9d591 in ProcSleep
(locallock=locallock@entry=0x5608fe4488e0,
lockMethodTable=lockMethodTable@entry=0x5608fe2321e0 <default_lockmethod>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/proc.c:1272
#6  0x00005608fdc97c99 in WaitOnLock
(locallock=locallock@entry=0x5608fe4488e0,
owner=owner@entry=0x5608fe474370) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lock.c:1744
#7  0x00005608fdc98fe1 in LockAcquireExtended
(locktag=locktag@entry=0x7ffc9277aac0, lockmode=lockmode@entry=1,
sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false,
reportMemoryError=reportMemoryError@entry=true,
     locallockp=locallockp@entry=0x7ffc9277aab8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lock.c:1026
#8  0x00005608fdc96648 in LockRelationOid (relid=2619, lockmode=1) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lmgr.c:113
#9  0x00005608fd9fb4cd in relation_open (relationId=2619,
lockmode=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:1132
#10 0x00005608fd9fb6de in heap_open (relationId=<optimized out>,
lockmode=lockmode@entry=1) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:1298
#11 0x00005608fddafb81 in SearchCatCacheMiss
(cache=cache@entry=0x5608fe498500, nkeys=nkeys@entry=3,
hashValue=hashValue@entry=2374461275, hashIndex=hashIndex@entry=91,
v1=v1@entry=16387, v2=v2@entry=1, v3=0, v4=0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/cache/catcache.c:1375
#12 0x00005608fddb093f in SearchCatCacheInternal (v4=0, v3=0,
v2=<optimized out>, v1=<optimized out>, nkeys=3, cache=0x5608fe498500)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/cache/catcache.c:1317
#13 SearchCatCache3 (cache=0x5608fe498500, v1=16387, v2=1, v3=0) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/cache/catcache.c:1201
#14 0x00005608fddc2f18 in SearchSysCache3 (cacheId=cacheId@entry=57,
key1=<optimized out>, key2=<optimized out>, key3=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/cache/syscache.c:1141
#15 0x00005608fdd76ca6 in examine_simple_variable (var=<optimized out>,
var=<optimized out>, vardata=0x7ffc9277aeb0, root=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/adt/selfuncs.c:5037
#16 examine_variable (root=root@entry=0x5608fe4f5fd8, node=<optimized
out>, varRelid=varRelid@entry=0, vardata=vardata@entry=0x7ffc9277aeb0)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/adt/selfuncs.c:4821
#17 0x00005608fdd772c7 in get_restriction_variable (root=0x5608fe4f5fd8,
args=<optimized out>, varRelid=0, vardata=0x7ffc9277aeb0,
other=0x7ffc9277aea8, varonleft=0x7ffc9277aea6)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/adt/selfuncs.c:4683
#18 0x00005608fdd783f6 in eqsel_internal (fcinfo=<optimized out>,
negate=negate@entry=false) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/adt/selfuncs.c:279
#19 0x00005608fdd786eb in eqsel (fcinfo=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/adt/selfuncs.c:243
#20 0x00005608fddd1cf5 in OidFunctionCall4Coll (functionId=<optimized
out>, collation=collation@entry=0, arg1=arg1@entry=94596626341848,
arg2=arg2@entry=96, arg3=arg3@entry=94596626343080, arg4=arg4@entry=0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/utils/fmgr/fmgr.c:1501
#21 0x00005608fdc1c07d in restriction_selectivity
(root=root@entry=0x5608fe4f5fd8, operatorid=96, args=0x5608fe4f64a8,
inputcollid=0, varRelid=varRelid@entry=0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/util/plancat.c:1813
#22 0x00005608fdbdba11 in clause_selectivity
(root=root@entry=0x5608fe4f5fd8, clause=<optimized out>,
varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
sjinfo=sjinfo@entry=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/clausesel.c:748
#23 0x00005608fdbdbbcb in clauselist_selectivity
(root=root@entry=0x5608fe4f5fd8, clauses=0x5608fe4f6d38,
varRelid=varRelid@entry=0, jointype=jointype@entry=JOIN_INNER,
sjinfo=sjinfo@entry=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/clausesel.c:117
#24 0x00005608fdbe19b1 in set_baserel_size_estimates
(root=root@entry=0x5608fe4f5fd8, rel=rel@entry=0x5608fe4f6628) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/costsize.c:4308
#25 0x00005608fdbda64c in set_plain_rel_size (rte=<optimized out>,
rel=0x5608fe4f6628, root=0x5608fe4f5fd8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/allpaths.c:532
#26 set_rel_size (root=root@entry=0x5608fe4f5fd8,
rel=rel@entry=0x5608fe4f6628, rti=rti@entry=1, rte=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/allpaths.c:369
#27 0x00005608fdbdb37e in set_base_rel_sizes (root=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/allpaths.c:281
#28 make_one_rel (root=root@entry=0x5608fe4f5fd8,
joinlist=joinlist@entry=0x5608fe4f69d8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/path/allpaths.c:179
#29 0x00005608fdbfa130 in query_planner (root=root@entry=0x5608fe4f5fd8,
tlist=tlist@entry=0x5608fe4f6408,
qp_callback=qp_callback@entry=0x5608fdbfb530 <standard_qp_callback>,
qp_extra=qp_extra@entry=0x7ffc9277b760)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/plan/planmain.c:265
#30 0x00005608fdbfe94b in grouping_planner
(root=root@entry=0x5608fe4f5fd8,
inheritance_update=inheritance_update@entry=false,
tuple_fraction=<optimized out>, tuple_fraction@entry=0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/plan/planner.c:1942
#31 0x00005608fdc00d36 in subquery_planner
(glob=glob@entry=0x5608fe42c128, parse=parse@entry=0x5608fe42bb38,
parent_root=parent_root@entry=0x0,
hasRecursion=hasRecursion@entry=false,
tuple_fraction=tuple_fraction@entry=0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/plan/planner.c:966
#32 0x00005608fdc01d15 in standard_planner (parse=0x5608fe42bb38,
cursorOptions=256, boundParams=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/optimizer/plan/planner.c:405
#33 0x00005608fdcaec59 in pg_plan_query
(querytree=querytree@entry=0x5608fe42bb38,
cursorOptions=cursorOptions@entry=256, boundParams=boundParams@entry=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:832
#34 0x00005608fdcaed36 in pg_plan_queries (querytrees=<optimized out>,
cursorOptions=cursorOptions@entry=256,
boundParams=boundParams@entry=0x0) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:898
#35 0x00005608fdcaf267 in exec_simple_query (query_string=0x5608fe42ab10
"select * from tablename where i = 6;") at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:1073
#36 0x00005608fdcb064b in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x5608fe472900, dbname=0x5608fe472858 "postgres",
username=<optimized out>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:4182
#37 0x00005608fd9becd2 in BackendRun (port=0x5608fe46b070) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:4358
#38 BackendStartup (port=0x5608fe46b070) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:4030
#39 ServerLoop () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1707
#40 0x00005608fdc3af82 in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1380
#41 0x00005608fd9c0075 in main (argc=5, argv=0x5608fe425820) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/main/main.c:228


2) pgbench on replica with 2 concurrent processes (-c 2):
this case also looks a bit different from initial report, here we have
AccessExclusiveLock on test table, not on pg_statistic's toast table:

select * from pg_stat_activity (both backend connections are waiting
until max_standby_streaming_delay timeout happened):
...
-[ RECORD 2 ]----+--------------------------------------
datid            | 13217
datname          | postgres
pid              | 21152
usesysid         | 10
usename          | postgres
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 2019-11-01 13:00:59.910532+06
xact_start       | 2019-11-01 13:01:08.160028+06
query_start      | 2019-11-01 13:01:08.160028+06
state_change     | 2019-11-01 13:01:08.160029+06
wait_event_type  | Lock
wait_event       | relation
state            | active
backend_xid      |
backend_xmin     | 473235
query            | select * from tablename where i = 89;
backend_type     | client backend
-[ RECORD 3 ]----+--------------------------------------
datid            | 13217
datname          | postgres
pid              | 21153
usesysid         | 10
usename          | postgres
application_name | pgbench
client_addr      |
client_hostname  |
client_port      | -1
backend_start    | 2019-11-01 13:00:59.913201+06
xact_start       | 2019-11-01 13:01:08.159873+06
query_start      | 2019-11-01 13:01:08.159873+06
state_change     | 2019-11-01 13:01:08.159873+06
wait_event_type  | Lock
wait_event       | relation
state            | active
backend_xid      |
backend_xmin     | 473235
query            | select * from tablename where i = 72;
backend_type     | client backend
-[ RECORD 4 ]----+--------------------------------------
datid            |
datname          |
pid              | 9907
usesysid         |
usename          |
application_name |
client_addr      |
client_hostname  |
client_port      |
backend_start    | 2019-10-31 19:11:55.514707+06
xact_start       |
query_start      |
state_change     |
wait_event_type  |
wait_event       |
state            |
backend_xid      |
backend_xmin     |
query            |
backend_type     | startup
...

select * from pg_locks:
   locktype  | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |  pid  
|        mode         | granted | fastpath
------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
  relation   |    13217 |    11645 |      | |            |              
|         |       |          | 2/30436            | 17266 |
AccessShareLock     | t       | t
  virtualxid |          |          |      |       | 2/30436    
|               |         |       |          | 2/30436            |
17266 | ExclusiveLock       | t       | t
  virtualxid |          |          |      |       | 4/2366585  
|               |         |       |          | 4/2366585          |
21153 | ExclusiveLock       | t       | t
  virtualxid |          |          |      |       | 3/2624520  
|               |         |       |          | 3/2624520          |
21152 | ExclusiveLock       | t       | t
  virtualxid |          |          |      |       | 1/1        
|               |         |       |          | 1/0                |  
9907 | ExclusiveLock       | t       | t
  relation   |    13217 |    16387 |      | |            |              
|         |       |          | 1/0                |  9907 |
AccessExclusiveLock | t       | f
  relation   |    13217 |    16387 |      | |            |              
|         |       |          | 3/2624520          | 21152 |
AccessShareLock     | f       | f
  relation   |    13217 |    16387 |      | |            |              
|         |       |          | 4/2366585          | 21153 |
AccessShareLock     | f       | f
(8 rows)

stack trace of startup process 9907:

#0  0x00007fdff4a025b3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005608fde0c6cd in pg_usleep (microsec=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/port/pgsleep.c:56
#2  0x00005608fdc94126 in WaitExceedsMaxStandbyDelay () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:201
#3  ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x5608fe46a450,
reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:262
#4  0x00005608fdc9434a in ResolveRecoveryConflictWithVirtualXIDs
(reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:315
#5  ResolveRecoveryConflictWithSnapshot
(latestRemovedXid=latestRemovedXid@entry=473242, node=...) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:313
#6  0x00005608fda1d8b2 in btree_xlog_delete (record=0x5608fe426578) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/nbtree/nbtxlog.c:703
#7  btree_redo (record=0x5608fe426578) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/nbtree/nbtxlog.c:1023
#8  0x00005608fda4b974 in StartupXLOG () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/transam/xlog.c:7306
#9  0x00005608fdc3b4d3 in StartupProcessMain () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/startup.c:211
#10 0x00005608fda5b763 in AuxiliaryProcessMain (argc=argc@entry=2,
argv=argv@entry=0x7ffc9277c7e0) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/bootstrap/bootstrap.c:441
#11 0x00005608fdc381e3 in StartChildProcess (type=StartupProcess) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:5337
#12 0x00005608fdc3af5e in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1372
#13 0x00005608fd9c0075 in main (argc=5, argv=0x5608fe425820) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/main/main.c:228

backend 21153:
#0  0x00007fdff4a0c9f3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005608fdc8cb51 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffc9277b1a0, cur_timeout=-1, set=0x5608fe42c8b8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x5608fe42c8b8,
timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7ffc9277b1a0,
nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=50331648)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:1000
#3  0x00005608fdc8cfc4 in WaitLatchOrSocket (latch=0x7fdfebbf0ba4,
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1,
timeout@entry=0, wait_event_info=50331648)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:385
#4  0x00005608fdc8d075 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0,
wait_event_info=<optimized out>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:339
#5  0x00005608fdc9d591 in ProcSleep
(locallock=locallock@entry=0x5608fe448940,
lockMethodTable=lockMethodTable@entry=0x5608fe2321e0 <default_lockmethod>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/proc.c:1272
#6  0x00005608fdc97c99 in WaitOnLock
(locallock=locallock@entry=0x5608fe448940,
owner=owner@entry=0x5608fe474370) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lock.c:1744
#7  0x00005608fdc98fe1 in LockAcquireExtended
(locktag=locktag@entry=0x7ffc9277b570, lockmode=lockmode@entry=1,
sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false,
reportMemoryError=reportMemoryError@entry=true,
     locallockp=locallockp@entry=0x7ffc9277b568) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lock.c:1026
#8  0x00005608fdc96648 in LockRelationOid (relid=16387, lockmode=1) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lmgr.c:113
#9  0x00005608fda6a4cd in RangeVarGetRelidExtended
(relation=0x5608fe42b650, lockmode=1, flags=1,
callback=callback@entry=0x0, callback_arg=callback_arg@entry=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/catalog/namespace.c:379
#10 0x00005608fd9fb629 in relation_openrv_extended (relation=<optimized
out>, lockmode=<optimized out>, missing_ok=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:1250
#11 0x00005608fd9fb841 in heap_openrv_extended
(relation=relation@entry=0x5608fe42b650, lockmode=lockmode@entry=1,
missing_ok=missing_ok@entry=true)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:1358
#12 0x00005608fdac6049 in parserOpenTable (pstate=0x5608fe42ba28,
relation=0x5608fe42b650, lockmode=1) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_relation.c:1152
#13 0x00005608fdac7a6a in addRangeTableEntry
(pstate=pstate@entry=0x5608fe42ba28,
relation=relation@entry=0x5608fe42b650, alias=0x0, inh=<optimized out>,
inFromCl=inFromCl@entry=true)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_relation.c:1217
#14 0x00005608fdab2369 in transformTableEntry (r=0x5608fe42b650,
pstate=0x5608fe42ba28) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_clause.c:435
#15 transformFromClauseItem (pstate=pstate@entry=0x5608fe42ba28,
n=0x5608fe42b650, top_rte=top_rte@entry=0x7ffc9277b858,
top_rti=top_rti@entry=0x7ffc9277b854,
namespace=namespace@entry=0x7ffc9277b860)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_clause.c:1122
#16 0x00005608fdab3835 in transformFromClause
(pstate=pstate@entry=0x5608fe42ba28, frmList=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_clause.c:139
#17 0x00005608fda90804 in transformSelectStmt (stmt=0x5608fe42b898,
pstate=0x5608fe42ba28) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:1240
#18 transformStmt (pstate=pstate@entry=0x5608fe42ba28,
parseTree=0x5608fe42b898) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:301
#19 0x00005608fda92550 in transformOptionalSelectInto
(pstate=pstate@entry=0x5608fe42ba28, parseTree=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:246
#20 0x00005608fda925d5 in transformTopLevelStmt
(pstate=pstate@entry=0x5608fe42ba28,
parseTree=parseTree@entry=0x5608fe42b9a8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:196
#21 0x00005608fda92648 in parse_analyze (parseTree=0x5608fe42b9a8,
sourceText=sourceText@entry=0x5608fe42ab10 "select * from tablename
where i = 72;", paramTypes=0x0, numParams=0, queryEnv=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:116
#22 0x00005608fdcaeb07 in pg_analyze_and_rewrite
(parsetree=parsetree@entry=0x5608fe42b9a8,
query_string=query_string@entry=0x5608fe42ab10 "select * from tablename
where i = 72;", paramTypes=paramTypes@entry=0x0,
     numParams=numParams@entry=0, queryEnv=queryEnv@entry=0x0) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:689
#23 0x00005608fdcaf258 in exec_simple_query (query_string=0x5608fe42ab10
"select * from tablename where i = 72;") at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:1070
#24 0x00005608fdcb064b in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x5608fe472900, dbname=0x5608fe472858 "postgres",
username=<optimized out>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:4182
#25 0x00005608fd9becd2 in BackendRun (port=0x5608fe46b070) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:4358
#26 BackendStartup (port=0x5608fe46b070) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:4030
#27 ServerLoop () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1707
#28 0x00005608fdc3af82 in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1380
#29 0x00005608fd9c0075 in main (argc=5, argv=0x5608fe425820) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/main/main.c:228


backend 21152 (looks the same):
#0  0x00007fdff4a0c9f3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00005608fdc8cb51 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffc9277b1a0, cur_timeout=-1, set=0x5608fe42c8b8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:1048
#2  WaitEventSetWait (set=set@entry=0x5608fe42c8b8,
timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7ffc9277b1a0,
nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=50331648)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:1000
#3  0x00005608fdc8cfc4 in WaitLatchOrSocket (latch=0x7fdfebbf1244,
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1,
timeout@entry=0, wait_event_info=50331648)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:385
#4  0x00005608fdc8d075 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0,
wait_event_info=<optimized out>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/latch.c:339
#5  0x00005608fdc9d591 in ProcSleep
(locallock=locallock@entry=0x5608fe448940,
lockMethodTable=lockMethodTable@entry=0x5608fe2321e0 <default_lockmethod>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/proc.c:1272
#6  0x00005608fdc97c99 in WaitOnLock
(locallock=locallock@entry=0x5608fe448940,
owner=owner@entry=0x5608fe474370) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lock.c:1744
#7  0x00005608fdc98fe1 in LockAcquireExtended
(locktag=locktag@entry=0x7ffc9277b570, lockmode=lockmode@entry=1,
sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false,
reportMemoryError=reportMemoryError@entry=true,
     locallockp=locallockp@entry=0x7ffc9277b568) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lock.c:1026
#8  0x00005608fdc96648 in LockRelationOid (relid=16387, lockmode=1) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/lmgr/lmgr.c:113
#9  0x00005608fda6a4cd in RangeVarGetRelidExtended
(relation=0x5608fe42b650, lockmode=1, flags=1,
callback=callback@entry=0x0, callback_arg=callback_arg@entry=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/catalog/namespace.c:379
#10 0x00005608fd9fb629 in relation_openrv_extended (relation=<optimized
out>, lockmode=<optimized out>, missing_ok=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:1250
#11 0x00005608fd9fb841 in heap_openrv_extended
(relation=relation@entry=0x5608fe42b650, lockmode=lockmode@entry=1,
missing_ok=missing_ok@entry=true)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/access/heap/heapam.c:1358
#12 0x00005608fdac6049 in parserOpenTable (pstate=0x5608fe42ba28,
relation=0x5608fe42b650, lockmode=1) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_relation.c:1152
#13 0x00005608fdac7a6a in addRangeTableEntry
(pstate=pstate@entry=0x5608fe42ba28,
relation=relation@entry=0x5608fe42b650, alias=0x0, inh=<optimized out>,
inFromCl=inFromCl@entry=true)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_relation.c:1217
#14 0x00005608fdab2369 in transformTableEntry (r=0x5608fe42b650,
pstate=0x5608fe42ba28) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_clause.c:435
#15 transformFromClauseItem (pstate=pstate@entry=0x5608fe42ba28,
n=0x5608fe42b650, top_rte=top_rte@entry=0x7ffc9277b858,
top_rti=top_rti@entry=0x7ffc9277b854,
namespace=namespace@entry=0x7ffc9277b860)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_clause.c:1122
#16 0x00005608fdab3835 in transformFromClause
(pstate=pstate@entry=0x5608fe42ba28, frmList=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/parse_clause.c:139
#17 0x00005608fda90804 in transformSelectStmt (stmt=0x5608fe42b898,
pstate=0x5608fe42ba28) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:1240
#18 transformStmt (pstate=pstate@entry=0x5608fe42ba28,
parseTree=0x5608fe42b898) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:301
#19 0x00005608fda92550 in transformOptionalSelectInto
(pstate=pstate@entry=0x5608fe42ba28, parseTree=<optimized out>) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:246
#20 0x00005608fda925d5 in transformTopLevelStmt
(pstate=pstate@entry=0x5608fe42ba28,
parseTree=parseTree@entry=0x5608fe42b9a8) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:196
#21 0x00005608fda92648 in parse_analyze (parseTree=0x5608fe42b9a8,
sourceText=sourceText@entry=0x5608fe42ab10 "select * from tablename
where i = 89;", paramTypes=0x0, numParams=0, queryEnv=0x0)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/parser/analyze.c:116
#22 0x00005608fdcaeb07 in pg_analyze_and_rewrite
(parsetree=parsetree@entry=0x5608fe42b9a8,
query_string=query_string@entry=0x5608fe42ab10 "select * from tablename
where i = 89;", paramTypes=paramTypes@entry=0x0,
     numParams=numParams@entry=0, queryEnv=queryEnv@entry=0x0) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:689
#23 0x00005608fdcaf258 in exec_simple_query (query_string=0x5608fe42ab10
"select * from tablename where i = 89;") at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:1070
#24 0x00005608fdcb064b in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x5608fe472900, dbname=0x5608fe472858 "postgres",
username=<optimized out>)
     at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/tcop/postgres.c:4182
#25 0x00005608fd9becd2 in BackendRun (port=0x5608fe46b070) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:4358
#26 BackendStartup (port=0x5608fe46b070) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:4030
#27 ServerLoop () at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1707
#28 0x00005608fdc3af82 in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/postmaster/postmaster.c:1380
#29 0x00005608fd9c0075 in main (argc=5, argv=0x5608fe425820) at
/build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/main/main.c:228

Hope that will help, I'll try to reproduce issue without vacuum full, so
far no luck.

--
Thanks,
Alexey Ermakov


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

Dmitry Dolgov
> On Fri, Nov 01, 2019 at 03:15:33PM +0600, Alexey Ermakov wrote:
>
> I reproduced Sergei's test case on postgresql 11.5, replica hung up almost
> immediately after pgbench ran.
>
> 9907 - pid of startup process, 16387 - oid of test table, 2619 - oid of
> pg_statistic, 2840 - oid of toast table of pg_statistic.
>
> 1) pgbench on replica with 1 concurrent processes (-c 1):
>
> this case looks a bit different from what happened on initial report (which
> recently happened again btw) because at this time I can't even open new
> connection via psql or run query with pg_stat_activity - in hangs (pg_locks
> query works).
> perhaps because this time access exclusive lock is on pg_statistic table
> too, not only on it's toast table.

Interesting. I've tried the test case from previous email on the master
branch, and looks like I've got something similar with similar
stacktraces. After a short investigation, it looks pretty strange, a
backend 12682 is waiting on a lock, taken by 12584 (startup process):

    [12682] LOG:  process 12682 still waiting for AccessShareLock on relation 2619 of database 16384 after 1000.038 ms
    [12682] DETAIL:  Process holding the lock: 12584. Wait queue: 12689, 12674, 12671, 12682, 12683, 12677, 12680, 12676, 12686, 12670, 12678, 12688, 12679, 12681, 12684, 12685, 12687.
    [12682] STATEMENT:  select * from tablename where i = 95;

And if I understand correctly, startup process is waiting inside
ResolveRecoveryConflictWithVirtualXIDs with a waitlist, containing
backendId = 14:

    >>> p *waitlist
    $3 = {
      backendId = 14,
      localTransactionId = 218
    }

    >>> p allProcs[pgprocnos]
      ...
      lxid = 218,
      pid = 12682,
      pgprocno = 87,
      backendId = 14,
      databaseId = 16384,
      ...

So the same backend 12682, although I'm not sure yet why.


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

Tom Lane-2
In reply to this post by alexey.ermakov-2
Alexey Ermakov <[hidden email]> writes:
> On 9/23/19 01:06, Tom Lane wrote:
>> If you have a test case, could you collect stack traces from each
>> of the stuck processes?  That would eliminate a lot of hypothesizing.

> I reproduced Sergei's test case on postgresql 11.5, replica hung up
> almost immediately after pgbench ran.

> stack trace of startup process 9907:
> #0  0x00007fdff4a025b3 in __select_nocancel () at
> ../sysdeps/unix/syscall-template.S:84
> #1  0x00005608fde0c6cd in pg_usleep (microsec=<optimized out>) at
> /build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/port/pgsleep.c:56
> #2  0x00005608fdc94126 in WaitExceedsMaxStandbyDelay () at
> /build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:201
> #3  ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x5608fe46a450,
> reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT) at
> /build/postgresql-11-d6c2wG/postgresql-11-11.5/build/../src/backend/storage/ipc/standby.c:262

This does not look like a deadlock: the startup process is just biding
its time until the standby delay timeout elapses, after which it's
gonna kill the conflicting queries.

It is, perhaps, arguable that it's a damn bad idea to allow
max_standby_streaming_delay or max_standby_archive_delay to be
set to "forever".

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16016: deadlock with startup process, AccessExclusiveLock on pg_statistic's toast table

alexey.ermakov-2
On 11/2/19 21:10, Tom Lane wrote:
> This does not look like a deadlock: the startup process is just biding
> its time until the standby delay timeout elapses, after which it's
> gonna kill the conflicting queries.
>
> It is, perhaps, arguable that it's a damn bad idea to allow
> max_standby_streaming_delay or max_standby_archive_delay to be
> set to "forever".
>
>

I agree, allowing max_standby_streaming_delay be set to -1 is perhaps
not a good idea. there is although case when such setting could be used
now, see "BUG #14044: Queries immediately conflict with recovery when
recovery_min_apply_delay is used". [1]

[1] https://postgr.es/m/20160324204550.2920.87946%40wrigleys.postgresql.org

--
Alexey Ermakov