BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16125: Crash of PostgreSQL's wal sender during logical replication

PG Doc comments form
The following bug has been logged on the website:

Bug reference:      16125
Logged by:          Andrey Salnikov
Email address:      [hidden email]
PostgreSQL version: 10.11
Operating system:   Ubuntu 18.04.2 LTS
Description:        

Hi,
PostgreSQL's wal sender begins continuously crash during logical
replication. Each start after crash sender tries to decode the same part of
wal and crash again and again.

Log contains next lines:
2019-11-16 03:01:39.512 UTC 22235 @ from  [vxid: txid:0] [] LOG:  server
process (PID 26972) was terminated by signal 11: Segmentation fault
--
2019-11-16 03:03:16.240 UTC 8286 @ from  [vxid: txid:0] [] LOG:  checkpoint
complete: wrote 675790 buffers (32.2%); 0 WAL file(s) added, 198 removed, 0
recycled; write=68.768 s, sync=0.011 s, total=69.214 s; sync files=361,
longest=0.002 s, average=0.000 s; distance=3164435 kB, estimate=3164435 kB
2019-11-16 03:03:16.251 UTC 22235 @ from  [vxid: txid:0] [] LOG:  database
system is ready to accept connections
2019-11-16 03:03:16.533 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0]
[idle] LOG:  received replication command: IDENTIFY_SYSTEM
2019-11-16 03:03:16.533 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0]
[idle] LOG:  received replication command: START_REPLICATION SLOT "rep_slot"
LOGICAL 25EE/D6DD72D0 (proto_version '1', publication_names '"rep_pub"')
2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0]
[idle] LOG:  starting logical decoding for slot "rep_slot"
2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0]
[idle] DETAIL:  streaming transactions committing after 25EE/D614FC78,
reading WAL from 25EE/D614B580
2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0]
[idle] LOG:  logical decoding found consistent point at 25EE/D614B580
2019-11-16 03:03:16.534 UTC 10070 rep_usr@db from 1.1.1.1 [vxid:5/0 txid:0]
[idle] DETAIL:  There are no running transactions.
2019-11-16 03:03:16.732 UTC 22235 @ from  [vxid: txid:0] [] LOG:  server
process (PID 10070) was terminated by signal 11: Segmentation fault
--
2019-11-16 03:03:18.410 UTC 10072 @ from  [vxid: txid:0] [] LOG:  database
system was not properly shut down; automatic recovery in progress
2019-11-16 03:03:18.413 UTC 10072 @ from  [vxid: txid:0] [] LOG:  invalid
record length at 25EE/D6E155C8: wanted 24, got 0
2019-11-16 03:03:18.413 UTC 10072 @ from  [vxid: txid:0] [] LOG:  redo is
not required
2019-11-16 03:03:18.419 UTC 10072 @ from  [vxid: txid:0] [] LOG:  checkpoint
starting: end-of-recovery immediate
2019-11-16 03:03:18.467 UTC 10072 @ from  [vxid: txid:0] [] LOG:  checkpoint
complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0
recycled; write=0.028 s, sync=0.000 s, total=0.050 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB
2019-11-16 03:03:18.476 UTC 22235 @ from  [vxid: txid:0] [] LOG:  database
system is ready to accept connections
2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0]
[idle] LOG:  received replication command: IDENTIFY_SYSTEM
2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0]
[idle] LOG:  received replication command: START_REPLICATION SLOT "rep_slot"
LOGICAL 25EE/D6DD72D0 (proto_version '1', publication_names '"rep_pub"')
2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0]
[idle] LOG:  starting logical decoding for slot "rep_slot"
2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0]
[idle] DETAIL:  streaming transactions committing after 25EE/D614FC78,
reading WAL from 25EE/D614B580
2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0]
[idle] LOG:  logical decoding found consistent point at 25EE/D614B580
2019-11-16 03:03:18.754 UTC 10112 rep_usr@db from 1.1.1.1 [vxid:4/0 txid:0]
[idle] DETAIL:  There are no running transactions.
2019-11-16 03:03:18.949 UTC 22235 @ from  [vxid: txid:0] [] LOG:  server
process (PID 10112) was terminated by signal 11: Segmentation fault

Core dump contains:
Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols
from
/usr/lib/debug/.build-id/92/b3b790a5135245f2cbb2a0a2f48fc7ff9f689a.debug...done.
done.
[New LWP 13678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: 10/main: wal sender process replication
1.1.1.1(43224) idle       '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  heap_deform_tuple (tuple=tuple@entry=0x8,
tupleDesc=tupleDesc@entry=0x555713b2e638,
values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "")
at ./build/../src/backend/access/common/heaptuple.c:936
936 ./build/../src/backend/access/common/heaptuple.c: No such file or
directory.
(gdb) bt
#0  heap_deform_tuple (tuple=tuple@entry=0x8,
tupleDesc=tupleDesc@entry=0x555713b2e638,
values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "")
at ./build/../src/backend/access/common/heaptuple.c:936
#1  0x00005557127c918b in ReorderBufferToastReplace (rb=0x555713b7b788,
txn=0x555713b982e8, relation=0x7f4b3fafa248, relation=0x7f4b3fafa248,
change=0x555713b9a108)
    at ./build/../src/backend/replication/logical/reorderbuffer.c:2985
#2  ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527,
commit_lsn=commit_lsn@entry=41707737345720,
end_lsn=end_lsn@entry=41707737345792,
commit_time=commit_time@entry=627188499326201,
    origin_id=origin_id@entry=0, origin_lsn=0) at
./build/../src/backend/replication/logical/reorderbuffer.c:1585
#3  0x00005557127bef32 in DecodeCommit (xid=1667601527,
parsed=0x7ffd288fe4a0, buf=<synthetic pointer>, ctx=0x555713a9ca08) at
./build/../src/backend/replication/logical/decode.c:614
#4  DecodeXactOp (buf=<synthetic pointer>, ctx=0x555713a9ca08) at
./build/../src/backend/replication/logical/decode.c:244
#5  LogicalDecodingProcessRecord (ctx=0x555713a9ca08, record=<optimized
out>) at ./build/../src/backend/replication/logical/decode.c:113
#6  0x00005557127d0b86 in XLogSendLogical () at
./build/../src/backend/replication/walsender.c:2797
#7  0x00005557127d232f in WalSndLoop
(send_data=send_data@entry=0x5557127d0b10 <XLogSendLogical>) at
./build/../src/backend/replication/walsender.c:2157
#8  0x00005557127d30aa in StartLogicalReplication (cmd=0x555713b499d8) at
./build/../src/backend/replication/walsender.c:1105
#9  exec_replication_command (cmd_string=cmd_string@entry=0x555713b4da38
"START_REPLICATION SLOT \"pgprod_office_basic\" LOGICAL 25EE/D6DD72D0
(proto_version '1', publication_names '\"pgprod_master_basic\"')")
    at ./build/../src/backend/replication/walsender.c:1536
#10 0x000055571281de3a in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x555713add8d8, dbname=<optimized out>, username=<optimized
out>) at ./build/../src/backend/tcop/postgres.c:4113
#11 0x000055571255c72c in BackendRun (port=0x555713ad56d0) at
./build/../src/backend/postmaster/postmaster.c:4408
#12 BackendStartup (port=0x555713ad56d0) at
./build/../src/backend/postmaster/postmaster.c:4080
#13 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1756
#14 0x00005557127ada21 in PostmasterMain (argc=9, argv=0x555713a7d190) at
./build/../src/backend/postmaster/postmaster.c:1364
#15 0x000055571255e422 in main (argc=9, argv=0x555713a7d190) at
./build/../src/backend/main/main.c:228
(gdb) quit

This behaviour does not depends on defined data in tables, because we see it
in different database with different sets of tables in publications.
Looks like a real issue in logical replication.
I will happy to provide an additional information about that issue, but i
should know what else to need to collect for helping to solve this
problem.

Regards,
Andrey.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Mon, Nov 18, 2019 at 06:41:43PM +0000, PG Bug reporting form wrote:

>The following bug has been logged on the website:
>
>Bug reference:      16125
>Logged by:          Andrey Salnikov
>Email address:      [hidden email]
>PostgreSQL version: 10.11
>Operating system:   Ubuntu 18.04.2 LTS
>Description:
>
>Hi,
>PostgreSQL's wal sender begins continuously crash during logical
>replication. Each start after crash sender tries to decode the same part of
>wal and crash again and again.
>

Thanks for the report. Unfortunately this does not tell us much, except
that it crashed :-(

>Log contains next lines:
>2019-11-16 03:01:39.512 UTC 22235 @ from  [vxid: txid:0] [] LOG:  server
>process (PID 26972) was terminated by signal 11: Segmentation fault
>--
> ...
>
>[idle] DETAIL:  There are no running transactions.
>2019-11-16 03:03:18.949 UTC 22235 @ from  [vxid: txid:0] [] LOG:  server
>process (PID 10112) was terminated by signal 11: Segmentation fault
>

This is more interesting:

>Core dump contains:
>Reading symbols from /usr/lib/postgresql/10/bin/postgres...Reading symbols
>from
>/usr/lib/debug/.build-id/92/b3b790a5135245f2cbb2a0a2f48fc7ff9f689a.debug...done.
>done.
>[New LWP 13678]
>[Thread debugging using libthread_db enabled]
>Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>Core was generated by `postgres: 10/main: wal sender process replication
>1.1.1.1(43224) idle       '.
>Program terminated with signal SIGSEGV, Segmentation fault.
>#0  heap_deform_tuple (tuple=tuple@entry=0x8,
>tupleDesc=tupleDesc@entry=0x555713b2e638,
>values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "")
>at ./build/../src/backend/access/common/heaptuple.c:936
>936 ./build/../src/backend/access/common/heaptuple.c: No such file or
>directory.
>(gdb) bt
>#0  heap_deform_tuple (tuple=tuple@entry=0x8,
>tupleDesc=tupleDesc@entry=0x555713b2e638,
>values=values@entry=0x555713b7ca48, isnull=isnull@entry=0x555713b7cf98 "")
>at ./build/../src/backend/access/common/heaptuple.c:936
>#1  0x00005557127c918b in ReorderBufferToastReplace (rb=0x555713b7b788,
>txn=0x555713b982e8, relation=0x7f4b3fafa248, relation=0x7f4b3fafa248,
>change=0x555713b9a108)
>    at ./build/../src/backend/replication/logical/reorderbuffer.c:2985
>#2  ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527,
>commit_lsn=commit_lsn@entry=41707737345720,
>end_lsn=end_lsn@entry=41707737345792,
>commit_time=commit_time@entry=627188499326201,
>    origin_id=origin_id@entry=0, origin_lsn=0) at
>./build/../src/backend/replication/logical/reorderbuffer.c:1585

OK, so the issue seems to be related to handling TOASTed values, i.e. we
do this:

     /* user-triggered change */
     if (!IsToastRelation(relation))
     {
         ReorderBufferToastReplace(rb, txn, relation, change);
         rb->apply_change(rb, txn, relation, change);

         /*
          * Only clear reassembled toast chunks if we're sure
          * they're not required anymore. The creator of the
          * tuple tells us.
          */
         if (change->data.tp.clear_toast_afterwards)
             ReorderBufferToastReset(rb, txn);
     }

and the ReorderBufferToastReplace does this:

     newtup = change->data.tp.newtuple;

     heap_deform_tuple(&newtup->tuple, desc, attrs, isnull);

but that fails, because the tuple pointer happens to be 0x8, which is
clearly bogus. Not sure where that comes from, I don't recognize that as
a typical patter.

Can you create a core dump (see [1]), and print 'change' and 'txn' in
frame #2? I wonder if some the other fields are bogus too (but it can't
be entirely true ...), and if the transaction got serialized.

You may need to install debug symbols first, BTW.

>
>This behaviour does not depends on defined data in tables, because we see it
>in different database with different sets of tables in publications.

I'm not sure I really believe that. Surely there has to be something
special about your schema, or possibly access patter that triggers this
bug in your environment and not elsewhere.

Do you see it on different PostgreSQL instances? Can you share the
schema definition?

>Looks like a real issue in logical replication.
>I will happy to provide an additional information about that issue, but i
>should know what else to need to collect for helping to solve this
>problem.
>

Well, if you can create a reproducer, that'd be the best option, because
then we can investigate locally instead of the ping-ping here.

But if that's not possible, let's start with the schema and the
additional information from the core file.

I'd also like to see the contents of the WAL, particularly for the XID
triggering this issue. Please run pg_waldump and see how much data is
there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
where it starts. It may have subtransactions, so don't do just grep.


regards

[1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Andres Freund
Hi,

On 2019-11-18 21:58:16 +0100, Tomas Vondra wrote:
> and the ReorderBufferToastReplace does this:
>
>     newtup = change->data.tp.newtuple;
>
>     heap_deform_tuple(&newtup->tuple, desc, attrs, isnull);
>
> but that fails, because the tuple pointer happens to be 0x8, which is
> clearly bogus. Not sure where that comes from, I don't recognize that as
> a typical patter.

It indicates that change->data.tp.newtuple is NULL,
afaict. newtup->tuple boils down to
((char *) newtup->tuple) + offsetof(ReorderBufferTupleBuf, tuple)
and offsetof(ReorderBufferTupleBuf, tuple) is 0x8.


> Can you create a core dump (see [1]), and print 'change' and 'txn' in
> frame #2? I wonder if some the other fields are bogus too (but it can't
> be entirely true ...), and if the transaction got serialized.

Please print change and *change, both, please.

I suspect what's happening is that somehow a change that shouldn't have
toast changes - e.g. a DELETE - somehow has toast changes. Which then
triggers a failure in ReorderBufferToastReplace(), which expects
newtuple to be valid.

It's probably worthwhile to add an elog(ERROR) check for this, even if
this does not turn out to be the case.



> > This behaviour does not depends on defined data in tables, because we see it
> > in different database with different sets of tables in publications.
>
> I'm not sure I really believe that. Surely there has to be something
> special about your schema, or possibly access patter that triggers this
> bug in your environment and not elsewhere.

Yea.  Are there any C triggers present? Any unusual extensions? Users of
the transaction hook, for example?


> > Looks like a real issue in logical replication.
> > I will happy to provide an additional information about that issue, but i
> > should know what else to need to collect for helping to solve this
> > problem.
> >
>
> Well, if you can create a reproducer, that'd be the best option, because
> then we can investigate locally instead of the ping-ping here.
>
> But if that's not possible, let's start with the schema and the
> additional information from the core file.
>
> I'd also like to see the contents of the WAL, particularly for the XID
> triggering this issue. Please run pg_waldump and see how much data is
> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
> where it starts. It may have subtransactions, so don't do just grep.

Yea, that'd be helpful.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Andrey Salnikov
In reply to this post by Tomas Vondra-4
Hi,

> Can you create a core dump (see [1]), and print 'change' and 'txn' in
> frame #2? I wonder if some the other fields are bogus too (but it can't
> be entirely true ...), and if the transaction got serialized.
>
> You may need to install debug symbols first, BTW.

I hope that I've understood how to extract correctly those data

(gdb) frame 2
#2  ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, commit_lsn=commit_lsn@entry=41707737345720, end_lsn=end_lsn@entry=41707737345792, commit_time=commit_time@entry=627188499326201,
    origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1585
1585 ./build/../src/backend/replication/logical/reorderbuffer.c: No such file or directory.
(gdb) print change
$1 = (ReorderBufferChange *) 0x555713b9a108
(gdb) print *change
$2 = {lsn = 41707737330608, action = REORDER_BUFFER_CHANGE_DELETE, origin_id = 0, data = {tp = {relnode = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, clear_toast_afterwards = 1 '\001', oldtuple = 0x555713cd0d88,
      newtuple = 0x0}, msg = {prefix = 0x32da0000067f <error: Cannot access memory at address 0x32da0000067f>, message_size = 4383932111, message = 0x555713cd0d88 "\330-\315\023WU"}, snapshot = 0x32da0000067f,
    command_id = 1663, tuplecid = {node = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, tid = {ip_blkid = {bi_hi = 1, bi_lo = 0}, ip_posid = 3464}, cmin = 21847, cmax = 0, combocid = 0}}, node = {
    prev = 0x555713b98178, next = 0x555713b98178}}
(gdb) print txn
$3 = (ReorderBufferTXN *) 0x555713b982e8
(gdb) print *txn
$6 = {xid = 1667601527, has_catalog_changes = 0 '\000', is_known_as_subxact = 0 '\000', toplevel_xid = 0, first_lsn = 41707730043960, final_lsn = 41707737345720, end_lsn = 41707737345792, restart_decoding_lsn = 0,
  origin_id = 0, origin_lsn = 0, commit_time = 627188499326201, base_snapshot = 0x555713b90408, base_snapshot_lsn = 41707730047464, base_snapshot_node = {prev = 0x555713b7b7a0, next = 0x555713b7b7a0}, nentries = 0,
  nentries_mem = 0, serialized = 0 '\000', changes = {head = {prev = 0x555713b98368, next = 0x555713b98368}}, tuplecids = {head = {prev = 0x555713b98378, next = 0x555713b98378}}, ntuplecids = 0, tuplecid_hash = 0x0,
  toast_hash = 0x555714ad0fb8, subtxns = {head = {prev = 0x555713b985b0, next = 0x555713b984b8}}, nsubtxns = 4, ninvalidations = 0, invalidations = 0x0, node = {prev = 0x555713b7b790, next = 0x555713b7b790}}


> Do you see it on different PostgreSQL instances? Can you share the
> schema definition?

Unfortunately I can't do it completely, but there are
                   List of installed extensions
   Name    | Version |   Schema   |          Description          
-----------+---------+------------+--------------------------------
 pgcrypto  | 1.3     | public     | cryptographic functions
 pglogical | 2.2.1   | pglogical  | PostgreSQL Logical Replication
 plpgsql   | 1.0     | pg_catalog | PL/pgSQL procedural language

And each table in publication has next structure:
1 jsonb field, and few smallint, integer, bigint fields. No one table has any triggers or special constraints, only primary key and one or two btree indexes.


> Well, if you can create a reproducer, that'd be the best option, because
> then we can investigate locally instead of the ping-ping here.

I will try to do it.


> I'd also like to see the contents of the WAL, particularly for the XID
> triggering this issue. Please run pg_waldump and see how much data is
> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
> where it starts. It may have subtransactions, so don't do just grep.

Hm, it looks like I do not have wal with start of transaction, but I've found next records using txid and subxacts. I hope it will be useful.

2514:rmgr: Transaction len (rec/tot):     38/    38, tx:          0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc: ASSIGNMENT xtop 1667601527: subxacts: 1667601528
2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538

Regards,
Andrey.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Wed, Nov 20, 2019 at 12:46:44AM +0300, Andrey Salnikov wrote:

>Hi,
>
>> Can you create a core dump (see [1]), and print 'change' and 'txn' in
>> frame #2? I wonder if some the other fields are bogus too (but it can't
>> be entirely true ...), and if the transaction got serialized.
>>
>> You may need to install debug symbols first, BTW.
>
>I hope that I've understood how to extract correctly those data
>
>(gdb) frame 2
>#2  ReorderBufferCommit (rb=0x555713b7b788, xid=xid@entry=1667601527, commit_lsn=commit_lsn@entry=41707737345720, end_lsn=end_lsn@entry=41707737345792, commit_time=commit_time@entry=627188499326201,
>    origin_id=origin_id@entry=0, origin_lsn=0) at ./build/../src/backend/replication/logical/reorderbuffer.c:1585
>1585 ./build/../src/backend/replication/logical/reorderbuffer.c: No such file or directory.
>(gdb) print change
>$1 = (ReorderBufferChange *) 0x555713b9a108
>(gdb) print *change
>$2 = {lsn = 41707737330608, action = REORDER_BUFFER_CHANGE_DELETE, origin_id = 0, data = {tp = {relnode = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, clear_toast_afterwards = 1 '\001', oldtuple = 0x555713cd0d88,
>      newtuple = 0x0}, msg = {prefix = 0x32da0000067f <error: Cannot access memory at address 0x32da0000067f>, message_size = 4383932111, message = 0x555713cd0d88 "\330-\315\023WU"}, snapshot = 0x32da0000067f,
>    command_id = 1663, tuplecid = {node = {spcNode = 1663, dbNode = 13018, relNode = 88964815}, tid = {ip_blkid = {bi_hi = 1, bi_lo = 0}, ip_posid = 3464}, cmin = 21847, cmax = 0, combocid = 0}}, node = {
>    prev = 0x555713b98178, next = 0x555713b98178}}
>(gdb) print txn
>$3 = (ReorderBufferTXN *) 0x555713b982e8
>(gdb) print *txn
>$6 = {xid = 1667601527, has_catalog_changes = 0 '\000', is_known_as_subxact = 0 '\000', toplevel_xid = 0, first_lsn = 41707730043960, final_lsn = 41707737345720, end_lsn = 41707737345792, restart_decoding_lsn = 0,
>  origin_id = 0, origin_lsn = 0, commit_time = 627188499326201, base_snapshot = 0x555713b90408, base_snapshot_lsn = 41707730047464, base_snapshot_node = {prev = 0x555713b7b7a0, next = 0x555713b7b7a0}, nentries = 0,
>  nentries_mem = 0, serialized = 0 '\000', changes = {head = {prev = 0x555713b98368, next = 0x555713b98368}}, tuplecids = {head = {prev = 0x555713b98378, next = 0x555713b98378}}, ntuplecids = 0, tuplecid_hash = 0x0,
>  toast_hash = 0x555714ad0fb8, subtxns = {head = {prev = 0x555713b985b0, next = 0x555713b984b8}}, nsubtxns = 4, ninvalidations = 0, invalidations = 0x0, node = {prev = 0x555713b7b790, next = 0x555713b7b790}}
>

Yep. thanks for this data. This seems to confirm Andres' suspicion that
the change is actually a delete. It's not clear to me why we're doing
this toast thing for DELETEs unconditionally.

For the record, the change LSN is 25EE/D6DE33B0 and the transactions is
between 25EE/D66F0438 and 25EE/D6DE6F00.

>
>> Do you see it on different PostgreSQL instances? Can you share the
>> schema definition?
>
>Unfortunately I can't do it completely, but there are
>                   List of installed extensions
>   Name    | Version |   Schema   |          Description
>-----------+---------+------------+--------------------------------
> pgcrypto  | 1.3     | public     | cryptographic functions
> pglogical | 2.2.1   | pglogical  | PostgreSQL Logical Replication
> plpgsql   | 1.0     | pg_catalog | PL/pgSQL procedural language
>
>And each table in publication has next structure:
>1 jsonb field, and few smallint, integer, bigint fields. No one table has any triggers or special constraints, only primary key and one or two btree indexes.
>

I think it'd be interesting to actually see the table structure - column
types, primary key / replica identity.

>
>> Well, if you can create a reproducer, that'd be the best option, because
>> then we can investigate locally instead of the ping-ping here.
>
>I will try to do it.
>

OK.

>
>> I'd also like to see the contents of the WAL, particularly for the XID
>> triggering this issue. Please run pg_waldump and see how much data is
>> there for XID 1667601527. It does commit at 25EE/D6DE6EB8, not sure
>> where it starts. It may have subtransactions, so don't do just grep.
>
>Hm, it looks like I do not have wal with start of transaction, but I've found next records using txid and subxacts. I hope it will be useful.
>
>2514:rmgr: Transaction len (rec/tot):     38/    38, tx:          0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc: ASSIGNMENT xtop 1667601527: subxacts: 1667601528
>2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
>2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
>2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
>2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
>2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
>2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
>2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
>2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
>2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
>2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
>2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
>56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW

So, this DELETE change triggers the issue. If I understand it correctly,
that means the transaction locked the tuple and then deleted it. But I
don't see any record about the lock.

But this output seems a bit incomplete - there's an assignment record
for subxact 1667601528, without any records. And there's no assignment
for 1667601530. Can you simply post the whole WAL output for the
transaction LSN range, i.e. something like

   pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 ...

Any idea what this transaction actually does, i.e. what commands it
executes? Presumably it does some plpgsql with subtransactions?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Andres Freund
Hi,

(Alvaro, see multixact point below)

Are there any partitions?

On 2019-11-19 23:17:32 +0100, Tomas Vondra wrote:
> Yep. thanks for this data. This seems to confirm Andres' suspicion that
> the change is actually a delete. It's not clear to me why we're doing
> this toast thing for DELETEs unconditionally.

Well, there never should be any toast activity in case of deletions. So
there's already is something wrong in that case. I don't think just
ignoring that something is wrong would be good - but raising an error
would make sense.


> For the record, the change LSN is 25EE/D6DE33B0 and the transactions is
> between 25EE/D66F0438 and 25EE/D6DE6F00.


> > 2514:rmgr: Transaction len (rec/tot):     38/    38, tx:          0, lsn: 25EE/D66F0438, prev 25EE/D66F0410, desc: ASSIGNMENT xtop 1667601527: subxacts: 1667601528

So there's subtransactions in this workload...


> > 2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
> > 2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
> > 2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
> > 2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
> > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
> > 2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
> > 2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
> > 2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
> > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
> > 2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
> > 2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
> > 56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW

> > 56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
> > 56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
> > 56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
> > 56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
> > 56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
> > 56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538


> So, this DELETE change triggers the issue. If I understand it correctly,
> that means the transaction locked the tuple and then deleted it. But I
> don't see any record about the lock.

Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when

        if (old_infomask & HEAP_XMAX_INVALID)

is true. Which is the most common case. I actually find it *more*
curious to *not* see that for a delete. I've not traced fully through
it, yet.


Alvaro, it's not clear to me whether the the multixact codepaths in
compute_new_xmax_infomask() actually work as expected for deletes.


> But this output seems a bit incomplete - there's an assignment record
> for subxact 1667601528, without any records. And there's no assignment
> for 1667601530. Can you simply post the whole WAL output for the
> transaction LSN range, i.e. something like
>
>   pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 ...

Yea, that'd be quite helpful.


Which relations do the relfilenodes correspond to? Most importantly
1663/13018/88964815, 1663/13018/88964795, 1663/13018/88964792

You can look up the mapping, if there's not been any rewriting DDL
since, with e.g.

SELECT r, r::regclass FROM pg_filenode_relation(13018, 88964815) r;

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Alvaro Herrera-9
On 2019-Nov-19, Andres Freund wrote:

> > > 2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
> > > 2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
> > > 2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
> > > 2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
> > > 2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
> > > 2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
> > > 2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
> > > 2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
> > > 2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
> > > 56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
>
> > > 56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
> > > 56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
> > > 56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
> > > 56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
> > > 56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
> > > 56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
>
> > So, this DELETE change triggers the issue. If I understand it correctly,
> > that means the transaction locked the tuple and then deleted it. But I
> > don't see any record about the lock.
>
> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>
> if (old_infomask & HEAP_XMAX_INVALID)
>
> is true. Which is the most common case.

I don't understand what you guys are saying.  There's a DELETE at
25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble.
Right?  There's also a LOCK+UPDATE sequence just below that, which is
unrelated (different rel).  Are you saying that one of them is toast and
the other is its main heap?  It would be useful to know more about the
rels.

> I actually find it *more*
> curious to *not* see that for a delete. I've not traced fully through
> it, yet.
> Alvaro, it's not clear to me whether the the multixact codepaths in
> compute_new_xmax_infomask() actually work as expected for deletes.

Are you saying that the three *prior* deletes (lines 2532-2534) are
strange because they lack the KEYS_UPDATED bit?  (Note these are all for
different rels).

I'm not sure what to do about this report at this point.
Notwithstanding my general failure to focus on anything too complex
given my surrounding social context.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Tue, Nov 26, 2019 at 02:10:54PM -0300, Alvaro Herrera wrote:

>On 2019-Nov-19, Andres Freund wrote:
>
>> > > 2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0, desc: INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
>> > > 2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8, desc: INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
>> > > 2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78, desc: INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
>> > > 2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0, desc: INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
>> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
>> > > 2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
>> > > 2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
>> > > 2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
>> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
>> > > 2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0, desc: DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
>> > > 2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28, desc: DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
>> > > 56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628, desc: DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
>>
>> > > 56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
>> > > 56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
>> > > 56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408, desc: INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
>> > > 56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8, desc: INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
>> > > 56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08, desc: INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
>> > > 56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68, desc: COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
>>
>> > So, this DELETE change triggers the issue. If I understand it correctly,
>> > that means the transaction locked the tuple and then deleted it. But I
>> > don't see any record about the lock.
>>
>> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
>> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>>
>> if (old_infomask & HEAP_XMAX_INVALID)
>>
>> is true. Which is the most common case.
>
>I don't understand what you guys are saying.  There's a DELETE at
>25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble.
>Right?  There's also a LOCK+UPDATE sequence just below that, which is
>unrelated (different rel).  Are you saying that one of them is toast and
>the other is its main heap?  It would be useful to know more about the
>rels.
>

I was wondering why there's no LOCK before the DELETE, when the waldump
output has KEYS_UPDATED for that change. But as Andres says, it's likely
normal because e.g. heap_delete sets flag in certain cases.

I agree it would be useful to know more about the rels. AFAIK 88964815
is the heap, and my guess is the other rels are various indexes on it.
I've already said so in my message from last week.

>> I actually find it *more*
>> curious to *not* see that for a delete. I've not traced fully through
>> it, yet.
>> Alvaro, it's not clear to me whether the the multixact codepaths in
>> compute_new_xmax_infomask() actually work as expected for deletes.
>
>Are you saying that the three *prior* deletes (lines 2532-2534) are
>strange because they lack the KEYS_UPDATED bit?  (Note these are all for
>different rels).
>
>I'm not sure what to do about this report at this point.
>Notwithstanding my general failure to focus on anything too complex
>given my surrounding social context.
>

I think having more information about the structure (tables, indexes,
mapping for relfilenodes) and a better idea what the transaction is
doing, would be helpful.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Andrey Salnikov
Hi, I’m sorry for late response.

26 нояб. 2019 г., в 21:27, Tomas Vondra <[hidden email]> написал(а):


I think having more information about the structure (tables, indexes,
mapping for relfilenodes) and a better idea what the transaction is
doing, would be helpful.

Here is information about tables

 filenode |      relation
----------+---------------------------
 88964815 | table1                     
 88964795 | pg_toast.pg_toast_4029268 (toast table2)
 88964792 | table2                     
(1 row)

                                  Table "public.table1"
   Column   |   Type   | Collation | Nullable | Default | Storage  | Stats target | Description 
------------+----------+-----------+----------+---------+----------+--------------+-------------
 column1    | jsonb    |           |          |         | extended |              | 
 column3    | smallint |           | not null |         | plain    |              | 
 column4    | integer  |           | not null |         | plain    |              | 
 column5    | integer  |           | not null |         | plain    |              | 
 column6    | integer  |           | not null |         | plain    |              | 
 column7    | smallint |           | not null |         | plain    |              | 
Indexes:
    "table1_pkey" PRIMARY KEY, btree (column6, column7, column5, column4, column3) WITH (fillfactor='70')
    "table1_index1" btree (column7, column5, column4) WHERE column7 = ? OR column7 = ? OR column7 = ?
Publications:
    "pub1"
Replica Identity: FULL
Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000, autovacuum_enabled=true, fillfactor=100

                                          Table "public.table2"
    Column    |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
 column1      | integer               |           | not null |         | plain    |              | 
 column2      | jsonb                 |           |          |         | extended |              | 
 column3      | jsonb                 |           |          |         | extended |              | 
 column5      | bigint                |           |          |         | plain    |              | 
 column6      | double precision      |           |          |         | plain    |              | 
 column7      | character varying(32) |           |          |         | extended |              | 
 column8      | bigint                |           |          |         | plain    |              | 
 column10     | smallint              |           |          |         | plain    |              | 
 column11     | bigint                |           |          |         | plain    |              | 
 column12     | bigint                |           |          |         | plain    |              | 
 column13     | integer               |           |          |         | plain    |              | 
 column14     | bigint                |           |          |         | plain    |              | 
Indexes:
    "table2_pkey" PRIMARY KEY, btree (column1)
    "table2_index1" btree (column1, column14, column12) WITH (fillfactor='90')
    "table2_index2" btree (column11, column14, column12, column8, column1) WITH (fillfactor='50')
Publications:
    "pub1"
Replica Identity: FULL
Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000, autovacuum_enabled=true, fillfactor=100

   attrelid   | attrelid |           attname            | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions 
--------------+----------+------------------------------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------
 table1       |  4029244 | tableoid                     |       26 |             0 |      4 |     -7 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | cmax                         |       29 |             0 |      4 |     -6 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | xmax                         |       28 |             0 |      4 |     -5 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | cmin                         |       29 |             0 |      4 |     -4 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | xmin                         |       28 |             0 |      4 |     -3 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | ctid                         |       27 |             0 |      6 |     -1 |        0 |          -1 |        -1 | f        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | column1                      |     3802 |            -1 |     -1 |      1 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | ........pg.dropped.2........ |        0 |             0 |     -1 |      2 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | t            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | column3                      |       21 |            -1 |      2 |      3 |        0 |          -1 |        -1 | t        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | column4                      |       23 |            -1 |      4 |      4 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | column5                      |       23 |            -1 |      4 |      5 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | column6                      |       23 |            -1 |      4 |      6 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table1       |  4029244 | column7                      |       21 |            -1 |      2 |      7 |        0 |          -1 |        -1 | t        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | tableoid                     |       26 |             0 |      4 |     -7 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | cmax                         |       29 |             0 |      4 |     -6 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | xmax                         |       28 |             0 |      4 |     -5 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | cmin                         |       29 |             0 |      4 |     -4 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | xmin                         |       28 |             0 |      4 |     -3 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | ctid                         |       27 |             0 |      6 |     -1 |        0 |          -1 |        -1 | f        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column1                      |       23 |            -1 |      4 |      1 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column2                      |     3802 |            -1 |     -1 |      2 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column3                      |     3802 |            -1 |     -1 |      3 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | ........pg.dropped.4........ |        0 |             0 |     -1 |      4 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | t            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column5                      |       20 |            -1 |      8 |      5 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column6                      |      701 |            -1 |      8 |      6 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column7                      |     1043 |            -1 |     -1 |      7 |        0 |          -1 |        36 | f        | x          | i        | f          | f         |             | f            | t          |           0 |          100 |        |            | 
 table2       |  4029268 | column8                      |       20 |            -1 |      8 |      8 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | ........pg.dropped.9........ |        0 |             0 |      4 |      9 |        0 |          -1 |        -1 | t        | p          | i        | f          | f         |             | t            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column10                     |       21 |            -1 |      2 |     10 |        0 |          -1 |        -1 | t        | p          | s        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column11                     |       20 |            -1 |      8 |     11 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column12                     |       20 |            -1 |      8 |     12 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column13                     |       23 |            -1 |      4 |     13 |        0 |          -1 |        -1 | t        | p          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            | 
 table2       |  4029268 | column14                     |       20 |            -1 |      8 |     14 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            | 

And extracted information from wal file by pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 in attached file.


waldump.output.gz (952K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Tue, Nov 26, 2019 at 11:35:51PM +0300, Andrey Salnikov wrote:

>Hi, I’m sorry for late response.
>
>> 26 нояб. 2019 г., в 21:27, Tomas Vondra <[hidden email]> написал(а):
>>
>>
>> I think having more information about the structure (tables, indexes,
>> mapping for relfilenodes) and a better idea what the transaction is
>> doing, would be helpful.
>
>Here is information about tables
>
> filenode |      relation
>----------+---------------------------
> 88964815 | table1
> 88964795 | pg_toast.pg_toast_4029268 (toast table2)
> 88964792 | table2
>(1 row)
>
>                                  Table "public.table1"
>   Column   |   Type   | Collation | Nullable | Default | Storage  | Stats target | Description
>------------+----------+-----------+----------+---------+----------+--------------+-------------
> column1    | jsonb    |           |          |         | extended |              |
> column3    | smallint |           | not null |         | plain    |              |
> column4    | integer  |           | not null |         | plain    |              |
> column5    | integer  |           | not null |         | plain    |              |
> column6    | integer  |           | not null |         | plain    |              |
> column7    | smallint |           | not null |         | plain    |              |
>Indexes:
>    "table1_pkey" PRIMARY KEY, btree (column6, column7, column5, column4, column3) WITH (fillfactor='70')
>    "table1_index1" btree (column7, column5, column4) WHERE column7 = ? OR column7 = ? OR column7 = ?
>Publications:
>    "pub1"
>Replica Identity: FULL
>Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000, autovacuum_enabled=true, fillfactor=100
>
>                                          Table "public.table2"
>    Column    |         Type          | Collation | Nullable | Default | Storage  | Stats target | Description
>--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
> column1      | integer               |           | not null |         | plain    |              |
> column2      | jsonb                 |           |          |         | extended |              |
> column3      | jsonb                 |           |          |         | extended |              |
> column5      | bigint                |           |          |         | plain    |              |
> column6      | double precision      |           |          |         | plain    |              |
> column7      | character varying(32) |           |          |         | extended |              |
> column8      | bigint                |           |          |         | plain    |              |
> column10     | smallint              |           |          |         | plain    |              |
> column11     | bigint                |           |          |         | plain    |              |
> column12     | bigint                |           |          |         | plain    |              |
> column13     | integer               |           |          |         | plain    |              |
> column14     | bigint                |           |          |         | plain    |              |
>Indexes:
>    "table2_pkey" PRIMARY KEY, btree (column1)
>    "table2_index1" btree (column1, column14, column12) WITH (fillfactor='90')
>    "table2_index2" btree (column11, column14, column12, column8, column1) WITH (fillfactor='50')
>Publications:
>    "pub1"
>Replica Identity: FULL
>Options: autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=1000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=1000, autovacuum_enabled=true, fillfactor=100
>
>   attrelid   | attrelid |           attname            | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attidentity | attisdropped | attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions
>--------------+----------+------------------------------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+-------------+--------------+------------+-------------+--------------+--------+------------+---------------
> table1       |  4029244 | tableoid                     |       26 |             0 |      4 |     -7 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | cmax                         |       29 |             0 |      4 |     -6 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | xmax                         |       28 |             0 |      4 |     -5 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | cmin                         |       29 |             0 |      4 |     -4 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | xmin                         |       28 |             0 |      4 |     -3 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | ctid                         |       27 |             0 |      6 |     -1 |        0 |          -1 |        -1 | f        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | column1                      |     3802 |            -1 |     -1 |      1 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | ........pg.dropped.2........ |        0 |             0 |     -1 |      2 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | t            | t          |           0 |            0 |        |            |
> table1       |  4029244 | column3                      |       21 |            -1 |      2 |      3 |        0 |          -1 |        -1 | t        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | column4                      |       23 |            -1 |      4 |      4 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | column5                      |       23 |            -1 |      4 |      5 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | column6                      |       23 |            -1 |      4 |      6 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table1       |  4029244 | column7                      |       21 |            -1 |      2 |      7 |        0 |          -1 |        -1 | t        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | tableoid                     |       26 |             0 |      4 |     -7 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | cmax                         |       29 |             0 |      4 |     -6 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | xmax                         |       28 |             0 |      4 |     -5 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | cmin                         |       29 |             0 |      4 |     -4 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | xmin                         |       28 |             0 |      4 |     -3 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | ctid                         |       27 |             0 |      6 |     -1 |        0 |          -1 |        -1 | f        | p          | s        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column1                      |       23 |            -1 |      4 |      1 |        0 |          -1 |        -1 | t        | p          | i        | t          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column2                      |     3802 |            -1 |     -1 |      2 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column3                      |     3802 |            -1 |     -1 |      3 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | ........pg.dropped.4........ |        0 |             0 |     -1 |      4 |        0 |          -1 |        -1 | f        | x          | i        | f          | f         |             | t            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column5                      |       20 |            -1 |      8 |      5 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column6                      |      701 |            -1 |      8 |      6 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column7                      |     1043 |            -1 |     -1 |      7 |        0 |          -1 |        36 | f        | x          | i        | f          | f         |             | f            | t          |           0 |          100 |        |            |
> table2       |  4029268 | column8                      |       20 |            -1 |      8 |      8 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | ........pg.dropped.9........ |        0 |             0 |      4 |      9 |        0 |          -1 |        -1 | t        | p          | i        | f          | f         |             | t            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column10                     |       21 |            -1 |      2 |     10 |        0 |          -1 |        -1 | t        | p          | s        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column11                     |       20 |            -1 |      8 |     11 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column12                     |       20 |            -1 |      8 |     12 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column13                     |       23 |            -1 |      4 |     13 |        0 |          -1 |        -1 | t        | p          | i        | f          | f         |             | f            | t          |           0 |            0 |        |            |
> table2       |  4029268 | column14                     |       20 |            -1 |      8 |     14 |        0 |          -1 |        -1 | t        | p          | d        | f          | f         |             | f            | t          |           0 |            0 |        |            |
>
>And extracted information from wal file by pg_waldump -s 25EE/D66F0438 -e 25EE/D6DE6F00 in attached file.

Can you also show how those relations map to the relfilenodes referenced
by the WAL? This should do the trick, I think:

   SELECT relname FORM pg_class
    WHERE relfilenode IN (88964795, 88964797, 88964795, 88964792,
                          88964798, 88964799, 88964800, 88964815);

Also, any idea what the transactions does? It seems it inserts 2 rows
into 88964795, then one row into 88964792, and then it deletes those new
records in the same subxact. And then it does a delete on 88964815 which
triggers the segfault.

How do you create the subtransactions? plpgsql procedure with exception
blocks, savepoints? I'm trying to reproduce it and I'm not sure if those
details matter.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2019-Nov-19, Andres Freund wrote:

> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475

> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>
> if (old_infomask & HEAP_XMAX_INVALID)
>
> is true. Which is the most common case. I actually find it *more*
> curious to *not* see that for a delete. I've not traced fully through
> it, yet.
>
>
> Alvaro, it's not clear to me whether the the multixact codepaths in
> compute_new_xmax_infomask() actually work as expected for deletes.

Hmm ... I think you're complaining about failing to set the
HEAP_KEYS_UPDATED bit when the delete transaction is the same
as the insert transaction.

Grepping the new waldump file the OP sent for relfilenode 88964792
I saw this:

rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601529, lsn: 25EE/D66F1A78, prev 25EE/D66F1A38, desc: INSERT off 5, blkref #0: rel 1663/13018/88964792 blk 18460
rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
rmgr: Heap        len (rec/tot):     48/    48, tx: 1667601529, lsn: 25EE/D66F7E10, prev 25EE/D66F6778, desc: HEAP_CONFIRM off 5, blkref #0: rel 1663/13018/88964792 blk 18460
rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601536, lsn: 25EE/D6BA3BC0, prev 25EE/D6BA3B70, desc: LOCK off 5: xid 1667601536: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 20806
rmgr: Heap        len (rec/tot):   3955/  3955, tx: 1667601536, lsn: 25EE/D6BA3D00, prev 25EE/D6BA3CB0, desc: UPDATE off 5 xmax 1667601536 ; new off 2 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18491, blkref #1: rel 1663/13018/88964792 blk 20806
rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0, desc: LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0, desc: UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460

I wonder about the HEAP_CONFIRM there ... it's in a completely different
transaction of course, so it's not related, but note that the
corresponding insert (25EE/D66F1A78) looks like a regular INSERT, it
doesn't indicate that it's a speculative insert, which it must have
been or there wouldn't be a confirm.  So I wonder if our insert just
below is *also* a speculative insert ... which perhaps could cause the
compute_new_xmax_infomask code to misbehave.

I think pg_waldump should print out the speculativeness of an insertion.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Alvaro Herrera-9
On 2019-Nov-27, Alvaro Herrera wrote:

> I think pg_waldump should print out the speculativeness of an insertion.

Ah, in pg12 we do print the "flags" for the insertion.  But earlier
releases don't.  IMO we should have backpatched that change, as I said
in https://postgr.es/m/20180423160216.m2yjhx67grofg7h5@...
but I was outvoted.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Wed, Nov 27, 2019 at 11:57:00AM -0300, Alvaro Herrera wrote:
>On 2019-Nov-27, Alvaro Herrera wrote:
>
>> I think pg_waldump should print out the speculativeness of an insertion.
>
>Ah, in pg12 we do print the "flags" for the insertion.  But earlier
>releases don't.  IMO we should have backpatched that change, as I said
>in https://postgr.es/m/20180423160216.m2yjhx67grofg7h5@...
>but I was outvoted.
>

Hmmmm, yeah. I still think the decision not to backpatch was reasonable,
although it seems it might have been useful in this particular case.

But hey, it should not be that difficult to build a custom pg_waldump
version with that patch applied, and decode the WAL segment. It can even
be done on a completely separate system, it's just a single binary.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Alvaro Herrera-9
In reply to this post by Andres Freund
On 2019-Nov-19, Andres Freund wrote:

> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475

> > So, this DELETE change triggers the issue. If I understand it correctly,
> > that means the transaction locked the tuple and then deleted it. But I
> > don't see any record about the lock.
>
> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>
> if (old_infomask & HEAP_XMAX_INVALID)
>
> is true. Which is the most common case. I actually find it *more*
> curious to *not* see that for a delete. I've not traced fully through
> it, yet.
>
>
> Alvaro, it's not clear to me whether the the multixact codepaths in
> compute_new_xmax_infomask() actually work as expected for deletes.

So, what's really happening here is heap_insert(speculative) followed by
heap_abort_speculative().  That seems to explain the above WAL sequence.
I *think* the speculative insertion code in reorderbuffer simply delays
the tuple insertion until it is confirmed; so this sequence would result
in nothing at all happening at ReorderBufferCommit.  Unless something
has messed up the state somehow.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Wed, Nov 27, 2019 at 02:59:35PM -0300, Alvaro Herrera wrote:

>On 2019-Nov-19, Andres Freund wrote:
>
>> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8, desc: INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
>> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0, desc: DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
>
>> > So, this DELETE change triggers the issue. If I understand it correctly,
>> > that means the transaction locked the tuple and then deleted it. But I
>> > don't see any record about the lock.
>>
>> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
>> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
>>
>> if (old_infomask & HEAP_XMAX_INVALID)
>>
>> is true. Which is the most common case. I actually find it *more*
>> curious to *not* see that for a delete. I've not traced fully through
>> it, yet.
>>
>>
>> Alvaro, it's not clear to me whether the the multixact codepaths in
>> compute_new_xmax_infomask() actually work as expected for deletes.
>
>So, what's really happening here is heap_insert(speculative) followed by
>heap_abort_speculative().  That seems to explain the above WAL sequence.
>I *think* the speculative insertion code in reorderbuffer simply delays
>the tuple insertion until it is confirmed; so this sequence would result
>in nothing at all happening at ReorderBufferCommit.  Unless something
>has messed up the state somehow.
>

Hmm, so how would that transaction look like? My reading of the WAL
(based on the available relfilenode info) is this:

   Transaction  25EE/D66F0438  ASSIGNMENT xtop 1667601527: subxacts: 1667601528

   -> seems like a savepoint, probably

   Heap         25EE/D66F11E8  INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
   Heap         25EE/D66F2118  INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
   Btree        25EE/D66F54C8  INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
   Heap         25EE/D66F5508  INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
   Btree        25EE/D66F6738  INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
   Btree        25EE/D66F7E40  INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
   Btree        25EE/D66F7EA0  INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412

   -> insert into TOAST table (88964795) + TOAST idx (probably 88964797)
   -> then an insert into table2 (88964792), and it's indexes

   Heap         25EE/D66F7EF0  DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
   Heap         25EE/D66F7F28  DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
   Heap         25EE/D66F7F60  DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139

   -> now we delete the record from table2 and also it's TOAST table (how
   do we trigger that?)

   Heap         25EE/D6DE33B0  DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW

   -> delete from table1 (88964815), which we haven't touched in this xact
   until now (this is the crash LSN)

   Heap         25EE/D6DE53D0  LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
   Heap         25EE/D6DE5408  UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel 1663/13018/88964792 blk 18460
   Btree        25EE/D6DE6DC8  INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
   Btree        25EE/D6DE6E08  INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
   Btree        25EE/D6DE6E68  INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
   Transaction  25EE/D6DE6EB8  COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538

   -> seems like a lock of a record in table2, along with update of indexes

I can more or less simulate this by

   begin;

   savepoint s;

   insert into table2 values (1, ... jsonb ..., NULL, 1, 1, 'x', 1, 1, 1, 1, 1, 1)
   on conflict (column1) do update set column14 = 1;

   delete from table2 where column1 = 1;

   update table2 set column14 = 2 where column1 = 1;

   commit;

Which however does not explain the three deletes from table2 + TOAST.
But, maybe that's not the issue, because the decoding fails on the next
record, which is for table1.

FWIW attached is a SQL script creating tables with the same structure
(without the dropped columns, but I think those are irrelevant).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Tomas Vondra-4
On Wed, Nov 27, 2019 at 11:39:33PM +0100, Tomas Vondra wrote:
>On Wed, Nov 27, 2019 at 02:59:35PM -0300, Alvaro Herrera wrote:
>>On 2019-Nov-19, Andres Freund wrote:
>>
>
> ...
>
>FWIW attached is a SQL script creating tables with the same structure
>(without the dropped columns, but I think those are irrelevant).
>

Meh, this time with the file actually attached ...

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

bug.sql (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16125: Crash of PostgreSQL's wal sender during logical replication

Alvaro Herrera-9
In reply to this post by Tomas Vondra-4
> Hmm, so how would that transaction look like? My reading of the WAL
> (based on the available relfilenode info) is this:
>
>   Transaction  25EE/D66F0438  ASSIGNMENT xtop 1667601527: subxacts: 1667601528
>
>   -> seems like a savepoint, probably
>
>   Heap         25EE/D66F11E8  INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
>   Heap         25EE/D66F2118  INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
>   Btree        25EE/D66F54C8  INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
>   Heap         25EE/D66F5508  INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
>   Btree        25EE/D66F6738  INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
>   Btree        25EE/D66F7E40  INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
>   Btree        25EE/D66F7EA0  INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
>
>   -> insert into TOAST table (88964795) + TOAST idx (probably 88964797)
>   -> then an insert into table2 (88964792), and it's indexes

I think it's an insert into table2 (which has at least four indexes),
followed by insertion in 3 indexes, followed by failure to insert into
the fourth index (which naturally does not appear in WAL).

>   Heap         25EE/D66F7EF0  DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475

Then the heap insertion is speculative-aborted because of failure to
insert into the fourth index ...

>   Heap         25EE/D66F7F28  DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
>   Heap         25EE/D66F7F60  DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139

... and this propagates to the toast table (toast_delete in
heap_abort_speculative)

I didn't look from this point down.  I'll look at your reproducer script
now; I think you need to make it so that the index specified in the ON
CONFLICT clause is the fourth one in OID order.  (Not really sure how
relevant the other index insertions are, really.)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services