BUG #15744: Replication slot peak query throwing error for wrong sequence entry for toast chunk

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15744: Replication slot peak query throwing error for wrong sequence entry for toast chunk

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15744
Logged by:          Nitesh Yadav
Email address:      [hidden email]
PostgreSQL version: 9.6.3
Operating system:   x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 2
Description:        

Hi,

Postgres Server setup:
  Postgres server is running as AWS rds instance.
  Server Version is PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc
(GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
  With the following parameters group rds.logical_replication is set to
1.Which internally set the following flags: wal_level, max_wal_senders,
max_replication_slots, max_connections.
  We are using test_decoding module for retrieving/read the WAL data through
the logical decoding mechanism.

Application setup:
  Periodically we run the peek command to retrieve the data from the slot:
eg SELECT * FROM pg_logical_slot_peek_changes('pgldpublic_cdc_slot', NULL,
NULL, 'include-timestamp', 'on') LIMIT 200000 OFFSET 0;
  From the above query result, we use location of last transaction to remove
the data from the slot: eg SELECT location, xid FROM
  pg_logical_slot_get_changes('pgldpublic_cdc_slot', 'B92/C7394678', NULL,
'include-timestamp', 'on') LIMIT 1;
  We runs Step 1 & 2 in the loop for reading data in the chunk of 200K
records at a time in a given process.

Behavior reported (Bug)
  We have a replication slot running for successfully but recently we
encountered following error:

error: got sequence entry 2 for toast chunk 30954054 instead of seq 0
at Connection.parseE
(/var/task/node_modules/datacoral-utils/node_modules/pg/lib/connection.js:555:11)
at Connection.parseMessage
(/var/task/node_modules/datacoral-utils/node_modules/pg/lib/connection.js:380:19)
at TLSSocket.<anonymous>
(/var/task/node_modules/datacoral-utils/node_modules/pg/lib/connection.js:120:22)
at emitOne (events.js:116:13)
at TLSSocket.emit (events.js:211:7)
at addChunk (_stream_readable.js:263:12)
at readableAddChunk (_stream_readable.js:250:11)
at TLSSocket.Readable.push (_stream_readable.js:208:10)
at TLSWrap.onread (net.js:607:20)

Temporary resolution
    After running the query 2-3 times, the error went away. But this causes
the whole process to shut down.

Is there any permanent resolution for the issue or is it resolved in the
higher version of postgres?

Regards,
Nitesh

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15744: Replication slot peak query throwing error for wrong sequence entry for toast chunk

Peter Eisentraut-6
On 2019-04-09 22:09, PG Bug reporting form wrote:
> PostgreSQL version: 9.6.3

This is very old.  9.6.12 is current.

> Behavior reported (Bug)
>   We have a replication slot running for successfully but recently we
> encountered following error:
>
> error: got sequence entry 2 for toast chunk 30954054 instead of seq 0

> Temporary resolution
>     After running the query 2-3 times, the error went away. But this causes
> the whole process to shut down.
>
> Is there any permanent resolution for the issue or is it resolved in the
> higher version of postgres?

I've not seen this particular error, but there are a number of fixes
related to possible TOAST corruption, so this might be related to that.
Upgrading would certainly be the first step before investigating further.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15744: Replication slot peak query throwing error for wrong sequence entry for toast chunk

Andres Freund
Hi,

On 2019-04-10 10:28:55 +0200, Peter Eisentraut wrote:
> On 2019-04-09 22:09, PG Bug reporting form wrote:
> > PostgreSQL version: 9.6.3
>
> This is very old.  9.6.12 is current.

Right, there were a number of fixes.

> > Behavior reported (Bug)
> >   We have a replication slot running for successfully but recently we
> > encountered following error:
> >
> > error: got sequence entry 2 for toast chunk 30954054 instead of seq 0
>
> > Temporary resolution
> >     After running the query 2-3 times, the error went away. But this causes
> > the whole process to shut down.
> >
> > Is there any permanent resolution for the issue or is it resolved in the
> > higher version of postgres?
>
> I've not seen this particular error, but there are a number of fixes
> related to possible TOAST corruption, so this might be related to that.
> Upgrading would certainly be the first step before investigating further.

Indeed. We really would need to have a reproducible example to debug
this easily. Alternatively inspect (via you) the WAL and see exactly
what's happening. But there's not much point before you upgrade and
reproduce with a newer version.

Greetings,

Andres Freund