recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)

Li EF Zhang
We have a postgresql database cluster with 3 node, one is primary, the other 2 are secondary. The cluster is managed by patroni. The 3 databases are in respective containers. When we update one secondary container image, the database in this container can not start. It reports "database system is starting up". We tried to delete the container and recover it from primary, the same error is reported.
 
I checked the db log, there is a message:
DEBUG:  recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)#0122020-12-15 07:40:08.513 UTC [146-5182] CONTEXT:  WAL redo at 3/ED11B020 for Standby/RUNNING_XACTS: nextXid 4739482 latestCompletedXid 4739475 oldestRunningXid 1422751; 16 xacts: 2716862 2721890 4665244 2495592 2289138 2288820 2287653 1422751 4280517 2288510 2287620 3297674 1757103 4739326 3320989 2259670; subxid ovf
 
 
Seems snapshot overflowed which causes the secondary can not start up. I am newer to postgresql. I do not know very clearly how this happened and how to fix it. Thanks!


 

Reply | Threaded
Open this post in threaded view
|

Re: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)

Magnus Hagander-2
On Sat, Jan 2, 2021 at 12:38 PM Li EF Zhang <[hidden email]> wrote:
>
> We have a postgresql database cluster with 3 node, one is primary, the other 2 are secondary. The cluster is managed by patroni. The 3 databases are in respective containers. When we update one secondary container image, the database in this container can not start. It reports "database system is starting up". We tried to delete the container and recover it from primary, the same error is reported.
>
> I checked the db log, there is a message:
> DEBUG:  recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)#0122020-12-15 07:40:08.513 UTC [146-5182] CONTEXT:  WAL redo at 3/ED11B020 for Standby/RUNNING_XACTS: nextXid 4739482 latestCompletedXid 4739475 oldestRunningXid 1422751; 16 xacts: 2716862 2721890 4665244 2495592 2289138 2288820 2287653 1422751 4280517 2288510 2287620 3297674 1757103 4739326 3320989 2259670; subxid ovf
>
>
> Seems snapshot overflowed which causes the secondary can not start up. I am newer to postgresql. I do not know very clearly how this happened and how to fix it. Thanks!

AFAICT it looks like you have a very old running transaction on the
system (transaction id 1422751 is 3 million transactions ago, and
there are other transactions also very old).  As mentioned on
https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-ADMIN,
having very long running transactions in combinations with many
subtransactions can cause it to take a long time for the system to
reach a consistent state. See also the caveat list at the bottom of
that page.

If those transactions are such that they're needed, you have no other
choice but to wait I believe. But in (the more likely, I'd say) they
represent hung or otherwise broken clients, then terminating those
sessions on the primary should help the process along.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

RE: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)

Li EF Zhang
Thanks Magnus!
Are there any SQL or method to query the running transactions with subtransaction that exceed 64?
 
----- Original message -----
From: Magnus Hagander <[hidden email]>
To: Li EF Zhang <[hidden email]>
Cc: [hidden email]
Subject: [EXTERNAL] Re: recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)
Date: Sat, Jan 2, 2021 7:50 PM
 
On Sat, Jan 2, 2021 at 12:38 PM Li EF Zhang <[hidden email]> wrote:
>
> We have a postgresql database cluster with 3 node, one is primary, the other 2 are secondary. The cluster is managed by patroni. The 3 databases are in respective containers. When we update one secondary container image, the database in this container can not start. It reports "database system is starting up". We tried to delete the container and recover it from primary, the same error is reported.
>
> I checked the db log, there is a message:
> DEBUG:  recovery snapshot waiting for non-overflowed snapshot or until oldest active xid on standby is at least 4739126 (now 1422751)#0122020-12-15 07:40:08.513 UTC [146-5182] CONTEXT:  WAL redo at 3/ED11B020 for Standby/RUNNING_XACTS: nextXid 4739482 latestCompletedXid 4739475 oldestRunningXid 1422751; 16 xacts: 2716862 2721890 4665244 2495592 2289138 2288820 2287653 1422751 4280517 2288510 2287620 3297674 1757103 4739326 3320989 2259670; subxid ovf
>
>
> Seems snapshot overflowed which causes the secondary can not start up. I am newer to postgresql. I do not know very clearly how this happened and how to fix it. Thanks!

AFAICT it looks like you have a very old running transaction on the
system (transaction id 1422751 is 3 million transactions ago, and
there are other transactions also very old).  As mentioned on
https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-ADMIN ,
having very long running transactions in combinations with many
subtransactions can cause it to take a long time for the system to
reach a consistent state. See also the caveat list at the bottom of
that page.

If those transactions are such that they're needed, you have no other
choice but to wait I believe. But in (the more likely, I'd say) they
represent hung or otherwise broken clients, then terminating those
sessions on the primary should help the process along.

--
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/