BUG #16879: Delayed standby does not connect to primary on startup

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16879: Delayed standby does not connect to primary on startup

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16879
Logged by:          Mahadevan Ramachandran
Email address:      [hidden email]
PostgreSQL version: 13.2
Operating system:   Linux, Debian 10
Description:        

Hi.

Below is a situation reproducible with version 13.1 and 13.2 (at least). At
the end of it, the streaming replication standby on startup does not connect
to the primary. It is unclear whether this is an issue, or whether the
standby will connect later on somehow and resume replication.

We have a customer who reported pg_last_wal_receive_lsn() returning NULL on
a delayed standby, and this is what the investigation led to. In their case
though, the standby was pulling in wal via restore_command and not
replicaton slots. Customer reports he can see changes in tables as expected,
with the appropriate delay. Primary and standby have been running well
beyond recovery_min_apply_delay.

Here are the steps to reproduce:

Step 1: Have a primary streaming-replicating to a standby via a replication
slot. Both primary and standby have default configurations from initdb,
except for:

@primary:
port = 7000

@standby:
port = 7001
hot_standby = on
primary_conninfo = 'port=7000'
primary_slot_name = 'slot1'

Step 2: Ensure standby is all caught up, and no ongoing changes in
primary.

@primary:
postgres=# select slot_name, restart_lsn, active, active_pid from
pg_replication_slots ;
 slot_name | restart_lsn | active | active_pid
-----------+-------------+--------+------------
 slot1     | 0/2EDFF8C8  | t      |      28061
(1 row)

@standby:
postgres=# select pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();
 pg_last_wal_replay_lsn | pg_last_wal_receive_lsn
------------------------+-------------------------
 0/2EDFF8C8             | 0/2EDFF8C8
(1 row)

Step 3: Change recovery_min_apply_delay = 1h in standby's configuration.
Restart the standby.

@primary:
postgres=# select slot_name, restart_lsn, active, active_pid from
pg_replication_slots ;
 slot_name | restart_lsn | active | active_pid
-----------+-------------+--------+------------
 slot1     | 0/2EDFF8C8  | t      |      28180
(1 row)

@standby:
postgres=# select pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();
 pg_last_wal_replay_lsn | pg_last_wal_receive_lsn
------------------------+-------------------------
 0/2EDFF8C8             | 0/2E000000

Step 4: Make some updates in the primary: "pgbench -T5" should do it. Wait
for changes to finish.

Step 5: Restart the standby again.

@primary:
postgres=# select slot_name, restart_lsn, active, active_pid from
pg_replication_slots ;
 slot_name | restart_lsn | active | active_pid
-----------+-------------+--------+------------
 slot1     | 0/2FDE97A8  | f      |          ~

@standby:
postgres=# select pg_last_wal_replay_lsn(), pg_last_wal_receive_lsn();
 pg_last_wal_replay_lsn | pg_last_wal_receive_lsn
------------------------+-------------------------
 0/2EE05DB8             | ~

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16879: Delayed standby does not connect to primary on startup

Euler Taveira-3
On Sun, Feb 21, 2021, at 6:39 AM, PG Bug reporting form wrote:
We have a customer who reported pg_last_wal_receive_lsn() returning NULL on
a delayed standby, and this is what the investigation led to.
This is not a bug. It is working as documented.                                
                                                                                   
"If streaming replication is disabled, or if it has not yet started, the          
function returns NULL." [1]                                                       

In this scenario, server was (re)started but it cannot streaming because of       
recovery_min_apply_delay setting, hence, the  pg_last_wal_receive_lsn() returns
NULL. When the standby applies the first transaction (after 1h), this function 
will return a non-NULL value.                                                     
                                                                                   


--
Euler Taveira