Lag clarification with Sync Replication

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lag clarification with Sync Replication

Raj kumar
I have made a synchronous replication setup with PG12.
During data loading of 200 gig in master node, I found that there is a lag that it shows on the slave. But, ideally for sync replication, lag should be always 0 right.
Because commit never happens in master before updating the slave.
 log_delay
------------
 871.549671
(1 row) log_delay
------------
 871.557246
(1 row) log_delay
-----------
         0
(1 row) log_delay
-----------
         0
(1 row)

Query Used for checking Standby Replication Lag:
psql -d perfbench -p 5432 -c "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"

Setting Used in PG12

 cat postgresql.conf |grep synchronous_

synchronous_commit = remote_write                # synchronization level;

synchronous_standby_names = 'ANY 1 (standby1,standby2)'        # standby servers that provide sync rep


cat postgresql.conf |grep primary_conn

primary_conninfo = 'user=edbrepl host= 10.12.11.101 port=5432 application_name=standby1 '


Thanks,
Raj Kumar
Reply | Threaded
Open this post in threaded view
|

Re: Lag clarification with Sync Replication

Scott Ribe-2
Your query is unrelated to whether or not the commit has completed in the master.



Reply | Threaded
Open this post in threaded view
|

Re: Lag clarification with Sync Replication

Keith
In reply to this post by Raj kumar


On Fri, May 22, 2020 at 12:36 AM Raj kumar <[hidden email]> wrote:
I have made a synchronous replication setup with PG12.
During data loading of 200 gig in master node, I found that there is a lag that it shows on the slave. But, ideally for sync replication, lag should be always 0 right.
Because commit never happens in master before updating the slave.
 log_delay
------------
 871.549671
(1 row) log_delay
------------
 871.557246
(1 row) log_delay
-----------
         0
(1 row) log_delay
-----------
         0
(1 row)

Query Used for checking Standby Replication Lag:
psql -d perfbench -p 5432 -c "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;"

Setting Used in PG12

 cat postgresql.conf |grep synchronous_

synchronous_commit = remote_write                # synchronization level;

synchronous_standby_names = 'ANY 1 (standby1,standby2)'        # standby servers that provide sync rep


cat postgresql.conf |grep primary_conn

primary_conninfo = 'user=edbrepl host= 10.12.11.101 port=5432 application_name=standby1 '


Thanks,
Raj Kumar


For synchronous replication, and any replication really, I would recommend monitoring byte lag from the primary. This is much more accurate and, if the count of replicas goes down from an expected number, you'll know it's been disconnected as well. Monitoring replication from the replica can be done, but it has caveats and shouldn't be the only way it's monitored

Wrote a short post on this a while ago. See relavant updates further down - https://www.keithf4.com/monitoring_streaming_slave_lag/
Reply | Threaded
Open this post in threaded view
|

Re: Lag clarification with Sync Replication

Rui DeSousa
In reply to this post by Raj kumar


On May 22, 2020, at 12:36 AM, Raj kumar <[hidden email]> wrote:

. But, ideally for sync replication, lag should be always 0 right.

Incorrect.  Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica.  That means the transaction is written to WAL file on both primary and replica.  On the primary, the transaction is also visible to transactions with a later xmin.  On the replica, the transaction has been recorded in the WAL; it still needs to get applied to the database for it to become visible to read transactions.

Also, changes could occur faster on the primary where the replica could be lagging behind — either by lagging in receiving the updates or writing the information to the WAL.  What will happen is when the commit is iussed; the session will hang until replica catches up and writes the all the data need safely to the WAL.

I would look at pg_stat_replication view on the primary/upstream provider.  Here’s a query that make it easier.

Sent_lag — data that still needs to be sent to the replica
Flush_lag —data that still has not been fsync’d to WAL files yet
Replay_lag — data that still need to be applied to the database so that it can be queried

select pg_stat_replication.client_addr
  , pg_stat_replication.application_name
  , pg_stat_replication.sync_priority
  , pg_stat_replication.sync_state
  , pg_stat_replication.state
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), sent_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))
    end as sent_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), flush_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) 
    end as flush_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), replay_lsn)) 
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn))
    end as replay_lag
from pg_stat_replication
;

Reply | Threaded
Open this post in threaded view
|

Re: Lag clarification with Sync Replication

Raj kumar
Thanks Rui, Keith and Scott for nice explanation and blog :)

Thanks,
Raj


On Fri, May 22, 2020 at 10:18 PM Rui DeSousa <[hidden email]> wrote:


On May 22, 2020, at 12:36 AM, Raj kumar <[hidden email]> wrote:

. But, ideally for sync replication, lag should be always 0 right.

Incorrect.  Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica.  That means the transaction is written to WAL file on both primary and replica.  On the primary, the transaction is also visible to transactions with a later xmin.  On the replica, the transaction has been recorded in the WAL; it still needs to get applied to the database for it to become visible to read transactions.

Also, changes could occur faster on the primary where the replica could be lagging behind — either by lagging in receiving the updates or writing the information to the WAL.  What will happen is when the commit is iussed; the session will hang until replica catches up and writes the all the data need safely to the WAL.

I would look at pg_stat_replication view on the primary/upstream provider.  Here’s a query that make it easier.

Sent_lag — data that still needs to be sent to the replica
Flush_lag —data that still has not been fsync’d to WAL files yet
Replay_lag — data that still need to be applied to the database so that it can be queried

select pg_stat_replication.client_addr
  , pg_stat_replication.application_name
  , pg_stat_replication.sync_priority
  , pg_stat_replication.sync_state
  , pg_stat_replication.state
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), sent_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn))
    end as sent_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), flush_lsn))
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn)) 
    end as flush_lag
  , case pg_is_in_recovery()
      when true then pg_size_pretty(pg_wal_lsn_diff(pg_last_wal_receive_lsn(), replay_lsn)) 
      else pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn))
    end as replay_lag
from pg_stat_replication
;

Reply | Threaded
Open this post in threaded view
|

Re: Lag clarification with Sync Replication

Laurenz Albe
In reply to this post by Rui DeSousa
On Fri, 2020-05-22 at 12:48 -0400, Rui DeSousa wrote:
> > On May 22, 2020, at 12:36 AM, Raj kumar <[hidden email]> wrote:
> >
> > . But, ideally for sync replication, lag should be always 0 right.
>
> Incorrect.  Synchronous replication means that a commit will not return until it has been safely written to disk on the primary and the replica.  That means the transaction is written to WAL file on
> both primary and replica.  On the primary, the transaction is also visible to transactions with a later xmin.  On the replica, the transaction has been recorded in the WAL; it still needs to get
> applied to the database for it to become visible to read transactions.

If you set "synchronous_commit = remote_apply", the commit will only return
when the change has been replayed on the synchronous standby server.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com