PostgreSQL logical replication slot LSN values

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

PostgreSQL logical replication slot LSN values

Rashmi V Bharadwaj
Hi,

We have an application that uses the PostgreSQL logical replication API to read the changes made to the PostgreSQL database and applies it to a different database (like Db2 etc). We are using logical replication slots for this. Currently I am facing an issue where the replication slot is pointing to an older restart_lsn and confirmed_flush_lsn (like 10 days back) and the corresponding WAL file is already deleted from the pg_wal directory. Ideally this should not happen, right? since the slot is holding this LSN the wal file should not have been deleted. Now when I try to use query like

select * from pg_logical_slot_get_changes(<slot_name>,<LSN>, NULL)

or use the logical replication API with a start position as any newer LSN, I get the following error:

ERROR: requested WAL segment pg_wal/000000010000000000000036 has already been removed
SQL state: 58P01.

How do I get past this issue? I have not enabled log archiving. I would also like to know how I can modify the restart_lsn and confirmed_flush_lsn positions of the slot?

Thanks,
Rashmi

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL logical replication slot LSN values

Andres Freund
Hi,

(please don't send HTML only emails to this list)

On 2019-03-12 11:08:56 +0000, Rashmi V Bharadwaj wrote:
> We have an application that uses the PostgreSQL logical replication API to read
> the changes made to the PostgreSQL database and applies it to a different
> database (like Db2 etc). We are using logical replication slots for
> this.

Cool.


> Currently I am facing an issue where the replication slot is pointing to an
> older restart_lsn and confirmed_flush_lsn (like 10 days back) and the
> corresponding WAL file is already deleted from the pg_wal directory. Ideally
> this should not happen, right?

Well, did you consume the logical data, and if so how? When you use the
streaming interface - HIGHLY recommended - you need to send feedback
messages as to where you've received the data.


> since the slot is holding this LSN the wal file
> should not have been deleted. Now when I try to use query like
> select * from pg_logical_slot_get_changes(<slot_name>,<LSN>, NULL)
>

> or use the logical replication API with a start position as any newer LSN, I
> get the following error:
>
> ERROR: requested WAL segment pg_wal/000000010000000000000036 has already been
> removed
> SQL state: 58P01.

Hm, that should not happen. Did you by any chance externally (manually
or by script) delete WAL files?


> How do I get past this issue? I have not enabled log archiving. I would also
> like to know how I can modify the restart_lsn and confirmed_flush_lsn positions
> of the slot?

You need to send feedback messages confirming up to wher eyou've
consumed the data when using the streaming protocol. When using the SQL
functions the _get_ function confirms when it returns, the _peek_
function never does so. It's recommended to limit the size of the
resultset a bit using the nchanges paramter so you can call it in
smaller increments.

Greetings,

Andres Freund

Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL logical replication slot LSN values

Rashmi V Bharadwaj
Hi,

> Well, did you consume the logical data, and if so how? When you use the
> streaming interface - HIGHLY recommended - you need to send feedback
> messages as to where you've received the data.
     Yes, I am consuming data using the PGReplicationStream.readPending() method in my program.


> Hm, that should not happen. Did you by any chance externally (manually
> or by script) delete WAL files?
        No, I am not deleting the WAL files externally.


> You need to send feedback messages confirming up to wher eyou've
> consumed the data when using the streaming protocol. When using the SQL
> functions the _get_ function confirms when it returns, the _peek_
> function never does so. It's recommended to limit the size of the
> resultset a bit using the nchanges paramter so you can call it in
> smaller increments.
      I am sending feedback messages using method PGReplicationStream.setAppliedLSN(<LSN>). I cannot do PGReplicationStream.setFlushedLSN(<LSN>) as I want the option to go back and read data in case of data loss. I have a separate utility that can be used to set the flush_lsn position periodically.


Currently since the WAL file was deleted by postgresql, I am not able to move forward at all - replication thru the SQL or Java API is not happening. I tried doing a PGReplicationStream.setAppliedLSN and PGReplicationStream.setFlushedLSN (to current LSN) thru another Java program for the same replication slot, but that didn't work. It still gives the WAL segment already removed error.

Could you please suggest a solution for this? Is there a way to set the restart_lsn and flush_lsn of slot? Or is recreating the slot the only possible solution?


Thanks,
Rashmi



-----Andres Freund <[hidden email]> wrote: -----
To: Rashmi V Bharadwaj <[hidden email]>
From: Andres Freund <[hidden email]>
Date: 12/03/2019 09:07PM
Cc: [hidden email]
Subject: Re: PostgreSQL logical replication slot LSN values

Hi,

(please don't send HTML only emails to this list)

On 2019-03-12 11:08:56 +0000, Rashmi V Bharadwaj wrote:
> We have an application that uses the PostgreSQL logical replication API to read
> the changes made to the PostgreSQL database and applies it to a different
> database (like Db2 etc). We are using logical replication slots for
> this.

Cool.


> Currently I am facing an issue where the replication slot is pointing to an
> older restart_lsn and confirmed_flush_lsn (like 10 days back) and the
> corresponding WAL file is already deleted from the pg_wal directory. Ideally
> this should not happen, right?

Well, did you consume the logical data, and if so how? When you use the
streaming interface - HIGHLY recommended - you need to send feedback
messages as to where you've received the data.


> since the slot is holding this LSN the wal file
> should not have been deleted. Now when I try to use query like
> select * from pg_logical_slot_get_changes(<slot_name>,<LSN>, NULL)
>

> or use the logical replication API with a start position as any newer LSN, I
> get the following error:
>
> ERROR: requested WAL segment pg_wal/000000010000000000000036 has already been
> removed
> SQL state: 58P01.

Hm, that should not happen. Did you by any chance externally (manually
or by script) delete WAL files?


> How do I get past this issue? I have not enabled log archiving. I would also
> like to know how I can modify the restart_lsn and confirmed_flush_lsn positions
> of the slot?

You need to send feedback messages confirming up to wher eyou've
consumed the data when using the streaming protocol. When using the SQL
functions the _get_ function confirms when it returns, the _peek_
function never does so. It's recommended to limit the size of the
resultset a bit using the nchanges paramter so you can call it in
smaller increments.

Greetings,

Andres Freund



Reply | Threaded
Open this post in threaded view
|

Re: PostgreSQL logical replication slot LSN values

Rashmi V Bharadwaj
The PGReplicationStream.setAppliedLSN is only applicable for physical replication, right? So this may not actually make a difference for my logical replication program.
Periodically running the utility for setting the flush LSN using PGReplicationStream.setFlushedLSN should still work for the feedback mechanism right?

Thanks,
Rashmi

-----"Rashmi V Bharadwaj" <[hidden email]> wrote: -----
To: Andres Freund <[hidden email]>
From: "Rashmi V Bharadwaj" <[hidden email]>
Date: 13/03/2019 10:59AM
Cc: [hidden email]
Subject: Re: PostgreSQL logical replication slot LSN values

Hi,

> Well, did you consume the logical data, and if so how? When you use the
> streaming interface - HIGHLY recommended - you need to send feedback
> messages as to where you've received the data.
     Yes, I am consuming data using the PGReplicationStream.readPending() method in my program.


> Hm, that should not happen. Did you by any chance externally (manually
> or by script) delete WAL files?
        No, I am not deleting the WAL files externally.


> You need to send feedback messages confirming up to wher eyou've
> consumed the data when using the streaming protocol. When using the SQL
> functions the _get_ function confirms when it returns, the _peek_
> function never does so. It's recommended to limit the size of the
> resultset a bit using the nchanges paramter so you can call it in
> smaller increments.
      I am sending feedback messages using method PGReplicationStream.setAppliedLSN(<LSN>). I cannot do PGReplicationStream.setFlushedLSN(<LSN>) as I want the option to go back and read data in case of data loss. I have a separate utility that can be used to set the flush_lsn position periodically.


Currently since the WAL file was deleted by postgresql, I am not able to move forward at all - replication thru the SQL or Java API is not happening. I tried doing a PGReplicationStream.setAppliedLSN and PGReplicationStream.setFlushedLSN (to current LSN) thru another Java program for the same replication slot, but that didn't work. It still gives the WAL segment already removed error.

Could you please suggest a solution for this? Is there a way to set the restart_lsn and flush_lsn of slot? Or is recreating the slot the only possible solution?


Thanks,
Rashmi



-----Andres Freund <[hidden email]> wrote: -----
To: Rashmi V Bharadwaj <[hidden email]>
From: Andres Freund <[hidden email]>
Date: 12/03/2019 09:07PM
Cc: [hidden email]
Subject: Re: PostgreSQL logical replication slot LSN values

Hi,

(please don't send HTML only emails to this list)

On 2019-03-12 11:08:56 +0000, Rashmi V Bharadwaj wrote:
> We have an application that uses the PostgreSQL logical replication API to read
> the changes made to the PostgreSQL database and applies it to a different
> database (like Db2 etc). We are using logical replication slots for
> this.

Cool.


> Currently I am facing an issue where the replication slot is pointing to an
> older restart_lsn and confirmed_flush_lsn (like 10 days back) and the
> corresponding WAL file is already deleted from the pg_wal directory. Ideally
> this should not happen, right?

Well, did you consume the logical data, and if so how? When you use the
streaming interface - HIGHLY recommended - you need to send feedback
messages as to where you've received the data.


> since the slot is holding this LSN the wal file
> should not have been deleted. Now when I try to use query like
> select * from pg_logical_slot_get_changes(<slot_name>,<LSN>, NULL)
>

> or use the logical replication API with a start position as any newer LSN, I
> get the following error:
>
> ERROR: requested WAL segment pg_wal/000000010000000000000036 has already been
> removed
> SQL state: 58P01.

Hm, that should not happen. Did you by any chance externally (manually
or by script) delete WAL files?


> How do I get past this issue? I have not enabled log archiving. I would also
> like to know how I can modify the restart_lsn and confirmed_flush_lsn positions
> of the slot?

You need to send feedback messages confirming up to wher eyou've
consumed the data when using the streaming protocol. When using the SQL
functions the _get_ function confirms when it returns, the _peek_
function never does so. It's recommended to limit the size of the
resultset a bit using the nchanges paramter so you can call it in
smaller increments.

Greetings,

Andres Freund