pg_rewind success even though getting error 'record with incorrect prev-link'

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

pg_rewind success even though getting error 'record with incorrect prev-link'

Abdullah Al Maruf
When I try to attach an old master with 'archiving set to on` as a new standby, `pg_rewind` doesn't throw any error, But, when the database starts, The following error appears:

```
LOG:  invalid record length at 0/B000098: wanted 24, got 0
LOG:  started streaming WAL from primary at 0/B000000 on timeline 2
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
FATAL:  terminating walreceiver process due to administrator command
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
```
Reply | Threaded
Open this post in threaded view
|

Re: pg_rewind success even though getting error 'record with incorrect prev-link'

Michael Paquier-2
On Tue, Jan 29, 2019 at 07:13:11PM +0600, Abdullah Al Maruf wrote:
> When I try to attach an old master with 'archiving set to on` as a new
> standby, `pg_rewind` doesn't throw any error, But, when the database
> starts, The following error appears:
>
> ```
> LOG:  invalid record length at 0/B000098: wanted 24, got 0
> LOG:  started streaming WAL from primary at 0/B000000 on timeline 2
> LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
> FATAL:  terminating walreceiver process due to administrator command

This is pointing out to the end of WAL for the current timeline.  You
may face it after reading a WAL segment in an area which has been used
in the past for a recycled segment.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_rewind success even though getting error 'record with incorrect prev-link'

Abdullah Al Maruf
Hi Michael

> This is pointing out to the end of WAL for the current timeline.  You
> may face it after reading a WAL segment in an area which has been used
> in the past for a recycled segment.

Are you talking about error ` LOG:  invalid record length at 0/B000098: wanted 24, got 0` ? 
or,
 `LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098`

Actually, the 1st error is not making any issue. This node starts to streaming from primary successfully. 
But when the second error comes, It appears every 5 seconds. And, the node is not streaming from master. 

pg_rewind still resolves timeline conflict, but it's not fixing this second error.

Any work around?? 
----------------
My scenario, in short, I have 1 master nodes (0th node) and three standby nodes (1st,

2nd & 3rd node). When I make the 3rd node as master (by trigger file) and
restarts 0th node as a replica, It shows no problem.

But when both nodes are offline and our leader selection chooses the 0th node
as a master, and tries to reattach the 3rd node as Replica, It throws an
error similar to:

```
LOG: invalid record length at 0/B000098: wanted 24, got 0
LOG: started streaming WAL from primary at 0/B000000 on timeline 2
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
FATAL: terminating walreceiver process due to administrator command
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098
```

If I disable archive_mode, I never faced this error with the same script. It
only appears when archive is on, and also not all the times it happens but
most of the time it does.
The error message appears after every 5 seconds.

Scenario In details:

I have two folders for scripts.

├── primary
│ ├── postgresql.conf
│ ├── restore.sh
│ ├── run.sh
│ └── start.sh
└── replica
├── recovery.conf
└── run.sh

I have a system that will choose the leader. If the current pod is the leader,
it will run `primary/run.sh`, and If it is a replica, it will run
`replica/run.sh`.

The files can be found here.



On Wed, Jan 30, 2019 at 7:44 AM Michael Paquier <[hidden email]> wrote:
On Tue, Jan 29, 2019 at 07:13:11PM +0600, Abdullah Al Maruf wrote:
> When I try to attach an old master with 'archiving set to on` as a new
> standby, `pg_rewind` doesn't throw any error, But, when the database
> starts, The following error appears:
>
> ```
> LOG:  invalid record length at 0/B000098: wanted 24, got 0
> LOG:  started streaming WAL from primary at 0/B000000 on timeline 2
> LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098
> FATAL:  terminating walreceiver process due to administrator command

This is pointing out to the end of WAL for the current timeline.  You
may face it after reading a WAL segment in an area which has been used
in the past for a recycled segment.
--
Michael
Reply | Threaded
Open this post in threaded view
|

Re: pg_rewind success even though getting error 'record with incorrect prev-link'

Ron-2
On 1/29/19 9:57 PM, Abdullah Al Maruf wrote:
Hi Michael

> This is pointing out to the end of WAL for the current timeline.  You
> may face it after reading a WAL segment in an area which has been used
> in the past for a recycled segment.

Are you talking about error ` LOG:  invalid record length at 0/B000098: wanted 24, got 0` ? 
or,
 `LOG:  record with incorrect prev-link 10000/21B0000 at 0/B000098`

Actually, the 1st error is not making any issue. This node starts to streaming from primary successfully. 
But when the second error comes, It appears every 5 seconds. And, the node is not streaming from master. 

pg_rewind still resolves timeline conflict, but it's not fixing this second error.

Any work around?? 
----------------
My scenario, in short, I have 1 master nodes (0th node) and three standby nodes (1st,

2nd & 3rd node). When I make the 3rd node as master (by trigger file) and restarts 0th node as a replica, It shows no problem.

But when both nodes are offline and our leader selection chooses the 0th node as a master, and tries to reattach the 3rd node as Replica, It throws an error similar to:

``` LOG: invalid record length at 0/B000098: wanted 24, got 0 LOG: started streaming WAL from primary at 0/B000000 on timeline 2 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 FATAL: terminating walreceiver process due to administrator command LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098 LOG: record with incorrect prev-link 10000/21B0000 at 0/B000098


The only error I see is when you apparently manually kill the process.  The LOG messages aren't actually errors.

--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

Re: pg_rewind success even though getting error 'record with incorrect prev-link'

Abdullah Al Maruf
> The only error I see is when you apparently manually kill the process.

You mean walreceiver process?? 'FATAL: terminating walreceiver process due to administrator command' ?
Actually, I didn't kill the receiver. It is done by postgres itself, as far as I understand.

I restart this node using a replica script. You can see the script here. https://github.com/kubedb/postgres/blob/pg-db/hack/docker/postgres/9.6.7/scripts/replica/run.sh
It then gives the above error, then stops streaming from primary (or, you can see the wal receiver is terminated by itself).

If I restart this node again, It throws following error:

LOG:  skipping missing configuration file "/etc/config/user.conf"
LOG:  skipping missing configuration file "/etc/config/user.conf"
LOG:  00000: database system was shut down in recovery at 2019-01-23 05:12:17 UTC
LOG:  00000: entering standby mode
LOG:  00000: invalid resource manager ID in primary checkpoint record
PANIC:  XX000: could not locate a valid checkpoint record
LOG:  00000: startup process (PID 33) was terminated by signal 6
LOG:  00000: aborting startup due to startup process failure
LOG:  00000: database system is shut down

On Wed, Jan 30, 2019 at 12:02 PM Ron <[hidden email]> wrote:
The only error I see is when you apparently manually kill the process.  The LOG messages aren't actually errors.

--
Angular momentum makes the world go 'round.