[HELP] Issue with standby server using WAL archive

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[HELP] Issue with standby server using WAL archive

Praveen Kumar K S
Hello,

I'm facing this issue with a standby server which is setup to use WAL archive.

Config on Master:
archive_mode = on
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

Config on standby:
standby_mode = 'on'
restore_command = 'cp /mnt/server/archivedir/%f %p'

Before starting standby, I took base backup

pg_basebackup -v -D main -R -P -h pg1 -p 5432 -U replication
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
43518/43518 kB (100%), 1/1 tablespace                                        
NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
pg_basebackup: base backup completed

Now, copied recovery.conf to data directory

Started postgres server on standby and seeing below error in log.

2020-07-02 13:14:02.134 UTC [7643] postgres@template1 FATAL:  the database system is starting up
2020-07-02 13:14:02.644 UTC [7646] postgres@template1 FATAL:  the database system is starting up
2020-07-02 13:14:03.154 UTC [7649] postgres@template1 FATAL:  the database system is starting up
2020-07-02 13:14:03.664 UTC [7652] postgres@template1 FATAL:  the database system is starting up
2020-07-02 13:14:04.174 UTC [7655] postgres@template1 FATAL:  the database system is starting up
cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such file or directory
2020-07-02 13:14:04.684 UTC [7660] postgres@template1 FATAL:  the database system is starting up
2020-07-02 13:14:05.194 UTC [7663] postgres@template1 FATAL:  the database system is starting up
2020-07-02 13:14:05.197 UTC [7664] [unknown]@[unknown] LOG:  incomplete startup packet
cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such file or directory
cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such file or directory

On Master:
postgres@pg1:~$ ls -lrt /var/lib/postgresql/9.6/main/pg_xlog/
total 65544
-rw------- 1 postgres postgres 16777216 Jul  2 13:09 00000001000000000000000E
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000F
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000C
-rw------- 1 postgres postgres      302 Jul  2 13:13 00000001000000000000000C.00000028.backup
drwx------ 2 postgres postgres     4096 Jul  2 13:18 archive_status
-rw------- 1 postgres postgres 16777216 Jul  2 13:18 00000001000000000000000D

postgres@pg1:~$ ls -lrt /var/lib/postgresql/9.6/main/pg_xlog/archive_status/
total 0
-rw------- 1 postgres postgres 0 Jul  2 13:13 00000001000000000000000C.done
-rw------- 1 postgres postgres 0 Jul  2 13:13 00000001000000000000000C.00000028.backup.done

On standby:
postgres@pg3:~$ ls -lrt /var/lib/postgresql/9.6/main/pg_xlog/
total 16388
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000C
drwx------ 2 postgres postgres     4096 Jul  2 13:13 archive_status
postgres@pg3:~$ ls -lrt /var/lib/postgresql/9.6/main/pg_xlog/archive_status/
total 0
-rw------- 1 postgres postgres 0 Jul  2 13:13 00000001000000000000000C.done

/mnt/server/archivedir/ is a NAS mount and is accessible from master and standby.

postgres@pg3:~$ ls -lrt /mnt/server/archivedir/
total 32788
drwx------ 2 postgres postgres    16384 Jul  1 14:44 lost+found
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000B
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000C
-rw------- 1 postgres postgres      302 Jul  2 13:13 00000001000000000000000C.00000028.backup

Am I missing anything here ? Please suggest. TIA.

PS: I have another slave streaming using synchronous replication and not WAL archive and is working fine.

--
Regards,

K S Praveen Kumar

Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Stephen Frost
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> Config on Master:
> archive_mode = on
> archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
> /mnt/server/archivedir/%f'

You really shouldn't use 'cp' or such a simple archive command, you'll
very likely lose WAL on a system crash with that setup.

> Config on standby:
> standby_mode = 'on'
> restore_command = 'cp /mnt/server/archivedir/%f %p'
>
> Before starting standby, I took base backup
>
> pg_basebackup -v -D main -R -P -h pg1 -p 5432 -U replication
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> 43518/43518 kB (100%), 1/1 tablespace
>
> NOTICE:  pg_stop_backup complete, all required WAL segments have been
> archived
> pg_basebackup: base backup completed
>
> Now, copied recovery.conf to data directory
>
> Started postgres server on standby and seeing below error in log.
>
> 2020-07-02 13:14:02.134 UTC [7643] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:02.644 UTC [7646] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:03.154 UTC [7649] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:03.664 UTC [7652] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:04.174 UTC [7655] postgres@template1 FATAL:  the database
> system is starting up
> cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory
> 2020-07-02 13:14:04.684 UTC [7660] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:05.194 UTC [7663] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:05.197 UTC [7664] [unknown]@[unknown] LOG:  incomplete
> startup packet
> cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory
> cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory

> Am I missing anything here ? Please suggest. TIA.

Do you have hot_standby enabled in postgresql.conf?  What version of PG?

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Praveen Kumar K S
Hello,

Thanks for your response.

I'm using 9.6

I have 3 servers. Let us call it pg1,pg2,pg3

pg1 is master
pg2 is hot stand by
pg3 is a DR server which doesn't serve any requests. hot_standby is not enabled in postgresql.conf on DR server.

pg1 and pg2 are in the same network and are in sync.

I followed official documentation and used the cp command. Are there any other best approaches ?

On Thu, Jul 2, 2020 at 7:44 PM Stephen Frost <[hidden email]> wrote:
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> Config on Master:
> archive_mode = on
> archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p
> /mnt/server/archivedir/%f'

You really shouldn't use 'cp' or such a simple archive command, you'll
very likely lose WAL on a system crash with that setup.

> Config on standby:
> standby_mode = 'on'
> restore_command = 'cp /mnt/server/archivedir/%f %p'
>
> Before starting standby, I took base backup
>
> pg_basebackup -v -D main -R -P -h pg1 -p 5432 -U replication
> pg_basebackup: initiating base backup, waiting for checkpoint to complete
> pg_basebackup: checkpoint completed
> 43518/43518 kB (100%), 1/1 tablespace
>
> NOTICE:  pg_stop_backup complete, all required WAL segments have been
> archived
> pg_basebackup: base backup completed
>
> Now, copied recovery.conf to data directory
>
> Started postgres server on standby and seeing below error in log.
>
> 2020-07-02 13:14:02.134 UTC [7643] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:02.644 UTC [7646] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:03.154 UTC [7649] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:03.664 UTC [7652] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:04.174 UTC [7655] postgres@template1 FATAL:  the database
> system is starting up
> cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory
> 2020-07-02 13:14:04.684 UTC [7660] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:05.194 UTC [7663] postgres@template1 FATAL:  the database
> system is starting up
> 2020-07-02 13:14:05.197 UTC [7664] [unknown]@[unknown] LOG:  incomplete
> startup packet
> cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory
> cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory

> Am I missing anything here ? Please suggest. TIA.

Do you have hot_standby enabled in postgresql.conf?  What version of PG?

Thanks,

Stephen


--
Regards,

K S Praveen Kumar

Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Stephen Frost
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> I'm using 9.6

That's quite old- I'd strongly recommend that you look at using a newer
version.

> I have 3 servers. Let us call it pg1,pg2,pg3
>
> pg1 is master
> pg2 is hot stand by
> pg3 is a DR server which doesn't serve any requests. hot_standby is not
> enabled in postgresql.conf on DR server.

If hot_standby isn't enable then you won't be able to connect- which is
entirely fine, but if that's what you want then I'm not sure what the
error is that you're concerned about..?

> pg1 and pg2 are in the same network and are in sync.

Ok.  Note that having only one replica when you're using syncronous
replication will lead to a situation where, if either node fails, the
database system won't accept writes.

> I followed official documentation and used the cp command. Are there any
> other best approaches ?

The documentation provides an example of how to use the command, using
'cp' isn't actually recommended.  I'd strongly suggest you look at
proper backup/archiving solutions for PostgreSQL like pgbackrest to use
instead.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Praveen Kumar K S
Hello,

Thanks for your response. I agree with all your suggestions.

At this moment, I'm concerned about the WAL archiving.

I'm seeing below error on pg3 (DR server) in log. Is this expected ?
cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such file or directory

On master, I see below info. I would like to know why the WALs are not being archived.

postgres@pg1:~/9.6/main/pg_xlog$ ls -lrth
total 65M
-rw------- 1 postgres postgres  16M Jul  2 13:09 00000001000000000000000E
-rw------- 1 postgres postgres  16M Jul  2 13:13 00000001000000000000000F
-rw------- 1 postgres postgres  16M Jul  2 13:13 000000010000000000000010
-rw------- 1 postgres postgres  302 Jul  2 13:13 00000001000000000000000C.00000028.backup
drwx------ 2 postgres postgres 4.0K Jul  2 13:23 archive_status
-rw------- 1 postgres postgres  16M Jul  2 15:38 00000001000000000000000D

postgres@pg1:~/9.6/main/pg_xlog$ ls -lrt /mnt/server/archivedir/
total 32788
drwx------ 2 postgres postgres    16384 Jul  1 14:44 lost+found
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000B
-rw------- 1 postgres postgres 16777216 Jul  2 13:13 00000001000000000000000C
-rw------- 1 postgres postgres      302 Jul  2 13:13 00000001000000000000000C.00000028.backup


On Thu, Jul 2, 2020 at 8:56 PM Stephen Frost <[hidden email]> wrote:
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> I'm using 9.6

That's quite old- I'd strongly recommend that you look at using a newer
version.

> I have 3 servers. Let us call it pg1,pg2,pg3
>
> pg1 is master
> pg2 is hot stand by
> pg3 is a DR server which doesn't serve any requests. hot_standby is not
> enabled in postgresql.conf on DR server.

If hot_standby isn't enable then you won't be able to connect- which is
entirely fine, but if that's what you want then I'm not sure what the
error is that you're concerned about..?

> pg1 and pg2 are in the same network and are in sync.

Ok.  Note that having only one replica when you're using syncronous
replication will lead to a situation where, if either node fails, the
database system won't accept writes.

> I followed official documentation and used the cp command. Are there any
> other best approaches ?

The documentation provides an example of how to use the command, using
'cp' isn't actually recommended.  I'd strongly suggest you look at
proper backup/archiving solutions for PostgreSQL like pgbackrest to use
instead.

Thanks,

Stephen


--
Regards,

K S Praveen Kumar
M: +91-9986855625

Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Stephen Frost
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> I'm seeing below error on pg3 (DR server) in log. Is this expected ?
> *cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such
> file or directory*

Yes, that's expected, the replica will constantly attempt to get the
next WAL segment to replay, unless you have streaming replication
configured.  There's nothing wrong with that.

> On master, I see below info. I would like to know why the WALs are not
> being archived.

The WAL file isn't going to be archived until PG is done writing into
it, which won't happen until there's been ~16MB of write activity on the
database.  If you want WAL to be archived more frequently even if the
segment isn't full then you should look at the archive_timeout option.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Jeff Janes
In reply to this post by Praveen Kumar K S
On Thu, Jul 2, 2020 at 11:44 AM Praveen Kumar K S <[hidden email]> wrote:
Hello,

Thanks for your response. I agree with all your suggestions.

At this moment, I'm concerned about the WAL archiving.

I'm seeing below error on pg3 (DR server) in log. Is this expected ?
cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such file or directory

Yes, this is expected.  That file has not been archived yet because the master is not done writing data to it yet.  pg3 will keep retrying the command until the file shows up.
 
Cheers, 

Jeff
Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Praveen Kumar K S
Hello,

Thanks for your response.

But if you look at the output below, 00000001000000000000000D is already 16MB and new WAL are created. As per my understanding, 00000001000000000000000D should have been archived. Am I wrong here ?

postgres@pg1:~/9.6/main/pg_xlog$ ls -lrth
total 65M
-rw------- 1 postgres postgres  16M Jul  2 13:09 00000001000000000000000E
-rw------- 1 postgres postgres  16M Jul  2 13:13 00000001000000000000000F
-rw------- 1 postgres postgres  16M Jul  2 13:13 000000010000000000000010
-rw------- 1 postgres postgres  302 Jul  2 13:13 00000001000000000000000C.00000028.backup
drwx------ 2 postgres postgres 4.0K Jul  2 13:23 archive_status
-rw------- 1 postgres postgres  16M Jul  2 15:38 00000001000000000000000D

On Thu, Jul 2, 2020 at 9:26 PM Jeff Janes <[hidden email]> wrote:
On Thu, Jul 2, 2020 at 11:44 AM Praveen Kumar K S <[hidden email]> wrote:
Hello,

Thanks for your response. I agree with all your suggestions.

At this moment, I'm concerned about the WAL archiving.

I'm seeing below error on pg3 (DR server) in log. Is this expected ?
cp: cannot stat '/mnt/server/archivedir/00000001000000000000000D': No such file or directory

Yes, this is expected.  That file has not been archived yet because the master is not done writing data to it yet.  pg3 will keep retrying the command until the file shows up.
 
Cheers, 

Jeff


--
Regards,

K S Praveen Kumar
M: +91-9986855625

Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Stephen Frost
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> But if you look at the output below, 00000001000000000000000D is already
> 16MB and new WAL are created. As per my understanding,
> 00000001000000000000000D should have been archived. Am I wrong here ?

Yes, you're wrong- WAL files are re-used and will always be 16MB even if
they aren't yet completed.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [HELP] Issue with standby server using WAL archive

Praveen Kumar K S
Hello,

Thanks for the clarification. Learned something new today. For now, I have my DC and DR setup done. I will look into pgbackrest for backup options. Thanks again!

On Thu, Jul 2, 2020 at 9:33 PM Stephen Frost <[hidden email]> wrote:
Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> But if you look at the output below, 00000001000000000000000D is already
> 16MB and new WAL are created. As per my understanding,
> 00000001000000000000000D should have been archived. Am I wrong here ?

Yes, you're wrong- WAL files are re-used and will always be 16MB even if
they aren't yet completed.

Thanks,

Stephen


--
Regards,

K S Praveen Kumar
M: +91-9986855625

Reply | Threaded
Open this post in threaded view
|

RE: [HELP] Issue with standby server using WAL archive

amit
In reply to this post by Stephen Frost
I do not wish to get emails from these. Kindly remove me from mailing list.



-----Original Message-----
From: Stephen Frost <[hidden email]>
Sent: 02 July 2020 20:56
To: Praveen Kumar K S <[hidden email]>
Cc: [hidden email]
Subject: Re: [HELP] Issue with standby server using WAL archive

Greetings,

* Praveen Kumar K S ([hidden email]) wrote:
> I'm using 9.6

That's quite old- I'd strongly recommend that you look at using a newer
version.

> I have 3 servers. Let us call it pg1,pg2,pg3
>
> pg1 is master
> pg2 is hot stand by
> pg3 is a DR server which doesn't serve any requests. hot_standby is
> not enabled in postgresql.conf on DR server.

If hot_standby isn't enable then you won't be able to connect- which is
entirely fine, but if that's what you want then I'm not sure what the error
is that you're concerned about..?

> pg1 and pg2 are in the same network and are in sync.

Ok.  Note that having only one replica when you're using syncronous
replication will lead to a situation where, if either node fails, the
database system won't accept writes.

> I followed official documentation and used the cp command. Are there
> any other best approaches ?

The documentation provides an example of how to use the command, using 'cp'
isn't actually recommended.  I'd strongly suggest you look at proper
backup/archiving solutions for PostgreSQL like pgbackrest to use instead.

Thanks,

Stephen