BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup

PG Doc comments form
The following bug has been logged on the website:

Bug reference:      15989
Logged by:          James Lucas
Email address:      [hidden email]
PostgreSQL version: 11.5
Operating system:   Centos 7
Description:        

After being sent SIGKILL while exclusive backup is in effect, cluster
becomes unable to open as a hot standby.  Reproducible on (at least) 11.5
and 9.6.15.

Steps to reproduce on a fresh database cluster:

* Initialize new cluster.
initdb -D data

* Start cluster.
pg_ctl -D data start

* Verify hot_standby is enabled (Should return "on" on 11.5.  On 9.6.15,
need to enable it.)
psql -c 'show hot_standby'

* Verify wal_level is set to replica or logical (Okay on 11.5.  On 9.6.15
need to set and restart.)
psql -c 'show wal_level'

* Stop cluster.
pg_ctl -D data stop

* Add recovery.conf
echo 'standby_mode = true' >> data/recovery.conf

* Start as hot standby
pg_ctl -D data start

* Validate cluster is running as hot standby (should return true)
psql -c 'select pg_is_in_recovery()'

* Stop cluster
pg_ctl -D data stop

* Remove recovery.conf
rm data/recovery.conf

* Start cluster normally
pg_ctl -D data start

* Enable exclusive backup
psql -c "select pg_start_backup('')"

* Find pid of main postgres process
ps -ef | grep 'postgres -D'

* Send SIGKILL to found pid
kill -s KILL <pid>

* Add recovery.conf
echo 'standby_mode = true' >> data/recovery.conf

* Attempt to start cluster
pg_ctl -D data start

At this point, the cluster fails to open.  On 11.5 pg_ctl hangs waiting for
the database to open, and eventually times out.  On 9.6.15, pg_ctl runs
normally, but tailing the database log shows that it never opens.  It loops
at "starting up."

I've found that if you stop the instance and remove the recovery.conf, the
database actually will open normally.  But even after that, if you go back
and try to open as a hot standby it will fail to open again.  I have so far
not been able to find a way to let this cluster open as a hot standby again.
 Affected instances had to be restored from backup.

This is particularly a problem when running postgres in Docker, as Docker
will send SIGKILL if database shutdown takes more than a few seconds.

Please let me know if any questions.

Thanks,
James Lucas

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup

Stephen Frost
Greetings,

* PG Bug reporting form ([hidden email]) wrote:
> * Enable exclusive backup
> psql -c "select pg_start_backup('')"
>
> * Find pid of main postgres process
> ps -ef | grep 'postgres -D'
>
> * Send SIGKILL to found pid
> kill -s KILL <pid>

Don't kill the postmaster and don't use exclusive backup (which has been
deprecated, due specifically in part to the issue that it causes
problems on a crash).

> This is particularly a problem when running postgres in Docker, as Docker
> will send SIGKILL if database shutdown takes more than a few seconds.

You'll want to fix that then.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment