FATAL: SMgrRelation hashtable corrupted

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

FATAL: SMgrRelation hashtable corrupted

Daulat Ram-2

Hello team

 

I need your help on this issue.

 

My Postgres 11.2 container is not started due to the below error message. It is in streaming replication environment.

 

2019-05-17 06:41:08.989 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"

2019-05-17 06:41:09.093 UTC [11] LOG:  database system was interrupted while in recovery at 2019-05-17 06:40:24 UTC

2019-05-17 06:41:09.093 UTC [11] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.

2019-05-17 06:41:11.260 UTC [12] FATAL:  the database system is starting up

2019-05-17 06:41:11.673 UTC [13] FATAL:  the database system is starting up

2019-05-17 06:41:12.209 UTC [14] FATAL:  the database system is starting up

2019-05-17 06:41:12.427 UTC [15] FATAL:  the database system is starting up

2019-05-17 06:41:15.425 UTC [16] FATAL:  the database system is starting up

2019-05-17 06:41:15.680 UTC [17] FATAL:  the database system is starting up

2019-05-17 06:41:16.059 UTC [18] FATAL:  the database system is starting up

2019-05-17 06:41:16.263 UTC [19] FATAL:  the database system is starting up

2019-05-17 06:41:16.624 UTC [20] FATAL:  the database system is starting up

2019-05-17 06:41:17.471 UTC [21] FATAL:  the database system is starting up

2019-05-17 06:41:18.739 UTC [22] FATAL:  the database system is starting up

2019-05-17 06:41:19.877 UTC [11] LOG:  database system was not properly shut down; automatic recovery in progress

2019-05-17 06:41:19.887 UTC [11] LOG:  redo starts at 5E/170349E8

2019-05-17 06:41:19.954 UTC [11] FATAL:  SMgrRelation hashtable corrupted

2019-05-17 06:41:19.954 UTC [11] CONTEXT:  WAL redo at 5E/17061648 for Transaction/COMMIT: 2019-05-17 06:39:46.902988+00; rels: base/59265/105367 base/59265/105349 base/59265/105365 base/59265/105362 base/59265/105360 base/59265/105349 base/59265/105358 base/59265/105355; inval msgs: catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 catcache 50 catcache 49 relcache 105365 relcache 105367 relcache 105367 relcache 105293 relcache 105411 relcache 105411 relcache 105365 relcache 105293 relcache 105358 relcache 105360 relcache 105360 relcache 105285 relcache 105413 relcache 105413 relcache 105358 relcache 105285

2019-05-17 06:41:19.955 UTC [1] LOG:  startup process (PID 11) exited with exit code 1

2019-05-17 06:41:19.955 UTC [1] LOG:  aborting startup due to startup process failure

2019-05-17 06:41:19.961 UTC [1] LOG:  database system is shut down

 

Regards,

Daulat

Reply | Threaded
Open this post in threaded view
|

Re: FATAL: SMgrRelation hashtable corrupted

Tom Lane-2
Daulat Ram <[hidden email]> writes:
> My Postgres 11.2 container is not started due to the below error message. It is in streaming replication environment.

> 2019-05-17 06:41:19.954 UTC [11] FATAL:  SMgrRelation hashtable corrupted

Yes, this is probably the same issue reported in

https://www.postgresql.org/message-id/15672-b9fa7db32698269f@...

https://www.postgresql.org/message-id/15684-4ef33de3271cf929@...

The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
The bad news is that your database is probably toast anyway --- an update
won't undo the catalog corruption that is causing the WAL replay crash.
I hope you have a recent backup to restore from.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: FATAL: SMgrRelation hashtable corrupted

Andres Freund
Hi,

On 2019-05-17 09:30:05 -0400, Tom Lane wrote:
> The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
> The bad news is that your database is probably toast anyway --- an update
> won't undo the catalog corruption that is causing the WAL replay crash.
> I hope you have a recent backup to restore from.

Should there not be a backup, couldn't weaken the error checks during
replay a bit (locally), to allow replay to progress? The indexes will be
toast, but it ought to allow to recover the table data completely.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: FATAL: SMgrRelation hashtable corrupted

Alvaro Herrera-9
In reply to this post by Tom Lane-2
On 2019-May-17, Tom Lane wrote:

> The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
> The bad news is that your database is probably toast anyway --- an update
> won't undo the catalog corruption that is causing the WAL replay crash.
> I hope you have a recent backup to restore from.

Hmm, shouldn't it be possible to do a PITR restore to the point just
before the problem record, ie. 5E/17061648?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: FATAL: SMgrRelation hashtable corrupted

Tom Lane-2
Alvaro Herrera <[hidden email]> writes:
> On 2019-May-17, Tom Lane wrote:
>> The good news is that the underlying ALTER TABLE bug is fixed in 11.3.
>> The bad news is that your database is probably toast anyway --- an update
>> won't undo the catalog corruption that is causing the WAL replay crash.
>> I hope you have a recent backup to restore from.

> Hmm, shouldn't it be possible to do a PITR restore to the point just
> before the problem record, ie. 5E/17061648?

If he's got the necessary WAL archives :-(

A dump and restore would be advisable afterwards in any case, since
the catalog corruption would still be there.

                        regards, tom lane