block corruption on slave db.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

block corruption on slave db.

vinod kale
Hello Team,

I have to clear some doughts regarding block corruption on slave db.

If m having streaming replication in my environment. And in some moments corruption happened in my slave server's data then how will i come to know the corruption occurred.

Bcoz if same happened on primary db then i wll immediately come to know bcoz there is daily dump backup we take on primary and in dump block corruption detects.

Bt my dought is how we can identify on slave db.

Kindly help me in above.

Thanks in advance.

--
Thanks & regards,
Vinod  Kale
Reply | Threaded
Open this post in threaded view
|

Re: block corruption on slave db.

Keith


On Mon, Jun 15, 2020 at 9:45 AM vinod kale <[hidden email]> wrote:
Hello Team,

I have to clear some doughts regarding block corruption on slave db.

If m having streaming replication in my environment. And in some moments corruption happened in my slave server's data then how will i come to know the corruption occurred.

Bcoz if same happened on primary db then i wll immediately come to know bcoz there is daily dump backup we take on primary and in dump block corruption detects.

Bt my dought is how we can identify on slave db.

Kindly help me in above.

Thanks in advance.

--
Thanks & regards,
Vinod  Kale

There's many different kinds of corruption, but one fairly effective way you can watch for some of it is by ensuring that page level checksums are enabled. If you're on a version of PG prior to 12, this can only be done during database initialization (when you run initdb), so if it wasn't enabled before, you'd have to dump and restore your database into a newly initialized instance.

If you are on PG12+, or you can upgrade to it, you can use the pg_checksum tool to enable page level checksums on an offline database


Also on PG12+, you have the ability to monitor if a page level checksum is encountered from within the database by querying the pg_stat_database catalog. This is useful for monitoring tools such as check_postgres(https://bucardo.org/check_postgres/) or pgMonitor (https://github.com/CrunchyData/pgmonitor)  An example query from pgMonitor to pull out the count and time since the last failure is as follows.

SELECT datname AS dbname
    , checksum_failures AS count
    , coalesce(extract(epoch from (now() - checksum_last_failure)), 0) AS time_since_last_failure_seconds
FROM pg_catalog.pg_stat_database;

Note that if statistics are reset, the errors detected and recorded here are also reset. That is also how you'd reset it if you encountered corruption and fixed it. Prior to PG12, page level checksum errors were only reported via the logs.

Keith