replication consistency checking

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

replication consistency checking

hydra
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?

In the MySQL world there is the percona-toolkit with pt-table-checksum that does this job:
https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Venkata B Nagothi


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia

Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

hydra


On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia


Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.

Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Greg Clough
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'

receive_location | replay_location | recovery_status

------------------+-----------------+-----------------

7/68002388       | 7/68002388      | t

(1 row)



... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$  psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'

primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag

--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------

           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0

           1 |            2 | 2015-06-04 09:09:40.45123+00  | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00  | 30/F85D74C0               | 30/F85D74C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0



Greg Clough

Database Administrator | CWT Digital

t. 0845 456 0070 w. cwtdigital.com


Units A/B, Level 8 North, New England House, 
New England Street, Brighton, East Sussex, BN1 4GH

https://ci6.googleusercontent.com/proxy/spbuXLCE_kiYn1CBmHRh6DoZX1WwETayC0rma88wNedREVrYPrjvkpQLcBPeuKfSkfkOpDSkuDvye9g2Ps4-BW4s=s0-d-e1-ft#http://www.cwtdigital.com/hosted/twitter.png https://ci5.googleusercontent.com/proxy/q4GOyacafkMnvs7TLpy3rOcwVK_uxMoECtqVEVpwmxiLShfkGNOZGisnwu4oHGtcEK_C-TfrLguFshp1VKC-5zZw6g=s0-d-e1-ft#http://www.cwtdigital.com/hosted/facebook.png https://ci4.googleusercontent.com/proxy/BVh2CnIvbQ_Rp5LFVzt4p72zUvhG3u7ksBMtlWRjbWjXL-DXIFEXXZJCYWMlSd3MshfG4UN0NAuTj1eElX8r5iu2KQ=s0-d-e1-ft#http://www.cwtdigital.com/hosted/linkedin.png https://ci4.googleusercontent.com/proxy/QJJKpRXCvrkfCat5ycimzE7Lve3Pp9wFZWL5eBF5ELDkpl_gA8UVOc-R2p29DTS5DJmfD-FHGg2cjSf4lEuz2Ts=s0-d-e1-ft#http://www.cwtdigital.com/hosted/google.png https://ci4.googleusercontent.com/proxy/X2ddbFYXE1hmnS4LGzoCr-Pl8fpK6yoO9R-jjF3__nbZkbuCmQR2LD3-Ts3deetyHm0L8E-hjyXeRS0YTkcqtNNGkxM=s0-d-e1-ft#http://www.cwtdigital.com/hosted/instagram.png


On 4 June 2015 at 09:47, hydra <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia


Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.


Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

hydra
Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.


On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <[hidden email]> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'

receive_location | replay_location | recovery_status

------------------+-----------------+-----------------

7/68002388       | 7/68002388      | t

(1 row)



... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$  psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'

primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag

--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------

           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0

           1 |            2 | 2015-06-04 09:09:40.45123+00  | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00  | 30/F85D74C0               | 30/F85D74C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0



Greg Clough

Database Administrator | CWT Digital

t. 0845 456 0070 w. cwtdigital.com


Units A/B, Level 8 North, New England House, 
New England Street, Brighton, East Sussex, BN1 4GH

https://ci6.googleusercontent.com/proxy/spbuXLCE_kiYn1CBmHRh6DoZX1WwETayC0rma88wNedREVrYPrjvkpQLcBPeuKfSkfkOpDSkuDvye9g2Ps4-BW4s=s0-d-e1-ft#http://www.cwtdigital.com/hosted/twitter.png https://ci5.googleusercontent.com/proxy/q4GOyacafkMnvs7TLpy3rOcwVK_uxMoECtqVEVpwmxiLShfkGNOZGisnwu4oHGtcEK_C-TfrLguFshp1VKC-5zZw6g=s0-d-e1-ft#http://www.cwtdigital.com/hosted/facebook.png https://ci4.googleusercontent.com/proxy/BVh2CnIvbQ_Rp5LFVzt4p72zUvhG3u7ksBMtlWRjbWjXL-DXIFEXXZJCYWMlSd3MshfG4UN0NAuTj1eElX8r5iu2KQ=s0-d-e1-ft#http://www.cwtdigital.com/hosted/linkedin.png https://ci4.googleusercontent.com/proxy/QJJKpRXCvrkfCat5ycimzE7Lve3Pp9wFZWL5eBF5ELDkpl_gA8UVOc-R2p29DTS5DJmfD-FHGg2cjSf4lEuz2Ts=s0-d-e1-ft#http://www.cwtdigital.com/hosted/google.png https://ci4.googleusercontent.com/proxy/X2ddbFYXE1hmnS4LGzoCr-Pl8fpK6yoO9R-jjF3__nbZkbuCmQR2LD3-Ts3deetyHm0L8E-hjyXeRS0YTkcqtNNGkxM=s0-d-e1-ft#http://www.cwtdigital.com/hosted/instagram.png


On 4 June 2015 at 09:47, hydra <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia


Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.



Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Guillaume Lelarge-3
2015-06-04 13:03 GMT+02:00 hydra <[hidden email]>:
Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.


There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.
 

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <[hidden email]> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'

receive_location | replay_location | recovery_status

------------------+-----------------+-----------------

7/68002388       | 7/68002388      | t

(1 row)



... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$  psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'

primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag

--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------

           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0

           1 |            2 | 2015-06-04 09:09:40.45123+00  | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00  | 30/F85D74C0               | 30/F85D74C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0



Greg Clough

Database Administrator | CWT Digital

t. 0845 456 0070 w. cwtdigital.com


Units A/B, Level 8 North, New England House, 
New England Street, Brighton, East Sussex, BN1 4GH

https://ci6.googleusercontent.com/proxy/spbuXLCE_kiYn1CBmHRh6DoZX1WwETayC0rma88wNedREVrYPrjvkpQLcBPeuKfSkfkOpDSkuDvye9g2Ps4-BW4s=s0-d-e1-ft#http://www.cwtdigital.com/hosted/twitter.png https://ci5.googleusercontent.com/proxy/q4GOyacafkMnvs7TLpy3rOcwVK_uxMoECtqVEVpwmxiLShfkGNOZGisnwu4oHGtcEK_C-TfrLguFshp1VKC-5zZw6g=s0-d-e1-ft#http://www.cwtdigital.com/hosted/facebook.png https://ci4.googleusercontent.com/proxy/BVh2CnIvbQ_Rp5LFVzt4p72zUvhG3u7ksBMtlWRjbWjXL-DXIFEXXZJCYWMlSd3MshfG4UN0NAuTj1eElX8r5iu2KQ=s0-d-e1-ft#http://www.cwtdigital.com/hosted/linkedin.png https://ci4.googleusercontent.com/proxy/QJJKpRXCvrkfCat5ycimzE7Lve3Pp9wFZWL5eBF5ELDkpl_gA8UVOc-R2p29DTS5DJmfD-FHGg2cjSf4lEuz2Ts=s0-d-e1-ft#http://www.cwtdigital.com/hosted/google.png https://ci4.googleusercontent.com/proxy/X2ddbFYXE1hmnS4LGzoCr-Pl8fpK6yoO9R-jjF3__nbZkbuCmQR2LD3-Ts3deetyHm0L8E-hjyXeRS0YTkcqtNNGkxM=s0-d-e1-ft#http://www.cwtdigital.com/hosted/instagram.png


On 4 June 2015 at 09:47, hydra <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia


Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.






--
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

hydra


On Thu, Jun 4, 2015 at 1:41 PM, Guillaume Lelarge <[hidden email]> wrote:
2015-06-04 13:03 GMT+02:00 hydra <[hidden email]>:
Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.


There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.


I have little experience with PostgreSQL, but that is not true for MySQL. You can have your system running and doing consistency checking online.
 
 

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <[hidden email]> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'

receive_location | replay_location | recovery_status

------------------+-----------------+-----------------

7/68002388       | 7/68002388      | t

(1 row)



... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$  psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'

primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag

--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------

           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0

           1 |            2 | 2015-06-04 09:09:40.45123+00  | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00  | 30/F85D74C0               | 30/F85D74C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0



Greg Clough

Database Administrator | CWT Digital

t. 0845 456 0070 w. cwtdigital.com


Units A/B, Level 8 North, New England House, 
New England Street, Brighton, East Sussex, BN1 4GH

https://ci6.googleusercontent.com/proxy/spbuXLCE_kiYn1CBmHRh6DoZX1WwETayC0rma88wNedREVrYPrjvkpQLcBPeuKfSkfkOpDSkuDvye9g2Ps4-BW4s=s0-d-e1-ft#http://www.cwtdigital.com/hosted/twitter.png https://ci5.googleusercontent.com/proxy/q4GOyacafkMnvs7TLpy3rOcwVK_uxMoECtqVEVpwmxiLShfkGNOZGisnwu4oHGtcEK_C-TfrLguFshp1VKC-5zZw6g=s0-d-e1-ft#http://www.cwtdigital.com/hosted/facebook.png https://ci4.googleusercontent.com/proxy/BVh2CnIvbQ_Rp5LFVzt4p72zUvhG3u7ksBMtlWRjbWjXL-DXIFEXXZJCYWMlSd3MshfG4UN0NAuTj1eElX8r5iu2KQ=s0-d-e1-ft#http://www.cwtdigital.com/hosted/linkedin.png https://ci4.googleusercontent.com/proxy/QJJKpRXCvrkfCat5ycimzE7Lve3Pp9wFZWL5eBF5ELDkpl_gA8UVOc-R2p29DTS5DJmfD-FHGg2cjSf4lEuz2Ts=s0-d-e1-ft#http://www.cwtdigital.com/hosted/google.png https://ci4.googleusercontent.com/proxy/X2ddbFYXE1hmnS4LGzoCr-Pl8fpK6yoO9R-jjF3__nbZkbuCmQR2LD3-Ts3deetyHm0L8E-hjyXeRS0YTkcqtNNGkxM=s0-d-e1-ft#http://www.cwtdigital.com/hosted/instagram.png


On 4 June 2015 at 09:47, hydra <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia


Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.






--

Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Guillaume Lelarge-3
2015-06-04 14:04 GMT+02:00 hydra <[hidden email]>:


On Thu, Jun 4, 2015 at 1:41 PM, Guillaume Lelarge <[hidden email]> wrote:
2015-06-04 13:03 GMT+02:00 hydra <[hidden email]>:
Thanks Greg, this looks nice.

However my original question still remains. You know, every software has bugs, every bits and pieces can break, hardware can be misbehaving. Really, checking the data and counting the checksum is the only way to be sure.


There is no such tool available as far as I know. Writing one that does that should not be quite difficult. The main issue, AFAICT, would be to stop writing on both while you do the check. I know many users wouldn't be happy with this. And if you do not stop them from writing, you'll get quite a lot of false positives on a busy system.


I have little experience with PostgreSQL, but that is not true for MySQL. You can have your system running and doing consistency checking online.
 

I was only speaking about PostgreSQL. I have no experience with MySQL.

 

On Thu, Jun 4, 2015 at 11:12 AM, Greg Clough <[hidden email]> wrote:
If you're using Streaming Replication, then the internal PostgreSQL code ensures consistency... but if you wanted to make sure that your standby is applying logs, then you can use something like:

postgres@HOST2:~$ psql -c 'select pg_last_xlog_receive_location() "receive_location", pg_last_xlog_replay_location() "replay_location", pg_is_in_recovery() "recovery_status";'

receive_location | replay_location | recovery_status

------------------+-----------------+-----------------

7/68002388       | 7/68002388      | t

(1 row)



... or as suggested previously, use repmgr, which has a "repmgrd" daemon that keeps things monitored for you:

postgres@HOST1:~$  psql repmgr repmgr -c 'select * from repmgr_test.repl_monitor;'

primary_node | standby_node |       last_monitor_time       |        last_apply_time        | last_wal_primary_location | last_wal_standby_location | replication_lag | apply_lag

--------------+--------------+-------------------------------+-------------------------------+---------------------------+---------------------------+-----------------+-----------

           1 |            2 | 2015-06-04 09:09:37.920809+00 | 2015-06-04 09:09:37.712235+00 | 30/F85D7280               | 30/F85D7280               |               0 |         0

           1 |            2 | 2015-06-04 09:09:40.45123+00  | 2015-06-04 09:09:38.251625+00 | 30/F85D73C0               | 30/F85D73C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:42.966728+00 | 2015-06-04 09:09:40.75668+00  | 30/F85D74C0               | 30/F85D74C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:45.480471+00 | 2015-06-04 09:09:43.273428+00 | 30/F85D75C0               | 30/F85D75C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:47.997739+00 | 2015-06-04 09:09:45.785806+00 | 30/F85D76C0               | 30/F85D76C0               |               0 |         0

           1 |            2 | 2015-06-04 09:09:50.515236+00 | 2015-06-04 09:09:48.303822+00 | 30/F85D77C0               | 30/F85D77C0               |               0 |         0



Greg Clough

Database Administrator | CWT Digital

t. 0845 456 0070 w. cwtdigital.com


Units A/B, Level 8 North, New England House, 
New England Street, Brighton, East Sussex, BN1 4GH

https://ci6.googleusercontent.com/proxy/spbuXLCE_kiYn1CBmHRh6DoZX1WwETayC0rma88wNedREVrYPrjvkpQLcBPeuKfSkfkOpDSkuDvye9g2Ps4-BW4s=s0-d-e1-ft#http://www.cwtdigital.com/hosted/twitter.png https://ci5.googleusercontent.com/proxy/q4GOyacafkMnvs7TLpy3rOcwVK_uxMoECtqVEVpwmxiLShfkGNOZGisnwu4oHGtcEK_C-TfrLguFshp1VKC-5zZw6g=s0-d-e1-ft#http://www.cwtdigital.com/hosted/facebook.png https://ci4.googleusercontent.com/proxy/BVh2CnIvbQ_Rp5LFVzt4p72zUvhG3u7ksBMtlWRjbWjXL-DXIFEXXZJCYWMlSd3MshfG4UN0NAuTj1eElX8r5iu2KQ=s0-d-e1-ft#http://www.cwtdigital.com/hosted/linkedin.png https://ci4.googleusercontent.com/proxy/QJJKpRXCvrkfCat5ycimzE7Lve3Pp9wFZWL5eBF5ELDkpl_gA8UVOc-R2p29DTS5DJmfD-FHGg2cjSf4lEuz2Ts=s0-d-e1-ft#http://www.cwtdigital.com/hosted/google.png https://ci4.googleusercontent.com/proxy/X2ddbFYXE1hmnS4LGzoCr-Pl8fpK6yoO9R-jjF3__nbZkbuCmQR2LD3-Ts3deetyHm0L8E-hjyXeRS0YTkcqtNNGkxM=s0-d-e1-ft#http://www.cwtdigital.com/hosted/instagram.png


On 4 June 2015 at 09:47, hydra <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 8:27 AM, Venkata Balaji N <[hidden email]> wrote:


On Thu, Jun 4, 2015 at 2:49 PM, hydra <[hidden email]> wrote:
After setting up streaming replication, is it possible to check whether the slave has the same data as the master?


Yes. There are quite a few ways to check the sync status between master and standby. Which version of PostgreSQL are you using by the way ? 

Generally, you can easily build some scripts and schedule them in cron to check the status. You can use functions or catalog views to monitor the status, this depends on the version of PG you are using.

You can also use tools like "repmgr" to monitor and manage replication. I am not 100% sure how good or efficient it is.

Regards,
Venkata Balaji N

Fujitsu Australia


Thanks for the answer, however I'm looking for a data consistency check.

That means, you deploy your master standby replication, you're happily running it and after 3 months you decide to do a failover. How can one be sure the data are the same?

I'm running PostgreSQL 9.4.






--




--
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Bruce Momjian
In reply to this post by hydra
On Thu, Jun  4, 2015 at 01:03:00PM +0200, hydra wrote:
> Thanks Greg, this looks nice.
>
> However my original question still remains. You know, every software has bugs,
> every bits and pieces can break, hardware can be misbehaving. Really, checking
> the data and counting the checksum is the only way to be sure.

I believe MySQL needed such a tool because it had known replication
synchronization problems  ---  Postgres does not, so has no such tool.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


--
Sent via pgsql-admin mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

David G Johnston
On Thu, Jun 4, 2015 at 8:32 AM, Bruce Momjian <[hidden email]> wrote:
On Thu, Jun  4, 2015 at 01:03:00PM +0200, hydra wrote:
> Thanks Greg, this looks nice.
>
> However my original question still remains. You know, every software has bugs,
> every bits and pieces can break, hardware can be misbehaving. Really, checking
> the data and counting the checksum is the only way to be sure.

I believe MySQL needed such a tool because it had known replication
synchronization problems  ---  Postgres does not, so has no such tool.

​Its those unknown replication synchronization problems that ​this tool would be able to catch...

How to page checksums factor into this?

David J.

Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Ravi Krishna-4
In reply to this post by Bruce Momjian
> I believe MySQL needed such a tool because it had known replication
> synchronization problems  ---  Postgres does not, so has no such tool.

I agree with Bruce here. In DB2, for HADR (similar to PG replication)
there is no tool which checks whether primary and standby are same so
that there is no surprise after a failover. I am pretty same is true
for Oracle Dataguard.

If a RDBMS is offering a tool to check for consistency, there is a
high likelihood it is because of inherent problems.


--
Sent via pgsql-admin mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Sergey Konoplev-2
In reply to this post by hydra
On Wed, Jun 3, 2015 at 9:49 PM, hydra <[hidden email]> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
[hidden email]


--
Sent via pgsql-admin mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

hydra
On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <[hidden email]> wrote:
On Wed, Jun 3, 2015 at 9:49 PM, hydra <[hidden email]> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
<a href="tel:%2B1%20%28415%29%20867-9984" value="+14158679984">+1 (415) 867-9984, <a href="tel:%2B7%20%28499%29%20346-7196" value="+74993467196">+7 (499) 346-7196, <a href="tel:%2B7%20%28988%29%20888-1979" value="+79888881979">+7 (988) 888-1979
[hidden email]



Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf

It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

The primary symptom of this corruption is rows that:
  • are present on the master, but missing on the replica
  • have been deleted on the master still appear to be visible on the replica
  • have been updated, and their old versions appear alongside the new, updated versions on the replica
How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Igor Neyman

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

 

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <[hidden email]> wrote:

On Wed, Jun 3, 2015 at 9:49 PM, hydra <[hidden email]> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
<a href="tel:%2B1%20%28415%29%20867-9984">+1 (415) 867-9984, <a href="tel:%2B7%20%28499%29%20346-7196"> +7 (499) 346-7196, <a href="tel:%2B7%20%28988%29%20888-1979">+7 (988) 888-1979
[hidden email]



Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf

It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

The primary symptom of this corruption is rows that:

  • are present on the master, but missing on the replica
  • have been deleted on the master still appear to be visible on the replica
  • have been updated, and their old versions appear alongside the new, updated versions on the replica

How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

 

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

 

 

My hand is up.

From the wiki page that you referenced you forgot to quote this:

 

This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”

 

So, what’s your point?  Yes, any software can (and does) have bugs.  What’s important is how quickly the bug is discovered and fixed.

 

You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.

But, how is to say that this utility will be 100% error-free?

 

Regards,

Igor Neyman

 

 

Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

hydra


On Fri, Jun 5, 2015 at 3:42 PM, Igor Neyman <[hidden email]> wrote:

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

 

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <[hidden email]> wrote:

On Wed, Jun 3, 2015 at 9:49 PM, hydra <[hidden email]> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
<a href="tel:%2B1%20%28415%29%20867-9984" target="_blank">+1 (415) 867-9984, <a href="tel:%2B7%20%28499%29%20346-7196" target="_blank"> +7 (499) 346-7196, <a href="tel:%2B7%20%28988%29%20888-1979" target="_blank">+7 (988) 888-1979
[hidden email]



Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf

It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

The primary symptom of this corruption is rows that:

  • are present on the master, but missing on the replica
  • have been deleted on the master still appear to be visible on the replica
  • have been updated, and their old versions appear alongside the new, updated versions on the replica

How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

 

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

 

 

My hand is up.

From the wiki page that you referenced you forgot to quote this:

 

This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”

 

So, what’s your point?  Yes, any software can (and does) have bugs.  What’s important is how quickly the bug is discovered and fixed.

 

You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.

But, how is to say that this utility will be 100% error-free?

 

Regards,

Igor Neyman

 

 


Hello Igor,
no code is without bugs, yes, we are humans. That is why I'm so shocked that everybody blindly trusts the replication (which is code again, plus network and hardware).

My point was to show that such a tool would be beneficial even for PostgreSQL. Yes, even the checking utility can have bugs, but at least you are better than now. Because now everybody relies on the fact that it should be ok. But as we see from the bug report, such problems exist here also. So from the situation where you believe everything should be ok you would have a tool reporting "checksums ok" and would allow your to trust your standby data more.
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Igor Neyman

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of hydra
Sent: Friday, June 05, 2015 10:25 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

 

 

 

On Fri, Jun 5, 2015 at 3:42 PM, Igor Neyman <[hidden email]> wrote:

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of hydra
Sent: Friday, June 05, 2015 12:33 AM
To: pgsql-admin
Subject: Re: [ADMIN] replication consistency checking

 

On Fri, Jun 5, 2015 at 4:40 AM, Sergey Konoplev <[hidden email]> wrote:

On Wed, Jun 3, 2015 at 9:49 PM, hydra <[hidden email]> wrote:
> After setting up streaming replication, is it possible to check whether the
> slave has the same data as the master?
>
> In the MySQL world there is the percona-toolkit with pt-table-checksum that
> does this job:
> https://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html

I believe you only need
http://www.postgresql.org/docs/9.4/static/app-initdb.html#APP-INITDB-DATA-CHECKSUMS.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
<a href="tel:%2B1%20%28415%29%20867-9984" target="_blank">+1 (415) 867-9984, <a href="tel:%2B7%20%28499%29%20346-7196" target="_blank"> +7 (499) 346-7196, <a href="tel:%2B7%20%28988%29%20888-1979" target="_blank"> +7 (988) 888-1979
[hidden email]

 

Thank you all for replies,
while looking for replication information I found this:
http://thebuild.com/presentations/worst-day-fosdem-2014.pdf

It's a real life experience of hitting this replication bug:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

The primary symptom of this corruption is rows that:

  • are present on the master, but missing on the replica
  • have been deleted on the master still appear to be visible on the replica
  • have been updated, and their old versions appear alongside the new, updated versions on the replica

How can I verify whether I already have this corruption?
There is no known way to identify that the issue has affected a standby in the past but comparing the data from the primary with the standby.

 

So hands up who still think PostgreSQL doesn't need some way of checking the data consistency between master-standby? :)

 

 

My hand is up.

From the wiki page that you referenced you forgot to quote this:

 

This is an issue, discovered Nov. 18, 2013., which can cause data corruption on a Hot-Standby replica when it is (re-)started, by marking committed transactions as uncommitted. This issue is fixed in the December 5th 2013 update releases.”

 

So, what’s your point?  Yes, any software can (and does) have bugs.  What’s important is how quickly the bug is discovered and fixed.

 

You wish to write a utility that compares data on 2 different clusters – sure, by all means, but I believe time could be better spent on something else.

But, how is to say that this utility will be 100% error-free?

 

Regards,

Igor Neyman

 

 

 

Hello Igor,

no code is without bugs, yes, we are humans. That is why I'm so shocked that everybody blindly trusts the replication (which is code again, plus network and hardware).

My point was to show that such a tool would be beneficial even for PostgreSQL. Yes, even the checking utility can have bugs, but at least you are better than now. Because now everybody relies on the fact that it should be ok. But as we see from the bug report, such problems exist here also. So from the situation where you believe everything should be ok you would have a tool reporting "checksums ok" and would allow your to trust your standby data more.

 

 

The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not transactions while it does its job.

 

Igor Neyman

Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Scott Ribe-2
On Jun 5, 2015, at 8:42 AM, Igor Neyman <[hidden email]> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)

--
Scott Ribe
[hidden email]
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice







--
Sent via pgsql-admin mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

hydra


On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <[hidden email]> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <[hidden email]> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)



Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

 
--
Scott Ribe
[hidden email]
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
<a href="tel:%28303%29%20722-0567" value="+13037220567">(303) 722-0567 voice






Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

Wei Shan
Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views
2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

On 6 June 2015 at 12:43, hydra <[hidden email]> wrote:


On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <[hidden email]> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <[hidden email]> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)



Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

 
--
Scott Ribe
[hidden email]
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
<a href="tel:%28303%29%20722-0567" value="+13037220567" target="_blank">(303) 722-0567 voice









--
Regards,
Ang Wei Shan
Reply | Threaded
Open this post in threaded view
|

Re: replication consistency checking

d0uble

9 июня 2015 г., в 7:06, Wei Shan <[hidden email]> написал(а):

Hi,

Usually, for failover, we don't do data consistency check. because it's unplanned failover! In my opinion, it's quite safe to trust pg_stat_replication because it's a binary level replication so it's hard to get row missing unless it's a bug. You got higher chance of getting corruption. However, for switchover, we will do the following;

1. Checking pg_stat_replication views
2. Have a script that do a row count on all the data tables for a period of time. We should see that the 2 database have have very similar row count and maybe a slight delay of 1s?

Starting from 9.3 it is sufficient to gracefully stop master (with '-m falst’ or ‘-m smart’) and all changes would be guaranteed delivered to archive and replics (even if they are asynchronous). See this thread [0].



I know that this is not perfect, but at least it's something to start with. The only tool out that that's doing this is Oracle VeriData but I'm not too sure how they are doing this.

Cheers.

On 6 June 2015 at 12:43, hydra <[hidden email]> wrote:


On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <[hidden email]> wrote:
On Jun 5, 2015, at 8:42 AM, Igor Neyman <[hidden email]> wrote:
>
> The problem I see with “checksum utility” is that for it to work both compared servers should be “static”:  not transactions while it does its job.

Indeed, and that was brought up before and OP seems to be ignoring it. What magic does MySQL (supposedly) use to compare databases without interfering with updates?

One could imagine a built-in feature in PG which depends on using MVCC and having both sides look at the same snapshot. (Which would require repeatable reads.)

But for an external utility, that's quite a bit harder. One suggestion which would involve minimal interruption to processing: if you have a snapshottable file system, shut down master, take snapshot, shut down replica, bring master back up, snapshot replica, bring it back up. You *still* have the issue of making sure that at the moment you take master down all changes have been streamed to replica, and the (easier) issue of making sure replica has applied them before taking its snapshot... But if you can manage to pull that off, then you can checksum & compare as much as you want, *IF* you can actually pull that off correctly ;-)



Hello,
I wasn't talking about a static check, indeed I was referring to an online tool.

I haven't read the sources for the MySQL tool yet, however it computes the checksum on the master, writes the operation to the binary log and while using statement replication, the slave computes the checksum - then those can be compared. Not all data in the table are checksummed at once, but smaller chunks are used instead.

As Igor mentioned before, that tool can also fail and is not 100%. But I suppose if it would be misbehaving, chances are the checksums will differ and you will notice it. It will probably not by accident compute the same crc/md5 whatever.

I have done some tests with it, comparing data after setting up replication, all went fine, then I changed some integer on the slave, it really computed different checkum as expected. I also did a logical dump compare, it really seems to be doing its job fine.

 
--
Scott Ribe
[hidden email]
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
<a href="tel:%28303%29%20722-0567" value="+13037220567" target="_blank" class="">(303) 722-0567 voice









--
Regards,
Ang Wei Shan


--
May the force be with you…

12