Master-slave failover question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Master-slave failover question

Athanasios | ZenGuard
Dear list,
I hope that everyone has a nice holiday. Currently, I am researching master-slave automated promotion. I have one question. Let's assume that the clients connect to the master 192.168.1.1 for read/write traffic and I have a slave at 192.168.1.2. Assume that the master goes down and that slave gets promoted to a slave (this I have found out how to do automatically). How will the clients know to connect to the new master? Any ideas on this one?

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

You can use pgpool to do this. Or you can use failover manager by enterpriseDB.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Athanasios | ZenGuard
Thanks for the quick answer Sameer. However, using PGPool is not an option. Where can I find additional information for enterpriseDB?


On Fri, Jan 3, 2014 at 12:23 PM, Sameer Kumar <[hidden email]> wrote:

You can use pgpool to do this. Or you can use failover manager by enterpriseDB.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

Check their site. Www.enterprisedb.com

Can you share you failover command here? You may use ifconfig to switch a vitrual IP while doing a failover.

BTW why are you reluctant to use pgpool?

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Athanasios | ZenGuard
Hi,
PGPool is a bit of a blackbox. I am confident in the use of VRRP but I have not seen any documentation on how to combine this with pgpool to eliminate single points of failure.
Can you share a few more details about how I can use a virtual IP?


On Fri, Jan 3, 2014 at 1:12 PM, Sameer Kumar <[hidden email]> wrote:

Check their site. Www.enterprisedb.com

Can you share you failover command here? You may use ifconfig to switch a vitrual IP while doing a failover.

BTW why are you reluctant to use pgpool?

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Athanasios | ZenGuard
Hi all again,
I rigged up a solution with PGPool (!) in master-slave mode, using the existing documentation and it works like a charm with a downtime of only a few seconds, which is acceptable in our case. However assume the following scenario:
1) Master goes down
2) slave gets promoted to master
owing to the usage of pgpool, this works fine. However, assuming that there is a certain amount of write traffic that goes to the new master. When the old master (sorry for the terminology) comes back up, how can I "sync" him with all the changes that have been performed? This is one of the two pieces of the puzzle missing for me (the other one is pgpool specific and I directed my query to their specific mailing list).
Thanks for any replies.


On Fri, Jan 3, 2014 at 2:00 PM, Athanasios | ZenGuard <[hidden email]> wrote:
Hi,
PGPool is a bit of a blackbox. I am confident in the use of VRRP but I have not seen any documentation on how to combine this with pgpool to eliminate single points of failure.
Can you share a few more details about how I can use a virtual IP?


On Fri, Jan 3, 2014 at 1:12 PM, Sameer Kumar <[hidden email]> wrote:

Check their site. Www.enterprisedb.com

Can you share you failover command here? You may use ifconfig to switch a vitrual IP while doing a failover.

BTW why are you reluctant to use pgpool?

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb



Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

As of now (v9.3 of postgreSQL), you need to rebuild old (lost) master from new master.
Glad you could discover the magic of pgpool :-)

You can either use pgpoll admin to do this rebuilding in a single click or you can write your own shell script (like I had done) to resotre a lost node (do remember to test a scenario where slave is lost and you have to rebuild that).

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Athanasios | ZenGuard
Hi Sameer,
thank you for the quick reply (again!). I am not using pgpool admin so is it possible to share the shell script that does the rebuilding? Also, if a slave is lost, when he comes back up, will he not "catch up" using Streaming Replication?
Cheers!


On Fri, Jan 3, 2014 at 5:00 PM, Sameer Kumar <[hidden email]> wrote:

As of now (v9.3 of postgreSQL), you need to rebuild old (lost) master from new master.
Glad you could discover the magic of pgpool :-)

You can either use pgpoll admin to do this rebuilding in a single click or you can write your own shell script (like I had done) to resotre a lost node (do remember to test a scenario where slave is lost and you have to rebuild that).

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

If you are only relying on streaming replication (and not using archive shipping/hybrid replication), the probablity of 'catching-up' is as high as your wal_keep_segment.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Athanasios | ZenGuard
Any sane values to it? This is what I am using right now (10 just to be on the safe side, the 5000 (!) is a repmgr requirement(!))

max_wal_senders = 10            # max number of walsender processes
wal_keep_segments = 5000        # in logfile segments, 16MB each; 0 disables





On Fri, Jan 3, 2014 at 5:07 PM, Sameer Kumar <[hidden email]> wrote:

If you are only relying on streaming replication (and not using archive shipping/hybrid replication), the probablity of 'catching-up' is as high as your wal_keep_segment.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb


Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

If the other node node is not available for a few day then??? Esp if it is slave node?
I would keeping it (wal_keep_segment) fair and rebuilding the lost node if I ever lose one.

Let's see what others have to suggest.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sergey Konoplev-2
In reply to this post by Athanasios | ZenGuard
On Fri, Jan 3, 2014 at 3:18 AM, Athanasios | ZenGuard
<[hidden email]> wrote:
> Dear list,
> I hope that everyone has a nice holiday. Currently, I am researching
> master-slave automated promotion. I have one question. Let's assume that the
> clients connect to the master 192.168.1.1 for read/write traffic and I have
> a slave at 192.168.1.2. Assume that the master goes down and that slave gets
> promoted to a slave (this I have found out how to do automatically). How
> will the clients know to connect to the new master? Any ideas on this one?

One of the ideas, and BTW my favorite method, is to use PgBouncer to
redirect queries from master to slave with pause. The full description
of the process is by the link below.

https://github.com/grayhemp/pgcookbook/blob/master/switching_to_another_server_with_pgbouncer.md

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
[hidden email]


--
Sent via pgsql-novice mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

On Tue, Jan 7, 2014 at 4:53 AM, Sergey Konoplev <[hidden email]> wrote:
On Fri, Jan 3, 2014 at 3:18 AM, Athanasios | ZenGuard
<[hidden email]> wrote:
> Dear list,
> I hope that everyone has a nice holiday. Currently, I am researching
> master-slave automated promotion. I have one question. Let's assume that the
> clients connect to the master 192.168.1.1 for read/write traffic and I have
> a slave at 192.168.1.2. Assume that the master goes down and that slave gets
> promoted to a slave (this I have found out how to do automatically). How
> will the clients know to connect to the new master? Any ideas on this one?

One of the ideas, and BTW my favorite method, is to use PgBouncer to
redirect queries from master to slave with pause. The full description
of the process is by the link below.

https://github.com/grayhemp/pgcookbook/blob/master/switching_to_another_server_with_pgbouncer.md


I checked it out. Looks like the process involves manual identification of failure(?). But this is a nice method for manual switchovers.

As a suggestion, won't it be a good idea to run the pgbouncer on Application Server [so that even if the Master Server has to be shutdown, IP address changes or starting pgbouncer on slave is not needed]?

 
--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
<a href="tel:%2B1%20%28415%29%20867-9984" value="+14158679984">+1 (415) 867-9984, <a href="tel:%2B7%20%28901%29%20903-0499" value="+79019030499">+7 (901) 903-0499, <a href="tel:%2B7%20%28988%29%20888-1979" value="+79888881979">+7 (988) 888-1979
[hidden email]


--
Sent via pgsql-novice mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sergey Konoplev-2
On Tue, Jan 7, 2014 at 8:16 AM, Sameer Kumar <[hidden email]> wrote:

>> > promoted to a slave (this I have found out how to do automatically). How
>> > will the clients know to connect to the new master? Any ideas on this
>> > one?
>>
>> One of the ideas, and BTW my favorite method, is to use PgBouncer to
>> redirect queries from master to slave with pause. The full description
>> of the process is by the link below.
>>
>> https://github.com/grayhemp/pgcookbook/blob/master/switching_to_another_server_with_pgbouncer.md
>
> I checked it out. Looks like the process involves manual identification of
> failure(?). But this is a nice method for manual switchovers.

Correct, this is about manual detection of failure. There were a lot
of discussions around automatic failover but I have never seen anyone
defined a really working (for me) criteria to detect the failure.

> As a suggestion, won't it be a good idea to run the pgbouncer on Application
> Server [so that even if the Master Server has to be shutdown, IP address
> changes or starting pgbouncer on slave is not needed]?

It depends. In some cases it is worth doing if you have a single
application server (and do not planning to have more), or the overhead
of a high number of app->db network connections per second is
significantly high in comparison to a local one. However, modern
architectures usually use multiple application servers and persistent
connections, and in this case installing pgbouncer on database servers
is preferable. I also do not recommend to install it on a separate
machine as it brings another point of failure in any case. And again,
it depends on your situation.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
[hidden email]


--
Sent via pgsql-novice mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice
Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar

On Wed, Jan 8, 2014 at 1:55 AM, Sergey Konoplev <[hidden email]> wrote:
> I checked it out. Looks like the process involves manual identification of
> failure(?). But this is a nice method for manual switchovers.

Correct, this is about manual detection of failure. There were a lot
of discussions around automatic failover but
I have never seen anyone
defined a really working (for me) criteria to detect the failure.


I guess there are few aspects involved: Auto Detection of Failure, Auto-promotion of slave to master and automatic switch of application to point to New Master.

- Auto-detection of Failure: This can be tricky and may lead to split-brain scenario when both the nodes think they have lost there fellow node and Slave promotes itself as master and Master. If you have a proper log monitoring in place, this can be detected and you can bring things back in shape. I agree to your point "
 I have never seen anyone defined a really working (for me) criteria to detect the failure.
"

- Auto-promotion of slave to master: On failure detection you can use a shell script/ssh to create trigger file which shall promote the slave

- Auto switch of Application to New Master: You can use Virtual IP or use a middle-tier e.g. pgbouncer on a different node

- SO far I have used pgpool in HA mode to cater to all these points. I have run into split brain scenario but once detected, you just need to rebuild slave and attach it back. Despite split brain my application continues to use original master (since Virtual IP has not shifted). 

- Even if Virtual IP shifts my application would connect to slave (not promoted to a standalone master), which I can live with.

- You will run into an issue if you fail to notice split brain scenario. Later when you actually loose your master you will face a problem (you won't have a slave to failover to). 

- If the scenario is of a DR site, I would not recommend auto-failovers. The setup I am talking about is HA setup and I am using synchronous streaming replication. 

- In same setup I have a DR site, which is using file based log shipping (archive restore) and has band width constraints. The replication is not synchronous.

- In DR there could be a data/transaction loss while switching 2nd Node and hence IMHO decision and process should be manual.

 
> As a suggestion, won't it be a good idea to run the pgbouncer on Application
> Server [so that even if the Master Server has to be shutdown, IP address
> changes or starting pgbouncer on slave is not needed]?

It depends. In some cases it is worth doing if you have a single
application server (and do not planning to have more), or the overhead
of a high number of app->db network connections per second is
significantly high in comparison to a local one.

I can't recollect where but I read recommended setup for pgbouncer to be installed away from db server (to avoid process contention and few other points were mentioned which I can not exactly recollect). I have seen/done setups which has pgbouncer running on dbserver and they seem to holding fine.
Recently I had done a benchmark of running pgbouncer on to Client (windows machine running HammerDB as benchmark tool) and then tried to run it on db server. There was not much of a difference. 

However, modern
architectures usually use multiple application servers and persistent
connections, and in this case installing pgbouncer on database servers
is preferable.
I guess one can always setup multiple pgbouncer (one on each application node) (?).

 
I also do not recommend to install it on a separate
machine as it brings another point of failure in any case.
Agree
And again,
it depends on your situation.
That's fair




Best Regards,
Sameer Kumar | Database Consultant
ASHNIK PTE. LTD.
101 Cecil Street, #11-11 Tong Eng Building, Singapore 069533
M : +65 8110 0350 T: +65 6438 3504 | www.ashnik.com
www.facebook.com/ashnikbiz | www.twitter.com/ashnikbiz

email patch

This email may contain confidential, privileged or copyright material and is solely for the use of the intended recipient(s).

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Jayadevan
This post was updated on .
Sameer Kumar wrote
- SO far I have used pgpool in HA mode to cater to all these points. I have
run into split brain scenario but once detected, you just need to rebuild
slave and attach it back. Despite split brain my application continues to
use original master (since Virtual IP has not shifted).
Just curious - once pgpool switches to the slave, it will mark it as primary (I did not use virtual IPs.) So split-brain scenario will not happen,right?
Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar
In reply to this post by Athanasios | ZenGuard


On Sat, Jan 4, 2014 at 12:03 AM, Athanasios | ZenGuard <[hidden email]> wrote:
Hi Sameer,
thank you for the quick reply (again!). I am not using pgpool admin so is it possible to share the shell script that does the rebuilding? 

For failover detection and failover the script should be as simple as:

        Check the failing node
               if failing_node=primary_node
                    create trigger file on secondary;
               end if;
               
               if failing_node=secondary_node
                     set synchornous replication off;
              end if;



The logic for rebuilding was simple (if I remember it correctly):

              node_to_rebuild=node_2;
              active_node=node_1;              

              node_to_rebuild=$1;    //pass this as a parameter

              if node_1=node_to_rebuild
                   then active_node=node_2;     
             end if;
               
              copy data directory from active_node to node_to_rebuild;
              start postgresql on node_to_rebuild;
              use ppc_attach_node to attach newly created node;

You need to use ssh (remote command execution) for some of these commands and make sure that ssh can login without password prompt to both the nodes. If you are going to use pgpool in HA mode, then keep a copy of this rebuilding script on each node so that you can rebuild any node from any node. 

With pgpoolAdmin the process must be much more simpler, but I could never get it installed. :-(


Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar
In reply to this post by Jayadevan

> - SO far I have used pgpool in HA mode to cater to all these points. I
> have
> run into split brain scenario but once detected, you just need to rebuild
> slave and attach it back. Despite split brain my application continues to
> use original master (since Virtual IP has not shifted).

Just curious - once pgpool switches to the slave, it will mark it as primary
(I did not use virtual IPs.) So spli-brain scenarion will not happen,right?




Are you using pgpool in HA mode or you are using pgpool only on Primary/Secondary server? Or using pgpool on a third server?

If you have only one pgpool talking to both master and slave and it performs a failover that should not suffer from split-brain. But yeah, it may at some point think (e.g. in case of a network failure) that both the nodes are lost [esp if it is running on a third server]. 

I can not think of all the scenarios and hence can not rule out a split brain situation (but I think it is not possible as long as you have only one pgpool).
Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Sameer Kumar
CCin the community mailing list
On Wed, Jan 8, 2014 at 3:02 PM, Jayadevan M <[hidden email]> wrote:
I am using pgpool with streaming replication option for PostgreSQL, with a failover command (shell script). 2 pgpool instances (with watchdog) on the database nodes. You were referring to split brain for pgpool instances or PostgreSQL instances?



I was talking about split-brain for pgpool which cascades to respective PostgreSQL instances i.e. pgpool would think that the other node is lost and hence it assume itself as only available pgpool+DB node and hence pgpool on slave will switchover to its on DB and pgpool on master will consider slave to be lost. This can happen if there is a network fluctuation between slave and master (can be avoided with redundant network path/line between two servers).

I do not understand if you are using HA mode, how come you are not using Virtual IP (you mentioned earlier). 
Just curious - once pgpool switches to the slave, it will mark it as primary
(I did not use virtual IPs.) So spli-brain scenarion will not happen,right?
Am I missing something here? or some mode of pgpool which I am not aware of?

 

Reply | Threaded
Open this post in threaded view
|

Re: Master-slave failover question

Jayadevan
Sameer Kumar wrote
 Am I missing something here? or some mode of pgpool which I am not aware
of?
I thought you were mentioning split-brain for PostgreSQL. For pgpool, I am using virtual IP. Somehow I got the message that the split-brain/virtual IPs in your mail were with reference to PostgreSQL servers.
12