hostorder and failover_timeout for libpq

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

hostorder and failover_timeout for libpq

Ildar Musin
Hello hackers,

Couple of years ago Victor Wagner presented a patch [1] that introduced
multiple hosts capability and also hostorder and failover_timeout
parameters for libpq. Subsequently multi-host feature was reimplemented
by Robert Haas and committed. Later target_session_attrs parameter was
also added. In this thread I want to revisit hostorder and
failover_timeout proposal.

'hostorder' defines the order in which postgres instances listed in
connection string will be tried. Possible values are:
* sequential (default)
* random

Random order can be used, for instance, for maintaining load balancing
(which is particularly useful in multi-master cluster, but also can be
used to load-balance read-only connections to standbys).

'failover_timeout' specifies time span (in seconds) during which libpq
would continue attempts to connect to the hosts listed in connection
string. If failover_timeout is specified then libpq will loop over hosts
again and again until either it successfully connects to one of the
hosts or it runs out of time.

I reimplemented 'hostorder' and 'failover_timeout' parameters in the
attached patch. I also took some documentation pieces from Victor
Wagner's original patch. I'll be glad to see any comments and
suggestions. Thanks!

[1]
https://www.postgresql.org/message-id/flat/20150818041850.GA5092%40wagner.pp.ru

--
Ildar Musin
[hidden email]

hostorder_v1.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: hostorder and failover_timeout for libpq

Iwata, Aya
Hello Ildar,

I have a question about failover_timeout parameter.
Which would be better: implementing the parameter to retry at waiting time
or controlling the connection retry on the application side?

Also, I have no idea if the amount of random access by hostorder parameter will have a good effect on load balancing.
Please let me know if there are examples.

I am sorry if these were examined by the previous thread. I haven't read it yet.

Regards,
Aya Iwata
Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Surafel Temesgen
In reply to this post by Ildar Musin
Hey ,
Here are a few comment.
+     <varlistentry id="libpq-connect-falover-timeout"
xreflabel="failover_timeout">
Here's a typo: ="libpq-connect-falover-timeout"
+ {"failover_timeout", NULL, NULL, NULL,
+ "Failover Timeout", "", 10,
Word is separated by hyphen in internalPQconninfoOption lable as a
surrounding code
+        If the value is <literal>random</literal>, the host to connect to
+        will be randomly picked from the list. It allows load balacing between
+        several cluster nodes.
I Can’t think of use case where randomly picking a node rather than in
user specified order can load balance the cluster better. Can you
explain the purpose of this feature more? And in the code I can’t see
a mechanism for preventing picking one host multiple time
By the way patch doesn’t apply cleanly I think it need a rebase
http://cfbot.cputube.org/patch_19_1631.log

Regards
Surafel

Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Ildar Musin-2
Hello Surafel,

On Fri, Sep 14, 2018 at 2:03 PM Surafel Temesgen <[hidden email]> wrote:
Hey ,
Here are a few comment.
+     <varlistentry id="libpq-connect-falover-timeout"
xreflabel="failover_timeout">
Here's a typo: ="libpq-connect-falover-timeout"
+       {"failover_timeout", NULL, NULL, NULL,
+               "Failover Timeout", "", 10,
Word is separated by hyphen in internalPQconninfoOption lable as a
surrounding code
+        If the value is <literal>random</literal>, the host to connect to
+        will be randomly picked from the list. It allows load balacing between
+        several cluster nodes.
I Can’t think of use case where randomly picking a node rather than in
user specified order can load balance the cluster better. Can you
explain the purpose of this feature more?
Probably load-balancing is a wrong word for this. Think of it as a connection routing mechanism. Let's say you have 10 servers and 100 clients willing to establish read-only connection. Without this feature all clients will go to the first specified host (unless they hit max_connections limit). And with random `hostorder` they would be splited between hosts more or less evenly.
 
And in the code I can’t see
a mechanism for preventing picking one host multiple time
The original idea was to collect all ip addresses that we get after resolving specified hostnames, put those addresses into a global array, apply random permutations to it and then use round robin algorithm trying to connect to each of them until we succeed. Now I'm not sure that this approach was the best. There are two concerns:

1. host name can be resolved to several ip addresses (which in turn can point to either the same physical server with multiple network interfaces or different servers). In described above schema each ip address would be added to the global array. This may lead to a situation when one host gets higher chance of being picked because it has more addresses in global array than other hosts.
2. host may support both ipv4 and ipv6 connections, which again leads to extra items in global array and therefore also increases its chance to be picked.

Another approach would be to leave `pg_conn->connhost` as it is now (i.e. not to create global addresses array) and just apply random permutations to it if `hostorder=random` is specified. And probably apply permutations to addresses list within each individual host. 

At this point I'd like to ask community what in your opinion would be the best course of action and whether this feature should be implemented within libpq at all? Because from my POV there are factors that really depend on network architecture and there is probably no single right solution.

Kind regards,
Ildar