hostorder and failover_timeout for libpq

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

hostorder and failover_timeout for libpq

Ildar Musin
Hello hackers,

Couple of years ago Victor Wagner presented a patch [1] that introduced
multiple hosts capability and also hostorder and failover_timeout
parameters for libpq. Subsequently multi-host feature was reimplemented
by Robert Haas and committed. Later target_session_attrs parameter was
also added. In this thread I want to revisit hostorder and
failover_timeout proposal.

'hostorder' defines the order in which postgres instances listed in
connection string will be tried. Possible values are:
* sequential (default)
* random

Random order can be used, for instance, for maintaining load balancing
(which is particularly useful in multi-master cluster, but also can be
used to load-balance read-only connections to standbys).

'failover_timeout' specifies time span (in seconds) during which libpq
would continue attempts to connect to the hosts listed in connection
string. If failover_timeout is specified then libpq will loop over hosts
again and again until either it successfully connects to one of the
hosts or it runs out of time.

I reimplemented 'hostorder' and 'failover_timeout' parameters in the
attached patch. I also took some documentation pieces from Victor
Wagner's original patch. I'll be glad to see any comments and
suggestions. Thanks!

[1]
https://www.postgresql.org/message-id/flat/20150818041850.GA5092%40wagner.pp.ru

--
Ildar Musin
[hidden email]

hostorder_v1.patch (28K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: hostorder and failover_timeout for libpq

Iwata, Aya
Hello Ildar,

I have a question about failover_timeout parameter.
Which would be better: implementing the parameter to retry at waiting time
or controlling the connection retry on the application side?

Also, I have no idea if the amount of random access by hostorder parameter will have a good effect on load balancing.
Please let me know if there are examples.

I am sorry if these were examined by the previous thread. I haven't read it yet.

Regards,
Aya Iwata
Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Surafel Temesgen
In reply to this post by Ildar Musin
Hey ,
Here are a few comment.
+     <varlistentry id="libpq-connect-falover-timeout"
xreflabel="failover_timeout">
Here's a typo: ="libpq-connect-falover-timeout"
+ {"failover_timeout", NULL, NULL, NULL,
+ "Failover Timeout", "", 10,
Word is separated by hyphen in internalPQconninfoOption lable as a
surrounding code
+        If the value is <literal>random</literal>, the host to connect to
+        will be randomly picked from the list. It allows load balacing between
+        several cluster nodes.
I Can’t think of use case where randomly picking a node rather than in
user specified order can load balance the cluster better. Can you
explain the purpose of this feature more? And in the code I can’t see
a mechanism for preventing picking one host multiple time
By the way patch doesn’t apply cleanly I think it need a rebase
http://cfbot.cputube.org/patch_19_1631.log

Regards
Surafel

Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Ildar Musin-2
Hello Surafel,

On Fri, Sep 14, 2018 at 2:03 PM Surafel Temesgen <[hidden email]> wrote:
Hey ,
Here are a few comment.
+     <varlistentry id="libpq-connect-falover-timeout"
xreflabel="failover_timeout">
Here's a typo: ="libpq-connect-falover-timeout"
+       {"failover_timeout", NULL, NULL, NULL,
+               "Failover Timeout", "", 10,
Word is separated by hyphen in internalPQconninfoOption lable as a
surrounding code
+        If the value is <literal>random</literal>, the host to connect to
+        will be randomly picked from the list. It allows load balacing between
+        several cluster nodes.
I Can’t think of use case where randomly picking a node rather than in
user specified order can load balance the cluster better. Can you
explain the purpose of this feature more?
Probably load-balancing is a wrong word for this. Think of it as a connection routing mechanism. Let's say you have 10 servers and 100 clients willing to establish read-only connection. Without this feature all clients will go to the first specified host (unless they hit max_connections limit). And with random `hostorder` they would be splited between hosts more or less evenly.
 
And in the code I can’t see
a mechanism for preventing picking one host multiple time
The original idea was to collect all ip addresses that we get after resolving specified hostnames, put those addresses into a global array, apply random permutations to it and then use round robin algorithm trying to connect to each of them until we succeed. Now I'm not sure that this approach was the best. There are two concerns:

1. host name can be resolved to several ip addresses (which in turn can point to either the same physical server with multiple network interfaces or different servers). In described above schema each ip address would be added to the global array. This may lead to a situation when one host gets higher chance of being picked because it has more addresses in global array than other hosts.
2. host may support both ipv4 and ipv6 connections, which again leads to extra items in global array and therefore also increases its chance to be picked.

Another approach would be to leave `pg_conn->connhost` as it is now (i.e. not to create global addresses array) and just apply random permutations to it if `hostorder=random` is specified. And probably apply permutations to addresses list within each individual host. 

At this point I'd like to ask community what in your opinion would be the best course of action and whether this feature should be implemented within libpq at all? Because from my POV there are factors that really depend on network architecture and there is probably no single right solution.

Kind regards,
Ildar
Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Michael Paquier-2
On Wed, Sep 19, 2018 at 02:26:53PM +0200, Ildar Musin wrote:
> Another approach would be to leave `pg_conn->connhost` as it is now (i.e.
> not to create global addresses array) and just apply random permutations to
> it if `hostorder=random` is specified. And probably apply permutations to
> addresses list within each individual host.
>
> At this point I'd like to ask community what in your opinion would be the
> best course of action and whether this feature should be implemented within
> libpq at all? Because from my POV there are factors that really depend on
> network architecture and there is probably no single right solution.

As things stand now, when multiple hosts are defined in a connection
string the order specified in the string is used until a successful
connection is done.  When working on Postgres-XC, we have implemented
similar capability at application-level.  However, now that libpq also
supports multi-host capabilities, I could see a point in having
something within libpq.  What could we get though except a random mode
for read-only or read-write load balancing?  This only use case looks a
bit limited to me to rework again the code paths discarding the
connection failures for that though, as there is as well the argument to
tell the application to generate its own connection string based on
libpq properties.  So my take would be to just do that at
application-level and not bother.

By the way, I can see that the latest patch available does not apply at
tries to juggle with multiple concepts.  I can see at least two of them:
failover_timeout and hostorder.  You should split things.  I have moved
the patch to next CF, waiting on author.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Dmitry Dolgov
> On Mon, Oct 1, 2018 at 9:10 AM Michael Paquier <[hidden email]> wrote:
>
> By the way, I can see that the latest patch available does not apply at
> tries to juggle with multiple concepts.  I can see at least two of them:
> failover_timeout and hostorder.  You should split things.  I have moved
> the patch to next CF, waiting on author.

Unfortunately, patch still needs to be rebased, and probably split into two, as
Michael suggested. Any plans about it?

Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Tom Lane-2
In reply to this post by Michael Paquier-2
Michael Paquier <[hidden email]> writes:
> On Wed, Sep 19, 2018 at 02:26:53PM +0200, Ildar Musin wrote:
>> At this point I'd like to ask community what in your opinion would be the
>> best course of action and whether this feature should be implemented within
>> libpq at all? Because from my POV there are factors that really depend on
>> network architecture and there is probably no single right solution.

> By the way, I can see that the latest patch available does not apply at
> tries to juggle with multiple concepts.  I can see at least two of them:
> failover_timeout and hostorder.  You should split things.  I have moved
> the patch to next CF, waiting on author.

Per the discussion about the nearby prefer-standby patch,

https://www.postgresql.org/message-id/flat/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@...

it seems pretty unfortunate that this patch proposes functionality
that's nearly identical to something in pgJDBC, but isn't using the
same terminology pgJDBC uses.

It's even more unfortunate that we have three separate patch proposal
threads that are touching more or less the same territory, but don't
seem to be talking to each other.  This one is also relevant:

https://www.postgresql.org/message-id/flat/1700970.cRWpxnom9y@...

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: hostorder and failover_timeout for libpq

Andres Freund
In reply to this post by Dmitry Dolgov
Hi,

On 2018-11-29 17:23:11 +0100, Dmitry Dolgov wrote:
> > On Mon, Oct 1, 2018 at 9:10 AM Michael Paquier <[hidden email]> wrote:
> >
> > By the way, I can see that the latest patch available does not apply at
> > tries to juggle with multiple concepts.  I can see at least two of them:
> > failover_timeout and hostorder.  You should split things.  I have moved
> > the patch to next CF, waiting on author.
>
> Unfortunately, patch still needs to be rebased, and probably split into two, as
> Michael suggested. Any plans about it?

As this hasn't been done, and Tom's questions haven't been addressed,
I'm marking this as returned with feedback.

Greetings,

Andres Freund