Designing a better connection pool for psycopg3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Designing a better connection pool for psycopg3

Daniele Varrazzo
Hello,

I've been gathering a few ideas about the connection pool I would like
to provide with psycopg3. I would be happy if you would like to take a
look and give some feedback.

https://www.psycopg.org/articles/2021/01/17/pool-design/

Thank you very much,

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
> I've been gathering a few ideas about the connection pool I would like
> to provide with psycopg3. I would be happy if you would like to take a
> look and give some feedback.
>
> https://www.psycopg.org/articles/2021/01/17/pool-design/

I would strongly advise against making sys.exit() the default
for pool.terminate() unless I misunderstand something.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Daniele Varrazzo
On Mon, 18 Jan 2021 at 14:13, Karsten Hilbert <[hidden email]> wrote:

> I would strongly advise against making sys.exit() the default
> for pool.terminate() unless I misunderstand something.

How would you terminate the program if a maintenance thread, not the
main one, thinks that the program is not in working state?

Thank you!

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Magnus Hagander-2
On Mon, Jan 18, 2021 at 2:50 PM Daniele Varrazzo
<[hidden email]> wrote:
>
> On Mon, 18 Jan 2021 at 14:13, Karsten Hilbert <[hidden email]> wrote:
>
> > I would strongly advise against making sys.exit() the default
> > for pool.terminate() unless I misunderstand something.
>
> How would you terminate the program if a maintenance thread, not the
> main one, thinks that the program is not in working state?

Why would it be OK for a maintenance thread to terminate the program
at all? And certainly by default?

Wouldn't the reasonable thing to do be to flag the pool as broken, and
then just stop trying. Then whenever the application makes an attempt
to use the pool, it can he thrown an exception saying that this
happened.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
In reply to this post by Karsten Hilbert
Am Mon, Jan 18, 2021 at 02:13:21PM +0100 schrieb Karsten Hilbert:

> > https://www.psycopg.org/articles/2021/01/17/pool-design/
>
> I would strongly advise against making sys.exit() the default
> for pool.terminate() unless I misunderstand something.

IMO it should fall back to stopping the pooling machinery and
trying to return new connections on request.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
In reply to this post by Daniele Varrazzo
Am Mon, Jan 18, 2021 at 02:50:34PM +0100 schrieb Daniele Varrazzo:

> > I would strongly advise against making sys.exit() the default
> > for pool.terminate() unless I misunderstand something.
>
> How would you terminate the program if a maintenance thread, not the
> main one, thinks that the program is not in working state?

To me it is not the business of a library to terminate its
user (eg an application) upon resource starvation. After all,
the app may be perfectly fine with not being able to talk to
the database. Only it knows what to do under such
circumstances.

If one wants to support such machinery, I would suggest a
callback into the application, set up by the application
code.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Daniele Varrazzo
In reply to this post by Magnus Hagander-2
On Mon, 18 Jan 2021 at 15:05, Magnus Hagander <[hidden email]> wrote:

>
> On Mon, Jan 18, 2021 at 2:50 PM Daniele Varrazzo
> <[hidden email]> wrote:
> >
> > On Mon, 18 Jan 2021 at 14:13, Karsten Hilbert <[hidden email]> wrote:
> >
> > > I would strongly advise against making sys.exit() the default
> > > for pool.terminate() unless I misunderstand something.
> >
> > How would you terminate the program if a maintenance thread, not the
> > main one, thinks that the program is not in working state?
>
> Why would it be OK for a maintenance thread to terminate the program
> at all? And certainly by default?
>
> Wouldn't the reasonable thing to do be to flag the pool as broken, and
> then just stop trying. Then whenever the application makes an attempt
> to use the pool, it can he thrown an exception saying that this
> happened.

I'm trying to imagine what happens in a case such as a network
partition or reconfiguration, and the app server doesn't see the
database anymore. This node is arguably broken.

If the reconnection thread fails to obtain new connections, and the
ones currently in the pool are discarded as detected broken,
eventually the pool is depleted.

The requesting threads (e.g. web requests handlers) would try to
obtain a connection, time out after e.g. 30 seconds, and receive an
error. The error would result in a 500 for the user, probably a sentry
exception and log in a file, but the program would likely not die.

The program could remain in this condition for an arbitrary long time,
until someone notices by looking at the logs and scratches their head
to understand how to fix the problem.

If the program dies, its manager would try to restart it, insisting
until the configuration makes the database visible again. A service
down in a crash loop is more visible than a service up and running but
not serving.

Anyway I appreciate that the default of terminating a program is
probably too aggressive. So I would remove the `terminate()` function
and base implementation and leave a `connection_failed()` handler,
with a default no-op implementation, which people preferring their
program to terminate can subclass (with `sys.exit(1)` or whatever
termination strategy they find useful).

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Daniele Varrazzo
In reply to this post by Karsten Hilbert
On Mon, 18 Jan 2021 at 15:19, Karsten Hilbert <[hidden email]> wrote:

>
> Am Mon, Jan 18, 2021 at 02:50:34PM +0100 schrieb Daniele Varrazzo:
>
> > > I would strongly advise against making sys.exit() the default
> > > for pool.terminate() unless I misunderstand something.
> >
> > How would you terminate the program if a maintenance thread, not the
> > main one, thinks that the program is not in working state?
>
> To me it is not the business of a library to terminate its
> user (eg an application) upon resource starvation. After all,
> the app may be perfectly fine with not being able to talk to
> the database. Only it knows what to do under such
> circumstances.
>
> If one wants to support such machinery, I would suggest a
> callback into the application, set up by the application
> code.

Thank you for the feedback, Karsten. Probably a customisable no-op
callback is a better default behaviour :)

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Magnus Hagander-2
In reply to this post by Daniele Varrazzo
On Mon, Jan 18, 2021 at 3:29 PM Daniele Varrazzo
<[hidden email]> wrote:

>
> On Mon, 18 Jan 2021 at 15:05, Magnus Hagander <[hidden email]> wrote:
> >
> > On Mon, Jan 18, 2021 at 2:50 PM Daniele Varrazzo
> > <[hidden email]> wrote:
> > >
> > > On Mon, 18 Jan 2021 at 14:13, Karsten Hilbert <[hidden email]> wrote:
> > >
> > > > I would strongly advise against making sys.exit() the default
> > > > for pool.terminate() unless I misunderstand something.
> > >
> > > How would you terminate the program if a maintenance thread, not the
> > > main one, thinks that the program is not in working state?
> >
> > Why would it be OK for a maintenance thread to terminate the program
> > at all? And certainly by default?
> >
> > Wouldn't the reasonable thing to do be to flag the pool as broken, and
> > then just stop trying. Then whenever the application makes an attempt
> > to use the pool, it can he thrown an exception saying that this
> > happened.
>
> I'm trying to imagine what happens in a case such as a network
> partition or reconfiguration, and the app server doesn't see the
> database anymore. This node is arguably broken.

Only if this is the *only* thing it does.

It might still be able to reach other services on other nodes. Other
databases. Heck, even other database son the same node if it was a
config error.


> If the reconnection thread fails to obtain new connections, and the
> ones currently in the pool are discarded as detected broken,
> eventually the pool is depleted.
>
> The requesting threads (e.g. web requests handlers) would try to
> obtain a connection, time out after e.g. 30 seconds, and receive an
> error. The error would result in a 500 for the user, probably a sentry
> exception and log in a file, but the program would likely not die.

Yes, and thus it would continue to serve all requests that don't need
a database connection from this particular pool.


> The program could remain in this condition for an arbitrary long time,
> until someone notices by looking at the logs and scratches their head
> to understand how to fix the problem.

The *program* can still decide to do this. By itself doing an exit(1)
when the exception is thrown (per my suggestion above), or by just not
catching that exception at all.


> If the program dies, its manager would try to restart it, insisting
> until the configuration makes the database visible again. A service
> down in a crash loop is more visible than a service up and running but
> not serving.

But what about the other services that potentially ran in the same process?

It's an extremely simplistic view to think that a single web server
will only ever talk to a single database, *and* require this database
for all it's operations. Yet that's what this default would do.

The entire point of a connection pool is persistent processes that do
more than one thing, after all. And there is (surprising maybe, but
there is) still an entire world out that that isn't just serving
individual web requests.


> Anyway I appreciate that the default of terminating a program is
> probably too aggressive. So I would remove the `terminate()` function
> and base implementation and leave a `connection_failed()` handler,
> with a default no-op implementation, which people preferring their
> program to terminate can subclass (with `sys.exit(1)` or whatever
> termination strategy they find useful).

That would definitely work. A plugin-point for this would be very useful.

And if you want to avoid the "timeout based error" in the client,
adding a flag to the pool saying "this pool is currently unhealthy" in
this case would work, and then whenever someone tries to get a
connection out of this pool it can throw an exception immediately
instead of waiting for a timeout to happen.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
In reply to this post by Daniele Varrazzo
Am Mon, Jan 18, 2021 at 03:29:34PM +0100 schrieb Daniele Varrazzo:

> Anyway I appreciate that the default of terminating a program is
> probably too aggressive. So I would remove the `terminate()` function
> and base implementation and leave a `connection_failed()` handler,
> with a default no-op implementation, which people preferring their
> program to terminate can subclass (with `sys.exit(1)` or whatever
> termination strategy they find useful).

+1

Some apps may just gladly carry on buffering data locally,
and sending them to the backend next time they get a chance
to do so.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Daniele Varrazzo
In reply to this post by Magnus Hagander-2
On Mon, 18 Jan 2021 at 15:39, Magnus Hagander <[hidden email]> wrote:

> And if you want to avoid the "timeout based error" in the client,
> adding a flag to the pool saying "this pool is currently unhealthy" in
> this case would work, and then whenever someone tries to get a
> connection out of this pool it can throw an exception immediately
> instead of waiting for a timeout to happen.

This is a good idea, thank you :)

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
In reply to this post by Magnus Hagander-2
Am Mon, Jan 18, 2021 at 03:39:20PM +0100 schrieb Magnus Hagander:

> > Anyway I appreciate that the default of terminating a program is
> > probably too aggressive. So I would remove the `terminate()` function
> > and base implementation and leave a `connection_failed()` handler,
> > with a default no-op implementation, which people preferring their
> > program to terminate can subclass (with `sys.exit(1)` or whatever
> > termination strategy they find useful).
>
> That would definitely work. A plugin-point for this would be very useful.
>
> And if you want to avoid the "timeout based error" in the client,
> adding a flag to the pool saying "this pool is currently unhealthy" in
> this case would work, and then whenever someone tries to get a
> connection out of this pool it can throw an exception immediately
> instead of waiting for a timeout to happen.

+1 :-)

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Daniele Varrazzo
In reply to this post by Magnus Hagander-2
On Mon, 18 Jan 2021 at 15:39, Magnus Hagander <[hidden email]> wrote:
>
> On Mon, Jan 18, 2021 at 3:29 PM Daniele Varrazzo
> <[hidden email]> wrote:

> > I'm trying to imagine what happens in a case such as a network
> > partition or reconfiguration, and the app server doesn't see the
> > database anymore. This node is arguably broken.
>
> Only if this is the *only* thing it does.
>
> It might still be able to reach other services on other nodes. Other
> databases. Heck, even other database son the same node if it was a
> config error.

In your opinion (and Karsten's, if he'd like to chip in), what should
happen on program start? Is attempting to create a new connection in
the main thread, and throwing an exception if failing to do so, a
reasonable behaviour?

Cheers,

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
Am Mon, Jan 18, 2021 at 04:12:05PM +0100 schrieb Daniele Varrazzo:

> > > partition or reconfiguration, and the app server doesn't see the
> > > database anymore. This node is arguably broken.
> >
> > Only if this is the *only* thing it does.
> >
> > It might still be able to reach other services on other nodes. Other
> > databases. Heck, even other database son the same node if it was a
> > config error.
>
> In your opinion (and Karsten's, if he'd like to chip in), what should
> happen on program start? Is attempting to create a new connection in
> the main thread, and throwing an exception if failing to do so, a
> reasonable behaviour?

Sure ! The app can always catch that and decide what to do.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Magnus Hagander-2
In reply to this post by Daniele Varrazzo
On Mon, Jan 18, 2021 at 4:12 PM Daniele Varrazzo
<[hidden email]> wrote:

>
> On Mon, 18 Jan 2021 at 15:39, Magnus Hagander <[hidden email]> wrote:
> >
> > On Mon, Jan 18, 2021 at 3:29 PM Daniele Varrazzo
> > <[hidden email]> wrote:
>
> > > I'm trying to imagine what happens in a case such as a network
> > > partition or reconfiguration, and the app server doesn't see the
> > > database anymore. This node is arguably broken.
> >
> > Only if this is the *only* thing it does.
> >
> > It might still be able to reach other services on other nodes. Other
> > databases. Heck, even other database son the same node if it was a
> > config error.
>
> In your opinion (and Karsten's, if he'd like to chip in), what should
> happen on program start? Is attempting to create a new connection in
> the main thread, and throwing an exception if failing to do so, a
> reasonable behaviour?

Yes, I think that's perfectly reasonable.

The application is always going to have to be ready to get an
exception on the call to get a connection out of the pool. If it cares
about the "early startup failure", it should just try to grab a
connection immediately on startup and will then notice if it fails.


--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Vladimir Ryabtsev
In reply to this post by Daniele Varrazzo
> Connection configuration
In my opinion it is not only static vs dynamic,
I have a feeling that subclassing gives more freedom for improper
implementation (resulting in problems). Also it depends on internal
API (not necessarily public), which puts more obligations on the
maintainer to keep it compatible with old releases. In this case
I like the configuring approach better
(and it does not exclude subclassing as well).

> Exclusive connections?
The feature itself seems useful, but what are the benefits of getting it
from the pool (e.g. checking state, reconnection, configuration)
as opposed to creating one manually?
Is such a connection counted in regard to 'minconn', 'maxconn'?

> What to do in case of connection failure?
> A first step towards a sane behaviour could be to die early, on startup, if the connection is not working when the first pool population is done
Agree.
BTW, at which moment will the driver be trying to connect? At pool
initialization or on first "getconn()"?

> What if the database is missing in action during the program lifetime? Following connection attempts may be repeated, with an exponential backoff, until dying after a few minutes of fruitless attempts
Yes, I would prefer exponential backoff, with a timeout prior to "panic" callback.
Preferably, both configurable.

> Connections usage pattern
I don't see cases where an application should bother about it,
choose whatever is better for your implementation. And yes, stack
looks reasonable.

> 'minconn', Proposed default: 4
Many services and applications do not really need many connections
but would like to benefit from automatic reconnections. So the default
is questionable, as many people will leave it unchanged.

> "get_info()"
One thing that comes into my head every time I use connection pools
is how many are really needed for my application. Usually it is adjusted
experimentally or using some performance tests. Can you collect and
provide some stats to get more insights on that? I see you want to
expose some in "get_info()", which exactly? E.g. number of times
the max limit was hit, current number of connections acquired?

> 'max_lifetime_sec'
Why is it necessary at all? Does a connection get rotten with time?
As I understand, you are planning to check a connection state in
background. As long as it is OK, what is the need to replace it?

> num_workers, default: 3
Assuming every worker does the same job, is there really the amount
of work for three workers?

"get_maintenance_task()"
This API seems to be too advanced, in what cases you see a sane use case for it?

Thank you,
Vladimir


On Mon, 18 Jan 2021 at 04:04, Daniele Varrazzo <[hidden email]> wrote:
Hello,

I've been gathering a few ideas about the connection pool I would like
to provide with psycopg3. I would be happy if you would like to take a
look and give some feedback.

https://www.psycopg.org/articles/2021/01/17/pool-design/

Thank you very much,

-- Daniele


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
Am Mon, Jan 18, 2021 at 10:20:12AM -0800 schrieb Vladimir Ryabtsev:

> > Exclusive connections?
> The feature itself seems useful, but what are the benefits of getting it
> from the pool (e.g. checking state, reconnection, configuration)
> as opposed to creating one manually?

It might decrease the necessary import surface.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
In reply to this post by Vladimir Ryabtsev
Am Mon, Jan 18, 2021 at 10:20:12AM -0800 schrieb Vladimir Ryabtsev:

> > 'minconn', Proposed default: 4
> Many services and applications do not really need many connections
> but would like to benefit from automatic reconnections. So the default
> is questionable, as many people will leave it unchanged.

I would, perhaps, default to 1 *per thread*. Or to 2 per
thread, one readonly, and one readwrite, if the application
cares (requests ro vs rw, that is). But that's just me and my
use case :-)

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B


Reply | Threaded
Open this post in threaded view
|

Re: Designing a better connection pool for psycopg3

Karsten Hilbert
In reply to this post by Vladimir Ryabtsev
Am Mon, Jan 18, 2021 at 10:20:12AM -0800 schrieb Vladimir Ryabtsev:

> BTW, at which moment will the driver be trying to connect? At pool
> initialization or on first "getconn()"?

I tend to favor the latter, as a default.

As you said, an application can always attempt to grab a
connection early on to detect failures.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B