Re: PATCH: Batch/pipelining support for libpq

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
46 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
Hi,

It seems like this patch is nearly finished. I fixed all the remaining
issues. I'm also asking
a confirmation of the test scenarios you want to see in the next
version of the patch.

> Hi,
>
> On 2020-07-10 19:01:49 -0400, Alvaro Herrera wrote:
> > Totally unasked for, here's a rebase of this patch series.  I didn't do
> > anything other than rebasing to current master, solving a couple of very
> > trivial conflicts, fixing some whitespace complaints by git apply, and
> > running tests to verify everthing works.
> >
> > I don't foresee working on this at all, so if anyone is interested in
> > seeing this feature in, I encourage them to read and address
> > Horiguchi-san's feedback.
>
> Nor am I planning to do so, but I do think its a pretty important
> improvement.
>
>
Fixed

>
>
> > +/*
> > + * PQrecyclePipelinedCommand
> > + * Push a command queue entry onto the freelist. It must be a dangling entry
> > + * with null next pointer and not referenced by any other entry's next pointer.
> > + */
>
> Dangling sounds a bit like it's already freed.
>
>
Fixed

>
>
> > +/*
> > + * PQbatchSendQueue
> > + * End a batch submission by sending a protocol sync. The connection will
> > + * remain in batch mode and unavailable for new synchronous command execution
> > + * functions until all results from the batch are processed by the client.
>
> I feel like the reference to the protocol sync is a bit too low level
> for an external API. It should first document what the function does
> from a user's POV.
>
> I think it'd also be good to document whether / whether not queries can
> already have been sent before PQbatchSendQueue is called or not.
>
Fixed

>
>
>
> > + if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> > + {
> > + /*
> > + * In an aborted batch we don't get anything from the server for each
> > + * result; we're just discarding input until we get to the next sync
> > + * from the server. The client needs to know its queries got aborted
> > + * so we create a fake PGresult to return immediately from
> > + * PQgetResult.
> > + */
> > + conn->result = PQmakeEmptyPGresult(conn,
> > +   PGRES_BATCH_ABORTED);
> > + if (!conn->result)
> > + {
> > + printfPQExpBuffer(&conn->errorMessage,
> > +  libpq_gettext("out of memory"));
> > + pqSaveErrorResult(conn);
> > + return 0;
>
> Is there any way an application can recover at this point? ISTM we'd be
> stuck in the previous asyncStatus, no?
>
conn->result is null when malloc(sizeof(PGresult)) returns null. It's
very unlikely unless
the server machine is out of memory, so the server will probably be
unresponsive anyway.

I'm leaving this as it is but if anyone has a solution simple to
implement I'll fix it.

>
>
> > +/* pqBatchFlush
> > + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> > + * In non-batch mode, data will be flushed all the time.
> > + */
> > +static int
> > +pqBatchFlush(PGconn *conn)
> > +{
> > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_THRESHOLD))
> > + return(pqFlush(conn));
> > + return 0; /* Just to keep compiler quiet */
> > +}
>
> unnecessarily long line.
>
Fixed

>
> > +/*
> > + * Connection's outbuffer threshold is set to 64k as it is safe
> > + * in Windows as per comments in pqSendSome() API.
> > + */
> > +#define OUTBUFFER_THRESHOLD 65536
>
> I don't think the comment explains much. It's fine to send more than 64k
> with pqSendSome(), they'll just be send with separate pgsecure_write()
> invocations. And only on windows.
>
> It clearly makes sense to start sending out data at a certain
> granularity to avoid needing unnecessary amounts of memory, and to make
> more efficient use of latency / serer side compute.
>
> It's not implausible that 64k is the right amount for that, I just don't
> think the explanation above is good.
>
Fixed

> > diff --git a/src/test/modules/test_libpq/testlibpqbatch.c b/src/test/modules/test_libpq/testlibpqbatch.c
> > new file mode 100644
> > index 0000000000..4d6ba266e5
> > --- /dev/null
> > +++ b/src/test/modules/test_libpq/testlibpqbatch.c
> > @@ -0,0 +1,1456 @@
> > +/*
> > + * src/test/modules/test_libpq/testlibpqbatch.c
> > + *
> > + *
> > + * testlibpqbatch.c
> > + * Test of batch execution functionality
> > + */
> > +
> > +#ifdef WIN32
> > +#include <windows.h>
> > +#endif
>
> ISTM that this shouldn't be needed in a test program like this?
> Shouldn't libpq abstract all of this away?
>
Fixed.

>
> > +static void
> > +simple_batch(PGconn *conn)
> > +{
>
> ISTM that all or at least several of these should include tests of
> transactional behaviour with pipelining (e.g. using both implicit and
> explicit transactions inside a single batch, using transactions across
> batches, multiple explicit transactions inside a batch).
>
@Andres, just to make sure I understood, here is the test scenarios I'll add:

Implicit and explicit multiple transactions:
   start batch:
     create_table
     insert X
     begin transaction
       insert X
     commit transaction
     begin transaction
       insert Y
     commit transaction
  end batch

Transaction across batches:
   start batch:
     create_table
     begin transaction
       insert X
   end batch
   start batch:
       insert Y
    commit transaction
  end batch

>
>
> > + PGresult   *res = NULL;
> > + const char *dummy_params[1] = {"1"};
> > + Oid dummy_param_oids[1] = {INT4OID};
> > +
> > + fprintf(stderr, "simple batch... ");
> > + fflush(stderr);
>
> Why do we need fflush()? IMO that shouldn't be needed in a use like
> this? Especially not on stderr, which ought to be unbuffered?
>
Removed.

>
> > + /*
> > + * Enter batch mode and dispatch a set of operations, which we'll then
> > + * process the results of as they come in.
> > + *
> > + * For a simple case we should be able to do this without interim
> > + * processing of results since our out buffer will give us enough slush to
> > + * work with and we won't block on sending. So blocking mode is fine.
> > + */
> > + if (PQisnonblocking(conn))
> > + {
> > + fprintf(stderr, "Expected blocking connection mode\n");
> > + goto fail;
> > + }
>
> Perhaps worth adding a helper for this?
>
> #define EXPECT(condition, explanation) \
>
Fixed (and saved 600 lines :).



Once I get your confirmation of the test scenarios, I'll implement
them and share another patch.

libpq-batch-v19.patch (118K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Alvaro Herrera-9
On 2020-Aug-31, Matthieu Garrigues wrote:

> It seems like this patch is nearly finished. I fixed all the remaining
> issues. I'm also asking a confirmation of the test scenarios you want
> to see in the next version of the patch.

Hmm, apparently nobody noticed that this patch is not registered in the
current commitfest.

Since it was submitted ahead of the deadline, I'm going to add it there.
(The patch has also undergone a lot of review already; it's certainly
not a newcomer.)

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Dave Cramer-7

Alvaro,


On Fri, 4 Sep 2020 at 17:26, Alvaro Herrera <[hidden email]> wrote:
On 2020-Aug-31, Matthieu Garrigues wrote:

> It seems like this patch is nearly finished. I fixed all the remaining
> issues. I'm also asking a confirmation of the test scenarios you want
> to see in the next version of the patch.

Hmm, apparently nobody noticed that this patch is not registered in the
current commitfest.

Since it was submitted ahead of the deadline, I'm going to add it there.
(The patch has also undergone a lot of review already; it's certainly
not a newcomer.)

I am looking for this in the commitfest and can't find it. However there is an old commitfest entry


Do you have the link for the new one ?

Dave Cramer
www.postgres.rocks
 


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Dave Cramer-7
In reply to this post by matt-42

Dave Cramer
www.postgres.rocks


On Mon, 31 Aug 2020 at 11:46, Matthieu Garrigues <[hidden email]> wrote:
Hi,

It seems like this patch is nearly finished. I fixed all the remaining
issues. I'm also asking
a confirmation of the test scenarios you want to see in the next
version of the patch.

> Hi,
>
> On 2020-07-10 19:01:49 -0400, Alvaro Herrera wrote:
> > Totally unasked for, here's a rebase of this patch series.  I didn't do
> > anything other than rebasing to current master, solving a couple of very
> > trivial conflicts, fixing some whitespace complaints by git apply, and
> > running tests to verify everthing works.
> >
> > I don't foresee working on this at all, so if anyone is interested in
> > seeing this feature in, I encourage them to read and address
> > Horiguchi-san's feedback.
>
> Nor am I planning to do so, but I do think its a pretty important
> improvement.
>
>
Fixed

>
>
> > +/*
> > + * PQrecyclePipelinedCommand
> > + * Push a command queue entry onto the freelist. It must be a dangling entry
> > + * with null next pointer and not referenced by any other entry's next pointer.
> > + */
>
> Dangling sounds a bit like it's already freed.
>
>
Fixed

>
>
> > +/*
> > + * PQbatchSendQueue
> > + * End a batch submission by sending a protocol sync. The connection will
> > + * remain in batch mode and unavailable for new synchronous command execution
> > + * functions until all results from the batch are processed by the client.
>
> I feel like the reference to the protocol sync is a bit too low level
> for an external API. It should first document what the function does
> from a user's POV.
>
> I think it'd also be good to document whether / whether not queries can
> already have been sent before PQbatchSendQueue is called or not.
>
Fixed

>
>
>
> > + if (conn->batch_status == PQBATCH_MODE_ABORTED && conn->queryclass != PGQUERY_SYNC)
> > + {
> > + /*
> > + * In an aborted batch we don't get anything from the server for each
> > + * result; we're just discarding input until we get to the next sync
> > + * from the server. The client needs to know its queries got aborted
> > + * so we create a fake PGresult to return immediately from
> > + * PQgetResult.
> > + */
> > + conn->result = PQmakeEmptyPGresult(conn,
> > +   PGRES_BATCH_ABORTED);
> > + if (!conn->result)
> > + {
> > + printfPQExpBuffer(&conn->errorMessage,
> > +  libpq_gettext("out of memory"));
> > + pqSaveErrorResult(conn);
> > + return 0;
>
> Is there any way an application can recover at this point? ISTM we'd be
> stuck in the previous asyncStatus, no?
>

conn->result is null when malloc(sizeof(PGresult)) returns null. It's
very unlikely unless
the server machine is out of memory, so the server will probably be
unresponsive anyway.

I'm leaving this as it is but if anyone has a solution simple to
implement I'll fix it.

>
>
> > +/* pqBatchFlush
> > + * In batch mode, data will be flushed only when the out buffer reaches the threshold value.
> > + * In non-batch mode, data will be flushed all the time.
> > + */
> > +static int
> > +pqBatchFlush(PGconn *conn)
> > +{
> > + if ((conn->batch_status == PQBATCH_MODE_OFF)||(conn->outCount >= OUTBUFFER_THRESHOLD))
> > + return(pqFlush(conn));
> > + return 0; /* Just to keep compiler quiet */
> > +}
>
> unnecessarily long line.
>
Fixed

>
> > +/*
> > + * Connection's outbuffer threshold is set to 64k as it is safe
> > + * in Windows as per comments in pqSendSome() API.
> > + */
> > +#define OUTBUFFER_THRESHOLD 65536
>
> I don't think the comment explains much. It's fine to send more than 64k
> with pqSendSome(), they'll just be send with separate pgsecure_write()
> invocations. And only on windows.
>
> It clearly makes sense to start sending out data at a certain
> granularity to avoid needing unnecessary amounts of memory, and to make
> more efficient use of latency / serer side compute.
>
> It's not implausible that 64k is the right amount for that, I just don't
> think the explanation above is good.
>

Fixed

> > diff --git a/src/test/modules/test_libpq/testlibpqbatch.c b/src/test/modules/test_libpq/testlibpqbatch.c
> > new file mode 100644
> > index 0000000000..4d6ba266e5
> > --- /dev/null
> > +++ b/src/test/modules/test_libpq/testlibpqbatch.c
> > @@ -0,0 +1,1456 @@
> > +/*
> > + * src/test/modules/test_libpq/testlibpqbatch.c
> > + *
> > + *
> > + * testlibpqbatch.c
> > + * Test of batch execution functionality
> > + */
> > +
> > +#ifdef WIN32
> > +#include <windows.h>
> > +#endif
>
> ISTM that this shouldn't be needed in a test program like this?
> Shouldn't libpq abstract all of this away?
>

Fixed.

>
> > +static void
> > +simple_batch(PGconn *conn)
> > +{
>
> ISTM that all or at least several of these should include tests of
> transactional behaviour with pipelining (e.g. using both implicit and
> explicit transactions inside a single batch, using transactions across
> batches, multiple explicit transactions inside a batch).
>

@Andres, just to make sure I understood, here is the test scenarios I'll add:

Implicit and explicit multiple transactions:
   start batch:
     create_table
     insert X
     begin transaction
       insert X
     commit transaction
     begin transaction
       insert Y
     commit transaction
  end batch

Transaction across batches:
   start batch:
     create_table
     begin transaction
       insert X
   end batch
   start batch:
       insert Y
    commit transaction
  end batch

>
>
> > + PGresult   *res = NULL;
> > + const char *dummy_params[1] = {"1"};
> > + Oid dummy_param_oids[1] = {INT4OID};
> > +
> > + fprintf(stderr, "simple batch... ");
> > + fflush(stderr);
>
> Why do we need fflush()? IMO that shouldn't be needed in a use like
> this? Especially not on stderr, which ought to be unbuffered?
>
Removed.

>
> > + /*
> > + * Enter batch mode and dispatch a set of operations, which we'll then
> > + * process the results of as they come in.
> > + *
> > + * For a simple case we should be able to do this without interim
> > + * processing of results since our out buffer will give us enough slush to
> > + * work with and we won't block on sending. So blocking mode is fine.
> > + */
> > + if (PQisnonblocking(conn))
> > + {
> > + fprintf(stderr, "Expected blocking connection mode\n");
> > + goto fail;
> > + }
>
> Perhaps worth adding a helper for this?
>
> #define EXPECT(condition, explanation) \
>

Fixed (and saved 600 lines :).



Once I get your confirmation of the test scenarios, I'll implement
them and share another patch.


There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San. 

From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.

The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.

Dave 
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Alvaro Herrera-9
In reply to this post by Dave Cramer-7
On 2020-Sep-21, Dave Cramer wrote:

Hello Dave,

> I am looking for this in the commitfest and can't find it. However there is
> an old commitfest entry
>
> https://commitfest.postgresql.org/13/1024/
>
> Do you have the link for the new one ?

Here you go:

https://commitfest.postgresql.org/29/2724/

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
In reply to this post by Dave Cramer-7
Matthieu Garrigues

On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <[hidden email]> wrote:
>>
> There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San.
>
> From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.
>
> The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
>

Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
thing: I'll keep PQgetResult returning null between the result of two
batched query so the user
can know which result comes from which query.


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Dave Cramer-7


On Mon, 21 Sep 2020 at 09:21, Matthieu Garrigues <[hidden email]> wrote:
Matthieu Garrigues

On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <[hidden email]> wrote:
>>
> There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San.
>
> From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.
>
> The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
>

Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
thing: I'll keep PQgetResult returning null between the result of two
batched query so the user
can know which result comes from which query.

Fair enough.

There may be other things in his comments that need to be addressed. That was the big one that stuck out for me.

Thanks for working on this!


Dave Cramer
www.postgres.rocks 
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
On Mon, Sep 21, 2020 at 3:39 PM Dave Cramer <[hidden email]> wrote:

>
>
>
> On Mon, 21 Sep 2020 at 09:21, Matthieu Garrigues <[hidden email]> wrote:
>>
>> Matthieu Garrigues
>>
>> On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <[hidden email]> wrote:
>> >>
>> > There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San.
>> >
>> > From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.
>> >
>> > The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
>> >
>>
>> Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
>> thing: I'll keep PQgetResult returning null between the result of two
>> batched query so the user
>> can know which result comes from which query.
>
>
> Fair enough.
>
> There may be other things in his comments that need to be addressed. That was the big one that stuck out for me.
>
> Thanks for working on this!
>

Yes I already addressed the other things in the v19 patch:
https://www.postgresql.org/message-id/flat/CAJkzx4T5E-2cQe3dtv2R78dYFvz+in8PY7A8MArvLhs_pg75gg@...


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
In reply to this post by Dave Cramer-7
Hi Dave,
I merged PQbatchProcessQueue into PQgetResult.
One first init call to PQbatchProcessQueue was also required in
PQsendQueue to have
PQgetResult ready to read the first batch query.

Tests and documentation are updated accordingly.

Matthieu Garrigues

On Mon, Sep 21, 2020 at 3:39 PM Dave Cramer <[hidden email]> wrote:

>
>
>
> On Mon, 21 Sep 2020 at 09:21, Matthieu Garrigues <[hidden email]> wrote:
>>
>> Matthieu Garrigues
>>
>> On Mon, Sep 21, 2020 at 3:09 PM Dave Cramer <[hidden email]> wrote:
>> >>
>> > There was a comment upthread a while back that people should look at the comments made in https://www.postgresql.org/message-id/20180322.211148.187821341.horiguchi.kyotaro%40lab.ntt.co.jp by Horiguchi-San.
>> >
>> > From what I can tell this has not been addressed. The one big thing is the use of PQbatchProcessQueue vs just putting it in PQgetResult.
>> >
>> > The argument is that adding PQbatchProcessQueue is unnecessary and just adds another step. Looking at this, it seems like putting this inside PQgetResult would get my vote as it leaves the interface unchanged.
>> >
>>
>> Ok. I'll merge PQbatchProcessQueue into PQgetResult. But just one
>> thing: I'll keep PQgetResult returning null between the result of two
>> batched query so the user
>> can know which result comes from which query.
>
>
> Fair enough.
>
> There may be other things in his comments that need to be addressed. That was the big one that stuck out for me.
>
> Thanks for working on this!
>
>
> Dave Cramer
> www.postgres.rocks

libpq-batch-v20.patch (115K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Michael Paquier-2
On Mon, Sep 21, 2020 at 07:55:03PM +0200, Matthieu Garrigues wrote:
> I merged PQbatchProcessQueue into PQgetResult.
> One first init call to PQbatchProcessQueue was also required in
> PQsendQueue to have
> PQgetResult ready to read the first batch query.
>
> Tests and documentation are updated accordingly.

The documentation is failing to build, and the patch does not build
correctly on Windows.  Could you address that?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
On Thu, Oct 1, 2020 at 6:35 AM Michael Paquier <[hidden email]> wrote:

> The documentation is failing to build, and the patch does not build
> correctly on Windows.  Could you address that?
> --
> Michael

Yes I'm on it.

--
Matthieu


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
This patch fixes compilation on windows and compilation of the documentation.

Matthieu Garrigues

On Thu, Oct 1, 2020 at 8:41 AM Matthieu Garrigues
<[hidden email]> wrote:

>
> On Thu, Oct 1, 2020 at 6:35 AM Michael Paquier <[hidden email]> wrote:
>
> > The documentation is failing to build, and the patch does not build
> > correctly on Windows.  Could you address that?
> > --
> > Michael
>
> Yes I'm on it.
>
> --
> Matthieu

libpq-batch-v21.patch (119K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Álvaro Herrera
I started reading this patch.  As a starting point I decided to post an
updated copy (v22), wherein I reverted the overwriting of
src/include/port/linux.h with the win32.h contents (?) and the inclusion
of <windows.h> in libpq-fe.h.  If those are needed, some different
solution will have to be found.

Trivial other changes (pgindent etc); nothing of substance.  With the
PQtrace() patch I posted at [1] the added test program crashes -- I
don't know if the fault lies in this patch or the other one.

[1] https://postgr.es/m/20201026162313.GA22502@...

v22-0001-libpq-batch.patch (89K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Álvaro Herrera
In v23 I've gone over docs; discovered that PQgetResults docs were
missing the new values.  Added those.  No significant other changes yet.



Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Álvaro Herrera
On 2020-Nov-02, Alvaro Herrera wrote:

> In v23 I've gone over docs; discovered that PQgetResults docs were
> missing the new values.  Added those.  No significant other changes yet.



v23-0001-libpq-batch.patch (90K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Dave Cramer-7
Alvaro,


On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <[hidden email]> wrote:
On 2020-Nov-02, Alvaro Herrera wrote:

> In v23 I've gone over docs; discovered that PQgetResults docs were
> missing the new values.  Added those.  No significant other changes yet.



Thanks for looking at this.

What else does it need to get it in shape to apply?


Dave Cramer
www.postgres.rocks 
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Álvaro Herrera
Hi Dave,

On 2020-Nov-03, Dave Cramer wrote:

> On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <[hidden email]> wrote:
>
> > On 2020-Nov-02, Alvaro Herrera wrote:
> >
> > > In v23 I've gone over docs; discovered that PQgetResults docs were
> > > missing the new values.  Added those.  No significant other changes yet.
>
> Thanks for looking at this.
>
> What else does it need to get it in shape to apply?

I want to go over the code in depth to grok the design more fully.

It would definitely help if you (and others) could think about the API
being added: Does it fulfill the promises being made?  Does it offer the
guarantees that real-world apps want to have?  I'm not much of an
application writer myself -- particularly high-traffic apps that would
want to use this.  As a driver author I would welcome your insight in
these questions.



Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

Dave Cramer-7


On Tue, 3 Nov 2020 at 08:42, Alvaro Herrera <[hidden email]> wrote:
Hi Dave,

On 2020-Nov-03, Dave Cramer wrote:

> On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <[hidden email]> wrote:
>
> > On 2020-Nov-02, Alvaro Herrera wrote:
> >
> > > In v23 I've gone over docs; discovered that PQgetResults docs were
> > > missing the new values.  Added those.  No significant other changes yet.
>
> Thanks for looking at this.
>
> What else does it need to get it in shape to apply?

I want to go over the code in depth to grok the design more fully.

It would definitely help if you (and others) could think about the API
being added: Does it fulfill the promises being made?  Does it offer the
guarantees that real-world apps want to have?  I'm not much of an
application writer myself -- particularly high-traffic apps that would
want to use this.  As a driver author I would welcome your insight in
these questions.


I'm sort of in the same boat as you. While I'm closer to the client. I don't personally write that much client code.

I'd really like to hear from the users here.


Dave Cramer
www.postgres.rocks 
Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

matt-42
I implemented a C++ async HTTP server using this new batch mode and it
provides everything I needed to transparently batch sql requests.
It gives a performance boost  between x2 and x3 on this benchmark:
https://www.techempower.com/benchmarks/#section=test&runid=3097dbae-5228-454c-ba2e-2055d3982790&hw=ph&test=query&a=2&f=zik0zj-zik0zj-zik0zj-zik0zj-zieepr-zik0zj-zik0zj-zik0zj-zik0zj-zik0zj-zik0zj

I'll ask other users interested in this to review the API.

Matthieu Garrigues

On Tue, Nov 3, 2020 at 4:56 PM Dave Cramer <[hidden email]> wrote:

>
>
>
> On Tue, 3 Nov 2020 at 08:42, Alvaro Herrera <[hidden email]> wrote:
>>
>> Hi Dave,
>>
>> On 2020-Nov-03, Dave Cramer wrote:
>>
>> > On Mon, 2 Nov 2020 at 10:57, Alvaro Herrera <[hidden email]> wrote:
>> >
>> > > On 2020-Nov-02, Alvaro Herrera wrote:
>> > >
>> > > > In v23 I've gone over docs; discovered that PQgetResults docs were
>> > > > missing the new values.  Added those.  No significant other changes yet.
>> >
>> > Thanks for looking at this.
>> >
>> > What else does it need to get it in shape to apply?
>>
>> I want to go over the code in depth to grok the design more fully.
>>
>> It would definitely help if you (and others) could think about the API
>> being added: Does it fulfill the promises being made?  Does it offer the
>> guarantees that real-world apps want to have?  I'm not much of an
>> application writer myself -- particularly high-traffic apps that would
>> want to use this.  As a driver author I would welcome your insight in
>> these questions.
>>
>
> I'm sort of in the same boat as you. While I'm closer to the client. I don't personally write that much client code.
>
> I'd really like to hear from the users here.
>
>
> Dave Cramer
> www.postgres.rocks


Reply | Threaded
Open this post in threaded view
|

Re: PATCH: Batch/pipelining support for libpq

David G Johnston
In reply to this post by Álvaro Herrera
On Mon, Nov 2, 2020 at 8:58 AM Alvaro Herrera <[hidden email]> wrote:
On 2020-Nov-02, Alvaro Herrera wrote:

> In v23 I've gone over docs; discovered that PQgetResults docs were
> missing the new values.  Added those.  No significant other changes yet.


Just reading the documentation of this patch, haven't been following the longer thread:

Given the caveats around blocking mode connections why not just require non-blocking mode, in a similar fashion to how synchronous functions are disallowed?

"Batched operations will be executed by the server in the order the client
sends them. The server will send the results in the order the statements
executed."

Maybe:

"The server executes statements, and returns results, in the order the client sends them."

Using two sentences and relying on the user to mentally link the two "in the order" descriptions together seems to add unnecessary cognitive load.

+     The client <link linkend="libpq-batch-interleave">interleaves result
+     processing</link> with sending batch queries, or for small batches may
+     process all results after sending the whole batch.

Suggest: "The client may choose to interleave result processing with sending batch queries, or wait until the complete batch has been sent."

I would expect to process the results of a batch only after sending the entire batch to the server.  That I don't have to is informative but knowing when I should avoid doing so, and why, is informative as well.  To the extreme while you can use batch mode and interleave if you just poll getResult after every command you will make the whole batch thing pointless.  Directing the reader from here to the section "Interleaving Result Processing and Query Dispatch" seems worth considering.  The dynamics of small sizes and sockets remains a bit unclear as to what will break (if anything, or is it just process memory on the server) if interleaving it not performed and sizes are large.

I would suggest placing commentary about "all transactions subsequent to a failed transaction in a batch are ignored while previous completed transactions are retained" in the "When to Use Batching".  Something like "Batching is less useful, and more complex, when a single batch contains multiple transactions (see Error Handling)."

My imagined use case would be to open a batch, start a transaction, send all of its components, end the transaction, end the batch, check for batch failure and if it doesn't fail have the option to easily continue without processing individual pgResults (or if it does fail, have the option to extract the first error pgResult and continue, ignoring the rest, knowing that the transaction as a whole was reverted and the batch unapplied).  I've never interfaced with libpq directly.  Though given how the existing C API works what is implemented here seems consistent.

The "queueing up queries into a pipeline to be executed as a batch on the server" can be read as a client-side behavior where nothing is sent to the server until the batch has been completed.  Reading further it becomes clear that all it basically is is a sever-side toggle that instructs the server to continue processing incoming commands even while prior commands have their results waiting to be ingested by the client.

Batch seems like the user-visible term to describe this feature.  Pipeline seems like an implementation detail that doesn't need to be mentioned in the documentation - especially given that pipeline doesn't get a mentioned beyond the first two paragraphs of the chapter and never without being linked directly to "batch".  I would probably leave the indexterm and have a paragraph describing that batching is implemented using a query pipeline so that people with the implementation detail on their mind can find this chapter, but the prose for the user should just stick to batching.

Sorry, that all is a bit unfocused, but the documentation for the user of the API could be cleaned up a bit and some more words spent on what trade-offs are being made when using batching versus normal command-response processing.  That said, while I don't see all of this purely a matter of style I'm also not seeing anything demonstrably wrong with the documentation at the moment.  Hopefully my perspective helps though, and depending on what happens next I may try and make my thoughts more concrete with an actual patch.

David J.

123