POC: postgres_fdw insert batching

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
84 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

Tim.Colles
On Wed, 11 Nov 2020, [hidden email] wrote:

> This email was sent to you by someone outside of the University.
> You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
>
> From: Tomas Vondra <[hidden email]>
>> I see the patch builds the "bulk" query in execute_foreign_modify. IMO
>> that's something we should do earlier, when we're building the simple
>> query (for 1-row inserts). I'd understand if you were concerned about
>> overhead in case of 1-row inserts, trying to not plan the bulk query
>> until necessary, but I'm not sure this actually helps.
>>
>> Or was the goal to build a query for every possible number of slots? I
>> don't think that's really useful, considering it requires deallocating
>> the old plan, preparing a new one, etc. IMO it should be sufficient to
>> have two queries - one for 1-row inserts, one for the full batch. The
>> last incomplete batch can be inserted using a loop of 1-row queries.
>>
>> That's what my patch was doing, but I'm not insisting on that - it just
>> seems like a better approach to me. So feel free to argue why this is
>> better.
>
> Don't be concerned, the processing is not changed for 1-row inserts: the INSERT query string is built in PlanForeignModify(), and the remote statement is prepared in execute_foreign_modify() during the first call to ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls.
>
> The re-creation of INSERT query string and its corresponding PREPARE happen when the number of tuples to be inserted is different from the previous call to ExecForeignInsert()/ExecForeignBulkInsert().  That's because we don't know how many tuples will be inserted during planning (PlanForeignModify) or execution (until the scan ends for SELECT).  For example, if we insert 10,030 rows with the bulk size 100, the flow is:
>
>  PlanForeignModify():
>    build the INSERT query string for 1 row
>  ExecForeignBulkInsert(100):
>    drop the INSERT query string and prepared statement for 1 row
>    build the query string and prepare statement for 100 row INSERT
>    execute it
>  ExecForeignBulkInsert(100):
>    reuse the prepared statement for 100 row INSERT and execute it
> ...
>  ExecForeignBulkInsert(30):
>    drop the INSERT query string and prepared statement for 100 row
>    build the query string and prepare statement for 30 row INSERT
>    execute it
>
>
>> I think it'd be good to have such GUC, even if only for testing and
>> development. We should probably have a way to disable the batching,
>> which the GUC could also do, I think. So +1 to have the GUC.
>
> OK, I'll add it.  The name would be max_bulk_insert_tuples, because a) it might cover bulk insert for local relations in the future, and b) "tuple" is used in cpu_(index_)tuple_cost and parallel_tuple_cost, while "row" or "record" is not used in GUC (except for row_security).
>
> The valid range would be between 1 and 1,000 (I said 10,000 previously, but I think it's overreaction and am a bit worried about unforseen trouble too many tuples might cause.)  1 disables the bulk processing and uses the traditonal ExecForeignInsert().  The default value is 100 (would 1 be sensible as a default value to avoid surprising users by increased memory usage?)
>
>
>>> 2. Should we accumulate records per relation in ResultRelInfo
>>> instead? That is, when inserting into a partitioned table that has
>>> foreign partitions, delay insertion until a certain number of input
>>> records accumulate, and then insert accumulated records per relation
>>> (e.g., 50 records to relation A, 30 records to relation B, and 20
>>> records to relation C.)  If we do that,
>>>
>>
>> I think there's a chunk of text missing here? If we do that, then what?
>
> Sorry, the two bullets below there are what follows.  Perhaps I should have written ":" instead of ",".
>
>
>> Anyway, I don't see why accumulating the records in ResultRelInfo would
>> be better than what the patch does now. It seems to me like fairly
>> specific to FDWs, so keeping it int FDW state seems appropriate. What
>> would be the advantage of stashing it in ResultRelInfo?
>
> I thought of distributing input records to their corresponding partitions' ResultRelInfos.  For example, input record for partition 1 comes, store it in the ResultRelInfo for partition 1, then input record for partition 2 comes, store it in the ResultRelInfo for partition 2.  When a ResultRelInfo accumulates some number of rows, insert the accumulated rows therein into the partition.  When the input endds, perform bulk inserts for ResultRelInfos that have accumulated rows.
>
>
>
>> I think that's OK for most use cases, and if it's not (e.g. when there's
>> something requiring the exact order of writes) then it's not possible to
>> use batching. That's one of the reasons why I think we should have a GUC
>> to disable the batching.
>
> Agreed.
>
>
>>> * Should the maximum count of accumulated records be applied per
>>> relation or the query? When many foreign partitions belong to a
>>> partitioned table, if the former is chosen, it may use much memory in
>>> total.  If the latter is chosen, the records per relation could be
>>> few and thus the benefit of bulk insert could be small.
>>>
>>
>> I think it needs to be applied per relation, because that's the level at
>> which we can do it easily and consistently. The whole point is to send
>> data in sufficiently large chunks to minimize the communication overhead
>> (latency etc.), but if you enforce it "per query" that seems hard.
>>
>> Imagine you're inserting data into a table with many partitions - how do
>> you pick the number of rows to accumulate? The table may have 10 or 1000
>> partitions, we may be inserting into all partitions or just a small
>> subset, not all partitions may be foreign, etc. It seems pretty
>> difficult to pick and enforce a reliable limit at the query level. But
>> maybe I'm missing something and it's easier than I think?
>>
>> Of course, you're entirely correct enforcing this at the partition level
>> may require a lot of memory. Sadly, I don't see a way around that,
>> except for (a) disabling batching or (b) ordering the data to insert
>> data into one partition at a time.
>
> OK, I think I'll try doing like that, after waiting for other opinions some days.
>
>
>> Two more comments regarding this:
>>
>> 1) If we want to be more strict about the memory consumption, we should
>> probably set the limit in terms of memory, not number of rows. Currently
>> the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
>> this is not the only place with this issue.
>>
>> 2) I wonder what the COPY FROM patch [1] does in this regard. I don't
>> have time to check right now, but I suggest we try to do the same thing,
>> if only to be consistent.
>>
>> [1]
>> https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-909
>> 86e55489f%40postgrespro.ru
>
> That COPY FROM patch uses the tuple accumulation mechanism for local tables as-is.  That is, it accumulates at most 1,000 tuples per partition.
>
> /*
> * No more than this many tuples per CopyMultiInsertBuffer
> *
> * Caution: Don't make this too big, as we could end up with this many
> * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
> * multiInsertBuffers list.  Increasing this can cause quadratic growth in
> * memory requirements during copies into partitioned tables with a large
> * number of partitions.
> */
> #define MAX_BUFFERED_TUPLES     1000
>
>
> Regards
> Takayuki Tsunakawa
>
>

Does this patch affect trigger semantics on the base table?

At the moment when I insert 1000 rows into a postgres_fdw table using a
single insert statement (e.g. INSERT INTO fdw_foo SELECT ... FROM bar) I
naively expect a "statement level" trigger on the base table to trigger
once. But this is not the case. The postgres_fdw implements this
operation as 1000 separate insert statements on the base table, so the
trigger happens 1000 times instead of once. Hence there is no
distinction between using a statement level and a row level trigger on
the base table in this context.

So would this patch change the behaviour so only 10 separate insert
statements (each of 100 rows) would be made against the base table?
If so thats useful as it means improving performance using statement
level triggers becomes possible. But it would also result in more
obscure semantics and might break user processes dependent on the
existing behaviour after the patch is applied.

BTW is this subtlety documented, I haven't found anything but happy
to be proved wrong?

Tim

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: [hidden email] <[hidden email]> On Behalf Of

> Does this patch affect trigger semantics on the base table?
>
> At the moment when I insert 1000 rows into a postgres_fdw table using a
> single insert statement (e.g. INSERT INTO fdw_foo SELECT ... FROM bar) I
> naively expect a "statement level" trigger on the base table to trigger
> once. But this is not the case. The postgres_fdw implements this
> operation as 1000 separate insert statements on the base table, so the
> trigger happens 1000 times instead of once. Hence there is no
> distinction between using a statement level and a row level trigger on
> the base table in this context.
>
> So would this patch change the behaviour so only 10 separate insert
> statements (each of 100 rows) would be made against the base table?
> If so thats useful as it means improving performance using statement
> level triggers becomes possible. But it would also result in more
> obscure semantics and might break user processes dependent on the
> existing behaviour after the patch is applied.

Yes, the times the statement trigger defined on the base (remote) table will be reduced, as you said.


> BTW is this subtlety documented, I haven't found anything but happy
> to be proved wrong?

Unfortunately, there doesn't seem to be any description on triggers on base tables.  For example, if the local foreign table has an AFTER ROW trigger and its remote base table has a BEFORE ROW trigger that modifies the input record, it seems that the AFTER ROW trigger doesn't see the modified record.


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by tsunakawa.takay@fujitsu.com
Hello,


Modified the patch as I talked with Tomas-san.  The performance results of loading one million records into a hash-partitioned table with 8 partitions are as follows:

    unpatched, local: 8.6 seconds
            unpatched, fdw: 113.7 seconds
    patched, fdw: 12.5 seconds  (9x improvement)

The test scripts are also attached.  Run prepare.sql once to set up tables and source data.  Run local_part.sql and fdw_part.sql to load source data into a partitioned table with local partitions and a partitioned table with foreign tables respectively.


Regards
Takayuki Tsunakawa


fdw.sql (74 bytes) Download Attachment
fdw_part.sql (304 bytes) Download Attachment
local.sql (70 bytes) Download Attachment
local_part.sql (88 bytes) Download Attachment
prepare.sql (5K) Download Attachment
v2-0001-Add-bulk-insert-for-foreign-tables.patch (46K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6


On 11/17/20 10:11 AM, [hidden email] wrote:

> Hello,
>
>
> Modified the patch as I talked with Tomas-san.  The performance
> results of loading one million records into a hash-partitioned table
> with 8 partitions are as follows:
>
> unpatched, local: 8.6 seconds unpatched, fdw: 113.7 seconds patched,
> fdw: 12.5 seconds  (9x improvement)
>
> The test scripts are also attached.  Run prepare.sql once to set up
> tables and source data.  Run local_part.sql and fdw_part.sql to load
> source data into a partitioned table with local partitions and a
> partitioned table with foreign tables respectively.
>
Unfortunately, this does not compile for me, because nodeModifyTable
calls ExecGetTouchedPartitions, which is not defined anywhere. Not sure
what's that about, so I simply commented-out this. That probably fails
the partitioned cases, but it allowed me to do some review and testing.

As for the patch, I have a couple of comments

1) As I mentioned before, I really don't think we should be doing
deparsing in execute_foreign_modify - that's something that should
happen earlier, and should be in a deparse.c function.

2) I think the GUC should be replaced with an server/table option,
similar to fetch_size.

The attached patch tries to address both of these points.

Firstly, it adds a new deparseBulkInsertSql function, that builds a
query for the "full" batch, and then uses those two queries - when we
get a full batch we use the bulk query, otherwise we use the single-row
query in a loop. IMO this is cleaner than deparsing queries ad hoc in
the execute_foreign_modify.

Of course, this might be worse when we don't have a full batch, e.g. for
a query that insert only 50 rows with batch_size=100. If this case is
common, one option would be lowering the batch_size accordingly. If we
really want to improve this case too, I suggest we pass more info than
just a position of the VALUES clause - that seems a bit too hackish.


Secondly, it adds the batch_size option to server/foreign table, and
uses that. This is not complete, though. postgresPlanForeignModify
currently passes a hard-coded value at the moment, it needs to lookup
the correct value for the server/table from RelOptInfo or something. And
I suppose ModifyTable inftractructure will need to determine the value
in order to pass the correct number of slots to the FDW API.

The are a couple other smaller changes. E.g. it undoes changes to
finish_foreign_modify, and instead calls separate functions to prepare
the bulk statement. It also adds list_make5/list_make6 macros, so as to
not have to do strange stuff with the parameter lists.


A finally, this should probably add a bunch of regression tests.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

v3-0001-Add-bulk-insert-for-foreign-tables.patch (34K) Download Attachment
v3-0002-make-it-compile.patch (873 bytes) Download Attachment
v3-0003-reworks.patch (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> Unfortunately, this does not compile for me, because nodeModifyTable calls
> ExecGetTouchedPartitions, which is not defined anywhere. Not sure what's
> that about, so I simply commented-out this. That probably fails the partitioned
> cases, but it allowed me to do some review and testing.

Ouch, sorry.  I'm ashamed to have forgotten including execPartition.c.


> The are a couple other smaller changes. E.g. it undoes changes to
> finish_foreign_modify, and instead calls separate functions to prepare the bulk
> statement. It also adds list_make5/list_make6 macros, so as to not have to do
> strange stuff with the parameter lists.

Thanks, I'll take them thankfully!  I wonder why I didn't think of separating deallocate_query() from finish_foreign_modify() ... perhaps my brain was dying.  As for list_make5/6(), I saw your first patch avoid adding them, so I thought you found them ugly (and I felt so, too.)  But thinking now, there's no reason to hesitate it.


> A finally, this should probably add a bunch of regression tests.

Sure.


> 1) As I mentioned before, I really don't think we should be doing deparsing in
> execute_foreign_modify - that's something that should happen earlier, and
> should be in a deparse.c function.
...
> The attached patch tries to address both of these points.
>
> Firstly, it adds a new deparseBulkInsertSql function, that builds a query for the
> "full" batch, and then uses those two queries - when we get a full batch we use
> the bulk query, otherwise we use the single-row query in a loop. IMO this is
> cleaner than deparsing queries ad hoc in the execute_foreign_modify.
...
> Of course, this might be worse when we don't have a full batch, e.g. for a query
> that insert only 50 rows with batch_size=100. If this case is common, one
> option would be lowering the batch_size accordingly. If we really want to
> improve this case too, I suggest we pass more info than just a position of the
> VALUES clause - that seems a bit too hackish.
...
> Secondly, it adds the batch_size option to server/foreign table, and uses that.
> This is not complete, though. postgresPlanForeignModify currently passes a
> hard-coded value at the moment, it needs to lookup the correct value for the
> server/table from RelOptInfo or something. And I suppose ModifyTable
> inftractructure will need to determine the value in order to pass the correct
> number of slots to the FDW API.

I can sort of understand your feeling, but I'd like to reconstruct the query and prepare it in execute_foreign_modify() because:

* Some of our customers use bulk insert in ECPG (INSERT ... VALUES(record1, (record2), ...) to insert variable number of records per query.  (Oracle's Pro*C has such a feature.)  So, I want to be prepared to enable such a thing with FDW.

* The number of records to insert is not known during planning (in general), so it feels natural to get prepared during execution phase, or not unnatural at least.

* I wanted to avoid the overhead of building the full query string for 100-record insert statement during query planning, because it may be a bit costly for usual 1-record inserts.  (The overhead may be hidden behind the high communication cost of postgres_fdw, though.)

So, in terms of code cleanness, how about moving my code for rebuilding query string from execute_foreign_modify() to some new function in deparse.c?


> 2) I think the GUC should be replaced with an server/table option, similar to
> fetch_size.

Hmm, batch_size differs from fetch_size.  fetch_size is a postgres_fdw-specific feature with no relevant FDW routine, while batch_size is a configuration parameter for all FDWs that implement ExecForeignBulkInsert().  The ideas I can think of are:

1. Follow JDBC/ODBC and add standard FDW properties.  For example, the JDBC standard defines standard connection pool properties such as maxPoolSize and minPoolSize.  JDBC drivers have to provide them with those defined names.  Likewise, the FDW interface requires FDW implementors to handle the foreign server option name "max_bulk_insert_tuples" if he/she wants to provide bulk insert feature and implement ExecForeignBulkInsert().  The core executor gets that setting from the FDW by calling a new FDW routine like GetMaxBulkInsertTuples().  Sigh...

2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN TABLE.  executor gets the value from Relation and uses it.  (But is this a table-specific configuration?  I don't think so, sigh...)

3. Adopt the current USERSET GUC max_bulk_insert_tuples.  I think this is enough because the user can change the setting per session, application, and database.



Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
On 11/19/20 3:43 AM, [hidden email] wrote:

> From: Tomas Vondra <[hidden email]>
>> Unfortunately, this does not compile for me, because
>> nodeModifyTable calls ExecGetTouchedPartitions, which is not
>> defined anywhere. Not sure what's that about, so I simply
>> commented-out this. That probably fails the partitioned cases, but
>> it allowed me to do some review and testing.
>
> Ouch, sorry.  I'm ashamed to have forgotten including
> execPartition.c.
>

No reason to feel ashamed. Mistakes do happen from time to time.

>
>> The are a couple other smaller changes. E.g. it undoes changes to
>> finish_foreign_modify, and instead calls separate functions to
>> prepare the bulk statement. It also adds list_make5/list_make6
>> macros, so as to not have to do strange stuff with the parameter
>> lists.
>
> Thanks, I'll take them thankfully!  I wonder why I didn't think of
> separating deallocate_query() from finish_foreign_modify() ...
> perhaps my brain was dying.  As for list_make5/6(), I saw your first
> patch avoid adding them, so I thought you found them ugly (and I felt
> so, too.)  But thinking now, there's no reason to hesitate it.
>

I think it's often easier to look changes like deallocate_query with a
bit of distance, not while hacking on the patch and just trying to make
it work somehow.

For the list_make# stuff, I think I've decided to do the simplest thing
possible in extension, without having to recompile the server. But I
think for a proper patch it's better to keep it more readable.

> ...
>
>> 1) As I mentioned before, I really don't think we should be doing
>> deparsing in execute_foreign_modify - that's something that should
>> happen earlier, and should be in a deparse.c function.
> ...
>> The attached patch tries to address both of these points.
>>
>> Firstly, it adds a new deparseBulkInsertSql function, that builds a
>> query for the "full" batch, and then uses those two queries - when
>> we get a full batch we use the bulk query, otherwise we use the
>> single-row query in a loop. IMO this is cleaner than deparsing
>> queries ad hoc in the execute_foreign_modify.
> ...
>> Of course, this might be worse when we don't have a full batch,
>> e.g. for a query that insert only 50 rows with batch_size=100. If
>> this case is common, one option would be lowering the batch_size
>> accordingly. If we really want to improve this case too, I suggest
>> we pass more info than just a position of the VALUES clause - that
>> seems a bit too hackish.
> ...
>> Secondly, it adds the batch_size option to server/foreign table,
>> and uses that. This is not complete, though.
>> postgresPlanForeignModify currently passes a hard-coded value at
>> the moment, it needs to lookup the correct value for the
>> server/table from RelOptInfo or something. And I suppose
>> ModifyTable inftractructure will need to determine the value in
>> order to pass the correct number of slots to the FDW API.
>
> I can sort of understand your feeling, but I'd like to reconstruct
> the query and prepare it in execute_foreign_modify() because:
>
> * Some of our customers use bulk insert in ECPG (INSERT ...
> VALUES(record1, (record2), ...) to insert variable number of records
> per query.  (Oracle's Pro*C has such a feature.)  So, I want to be
> prepared to enable such a thing with FDW.
>
> * The number of records to insert is not known during planning (in
> general), so it feels natural to get prepared during execution phase,
> or not unnatural at least.
>

I think we should differentiate between "deparsing" and "preparing".

> * I wanted to avoid the overhead of building the full query string
> for 100-record insert statement during query planning, because it may
> be a bit costly for usual 1-record inserts.  (The overhead may be
> hidden behind the high communication cost of postgres_fdw, though.)
>

Hmm, ok. I haven't tried how expensive that would be, but my assumption
was it's much cheaper than the latency we save. But maybe I'm wrong.

> So, in terms of code cleanness, how about moving my code for
> rebuilding query string from execute_foreign_modify() to some new
> function in deparse.c?
>

That might work, yeah. I suggest we do this:

1) try to use the same approach for both single-row inserts and larger
batches, to not have a lot of different branches

2) modify deparseInsertSql to produce not the "final" query but some
intermediate representation useful to generate queries inserting
arbitrary number of rows

3) in execute_foreign_modify remember the last number of rows, and only
rebuild/replan the query when it changes

>
>> 2) I think the GUC should be replaced with an server/table option,
>> similar to fetch_size.
>
> Hmm, batch_size differs from fetch_size.  fetch_size is a
> postgres_fdw-specific feature with no relevant FDW routine, while
> batch_size is a configuration parameter for all FDWs that implement
> ExecForeignBulkInsert().  The ideas I can think of are:
>
> 1. Follow JDBC/ODBC and add standard FDW properties.  For example,
> the JDBC standard defines standard connection pool properties such as
> maxPoolSize and minPoolSize.  JDBC drivers have to provide them with
> those defined names.  Likewise, the FDW interface requires FDW
> implementors to handle the foreign server option name
> "max_bulk_insert_tuples" if he/she wants to provide bulk insert
> feature and implement ExecForeignBulkInsert().  The core executor
> gets that setting from the FDW by calling a new FDW routine like
> GetMaxBulkInsertTuples().  Sigh...
>
> 2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN
> TABLE.  executor gets the value from Relation and uses it.  (But is
> this a table-specific configuration?  I don't think so, sigh...)
>

I do agree there's a difference between fetch_size and batch_size. For
fetch_size, it's internal to postgres_fdw - no external code needs to
know about it. For batch_size that's not the case, the ModifyTable core
code needs to be aware of that.

That means the "batch_size" is becoming part of the API, and IMO the way
to do that is by exposing it as an explicit API method. So +1 to add
something like GetMaxBulkInsertTuples.

It still needs to be configurable at the server/table level, though. The
new API method should only inform ModifyTable about the final max batch
size the FDW decided to use.

> 3. Adopt the current USERSET GUC max_bulk_insert_tuples.  I think
> this is enough because the user can change the setting per session,
> application, and database.
>

I don't think this is usable in practice, because a single session may
be using multiple FDW servers, with different implementations, latency
to the data nodes, etc. It's unlikely a single GUC value will be
suitable for all of them.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> I don't think this is usable in practice, because a single session may
> be using multiple FDW servers, with different implementations, latency
> to the data nodes, etc. It's unlikely a single GUC value will be
> suitable for all of them.

That makes sense.  The row size varies from table to table, so the user may want to tune this option to reduce memory consumption.

I think the attached patch has reflected all your comments.  I hope this will pass..


Regards
Takayuki Tsunakawa




v4-0001-Add-bulk-insert-for-foreign-tables.patch (66K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6

On 11/23/20 3:17 AM, [hidden email] wrote:

> From: Tomas Vondra <[hidden email]>
>> I don't think this is usable in practice, because a single session
>> may be using multiple FDW servers, with different implementations,
>> latency to the data nodes, etc. It's unlikely a single GUC value
>> will be suitable for all of them.
>
> That makes sense.  The row size varies from table to table, so the
> user may want to tune this option to reduce memory consumption.
>
> I think the attached patch has reflected all your comments.  I hope
> this will pass..
>

Thanks - I didn't have time for a thorough review at the moment, so I
only skimmed through the diff and did a couple  very simple tests. And I
think overall it looks quite nice.

A couple minor comments/questions:

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> 1) We're calling it "batch_size" but the API function is named
> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
> ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better.  Because Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think GetInsertBatchSize may be better.  That is, this API deals with multiple records in a single INSERT statement.  Your GetModifyBatchSize will be reserved for statement batching when libpq has supported batch/pipelining to execute multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch updates.  What do you think?

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements
--------------------------------------------------
Statement stmt = con.createStatement();
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");

// submit a batch of update commands for execution
int[] updateCounts = stmt.executeBatch();
--------------------------------------------------


> 2) Do we have to lookup the batch_size in create_foreign_modify (in
> server/table options)? I'd have expected to look it up while planning
> the modify and then pass it through the list, just like the other
> FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during planning.  Unfortunately, it's also called from BeginForeignInsert(), but other stuff passed to create_foreign_modify() including the query string is constructed there.


> 3) That reminds me - should we show the batching info on EXPLAIN? That
> seems like a fairly interesting thing to show to the user. Perhaps
> showing the average batch size would also be useful? Or maybe not, we
> create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change dynamically based on the planning or system state unlike shared buffers and parallel workers.  OTOH, I sometimes want to see what configuration parameter values the user set, such as work_mem, enable_*, and shared_buffers, together with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters could be included in that output.

> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
> over for every tuple. I don't know it that has measurable impact, but it
> seems a bit excessive IMO. I don't think we should support the batch
> size changing during execution (seems tricky).

Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value that was already saved in a struct in create_foreign_modify().


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6


On 11/24/20 9:45 AM, [hidden email] wrote:

> From: Tomas Vondra <[hidden email]>
>> 1) We're calling it "batch_size" but the API function is named
>> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
>> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
>> ever add support for batching to UPDATE/DELETE.
>
> Actually, I was in two minds whether the term batch or bulk is better.  Because Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch updates" and its API method names (addBatch, executeBatch).
>
> But it seems better or common to use batch according to the etymology and the following Stack Overflow page:
>
> https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch
>
> OTOH, as for the name GetModifyBatchSize() you suggest, I think GetInsertBatchSize may be better.  That is, this API deals with multiple records in a single INSERT statement.  Your GetModifyBatchSize will be reserved for statement batching when libpq has supported batch/pipelining to execute multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch updates.  What do you think?
>

I don't know. I was really only thinking about batching in the context
of a single DML command, not about batching of multiple commands at the
protocol level. IMHO it's far more likely we'll add support for batching
for DELETE/UPDATE than libpq pipelining, which seems rather different
from how the FDW API works. Which is why I was suggesting to use a name
that would work for all DML commands, not just for inserts.

> CODE EXAMPLE 14-1 Creating and executing a batch of insert statements
> --------------------------------------------------
> Statement stmt = con.createStatement();
> stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')");
> stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')");
> stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)");
>
> // submit a batch of update commands for execution
> int[] updateCounts = stmt.executeBatch();
> --------------------------------------------------
>

Sure. We already have a patch to support something like this at the
libpq level, IIRC. But I'm not sure how well that matches the FDW API
approach in general.

>
>> 2) Do we have to lookup the batch_size in create_foreign_modify (in
>> server/table options)? I'd have expected to look it up while planning
>> the modify and then pass it through the list, just like the other
>> FdwModifyPrivateIndex stuff. But maybe that's not possible.
>
> Don't worry, create_foreign_modify() is called from PlanForeignModify() during planning.  Unfortunately, it's also called from BeginForeignInsert(), but other stuff passed to create_foreign_modify() including the query string is constructed there.
>

Hmm, ok.

>
>> 3) That reminds me - should we show the batching info on EXPLAIN? That
>> seems like a fairly interesting thing to show to the user. Perhaps
>> showing the average batch size would also be useful? Or maybe not, we
>> create the batches as large as possible, with the last one smaller.
>
> Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change dynamically based on the planning or system state unlike shared buffers and parallel workers.  OTOH, I sometimes want to see what configuration parameter values the user set, such as work_mem, enable_*, and shared_buffers, together with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters could be included in that output.
>

Not sure, but I'd guess knowing whether batching is used would be
useful. We only print the single-row SQL query, which kinda gives the
impression that there's no batching.

>> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
>> over for every tuple. I don't know it that has measurable impact, but it
>> seems a bit excessive IMO. I don't think we should support the batch
>> size changing during execution (seems tricky).
>
> Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value that was already saved in a struct in create_foreign_modify().
>

Well, I do worry for two reasons.

Firstly, the fact that in postgres_fdw the call is cheap does not mean
it'll be like that in every other FDW. Presumably, the other FDWs might
cache it in the struct and do the same thing, of course.

But the fact that we're calling it over and over for each row kinda
seems like we allow the value to change during execution, but I very
much doubt the code is expecting that. I haven't tried, but assume the
function first returns 10 and then 100. ISTM the code will allocate
ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
That can't end well. Sure, we can claim it's a bug in the FDW extension,
but it's also due to the API design.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Craig Ringer-5
In reply to this post by tsunakawa.takay@fujitsu.com
On Thu, Oct 8, 2020 at 10:40 AM [hidden email] <[hidden email]> wrote:

Thank you for picking up this.  I'm interested in this topic, too.  (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.)

As others referred, Andrey-san's fast COPY to foreign partitions is also promising.  But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT.

Is there anything that makes you worry and stops development?  Could I give it a try to implement this (I'm not sure I can, sorry.  I'm worried if we can change the executor's call chain easily.)


I suggest that when developing this, you keep in mind the ongoing work on the libpq pipelining/batching enhancements, and also the way many interfaces to foreign data sources support asynchronous, concurrent operations.

Best results with postgres_fdw insert batching would be achieved if it can also send its batches as asynchronous queries and only block when it's required to report on the results of the work. This will also be true of any other FDW where the backing remote interface can support asynchronous concurrent or pipelined operation.

I'd argue it's pretty much vital for decent performance when talking to a cloud database from an on-prem server for example, or any other time that round-trip-time reduction is important.

The most important characteristic of an FDW API to permit this would be decoupling of request and response into separate non-blocking calls that don't have to occur in ordered pairs. Instead of "insert_foo(foo) -> insert_result", have "queue_insert_foo(foo) -> future_result", "get_result_if_available(future_result) -> maybe result" and "get_result_blocking(future_result) -> result". Permit multiple queue_insert_foo(...)s without a/b interleaving with result fetches being required.

Ideally it'd be able to accumulate small batches of inserts locally and send a batch to the remote end once it's accumulated enough. But instead of blocking waiting for the result, return control to the executor after sending, without forcing a socket flush (which might block) and without waiting to learn what the outcome was. Allow new batches to be accumulated and sent before the results of the first batch are received, so long as it's within the same executor node so we don't make any unfortunate mistakes with mixing things up in compound statements or functions etc. Only report outcomes like rowcounts lazily when results are received, or when required to do so.

If now we have

REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT

and batching would give us

{ REQUEST, REQUEST} -> [block] -> { RESULT, RESULT }
~~ round trip delay ~~
{ REQUEST, REQUEST} -> [block] -> { RESULT, RESULT }

consider if room can be left in the batching API to permit:

{ REQUEST, REQUEST} -> [nonblocking send...]
{ REQUEST, REQUEST} -> [nonblocking send...]
~~ round trip delay ~~
[....] -> RESULT, RESULT
[....] -> RESULT, RESULT


... where we only actually block at the point where the result is required as input into the next node.

Honestly I don't know the executor structure well enough to say if this is even remotely feasible right now. Maybe Andres may be able to comment. But please keep it in mind if you're thinking of making FDW API changes.
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by Tomas Vondra-6
From: Tomas Vondra <[hidden email]>

> On 11/24/20 9:45 AM, [hidden email] wrote:
> > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> GetInsertBatchSize may be better.  That is, this API deals with multiple
> records in a single INSERT statement.  Your GetModifyBatchSize will be
> reserved for statement batching when libpq has supported batch/pipelining to
> execute multiple INSERT/UPDATE/DELETE statements, as in the following
> JDBC batch updates.  What do you think?
> >
>
> I don't know. I was really only thinking about batching in the context
> of a single DML command, not about batching of multiple commands at the
> protocol level. IMHO it's far more likely we'll add support for batching
> for DELETE/UPDATE than libpq pipelining, which seems rather different
> from how the FDW API works. Which is why I was suggesting to use a name
> that would work for all DML commands, not just for inserts.
Right, I can't imagine now how the interaction among the client, server core and FDWs would be regarding the statement batching.  So I'll take your suggested name.


> Not sure, but I'd guess knowing whether batching is used would be
> useful. We only print the single-row SQL query, which kinda gives the
> impression that there's no batching.

Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.


> > Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value
> that was already saved in a struct in create_foreign_modify().
> >
>
> Well, I do worry for two reasons.
>
> Firstly, the fact that in postgres_fdw the call is cheap does not mean
> it'll be like that in every other FDW. Presumably, the other FDWs might
> cache it in the struct and do the same thing, of course.
>
> But the fact that we're calling it over and over for each row kinda
> seems like we allow the value to change during execution, but I very
> much doubt the code is expecting that. I haven't tried, but assume the
> function first returns 10 and then 100. ISTM the code will allocate
> ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> That can't end well. Sure, we can claim it's a bug in the FDW extension,
> but it's also due to the API design.
You worried about other FDWs than postgres_fdw.  That's reasonable.  I insisted in other threads that PG developers care only about postgres_fdw, not other FDWs, when designing the FDW interface, but I myself made the same mistake.  I made changes so that the executor calls GetModifyBatchSize() once per relation per statement.


Regards
Takayuki Tsunakawa


v5-0001-Add-bulk-insert-for-foreign-tables.patch (66K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by Craig Ringer-5
From: Craig Ringer <[hidden email]>
> I suggest that when developing this, you keep in mind the ongoing work on the libpq pipelining/batching enhancements, and also the way many interfaces to foreign data sources support asynchronous, concurrent operations.

Yes, thank you, I bear it in mind.  I understand it's a feature for batching multiple kinds of SQL statements like DBC's batch updates.


> I'd argue it's pretty much vital for decent performance when talking to a cloud database from an on-prem server for example, or any other time that round-trip-time reduction is important.

Yeah, I'm thinking of the data migration and integration as the prominent use case.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
On 11/25/20 7:31 AM, [hidden email] wrote:
> From: Craig Ringer <[hidden email]>
>> I suggest that when developing this, you keep in mind the ongoing
>> work on the libpq pipelining/batching enhancements, and also the
>> way many interfaces to foreign data sources support asynchronous,
>> concurrent operations.
>
> Yes, thank you, I bear it in mind.  I understand it's a feature for
> batching multiple kinds of SQL statements like DBC's batch updates.
>

I haven't followed the libpq pipelining thread very closely. It does
seem related, but I'm not sure if it's a good match for this patch, or
how far is it from being committable ...

>
>> I'd argue it's pretty much vital for decent performance when
>> talking to a cloud database from an on-prem server for example, or
>> any other time that round-trip-time reduction is important.
>
> Yeah, I'm thinking of the data migration and integration as the
> prominent use case.
>

Well, good that we all agree this is a useful feature to have (in
general). The question is whether postgres_fdw should be doing batching
on it's onw (per this thread) or rely on some other feature (libpq
pipelining). I haven't followed the other thread, so I don't have an
opinion on that.

Note however we're doing two things here, actually - we're implementing
custom batching for postgres_fdw, but we're also extending the FDW API
to allow other implementations do the same thing. And most of them won't
be able to rely on the connection library providing that, I believe.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> Well, good that we all agree this is a useful feature to have (in
> general). The question is whether postgres_fdw should be doing batching
> on it's onw (per this thread) or rely on some other feature (libpq
> pipelining). I haven't followed the other thread, so I don't have an
> opinion on that.

Well, as someone said in this thread, I think bulk insert is much more common than updates/deletes.  Thus, major DBMSs have INSERT VALUES(record1), (record2)... and INSERT SELECT.  Oracle has direct path INSERT in addition.  As for the comparison of INSERT with multiple records and libpq batching (= multiple INSERTs), I think the former is more efficient because the amount of data transfer is less and the parsing-planning of INSERT for each record is eliminated.

I never deny the usefulness of libpq batch/pipelining, but I'm not sure if app developers would really use it.  If they want to reduce the client-server round-trips, won't they use traditional stored procedures?  Yes, the stored procedure language is very DBMS-specific.  Then, I'd like to know what kind of well-known applications are using standard batching API like JDBC's batch updates.  (Sorry, I think that should be discussed in libpq batch/pipelining thread and this thread should not be polluted.)


> Note however we're doing two things here, actually - we're implementing
> custom batching for postgres_fdw, but we're also extending the FDW API
> to allow other implementations do the same thing. And most of them won't
> be able to rely on the connection library providing that, I believe.

I'm afraid so, too.  Then, postgres_fdw would be an example that other FDW developers would look at when they use INSERT with multiple records.


Regards
Takayuki Tsunakawa




Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6


On 11/26/20 2:48 AM, [hidden email] wrote:

> From: Tomas Vondra <[hidden email]>
>> Well, good that we all agree this is a useful feature to have (in
>> general). The question is whether postgres_fdw should be doing
>> batching on it's onw (per this thread) or rely on some other
>> feature (libpq pipelining). I haven't followed the other thread,
>> so I don't have an opinion on that.
>
> Well, as someone said in this thread, I think bulk insert is much
> more common than updates/deletes.  Thus, major DBMSs have INSERT
> VALUES(record1), (record2)... and INSERT SELECT.  Oracle has direct
> path INSERT in addition.  As for the comparison of INSERT with
> multiple records and libpq batching (= multiple INSERTs), I think
> the former is more efficient because the amount of data transfer is
> less and the parsing-planning of INSERT for each record is
> eliminated.
>
> I never deny the usefulness of libpq batch/pipelining, but I'm not
> sure if app developers would really use it.  If they want to reduce
> the client-server round-trips, won't they use traditional stored
> procedures?  Yes, the stored procedure language is very
> DBMS-specific.  Then, I'd like to know what kind of well-known
> applications are using standard batching API like JDBC's batch
> updates.  (Sorry, I think that should be discussed in libpq
> batch/pipelining thread and this thread should not be polluted.)
>

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

>
>> Note however we're doing two things here, actually - we're
>> implementing custom batching for postgres_fdw, but we're also
>> extending the FDW API to allow other implementations do the same
>> thing. And most of them won't be able to rely on the connection
>> library providing that, I believe.
>
> I'm afraid so, too.  Then, postgres_fdw would be an example that
> other FDW developers would look at when they use INSERT with
> multiple records.
>

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Craig Ringer-5
On Fri, Nov 27, 2020 at 3:34 AM Tomas Vondra <[hidden email]> wrote:
 
Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

Right. Or at least, when designing the FDW API, do so in a way that doesn't strictly enforce Request/Response alternation without interleaving, so you can benefit from it in the future.

It's hardly just libpq after all. A *lot* of client libraries and drivers will be capable of non-blocking reads or writes with multiple ones in flight at once. Any REST-like API generally can, for example. So for performance reasons we should if possible avoid baking the assumption that a request cannot be made until the response from the previous request is received, and instead have a wait interface to use for when a new request requires the prior response's result before it can proceed.

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.

Sure. Or just allow room for it in the FDW API, though using the pipelining support natively would be nice.

If the FDW interface allows Pg to

Insert A
Insert B
Insert C
Wait for outcome of insert A
...

then that'll be useful for using libpq pipelining, but also FDWs for all sorts of other DBs, especially cloud-y ones where latency is a big concern.


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by Tomas Vondra-6
From: Tomas Vondra <[hidden email]>
> Not sure how is this related to app developers? I think the idea was
> that the libpq features might be useful between the two PostgreSQL
> instances. I.e. the postgres_fdw would use the libpq batching to send
> chunks of data to the other side.

> Well, my point was that we could keep the API, but maybe it should be
> implemented using the proposed libpq batching. They could still use the
> postgres_fdw example how to use the API, but the internals would need to
> be different, of course.

Yes, I understand them.  I just wondered if app developers use the statement batching API for libpq or JDBC in what kind of apps.  That is, I talked about the batching API itself, not related to FDW.  (So, I mentioned I think I should ask such a question in the libpq batching thread.)

I expect postgresExecForeignBatchInsert() would be able to use the libpq batching API, because it receives an array of tuples and can generate and issue INSERT statement for each tuple.  But I'm not sure either if the libpq batching is likely to be committed in the near future.  (The thread looks too long...)  Anyway, this thread's batch insert can be progressed (and hopefully committed), and once the libpq batching has been committed, we can give it a try to use it and modify postgres_fdw to see if we can get further performance boost.


Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Craig Ringer-5
On Fri, Nov 27, 2020 at 10:47 AM [hidden email] <[hidden email]> wrote:

Covering this one first:

I expect postgresExecForeignBatchInsert() would be able to use the libpq batching API, because it receives an array of tuples and can generate and issue INSERT statement for each tuple.

Sure, you can generate big multi-inserts. Or even do a COPY. But you still have to block for a full round-trip until the foreign server replies. So if you have 6000 calls to postgresExecForeignBatchInsert() during a single query, and a 100ms round trip time to the foreign server, you're going to waste 6000*0.1 = 600s = 10 min blocked in  postgresExecForeignBatchInsert() waiting for results from the foreign server.

Such batches have some major downsides:

* The foreign server cannot start executing the first query in the batch until the last query in the batch has been accumulated and the whole batch has been sent to the foreign server;
* The FDW has to block waiting for the batch to execute on the foreign server and for a full network round-trip before it can start another batch or let the backend do other work
This means RTTs get multiplied by batch counts. Still a lot better than individual statements, but plenty slow for high latency connections.

* Prepare 1000 rows to insert [10ms]
* INSERT 1000 values [100ms RTT + 50ms foreign server execution time]
* Prepare 1000 rows to insert [10ms]
* INSERT 1000 values [100ms RTT + 50ms foreign server execution time]
* ...

If you can instead send new inserts (or sets of inserts) to the foreign server without having to wait for the result of the previous batch to arrive, you can spend 100ms total waiting for results instead of 10 mins. You can start the execution of the first query earlier, spend less time blocked waiting on network, and let the local backend continue doing other work while the foreign server is busy executing the statements.

The time spent preparing local rows to insert now overlaps with the RTT and remote execution time, instead of happening serially. And there only has to be one RTT wait, assuming the foreign server and network can keep up with the rate we are generating requests at.

I can throw together some diagrams if it'll help. But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...

Anyway, this thread's batch insert can be progressed (and hopefully committed), and once the libpq batching has been committed, we can give it a try to use it and modify postgres_fdw to see if we can get further performance boost.

My point is that you should seriously consider whether batching is the appropriate interface here, or whether the FDW can expose a pipeline-like "queue work" then "wait for results" interface. That can be used to implement batching exactly as currently proposed, it does not have to wait for any libpq pipelining features. But it can *also* be used to implement concurrent async requests in other FDWs, and to implement pipelining in postgres_fdw once the needed libpq support is available.

I don't know the FDW to postgres API well enough, and it's possible I'm talking entirely out of my hat here.
 
From: Tomas Vondra <[hidden email]>
> Not sure how is this related to app developers? I think the idea was
> that the libpq features might be useful between the two PostgreSQL
> instances. I.e. the postgres_fdw would use the libpq batching to send
> chunks of data to the other side.

> Well, my point was that we could keep the API, but maybe it should be
> implemented using the proposed libpq batching. They could still use the
> postgres_fdw example how to use the API, but the internals would need to
> be different, of course.

Yes, I understand them.  I just wondered if app developers use the statement batching API for libpq or JDBC in what kind of apps.

For JDBC, yes, it's used very heavily and has been for a long time, because PgJDBC doesn't rely on libpq - it implements the protocol directly and isn't bound by libpq's limitations. The application interface for it in JDBC is a batch interface [1][2], not a pipelined interface, so that's what PgJDBC users interact with [3] but batch execution is implemented using protocol pipelining support inside PgJDBC [4]. A while ago I did some work on deadlock prevention to work around issues with PgJDBC's implementation [5] which was needed because the feature was so heavily used. Both were to address customer needs in real world applications. The latter increased application performance over 50x through round-trip elimination.

For libpq, no, batching and pipelining are not yet used by anybody because application authors have to write to the libpq API and there hasn't been any in-core support for batching. We've had async / non-blocking support for a while, but it still enforces strict request/response ordering without interleaving, so application authors cannot make use of the same postgres server and protocol capabilities as PgJDBC. Most other drivers (like psqlODBC and psycopg2) are implemented on top of libpq, so they inherit the same limitations.

I don't expect most application authors to adopt pipelining directly, mainly because hardly anyone writes application code against libpq anyway. But drivers written on top of libpq will be able to adopt it to expose the batching, pipeline, or async/callback/event driven interfaces supported by their client database language interface specifications, or expose their own extension interfaces to give users callback-driven or batched query capabilities. In particular, psqlODBC will be able to implement ODBC batch query [6] efficiently. Right now psqlODBC can't execute batches efficiently via libpq, since it must perform one round-trip per query. It will be able to use the libpq pipelining API to greatly reduce round trips.

 
  But I'm not sure either if the libpq batching is likely to be committed in the near future.  (The thread looks too long...) 

I think it's getting there tbh.




Regards
Takayuki Tsunakawa


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Craig Ringer <[hidden email]>
> But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...

Wow, impressive  number.  I've just seen it in the beginning of the libpq pipelining thread (oh, already four years ago..!)  Could you share the workload and the network latency (ping time)?  I'm sorry I'm just overlooking it.

Thank you for your (always) concise explanation.  I'd like to check other DBMSs and your rich references for the FDW interface.  (My first intuition is that many major DBMSs might not have client C APIs that can be used to implement an async pipelining FDW interface.  Also, I'm afraid it requires major surgery or reform of executor.  I don't want it to delay the release of reasonably good (10x) improvement with the synchronous interface.)

(It'd be kind of you to send emails in text format.  I've changed the format of this reply from HTML to text.)


Regards
Takayuki Tsunakawa
 
12345