POC: postgres_fdw insert batching

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
86 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6


On 11/27/20 7:05 AM, [hidden email] wrote:

> From: Craig Ringer <[hidden email]>
>> But in the libpq pipelining patch I demonstrated a 300 times
>> (3000%) performance improvement on a test workload...
>
> Wow, impressive  number.  I've just seen it in the beginning of the
> libpq pipelining thread (oh, already four years ago..!)  Could you
> share the workload and the network latency (ping time)?  I'm sorry
> I'm just overlooking it.
>
> Thank you for your (always) concise explanation.  I'd like to check
> other DBMSs and your rich references for the FDW interface.  (My
> first intuition is that many major DBMSs might not have client C APIs
> that can be used to implement an async pipelining FDW interface.
> Also, I'm afraid it requires major surgery or reform of executor.  I
> don't want it to delay the release of reasonably good (10x)
> improvement with the synchronous interface.)
>

I do agree that pipelining is nice, and can bring huge improvements.

However, the FDW interface as it's implemented today is not designed to
allow that, I believe (we pretty much just invoke the FWD callbacks as
if it was a local AM). It assumes the calls are synchronous, and
redesigning it to work in async way is a much larger/complex patch than
what's being discussed here.

I do think the FDW extension proposed here (adding the bulk-insert
callback) is useful in general, for two reasons: (a) even if most client
libraries support some sort of pipelining, some don't, and (b) I'd bet
it's still more efficient to send one large insert than pipelining many
individual inserts.

That being said, I'm against expanding the scope of this patch to also
require redesign of the whole FDW infrastructure - that would likely
mean no such improvement landing in PG14. If the libpq pipelining patch
seems likely to get committed, we can try using it for the bulk insert
callback (instead of the current multi-value stuff).


> (It'd be kind of you to send emails in text format.  I've changed the
> format of this reply from HTML to text.)
>

Craig's client is sending messages in both text/plain and text/html. You
probably need to tell your client to prefer that over html, somehow.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Craig Ringer-5


On Sat, 28 Nov 2020, 10:10 Tomas Vondra, <[hidden email]> wrote:


On 11/27/20 7:05 AM, [hidden email] wrote:

However, the FDW interface as it's implemented today is not designed to
allow that, I believe (we pretty much just invoke the FWD callbacks as
if it was a local AM). It assumes the calls are synchronous, and
redesigning it to work in async way is a much larger/complex patch than
what's being discussed here.

I do think the FDW extension proposed here (adding the bulk-insert
callback) is useful in general, for two reasons: (a) even if most client
libraries support some sort of pipelining, some don't, and (b) I'd bet
it's still more efficient to send one large insert than pipelining many
individual inserts.

That being said, I'm against expanding the scope of this patch to also
require redesign of the whole FDW infrastructure - that would likely
mean no such improvement landing in PG14. If the libpq pipelining patch
seems likely to get committed, we can try using it for the bulk insert
callback (instead of the current multi-value stuff).

I totally agree on all points. It was not my intent to expand the scope of this significantly and I really don't want to hold it back.

I raised the interface consideration in case it was something easy to accommodate. It's not, so that's done, topic over.
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Craig Ringer-5
In reply to this post by tsunakawa.takay@fujitsu.com
On Fri, 27 Nov 2020, 14:06 [hidden email],
<[hidden email]> wrote:
>
>
Also, I'm afraid it requires major surgery or reform of executor.  I
don't want it to delay the release of reasonably good (10x)
improvement with the synchronous interface.)


Totally sensible. If it isn't feasible without significant executor
change that's all that needs to be said.

I was afraid that'd be the case given the executor's pull flow but
just didn't know enough.

It was not my intention to hold this patch up or greatly expand its
scope. I'll spend some time testing it out and have a closer read soon
to see if I can help progress it.

I know Andres did some initial work on executor parallelism and
pipelining. I should take a look.

> > But in the libpq pipelining patch I demonstrated a 300 times (3000%) performance improvement on a test workload...
>
> Wow, impressive  number.  I've just seen it in the beginning of the libpq pipelining thread (oh, already four years ago..!)

Yikes.

> Could you share the workload and the network latency (ping time)?  I'm sorry I'm just overlooking it.

I thought I gave it at the time, and a demo program. IIRC it was just
doing small multi row inserts or single row inserts. Latency would've
been a couple of hundred ms probably, I think I did something like
running on my laptop (Australia, ADSL) to a server on AWS US or EU.

> Thank you for your (always) concise explanation.

You joke! I am many things but despite my best efforts concise is
rarely one of them.

> I'd like to check other DBMSs and your rich references for the FDW interface.  (My first intuition is that many major DBMSs might not have client C APIs that can be used to implement an async pipelining FDW interface.

Likely correct for C APIs of other traditional DBMSes. I'd be less
sure about newer non SQL ones, especially cloud oriented. For example
DynamoDB supports at least async requests in the Java client [3] and
C++ client [4]; it's not immediately clear if requests can be
pipelined, but the API suggests they can.

Most things with a REST-like API can do a fair bit of concurrency
though. Multiple async nonblocking HTTP connections can be serviced at
once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
streams [2]. This is relevant for any REST-like API.

> (It'd be kind of you to send emails in text format.  I've changed the format of this reply from HTML to text.)

I try to remember. Stupid Gmail.  Sorry. On mobile it offers very
little control over format but I'll do my best when I can.

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2] https://blog.restcase.com/http2-benefits-for-rest-apis/
[3] https://aws.amazon.com/blogs/developer/asynchronous-requests-with-the-aws-sdk-for-java/
[4] https://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_dynamo_d_b_1_1_dynamo_d_b_client.html#ab631edaccca5f3f8988af15e7e9aa4f0


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

David Fetter
In reply to this post by tsunakawa.takay@fujitsu.com
On Wed, Nov 25, 2020 at 05:04:36AM +0000, [hidden email] wrote:

> From: Tomas Vondra <[hidden email]>
> > On 11/24/20 9:45 AM, [hidden email] wrote:
> > > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> > GetInsertBatchSize may be better.  That is, this API deals with multiple
> > records in a single INSERT statement.  Your GetModifyBatchSize will be
> > reserved for statement batching when libpq has supported batch/pipelining to
> > execute multiple INSERT/UPDATE/DELETE statements, as in the following
> > JDBC batch updates.  What do you think?
> > >
> >
> > I don't know. I was really only thinking about batching in the context
> > of a single DML command, not about batching of multiple commands at the
> > protocol level. IMHO it's far more likely we'll add support for batching
> > for DELETE/UPDATE than libpq pipelining, which seems rather different
> > from how the FDW API works. Which is why I was suggesting to use a name
> > that would work for all DML commands, not just for inserts.
>
> Right, I can't imagine now how the interaction among the client, server core and FDWs would be regarding the statement batching.  So I'll take your suggested name.
>
>
> > Not sure, but I'd guess knowing whether batching is used would be
> > useful. We only print the single-row SQL query, which kinda gives the
> > impression that there's no batching.
>
> Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.
>
>
> > > Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value
> > that was already saved in a struct in create_foreign_modify().
> > >
> >
> > Well, I do worry for two reasons.
> >
> > Firstly, the fact that in postgres_fdw the call is cheap does not mean
> > it'll be like that in every other FDW. Presumably, the other FDWs might
> > cache it in the struct and do the same thing, of course.
> >
> > But the fact that we're calling it over and over for each row kinda
> > seems like we allow the value to change during execution, but I very
> > much doubt the code is expecting that. I haven't tried, but assume the
> > function first returns 10 and then 100. ISTM the code will allocate
> > ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> > That can't end well. Sure, we can claim it's a bug in the FDW extension,
> > but it's also due to the API design.
>
> You worried about other FDWs than postgres_fdw.  That's reasonable.  I insisted in other threads that PG developers care only about postgres_fdw, not other FDWs, when designing the FDW interface, but I myself made the same mistake.  I made changes so that the executor calls GetModifyBatchSize() once per relation per statement.

Please pardon me for barging in late in this discussion, but if we're
going to be using a bulk API here, wouldn't it make more sense to use
COPY, except where RETURNING is specified, in place of INSERT?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by Craig Ringer-5
From: Craig Ringer <[hidden email]>
> It was not my intention to hold this patch up or greatly expand its
> scope. I'll spend some time testing it out and have a closer read soon
> to see if I can help progress it.

Thank you, I'm relieved to hear that.  Last weekend, I was scared of a possible mood that's something like "We won't accept the insert speedup patch for foreign tables unless you take full advantage of pipelining and achieve maximum conceivable speed!"


> I thought I gave it at the time, and a demo program. IIRC it was just
> doing small multi row inserts or single row inserts. Latency would've
> been a couple of hundred ms probably, I think I did something like
> running on my laptop (Australia, ADSL) to a server on AWS US or EU.

a couple of hundred ms, so that would be dominant in each prepare-send-execute-receive, possibly even for batch insert with hundreds of rows in each batch.  Then, the synchronous batch insert of the current patch may achieve a few hundreds times speedup compared to a single row inserts when the batch size is hundreds or more.


> > I'd like to check other DBMSs and your rich references for the FDW interface.
> (My first intuition is that many major DBMSs might not have client C APIs that
> can be used to implement an async pipelining FDW interface.
>
> Likely correct for C APIs of other traditional DBMSes. I'd be less
> sure about newer non SQL ones, especially cloud oriented. For example
> DynamoDB supports at least async requests in the Java client [3] and
> C++ client [4]; it's not immediately clear if requests can be
> pipelined, but the API suggests they can.

I've checked ODBC, MySQL, Microsoft Synapse Analytics, Redshift, and BigQuery, guessing that the data warehouse may have asynchronous/pipelining API that enables efficient data integration/migration.  But none of them had one.  (I seem to have spent too long and am a bit tired... but it was a bit fun as well.)  They all support INSERT with multiple records in its VALUES clause.  So, it will be useful to provide a synchronous batch insert FDW API.  I guess Oracle's OCI has an asynchronous API, but I didn't check it.

As an aside, MySQL 8.0.16 added support for asynchronous execution in its C API, but it allows only one active SQL statement in each connection.  Likewise, although the ODBC standard defines asynchronous execution (SQLSetStmtAttr(SQL_ASYNC_ENABLE) and SQLCompleteAsync), SQL Server and Synapse Analytics only allows only one active statement per connection.  psqlODBC doesn't support asynchronous execution.


> Most things with a REST-like API can do a fair bit of concurrency
> though. Multiple async nonblocking HTTP connections can be serviced at
> once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
> streams [2]. This is relevant for any REST-like API.

I'm not sure if this is related, Google deprecated Batch HTTP API [1].


[1]
https://cloud.google.com/bigquery/batch


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by David Fetter
From: David Fetter <[hidden email]>
> Please pardon me for barging in late in this discussion, but if we're
> going to be using a bulk API here, wouldn't it make more sense to use
> COPY, except where RETURNING is specified, in place of INSERT?

Please do not hesitate.  I mentioned earlier in this thread that I think INSERT is better because:


--------------------------------------------------
* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT.  (I don't think the rule is a feature what users care about, though.)  Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.
--------------------------------------------------


Also, COPY to foreign tables currently uses INSERTs, the improvement of using COPY instead of INSERT is in progress [1].  Keeping "COPY uses COPY, INSERT uses INSERT" correspondence seems natural, and it makes COPY's high-speed advantage stand out.


[1]
Fast COPY FROM command for the table with foreign partitions
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
In reply to this post by tsunakawa.takay@fujitsu.com
Hi,

Attached is a v6 of this patch, rebased to current master and with some
minor improvements (mostly comments and renaming the "end" struct field
to "values_end" which I think is more descriptive).

The one thing that keeps bugging me is convert_prep_stmt_params - it
dies the right thing, but the code is somewhat confusing.


AFAICS the discussions about making this use COPY and/or libpq
pipelining (neither of which is committed yet) ended with the conclusion
that those changes are somewhat independent, and that it's worth getting
this committed in the current form. Barring objections, I'll push this
within the next couple days.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Add-bulk-insert-for-foreign-tables-v6.patch (51K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> Attached is a v6 of this patch, rebased to current master and with some minor
> improvements (mostly comments and renaming the "end" struct field to
> "values_end" which I think is more descriptive).

Thank you very much.  In fact, my initial patches used values_end, and I changed it to len considering that it may be used for bulk UPDATEand DELETE in the future.  But I think values_end is easier to understand its role, too.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
In reply to this post by Tomas Vondra-6
Hi Tomas, Tsunakawa-san,

Thanks for your work on this.

On Tue, Jan 12, 2021 at 11:06 AM Tomas Vondra
<[hidden email]> wrote:
> AFAICS the discussions about making this use COPY and/or libpq
> pipelining (neither of which is committed yet) ended with the conclusion
> that those changes are somewhat independent, and that it's worth getting
> this committed in the current form. Barring objections, I'll push this
> within the next couple days.

I was trying this out today (been meaning to do so for a while) and
noticed that this fails when there are AFTER ROW triggers on the
foreign table.  Here's an example:

create extension postgres_fdw ;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table p (a numeric primary key);
create foreign table fp (a int) server lb options (table_name 'p');
create function print_row () returns trigger as $$ begin raise notice
'%', new; return null; end; $$ language plpgsql;
create trigger after_insert_trig after insert on fp for each row
execute function print_row();
insert into fp select generate_series (1, 10);
<crashes>

Apparently, the new code seems to assume that batching wouldn't be
active when the original query contains RETURNING clause but some
parts fail to account for the case where RETURNING is added to the
query to retrieve the tuple to pass to the AFTER TRIGGER.
Specifically, the Assert in the following block in
execute_foreign_modify() is problematic:

    /* Check number of rows affected, and fetch RETURNING tuple if any */
    if (fmstate->has_returning)
    {
        Assert(*numSlots == 1);
        n_rows = PQntuples(res);
        if (n_rows > 0)
            store_returning_result(fmstate, slots[0], res);
    }

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
On 1/13/21 10:15 AM, Amit Langote wrote:

> Hi Tomas, Tsunakawa-san,
>
> Thanks for your work on this.
>
> On Tue, Jan 12, 2021 at 11:06 AM Tomas Vondra
> <[hidden email]> wrote:
>> AFAICS the discussions about making this use COPY and/or libpq
>> pipelining (neither of which is committed yet) ended with the conclusion
>> that those changes are somewhat independent, and that it's worth getting
>> this committed in the current form. Barring objections, I'll push this
>> within the next couple days.
>
> I was trying this out today (been meaning to do so for a while) and
> noticed that this fails when there are AFTER ROW triggers on the
> foreign table.  Here's an example:
>
> create extension postgres_fdw ;
> create server lb foreign data wrapper postgres_fdw ;
> create user mapping for current_user server lb;
> create table p (a numeric primary key);
> create foreign table fp (a int) server lb options (table_name 'p');
> create function print_row () returns trigger as $$ begin raise notice
> '%', new; return null; end; $$ language plpgsql;
> create trigger after_insert_trig after insert on fp for each row
> execute function print_row();
> insert into fp select generate_series (1, 10);
> <crashes>
>
> Apparently, the new code seems to assume that batching wouldn't be
> active when the original query contains RETURNING clause but some
> parts fail to account for the case where RETURNING is added to the
> query to retrieve the tuple to pass to the AFTER TRIGGER.
> Specifically, the Assert in the following block in
> execute_foreign_modify() is problematic:
>
>     /* Check number of rows affected, and fetch RETURNING tuple if any */
>     if (fmstate->has_returning)
>     {
>         Assert(*numSlots == 1);
>         n_rows = PQntuples(res);
>         if (n_rows > 0)
>             store_returning_result(fmstate, slots[0], res);
>     }
>
Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

  (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

  if (resultRelInfo->ri_BatchSize == 0)
    resultRelInfo->ri_BatchSize =
      resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

  if (resultRelInfo->ri_BatchSize > 1)
  {
    ... do batching ...
  }

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Add-bulk-insert-for-foreign-tables-v7.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
On 1/13/21 3:43 PM, Tomas Vondra wrote:

>
> ...
>
> Thanks for the report. Yeah, I think there's a missing check in
> ExecInsert. Adding
>
>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
>
> solves this. But now I'm wondering if this is the wrong place to make
> this decision. I mean, why should we make the decision here, when the
> decision whether to have a RETURNING clause is made in postgres_fdw in
> deparseReturningList? We don't really know what the other FDWs will do,
> for example.
>
> So I think we should just move all of this into GetModifyBatchSize. We
> can start with ri_BatchSize = 0. And then do
>
>   if (resultRelInfo->ri_BatchSize == 0)
>     resultRelInfo->ri_BatchSize =
>       resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
>
>   if (resultRelInfo->ri_BatchSize > 1)
>   {
>     ... do batching ...
>   }
>
> The GetModifyBatchSize would always return value > 0, so either 1 (no
> batching) or >1 (batching).
>
FWIW the attached v8 patch does this - most of the conditions are moved
to the GetModifyBatchSize() callback. I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports that,
and it seems a bit pointless at the moment as there are no other batch
callbacks. Maybe we should add an Assert somewhere, though?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Add-bulk-insert-for-foreign-tables-v8.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> FWIW the attached v8 patch does this - most of the conditions are moved to the
> GetModifyBatchSize() callback. I've removed the check for the BatchInsert
> callback, though - the FDW knows whether it supports that, and it seems a bit
> pointless at the moment as there are no other batch callbacks. Maybe we
> should add an Assert somewhere, though?

Thank you.  I'm in favor this idea that the decision to support RETURNING and trigger is left to the FDW.  I don' think of the need for another Assert, as the caller has one for the returned batch size.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
In reply to this post by Tomas Vondra-6
Hi,

On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
<[hidden email]> wrote:

> On 1/13/21 3:43 PM, Tomas Vondra wrote:
> > Thanks for the report. Yeah, I think there's a missing check in
> > ExecInsert. Adding
> >
> >   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
> >
> > solves this. But now I'm wondering if this is the wrong place to make
> > this decision. I mean, why should we make the decision here, when the
> > decision whether to have a RETURNING clause is made in postgres_fdw in
> > deparseReturningList? We don't really know what the other FDWs will do,
> > for example.
> >
> > So I think we should just move all of this into GetModifyBatchSize. We
> > can start with ri_BatchSize = 0. And then do
> >
> >   if (resultRelInfo->ri_BatchSize == 0)
> >     resultRelInfo->ri_BatchSize =
> >       resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
> >
> >   if (resultRelInfo->ri_BatchSize > 1)
> >   {
> >     ... do batching ...
> >   }
> >
> > The GetModifyBatchSize would always return value > 0, so either 1 (no
> > batching) or >1 (batching).
> >
>
> FWIW the attached v8 patch does this - most of the conditions are moved
> to the GetModifyBatchSize() callback.

Thanks.  A few comments:

* I agree with leaving it up to an FDW to look at the properties of
the table and of the operation being performed to decide whether or
not to use batching, although maybe BeginForeignModify() is a better
place for putting that logic instead of GetModifyBatchSize()?  So, in
create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
being set to match the table's or the server's value for the
batch_size option, make it also consider the things that prevent
batching and set the execution state's batch_size based on that.
GetModifyBatchSize() simply returns that value.

* Regarding the timing of calling GetModifyBatchSize() to set
ri_BatchSize, I wonder if it wouldn't be better to call it just once,
say from ExecInitModifyTable(), right after BeginForeignModify()
returns?  I don't quite understand why it is being called from
ExecInsert().  Can the batch size change once the execution starts?

* Lastly, how about calling it GetForeignModifyBatchSize() to be
consistent with other nearby callbacks?

> I've removed the check for the
> BatchInsert callback, though - the FDW knows whether it supports that,
> and it seems a bit pointless at the moment as there are no other batch
> callbacks. Maybe we should add an Assert somewhere, though?

Hmm, not checking whether BatchInsert() exists may not be good idea,
because if an FDW's GetModifyBatchSize() returns a value > 1 but
there's no BatchInsert() function to call, ExecBatchInsert() would
trip.  I don't see the newly added documentation telling FDW authors
to either define both or none.

Regarding how this plays with partitions, I don't think we need
ExecGetTouchedPartitions(), because you can get the routed-to
partitions using es_tuple_routing_result_relations.  Also, perhaps
it's a good idea to put the "finishing" ExecBatchInsert() calls into a
function ExecFinishBatchInsert().  Maybe the logic to choose the
relations to perform the finishing calls on will get complicated in
the future as batching is added for updates/deletes too and it seems
better to encapsulate that in the separate function than have it out
in the open in ExecModifyTable().

(Sorry about being so late reviewing this.)

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
On 1/14/21 9:58 AM, Amit Langote wrote:

> Hi,
>
> On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
> <[hidden email]> wrote:
>> On 1/13/21 3:43 PM, Tomas Vondra wrote:
>>> Thanks for the report. Yeah, I think there's a missing check in
>>> ExecInsert. Adding
>>>
>>>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
>>>
>>> solves this. But now I'm wondering if this is the wrong place to make
>>> this decision. I mean, why should we make the decision here, when the
>>> decision whether to have a RETURNING clause is made in postgres_fdw in
>>> deparseReturningList? We don't really know what the other FDWs will do,
>>> for example.
>>>
>>> So I think we should just move all of this into GetModifyBatchSize. We
>>> can start with ri_BatchSize = 0. And then do
>>>
>>>   if (resultRelInfo->ri_BatchSize == 0)
>>>     resultRelInfo->ri_BatchSize =
>>>       resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
>>>
>>>   if (resultRelInfo->ri_BatchSize > 1)
>>>   {
>>>     ... do batching ...
>>>   }
>>>
>>> The GetModifyBatchSize would always return value > 0, so either 1 (no
>>> batching) or >1 (batching).
>>>
>>
>> FWIW the attached v8 patch does this - most of the conditions are moved
>> to the GetModifyBatchSize() callback.
>
> Thanks.  A few comments:
>
> * I agree with leaving it up to an FDW to look at the properties of
> the table and of the operation being performed to decide whether or
> not to use batching, although maybe BeginForeignModify() is a better
> place for putting that logic instead of GetModifyBatchSize()?  So, in
> create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
> being set to match the table's or the server's value for the
> batch_size option, make it also consider the things that prevent
> batching and set the execution state's batch_size based on that.
> GetModifyBatchSize() simply returns that value.
>
> * Regarding the timing of calling GetModifyBatchSize() to set
> ri_BatchSize, I wonder if it wouldn't be better to call it just once,
> say from ExecInitModifyTable(), right after BeginForeignModify()
> returns?  I don't quite understand why it is being called from
> ExecInsert().  Can the batch size change once the execution starts?
>

But it should be called just once. The idea is that initially we have
batch_size=0, and the fist call returns value that is >= 1. So we never
call it again. But maybe it could be called from BeginForeignModify, in
which case we'd not need this logic with first setting it to 0 etc.

> * Lastly, how about calling it GetForeignModifyBatchSize() to be
> consistent with other nearby callbacks?
>

Yeah, good point.

>> I've removed the check for the
>> BatchInsert callback, though - the FDW knows whether it supports that,
>> and it seems a bit pointless at the moment as there are no other batch
>> callbacks. Maybe we should add an Assert somewhere, though?
>
> Hmm, not checking whether BatchInsert() exists may not be good idea,
> because if an FDW's GetModifyBatchSize() returns a value > 1 but
> there's no BatchInsert() function to call, ExecBatchInsert() would
> trip.  I don't see the newly added documentation telling FDW authors
> to either define both or none.
>

Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
it can't hurt, I guess. I'll ad it back.

> Regarding how this plays with partitions, I don't think we need
> ExecGetTouchedPartitions(), because you can get the routed-to
> partitions using es_tuple_routing_result_relations.  Also, perhaps

I'm not very familiar with es_tuple_routing_result_relations, but that
doesn't seem to work. I've replaced the flushing code at the end of
ExecModifyTable with a loop over es_tuple_routing_result_relations, but
then some of the rows are missing (i.e. not flushed).

> it's a good idea to put the "finishing" ExecBatchInsert() calls into a
> function ExecFinishBatchInsert().  Maybe the logic to choose the
> relations to perform the finishing calls on will get complicated in
> the future as batching is added for updates/deletes too and it seems
> better to encapsulate that in the separate function than have it out
> in the open in ExecModifyTable().
>

IMO that'd be an over-engineering at this point. We don't need such
separate function yet, so why complicate the API? If we need it in the
future, we can add it.

> (Sorry about being so late reviewing this.)

thanks

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
On Thu, Jan 14, 2021 at 21:57 Tomas Vondra <[hidden email]> wrote:
On 1/14/21 9:58 AM, Amit Langote wrote:
> Hi,
>
> On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
> <[hidden email]> wrote:
>> On 1/13/21 3:43 PM, Tomas Vondra wrote:
>>> Thanks for the report. Yeah, I think there's a missing check in
>>> ExecInsert. Adding
>>>
>>>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
>>>
>>> solves this. But now I'm wondering if this is the wrong place to make
>>> this decision. I mean, why should we make the decision here, when the
>>> decision whether to have a RETURNING clause is made in postgres_fdw in
>>> deparseReturningList? We don't really know what the other FDWs will do,
>>> for example.
>>>
>>> So I think we should just move all of this into GetModifyBatchSize. We
>>> can start with ri_BatchSize = 0. And then do
>>>
>>>   if (resultRelInfo->ri_BatchSize == 0)
>>>     resultRelInfo->ri_BatchSize =
>>>       resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
>>>
>>>   if (resultRelInfo->ri_BatchSize > 1)
>>>   {
>>>     ... do batching ...
>>>   }
>>>
>>> The GetModifyBatchSize would always return value > 0, so either 1 (no
>>> batching) or >1 (batching).
>>>
>>
>> FWIW the attached v8 patch does this - most of the conditions are moved
>> to the GetModifyBatchSize() callback.
>
> Thanks.  A few comments:
>
> * I agree with leaving it up to an FDW to look at the properties of
> the table and of the operation being performed to decide whether or
> not to use batching, although maybe BeginForeignModify() is a better
> place for putting that logic instead of GetModifyBatchSize()?  So, in
> create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
> being set to match the table's or the server's value for the
> batch_size option, make it also consider the things that prevent
> batching and set the execution state's batch_size based on that.
> GetModifyBatchSize() simply returns that value.
>
> * Regarding the timing of calling GetModifyBatchSize() to set
> ri_BatchSize, I wonder if it wouldn't be better to call it just once,
> say from ExecInitModifyTable(), right after BeginForeignModify()
> returns?  I don't quite understand why it is being called from
> ExecInsert().  Can the batch size change once the execution starts?
>

But it should be called just once. The idea is that initially we have
batch_size=0, and the fist call returns value that is >= 1. So we never
call it again. But maybe it could be called from BeginForeignModify, in
which case we'd not need this logic with first setting it to 0 etc.

Right, although I was thinking that maybe ri_BatchSize itself is not to be written to by the FDW.  Not to say that’s doing anything wrong though.

> * Lastly, how about calling it GetForeignModifyBatchSize() to be
> consistent with other nearby callbacks?
>

Yeah, good point.

>> I've removed the check for the
>> BatchInsert callback, though - the FDW knows whether it supports that,
>> and it seems a bit pointless at the moment as there are no other batch
>> callbacks. Maybe we should add an Assert somewhere, though?
>
> Hmm, not checking whether BatchInsert() exists may not be good idea,
> because if an FDW's GetModifyBatchSize() returns a value > 1 but
> there's no BatchInsert() function to call, ExecBatchInsert() would
> trip.  I don't see the newly added documentation telling FDW authors
> to either define both or none.
>

Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
it can't hurt, I guess. I'll ad it back.

> Regarding how this plays with partitions, I don't think we need
> ExecGetTouchedPartitions(), because you can get the routed-to
> partitions using es_tuple_routing_result_relations.  Also, perhaps

I'm not very familiar with es_tuple_routing_result_relations, but that
doesn't seem to work. I've replaced the flushing code at the end of
ExecModifyTable with a loop over es_tuple_routing_result_relations, but
then some of the rows are missing (i.e. not flushed).

I should’ve mentioned es_opened_result_relations too which contain non-routing result relations.  So I really meant if (proute) then use es_tuple_routing_result_relations, else es_opened_result_relations.  This should work as long as batching is only used for inserts.


> it's a good idea to put the "finishing" ExecBatchInsert() calls into a
> function ExecFinishBatchInsert().  Maybe the logic to choose the
> relations to perform the finishing calls on will get complicated in
> the future as batching is added for updates/deletes too and it seems
> better to encapsulate that in the separate function than have it out
> in the open in ExecModifyTable().
>

IMO that'd be an over-engineering at this point. We don't need such
separate function yet, so why complicate the API? If we need it in the
future, we can add it.

Fair enough.
--
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6


On 1/14/21 2:57 PM, Amit Langote wrote:

> On Thu, Jan 14, 2021 at 21:57 Tomas Vondra
> <[hidden email] <mailto:[hidden email]>>
> wrote:
>
>     On 1/14/21 9:58 AM, Amit Langote wrote:
>     > Hi,
>     >
>     > On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
>     > <[hidden email]
>     <mailto:[hidden email]>> wrote:
>     >> On 1/13/21 3:43 PM, Tomas Vondra wrote:
>     >>> Thanks for the report. Yeah, I think there's a missing check in
>     >>> ExecInsert. Adding
>     >>>
>     >>>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
>     >>>
>     >>> solves this. But now I'm wondering if this is the wrong place to
>     make
>     >>> this decision. I mean, why should we make the decision here,
>     when the
>     >>> decision whether to have a RETURNING clause is made in
>     postgres_fdw in
>     >>> deparseReturningList? We don't really know what the other FDWs
>     will do,
>     >>> for example.
>     >>>
>     >>> So I think we should just move all of this into
>     GetModifyBatchSize. We
>     >>> can start with ri_BatchSize = 0. And then do
>     >>>
>     >>>   if (resultRelInfo->ri_BatchSize == 0)
>     >>>     resultRelInfo->ri_BatchSize =
>     >>>     
>      resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
>     >>>
>     >>>   if (resultRelInfo->ri_BatchSize > 1)
>     >>>   {
>     >>>     ... do batching ...
>     >>>   }
>     >>>
>     >>> The GetModifyBatchSize would always return value > 0, so either
>     1 (no
>     >>> batching) or >1 (batching).
>     >>>
>     >>
>     >> FWIW the attached v8 patch does this - most of the conditions are
>     moved
>     >> to the GetModifyBatchSize() callback.
>     >
>     > Thanks.  A few comments:
>     >
>     > * I agree with leaving it up to an FDW to look at the properties of
>     > the table and of the operation being performed to decide whether or
>     > not to use batching, although maybe BeginForeignModify() is a better
>     > place for putting that logic instead of GetModifyBatchSize()?  So, in
>     > create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
>     > being set to match the table's or the server's value for the
>     > batch_size option, make it also consider the things that prevent
>     > batching and set the execution state's batch_size based on that.
>     > GetModifyBatchSize() simply returns that value.
>     >
>     > * Regarding the timing of calling GetModifyBatchSize() to set
>     > ri_BatchSize, I wonder if it wouldn't be better to call it just once,
>     > say from ExecInitModifyTable(), right after BeginForeignModify()
>     > returns?  I don't quite understand why it is being called from
>     > ExecInsert().  Can the batch size change once the execution starts?
>     >
>
>     But it should be called just once. The idea is that initially we have
>     batch_size=0, and the fist call returns value that is >= 1. So we never
>     call it again. But maybe it could be called from BeginForeignModify, in
>     which case we'd not need this logic with first setting it to 0 etc.
>
>
> Right, although I was thinking that maybe ri_BatchSize itself is not to
> be written to by the FDW.  Not to say that’s doing anything wrong though.
>
>     > * Lastly, how about calling it GetForeignModifyBatchSize() to be
>     > consistent with other nearby callbacks?
>     >
>
>     Yeah, good point.
>
>     >> I've removed the check for the
>     >> BatchInsert callback, though - the FDW knows whether it supports
>     that,
>     >> and it seems a bit pointless at the moment as there are no other
>     batch
>     >> callbacks. Maybe we should add an Assert somewhere, though?
>     >
>     > Hmm, not checking whether BatchInsert() exists may not be good idea,
>     > because if an FDW's GetModifyBatchSize() returns a value > 1 but
>     > there's no BatchInsert() function to call, ExecBatchInsert() would
>     > trip.  I don't see the newly added documentation telling FDW authors
>     > to either define both or none.
>     >
>
>     Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
>     it can't hurt, I guess. I'll ad it back.
>
>     > Regarding how this plays with partitions, I don't think we need
>     > ExecGetTouchedPartitions(), because you can get the routed-to
>     > partitions using es_tuple_routing_result_relations.  Also, perhaps
>
>     I'm not very familiar with es_tuple_routing_result_relations, but that
>     doesn't seem to work. I've replaced the flushing code at the end of
>     ExecModifyTable with a loop over es_tuple_routing_result_relations, but
>     then some of the rows are missing (i.e. not flushed).
>
>
> I should’ve mentioned es_opened_result_relations too which contain
> non-routing result relations.  So I really meant if (proute) then use
> es_tuple_routing_result_relations, else es_opened_result_relations. 
> This should work as long as batching is only used for inserts.
>
Ah, right. That did the trick.

Attached is v9 with all of those tweaks, except for moving the BatchSize
call to BeginForeignModify - I tried that, but it did not seem like an
improvement, because we'd still need the checks for API callbacks in
ExecInsert for example. So I decided not to do that.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

0001-Add-bulk-insert-for-foreign-tables-v9.patch (50K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> Attached is v9 with all of those tweaks, except for moving the BatchSize call to
> BeginForeignModify - I tried that, but it did not seem like an improvement,
> because we'd still need the checks for API callbacks in ExecInsert for example.
> So I decided not to do that.

Thanks, Tomas-san.  The patch looks good again.

Amit-san, thank you for teaching us about es_tuple_routing_result_relations and es_opened_result_relations.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
In reply to this post by Tomas Vondra-6
On Fri, Jan 15, 2021 at 12:05 AM Tomas Vondra
<[hidden email]> wrote:
> Attached is v9 with all of those tweaks,

Thanks.

> except for moving the BatchSize
> call to BeginForeignModify - I tried that, but it did not seem like an
> improvement, because we'd still need the checks for API callbacks in
> ExecInsert for example. So I decided not to do that.

Okay, so maybe not moving the whole logic into the FDW's
BeginForeignModify(), but at least if we move this...

@@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
+       /*
+        * Determine if the FDW supports batch insert and determine the batch
+        * size (a FDW may support batching, but it may be disabled for the
+        * server/table). Do this only once, at the beginning - we don't want
+        * the batch size to change during execution.
+        */
+       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+           resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+           resultRelInfo->ri_BatchSize == 0)
+           resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

...into ExecInitModifyTable(), ExecInsert() only needs the following block:

       /*
+        * If the FDW supports batching, and batching is requested, accumulate
+        * rows and insert them in batches. Otherwise use the per-row inserts.
+        */
+       if (resultRelInfo->ri_BatchSize > 1)
+       {
+ ...

AFAICS, I don't see anything that will cause ri_BatchSize to become 0
once set so don't see the point of checking whether it needs to be set
again on every ExecInsert() call.  Also, maybe not that important, but
shaving off 3 comparisons for every tuple would add up nicely IMHO
especially given that we're targeting bulk loads.

--
Amit Langote
EDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Amit Langote <[hidden email]>

> Okay, so maybe not moving the whole logic into the FDW's
> BeginForeignModify(), but at least if we move this...
>
> @@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
> +       /*
> +        * Determine if the FDW supports batch insert and determine the
> batch
> +        * size (a FDW may support batching, but it may be disabled for the
> +        * server/table). Do this only once, at the beginning - we don't want
> +        * the batch size to change during execution.
> +        */
> +       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
> +           resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
> +           resultRelInfo->ri_BatchSize == 0)
> +           resultRelInfo->ri_BatchSize =
> +
> resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
>
> ...into ExecInitModifyTable(), ExecInsert() only needs the following block:

Does ExecInitModifyTable() know all leaf partitions where the tuples produced by VALUES or SELECT go?  ExecInsert() doesn't find the target leaf partition for the first time through the call to ExecPrepareTupleRouting()?  Leaf partitions can have different batch_size settings.


Regards
Takayuki Tsunakawa

Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
On Sat, Jan 16, 2021 at 12:00 AM [hidden email]
<[hidden email]> wrote:

> From: Amit Langote <[hidden email]>
> > Okay, so maybe not moving the whole logic into the FDW's
> > BeginForeignModify(), but at least if we move this...
> >
> > @@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
> > +       /*
> > +        * Determine if the FDW supports batch insert and determine the
> > batch
> > +        * size (a FDW may support batching, but it may be disabled for the
> > +        * server/table). Do this only once, at the beginning - we don't want
> > +        * the batch size to change during execution.
> > +        */
> > +       if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
> > +           resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
> > +           resultRelInfo->ri_BatchSize == 0)
> > +           resultRelInfo->ri_BatchSize =
> > +
> > resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
> >
> > ...into ExecInitModifyTable(), ExecInsert() only needs the following block:
>
> Does ExecInitModifyTable() know all leaf partitions where the tuples produced by VALUES or SELECT go?  ExecInsert() doesn't find the target leaf partition for the first time through the call to ExecPrepareTupleRouting()?  Leaf partitions can have different batch_size settings.

Good thing you reminded me that this is about inserts, and in that
case no, ExecInitModifyTable() doesn't know all leaf partitions, it
only sees the root table whose batch_size doesn't really matter.  So
it's really ExecInitRoutingInfo() that I would recommend to set
ri_BatchSize; right after this block:

/*
 * If the partition is a foreign table, let the FDW init itself for
 * routing tuples to the partition.
 */
if (partRelInfo->ri_FdwRoutine != NULL &&
    partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
    partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a partition
when it is initialized after being inserted into for the first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in
ExecInitModifyTable().

--
Amit Langote
EDB: http://www.enterprisedb.com


12345