POC: postgres_fdw insert batching

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
115 messages Options
1234 ... 6
Reply | Threaded
Open this post in threaded view
|

POC: postgres_fdw insert batching

Tomas Vondra-4
Hi,

One of the issues I'm fairly regularly reminded by users/customers is
that inserting into tables sharded using FDWs are rather slow. We do
even get it reported on pgsql-bugs from time to time [1].

Some of the slowness / overhead is expected, doe to the latency between
machines in the sharded setup. Even just 1ms latency will make it way
more expensive than a single instance.

But let's do a simple experiment, comparing a hash-partitioned regular
partitions, and one with FDW partitions in the same instance. Scripts to
run this are attached. The duration of inserting 1M rows to this table
(average of 10 runs on my laptop) looks like this:

   regular: 2872 ms
   FDW:     64454 ms

Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
which is probably where most single-DC clusters are :-(

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

For some use cases this may be reduced by having many independent
connnections from different users, so the per-user latency is higher but
acceptable. But if you need to import larger amounts of data (say, a CSV
file for analytics, ...) this may not work.

Some time ago I wrote an ugly PoC adding batching, just to see how far
would it get us, and it seems quite promising - results for he same
INSERT benchmarks look like this:

    FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.


1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).


2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.

The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.


3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.


3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(


4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

But maybe that's OK - I'm not sure how the prepared statements/cursors
affect this. I can imagine restricting the batching only to plans where
this is not an issue (single FDW node or something), but it seems rather
fragile and undesirable.

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).


regards



[1] https://www.postgresql.org/message-id/CACnz%2BQ1q0%2B2KoJam9LyNMk8JmdC6qYHXWB895Wu2xcpoip18xQ%40mail.gmail.com

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

fdw.sql (3K) Download Attachment
local.sql (1K) Download Attachment
0001-fdw-insert-batching-v1.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
Hi Tomas,

On Mon, Jun 29, 2020 at 12:10 AM Tomas Vondra
<[hidden email]> wrote:

>
> Hi,
>
> One of the issues I'm fairly regularly reminded by users/customers is
> that inserting into tables sharded using FDWs are rather slow. We do
> even get it reported on pgsql-bugs from time to time [1].
>
> Some of the slowness / overhead is expected, doe to the latency between
> machines in the sharded setup. Even just 1ms latency will make it way
> more expensive than a single instance.
>
> But let's do a simple experiment, comparing a hash-partitioned regular
> partitions, and one with FDW partitions in the same instance. Scripts to
> run this are attached. The duration of inserting 1M rows to this table
> (average of 10 runs on my laptop) looks like this:
>
>    regular: 2872 ms
>    FDW:     64454 ms
>
> Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
> Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
> which is probably where most single-DC clusters are :-(
>
> Now, the primary reason why the performance degrades like this is that
> while FDW has batching for SELECT queries (i.e. we read larger chunks of
> data from the cursors), we don't have that for INSERTs (or other DML).
> Every time you insert a row, it has to go all the way down into the
> partition synchronously.
>
> For some use cases this may be reduced by having many independent
> connnections from different users, so the per-user latency is higher but
> acceptable. But if you need to import larger amounts of data (say, a CSV
> file for analytics, ...) this may not work.
>
> Some time ago I wrote an ugly PoC adding batching, just to see how far
> would it get us, and it seems quite promising - results for he same
> INSERT benchmarks look like this:
>
>     FDW batching: 4584 ms
>
> So, rather nice improvement, I'd say ...

Very nice indeed.

> Before I spend more time hacking on this, I have a couple open questions
> about the design, restrictions etc.

I think you may want to take a look this recent proposal by Andrey Lepikhov:

* [POC] Fast COPY FROM command for the table with foreign partitions *
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Ashutosh Bapat-2
In reply to this post by Tomas Vondra-4
On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<[hidden email]> wrote:

>
>     FDW batching: 4584 ms
>
> So, rather nice improvement, I'd say ...

Very nice.

>
> Before I spend more time hacking on this, I have a couple open questions
> about the design, restrictions etc.
>
>
> 1) Extend the FDW API?
>
> In the patch, the batching is simply "injected" into the existing insert
> API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
> extend the API with a "batched" version of the method, so that we can
> easily determine whether the FDW supports batching or not - it would
> require changes in the callers, though. OTOH it might be useful for
> COPY, where we could do something similar to multi_insert (COPY already
> benefits from this patch, but it does not use the batching built-into
> COPY).

Amit Langote has pointed out a related patch being discussed on hackers at [1].

That patch introduces a new API. But if we can do it without
introducing a new API that will be good. FDWs which can support
batching can just modify their code and don't have to implement and
manage a new API. We already have a handful of those APIs.

>
> 2) What about the insert results?
>
> I'm not sure what to do about "result" status for the inserted rows. We
> only really "stash" the rows into a buffer, so we don't know if it will
> succeed or not. The patch simply assumes it will succeed, but that's
> clearly wrong, and it may result in reporting a wrong number or rows.

I didn't get this. We are executing an INSERT on the foreign server,
so we get the number of rows INSERTed from that server. We should just
add those up across batches. If there's a failure, it would abort the
transaction, local as well as remote.

>
> The patch also disables the batching when the insert has a RETURNING
> clause, because there's just a single slot (for the currently inserted
> row). I suppose a "batching" method would take an array of slots.
>

It will be a rare case when a bulk load also has a RETURNING clause.
So, we can leave with this restriction. We should try to choose a
design which allows that restriction to be lifted in the future. But I
doubt that restriction will be a serious one.

>
> 3) What about the other DML operations (DELETE/UPDATE)?
>
> The other DML operations could probably benefit from the batching too.
> INSERT was good enough for a PoC, but having batching only for INSERT
> seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In your patch, I see that an INSERT statement with batch is
constructed as INSERT INTO ... VALUES (...), (...) as many values as
the batch size. That won't work as is for UPDATE/DELETE since we can't
pass multiple pairs of ctids and columns to be updated for each ctid
in one statement. Maybe we could build as many UPDATE/DELETE
statements as the size of a batch, but that would be ugly. What we
need is a feature like a batch prepared statement in libpq similar to
what JDBC supports
((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
This will allow a single prepared statement to be executed with a
batch of parameters, each batch corresponding to one foreign DML
statement.

>
>
> 3) Should we do batching for COPY insteads?
>
> While looking at multi_insert, I've realized it's mostly exactly what
> the new "batching insert" API function would need to be. But it's only
> really used in COPY, so I wonder if we should just abandon the idea of
> batching INSERTs and do batching COPY for FDW tables.

I think this won't support RETURNING as well. But if we could somehow
use copy protocol to send the data to the foreign server and yet treat
it as INSERT, that might work. I think we have find out which performs
better COPY or batch INSERT.

>
> For cases that can replace INSERT with COPY this would be enough, but
> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
> this :-(

Agreed, if we want to support bulk UPDATE/DELETE as well.

>
>
> 4) Expected consistency?
>
> I'm not entirely sure what are the consistency expectations for FDWs.
> Currently the FDW nodes pointing to the same server share a connection,
> so the inserted rows might be visible to other nodes. But if we only
> stash the rows in a local buffer for a while, that's no longer true. So
> maybe this breaks the consistency expectations?
>
> But maybe that's OK - I'm not sure how the prepared statements/cursors
> affect this. I can imagine restricting the batching only to plans where
> this is not an issue (single FDW node or something), but it seems rather
> fragile and undesirable.

I think that area is grey. Depending upon where the cursor is
positioned when a DML node executes a query, the data fetched from
cursor may or may not see the effect of DML. The cursor position is
based on the batch size so we already have problems in this area I
think. Assuming that the DML and SELECT are independent this will
work. So, the consistency problems exists, it will just be modulated
by batching DML. I doubt that's related to this feature exclusively
and should be solved independent of this feature.

>
> I was thinking about adding a GUC to enable/disable the batching at some
> level (global, server, table, ...) but it seems like a bad match because
> the consistency expectations likely depend on a query. There should be a
> GUC to set the batch size, though (it's hardcoded to 100 for now).
>

Similar to fetch_size, it should foreign server, table level setting, IMO.

[1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru


--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Etsuro Fujita-2
On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<[hidden email]> wrote:
> On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> <[hidden email]> wrote:

> > 3) What about the other DML operations (DELETE/UPDATE)?
> >
> > The other DML operations could probably benefit from the batching too.
> > INSERT was good enough for a PoC, but having batching only for INSERT
> > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> > of quals, but likely doable.
>
> Bulk INSERTs are more common in a sharded environment because of data
> load in say OLAP systems. Bulk update/delete are rare, although not
> that rare. So if an approach just supports bulk insert and not bulk
> UPDATE/DELETE that will address a large number of usecases IMO. But if
> we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify.  So, I'm not sure we
really need the bulk UPDATE/DELETE.

> > 3) Should we do batching for COPY insteads?
> >
> > While looking at multi_insert, I've realized it's mostly exactly what
> > the new "batching insert" API function would need to be. But it's only
> > really used in COPY, so I wonder if we should just abandon the idea of
> > batching INSERTs and do batching COPY for FDW tables.

> I think we have find out which performs
> better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side.  (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

> [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Ashutosh Bapat-3


On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote:
On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<[hidden email]> wrote:
> On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> <[hidden email]> wrote:

> > 3) What about the other DML operations (DELETE/UPDATE)?
> >
> > The other DML operations could probably benefit from the batching too.
> > INSERT was good enough for a PoC, but having batching only for INSERT
> > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> > of quals, but likely doable.
>
> Bulk INSERTs are more common in a sharded environment because of data
> load in say OLAP systems. Bulk update/delete are rare, although not
> that rare. So if an approach just supports bulk insert and not bulk
> UPDATE/DELETE that will address a large number of usecases IMO. But if
> we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify.  So, I'm not sure we
really need the bulk UPDATE/DELETE.

That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.
 

> > 3) Should we do batching for COPY insteads?
> >
> > While looking at multi_insert, I've realized it's mostly exactly what
> > the new "batching insert" API function would need to be. But it's only
> > really used in COPY, so I wonder if we should just abandon the idea of
> > batching INSERTs and do batching COPY for FDW tables.

> I think we have find out which performs
> better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side.  (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

> [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Best regards,
Etsuro Fujita

--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
<[hidden email]> wrote:

> On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote:
>> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
>> <[hidden email]> wrote:
>> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
>> > <[hidden email]> wrote:
>>
>> > > 3) What about the other DML operations (DELETE/UPDATE)?
>> > >
>> > > The other DML operations could probably benefit from the batching too.
>> > > INSERT was good enough for a PoC, but having batching only for INSERT
>> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
>> > > of quals, but likely doable.
>> >
>> > Bulk INSERTs are more common in a sharded environment because of data
>> > load in say OLAP systems. Bulk update/delete are rare, although not
>> > that rare. So if an approach just supports bulk insert and not bulk
>> > UPDATE/DELETE that will address a large number of usecases IMO. But if
>> > we can make everything work together that would be good as well.
>>
>> In most cases, I think the entire UPDATE/DELETE operations would be
>> pushed down to the remote side by DirectModify.  So, I'm not sure we
>> really need the bulk UPDATE/DELETE.
> That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.

While it's true that how to accommodate the DirectModify API in the
new inherited update/delete planning approach is an open question on
that thread, I would eventually like to find an answer to that.  That
is, that work shouldn't result in losing the foreign partition's
ability to use DirectModify API to optimize updates/deletes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Etsuro Fujita-2
On Tue, Jun 30, 2020 at 2:54 PM Amit Langote <[hidden email]> wrote:

> On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
> <[hidden email]> wrote:
> > On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote:
> >> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
> >> <[hidden email]> wrote:
> >> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> >> > <[hidden email]> wrote:
> >>
> >> > > 3) What about the other DML operations (DELETE/UPDATE)?
> >> > >
> >> > > The other DML operations could probably benefit from the batching too.
> >> > > INSERT was good enough for a PoC, but having batching only for INSERT
> >> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> >> > > of quals, but likely doable.
> >> >
> >> > Bulk INSERTs are more common in a sharded environment because of data
> >> > load in say OLAP systems. Bulk update/delete are rare, although not
> >> > that rare. So if an approach just supports bulk insert and not bulk
> >> > UPDATE/DELETE that will address a large number of usecases IMO. But if
> >> > we can make everything work together that would be good as well.
> >>
> >> In most cases, I think the entire UPDATE/DELETE operations would be
> >> pushed down to the remote side by DirectModify.  So, I'm not sure we
> >> really need the bulk UPDATE/DELETE.
> > That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.
>
> While it's true that how to accommodate the DirectModify API in the
> new inherited update/delete planning approach is an open question on
> that thread, I would eventually like to find an answer to that.  That
> is, that work shouldn't result in losing the foreign partition's
> ability to use DirectModify API to optimize updates/deletes.

That would be great!

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-4
In reply to this post by Ashutosh Bapat-2
On Mon, Jun 29, 2020 at 04:22:15PM +0530, Ashutosh Bapat wrote:

>On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
><[hidden email]> wrote:
>
>>
>>     FDW batching: 4584 ms
>>
>> So, rather nice improvement, I'd say ...
>
>Very nice.
>
>>
>> Before I spend more time hacking on this, I have a couple open questions
>> about the design, restrictions etc.
>>
>>
>> 1) Extend the FDW API?
>>
>> In the patch, the batching is simply "injected" into the existing insert
>> API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
>> extend the API with a "batched" version of the method, so that we can
>> easily determine whether the FDW supports batching or not - it would
>> require changes in the callers, though. OTOH it might be useful for
>> COPY, where we could do something similar to multi_insert (COPY already
>> benefits from this patch, but it does not use the batching built-into
>> COPY).
>
>Amit Langote has pointed out a related patch being discussed on hackers at [1].
>
>That patch introduces a new API. But if we can do it without
>introducing a new API that will be good. FDWs which can support
>batching can just modify their code and don't have to implement and
>manage a new API. We already have a handful of those APIs.
>

I don't think extending the API is a big issue - the FDW code will need
changing anyway, so this seems minor.

I'll take a look at the COPY patch - I agree it seems like a good idea,
although it can be less convenient in various caes (e.g. I've seen a lot
of INSERT ... SELECT queries in sharded systems, etc.).

>>
>> 2) What about the insert results?
>>
>> I'm not sure what to do about "result" status for the inserted rows. We
>> only really "stash" the rows into a buffer, so we don't know if it will
>> succeed or not. The patch simply assumes it will succeed, but that's
>> clearly wrong, and it may result in reporting a wrong number or rows.
>
>I didn't get this. We are executing an INSERT on the foreign server,
>so we get the number of rows INSERTed from that server. We should just
>add those up across batches. If there's a failure, it would abort the
>transaction, local as well as remote.
>

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

>>
>> The patch also disables the batching when the insert has a RETURNING
>> clause, because there's just a single slot (for the currently inserted
>> row). I suppose a "batching" method would take an array of slots.
>>
>
>It will be a rare case when a bulk load also has a RETURNING clause.
>So, we can leave with this restriction. We should try to choose a
>design which allows that restriction to be lifted in the future. But I
>doubt that restriction will be a serious one.
>
>>
>> 3) What about the other DML operations (DELETE/UPDATE)?
>>
>> The other DML operations could probably benefit from the batching too.
>> INSERT was good enough for a PoC, but having batching only for INSERT
>> seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
>> of quals, but likely doable.
>
>Bulk INSERTs are more common in a sharded environment because of data
>load in say OLAP systems. Bulk update/delete are rare, although not
>that rare. So if an approach just supports bulk insert and not bulk
>UPDATE/DELETE that will address a large number of usecases IMO. But if
>we can make everything work together that would be good as well.
>
>In your patch, I see that an INSERT statement with batch is
>constructed as INSERT INTO ... VALUES (...), (...) as many values as
>the batch size. That won't work as is for UPDATE/DELETE since we can't
>pass multiple pairs of ctids and columns to be updated for each ctid
>in one statement. Maybe we could build as many UPDATE/DELETE
>statements as the size of a batch, but that would be ugly. What we
>need is a feature like a batch prepared statement in libpq similar to
>what JDBC supports
>((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
>This will allow a single prepared statement to be executed with a
>batch of parameters, each batch corresponding to one foreign DML
>statement.
>

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

>>
>>
>> 3) Should we do batching for COPY insteads?
>>
>> While looking at multi_insert, I've realized it's mostly exactly what
>> the new "batching insert" API function would need to be. But it's only
>> really used in COPY, so I wonder if we should just abandon the idea of
>> batching INSERTs and do batching COPY for FDW tables.
>
>I think this won't support RETURNING as well. But if we could somehow
>use copy protocol to send the data to the foreign server and yet treat
>it as INSERT, that might work. I think we have find out which performs
>better COPY or batch INSERT.
>

I don't see why not support both, the use cases are somewhat different I
think.

>>
>> For cases that can replace INSERT with COPY this would be enough, but
>> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
>> this :-(
>
>Agreed, if we want to support bulk UPDATE/DELETE as well.
>
>>
>>
>> 4) Expected consistency?
>>
>> I'm not entirely sure what are the consistency expectations for FDWs.
>> Currently the FDW nodes pointing to the same server share a connection,
>> so the inserted rows might be visible to other nodes. But if we only
>> stash the rows in a local buffer for a while, that's no longer true. So
>> maybe this breaks the consistency expectations?
>>
>> But maybe that's OK - I'm not sure how the prepared statements/cursors
>> affect this. I can imagine restricting the batching only to plans where
>> this is not an issue (single FDW node or something), but it seems rather
>> fragile and undesirable.
>
>I think that area is grey. Depending upon where the cursor is
>positioned when a DML node executes a query, the data fetched from
>cursor may or may not see the effect of DML. The cursor position is
>based on the batch size so we already have problems in this area I
>think. Assuming that the DML and SELECT are independent this will
>work. So, the consistency problems exists, it will just be modulated
>by batching DML. I doubt that's related to this feature exclusively
>and should be solved independent of this feature.
>

OK, thanks for the feedback.

>>
>> I was thinking about adding a GUC to enable/disable the batching at some
>> level (global, server, table, ...) but it seems like a bad match because
>> the consistency expectations likely depend on a query. There should be a
>> GUC to set the batch size, though (it's hardcoded to 100 for now).
>>
>
>Similar to fetch_size, it should foreign server, table level setting, IMO.
>
>[1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
>

Yeah, I agree we should have a GUC to define the batch size. What I had
in mind was something that would allow us to enable/disable batching to
increase the consistency guarantees, or something like that. I think
simple GUCs are a poor solution for that.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Ashutosh Bapat-3


On Tue, 30 Jun 2020 at 22:23, Tomas Vondra <[hidden email]> wrote:
>I didn't get this. We are executing an INSERT on the foreign server,
>so we get the number of rows INSERTed from that server. We should just
>add those up across batches. If there's a failure, it would abort the
>transaction, local as well as remote.
>

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

Hmm yes, now I remember that bit. So for every row buffered, we return a valid slot without knowing whether that row was inserted on the remote server or not. I think we have that problem even now where a single INSERT might result in multiple INSERTs on the remote server (rare but not completely impossible).
 

>In your patch, I see that an INSERT statement with batch is
>constructed as INSERT INTO ... VALUES (...), (...) as many values as
>the batch size. That won't work as is for UPDATE/DELETE since we can't
>pass multiple pairs of ctids and columns to be updated for each ctid
>in one statement. Maybe we could build as many UPDATE/DELETE
>statements as the size of a batch, but that would be ugly. What we
>need is a feature like a batch prepared statement in libpq similar to
>what JDBC supports
>((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
>This will allow a single prepared statement to be executed with a
>batch of parameters, each batch corresponding to one foreign DML
>statement.
>

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

That sounds great. The solution will be limited to postgres_fdw only.
 
I don't see why not support both, the use cases are somewhat different I
think.

+1, if we can do both.

--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Andres Freund
In reply to this post by Tomas Vondra-4
Hi,

On 2020-06-28 17:10:02 +0200, Tomas Vondra wrote:

> 3) Should we do batching for COPY insteads?
>
> While looking at multi_insert, I've realized it's mostly exactly what
> the new "batching insert" API function would need to be. But it's only
> really used in COPY, so I wonder if we should just abandon the idea of
> batching INSERTs and do batching COPY for FDW tables.
>
> For cases that can replace INSERT with COPY this would be enough, but
> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
> this :-(

I personally think - and I realize that that might be annoying to
somebody looking to make an incremental improvement - that the
nodeModifyTable.c and copy.c code dealing with DML has become too
complicated to add features like this without a larger
refactoring. Leading to choices like this, whether to add a feature in
one place but not the other.

I think before we add more complexity, we ought to centralize and clean
up the DML handling code so most is shared between copy.c and
nodeModifyTable.c. Then we can much more easily add batching to FDWs, to
CTAS, to INSERT INTO SELECT etc, for which there are patches already.


> 4) Expected consistency?
>
> I'm not entirely sure what are the consistency expectations for FDWs.
> Currently the FDW nodes pointing to the same server share a connection,
> so the inserted rows might be visible to other nodes. But if we only
> stash the rows in a local buffer for a while, that's no longer true. So
> maybe this breaks the consistency expectations?

Given that for local queries that's not the case (since the snapshot
won't have those changes visible), I think we shouldn't be too concerned
about that. If anything we should be concerned about the opposite.

If we are concerned, perhaps we could add functionality to flush all
pending changes before executing further statements?



> I was thinking about adding a GUC to enable/disable the batching at some
> level (global, server, table, ...) but it seems like a bad match because
> the consistency expectations likely depend on a query. There should be a
> GUC to set the batch size, though (it's hardcoded to 100 for now).

Hm. If libpq allowed to utilize pipelining ISTM the answer here would be
to not batch by building a single statement with all rows as a VALUES,
but issue the single INSERTs in a pipelined manner. That'd probably
remove all behavioural differences.   I really wish somebody would pick
up that libpq patch again.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Andrey V. Lepikhov
In reply to this post by Tomas Vondra-4
On 6/28/20 8:10 PM, Tomas Vondra wrote:
> Now, the primary reason why the performance degrades like this is that
> while FDW has batching for SELECT queries (i.e. we read larger chunks of
> data from the cursors), we don't have that for INSERTs (or other DML).
> Every time you insert a row, it has to go all the way down into the
> partition synchronously.

You added new fields into the PgFdwModifyState struct. Why you didn't
reused ResultRelInfo::ri_CopyMultiInsertBuffer field and
CopyMultiInsertBuffer machinery as storage for incoming tuples?

--
regards,
Andrey Lepikhov
Postgres Professional


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-4
On Fri, Jul 10, 2020 at 09:28:44AM +0500, Andrey V. Lepikhov wrote:

>On 6/28/20 8:10 PM, Tomas Vondra wrote:
>>Now, the primary reason why the performance degrades like this is that
>>while FDW has batching for SELECT queries (i.e. we read larger chunks of
>>data from the cursors), we don't have that for INSERTs (or other DML).
>>Every time you insert a row, it has to go all the way down into the
>>partition synchronously.
>
>You added new fields into the PgFdwModifyState struct. Why you didn't
>reused ResultRelInfo::ri_CopyMultiInsertBuffer field and
>CopyMultiInsertBuffer machinery as storage for incoming tuples?
>

Because I was focused on speeding-up inserts, and that is not using
CopyMultiInsertBuffer I think. I agree the way the tuples are stored
may be improved, of course.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Michael Paquier-2
On Sun, Jul 12, 2020 at 02:11:01AM +0200, Tomas Vondra wrote:
> Because I was focused on speeding-up inserts, and that is not using
> CopyMultiInsertBuffer I think. I agree the way the tuples are stored
> may be improved, of course.

The CF bot is telling that the regression tests of postgres_fdw are
crashing.  Could you look at that?
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by Tomas Vondra-4
Hello Tomas san,


Thank you for picking up this.  I'm interested in this topic, too.  (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.)

As others referred, Andrey-san's fast COPY to foreign partitions is also promising.  But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT.

Is there anything that makes you worry and stops development?  Could I give it a try to implement this (I'm not sure I can, sorry.  I'm worried if we can change the executor's call chain easily.)


> 1) Extend the FDW API?

Yes, I think, because FDWs for other DBMSs will benefit from this.  (But it's questionable whether we want users to transfer data in Postgres database to other DBMSs...)

MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2), ...  Oracle doesn't have this syntax, but it can use CTE as follows:

  INSERT INTO table
  WITH t AS (
    SELECT record1 FROM DUAL UNION ALL
    SELECT record2 FROM DUAL UNION ALL
    ...
  )
  SELECT * FROM t;

And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ...

The API would simply be:

TupleTableSlot **
ExecForeignMultiInsert(EState *estate,
                  ResultRelInfo *rinfo,
                  TupleTableSlot **slot,
                  TupleTableSlot **planSlot,
                  int numSlots);


> 2) What about the insert results?

I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely.  Other FDWs may be able to do it, so the API can be like above.

For the same reason, support for RETURNING clause will vary from DBMS to DBMS.


> 3) What about the other DML operations (DELETE/UPDATE)?

I don't think they are necessary for the time being.  If we want them, they will be implemented using the libpq batch/pipelining as Andres-san said.


> 3) Should we do batching for COPY insteads?

I'm thinking of issuing INSERT with multiple records as your patch does, because:

* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT.  (I don't think the rule is a feature what users care about, though.)  Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.


[1]
Fast COPY FROM command for the table with foreign partitions
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru


Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-4
On Thu, Oct 08, 2020 at 02:40:10AM +0000, [hidden email] wrote:
>Hello Tomas san,
>
>
>Thank you for picking up this.  I'm interested in this topic, too.  (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.)
>
>As others referred, Andrey-san's fast COPY to foreign partitions is also promising.  But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT.
>
>Is there anything that makes you worry and stops development?  Could I give it a try to implement this (I'm not sure I can, sorry.  I'm worried if we can change the executor's call chain easily.)
>

It's primarily a matter of having too much other stuff on my plate, thus
not having time to work on this feature. I was not too worried about any
particular issue, but I wanted some feedback before spending more time
on extending the API.

I'm not sure when I'll have time to work on this again, so if you are
interested and willing to work on it, please go ahead. I'll gladly do
reviews and help you with it.

>
>> 1) Extend the FDW API?
>
>Yes, I think, because FDWs for other DBMSs will benefit from this.  (But it's questionable whether we want users to transfer data in Postgres database to other DBMSs...)
>

I think transferring data to other databases is fine - interoperability
is a big advantage for users, I don't see it as something threatening
the PostgreSQL project. I doubt this would make it more likely for users
to migrate from PostgreSQL - there are many ways to do that already.


>MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2), ...  Oracle doesn't have this syntax, but it can use CTE as follows:
>
>  INSERT INTO table
>  WITH t AS (
>    SELECT record1 FROM DUAL UNION ALL
>    SELECT record2 FROM DUAL UNION ALL
>    ...
>  )
>  SELECT * FROM t;
>
>And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ...
>

True. In some cases INSERT may be replaced by COPY, but it has various
other features too.

>The API would simply be:
>
>TupleTableSlot **
>ExecForeignMultiInsert(EState *estate,
>                  ResultRelInfo *rinfo,
>                  TupleTableSlot **slot,
>                  TupleTableSlot **planSlot,
>                  int numSlots);
>
>

+1, seems quite reasonable

>> 2) What about the insert results?
>
>I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely.  Other FDWs may be able to do it, so the API can be like above.
>

Yeah. I think handling complete failure should not be very difficult,
but there are cases that worry me more. For example, what if there's a
before trigger (on the remote db) that "skips" inserting some of the
rows by returning NULL?

>For the same reason, support for RETURNING clause will vary from DBMS to DBMS.
>

Yeah. I wonder if the FDW needs to indicate which features are supported
by the ExecForeignMultiInsert, e.g. by adding a function that decides
whether batch insert is supported (it might also do that internally by
calling ExecForeignInsert, of course).

>
>> 3) What about the other DML operations (DELETE/UPDATE)?
>
>I don't think they are necessary for the time being.  If we want them, they will be implemented using the libpq batch/pipelining as Andres-san said.
>

I agree.

>
>> 3) Should we do batching for COPY insteads?
>
>I'm thinking of issuing INSERT with multiple records as your patch does, because:
>
>* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.
>
>* COPY doesn't invoke rules unlike INSERT.  (I don't think the rule is a feature what users care about, though.)  Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY.
>

I agree.



regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
From: Tomas Vondra <[hidden email]>
> I'm not sure when I'll have time to work on this again, so if you are
> interested and willing to work on it, please go ahead. I'll gladly do
> reviews and help you with it.

Thank you very much.


> I think transferring data to other databases is fine - interoperability
> is a big advantage for users, I don't see it as something threatening
> the PostgreSQL project. I doubt this would make it more likely for users
> to migrate from PostgreSQL - there are many ways to do that already.

Definitely true.  Users may want to use INSERT SELECT to do some data transformation in their OLTP database and load it into a non-Postgres data warehouse.


> Yeah. I think handling complete failure should not be very difficult,
> but there are cases that worry me more. For example, what if there's a
> before trigger (on the remote db) that "skips" inserting some of the
> rows by returning NULL?

> Yeah. I wonder if the FDW needs to indicate which features are supported
> by the ExecForeignMultiInsert, e.g. by adding a function that decides
> whether batch insert is supported (it might also do that internally by
> calling ExecForeignInsert, of course).

Thanks for your advice.  I'll try to address them.


 Regards
Takayuki Tsunakawa



Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
Hello,


The attached patch implements the new bulk insert routine for postgres_fdw and the executor utilizing it.  It passes make check-world.

I measured performance in a basic non-partitioned case by modifying Tomas-san's scripts.  They perform an INSERT SELECT statement that copies one million records.  The table consists of two integer columns, with a primary key on one of those them.  You can run the attached prepare.sql to set up once.  local.sql inserts to the table directly, while fdw.sql inserts through a foreign table.

The performance results, the average time of 5 runs,  were as follows on a Linux host where the average round-trip time of "ping localhost" was 34 us:

    master, local: 6.1 seconds
    master, fdw: 125.3 seconds
    patched, fdw: 11.1 seconds (11x improvement)


The patch accumulates at most 100 records in ModifyTableState before inserting in bulk.  Also, when an input record is targeted for a different relation (= partition) than that for already accumulated records, insert the accumulated records and store the new record for later insert.

[Issues]

1. Do we want a GUC parameter, say, max_bulk_insert_records = (integer), to control the number of records inserted at once?
The range of allowed values would be between 1 and 1,000.  1 disables bulk insert.
The possible reason of the need for this kind of parameter would be to limit the amount of memory used for accumulated records, which could be prohibitively large if each record is big.  I don't think this is a must, but I think we can have it.

2. Should we accumulate records per relation in ResultRelInfo instead?
That is, when inserting into a partitioned table that has foreign partitions, delay insertion until a certain number of input records accumulate, and then insert accumulated records per relation (e.g., 50 records to relation A, 30 records to relation B, and 20 records to relation C.)  If we do that,

* The order of insertion differs from the order of input records.  Is it OK?

* Should the maximum count of accumulated records be applied per relation or the query?
When many foreign partitions belong to a partitioned table, if the former is chosen, it may use much memory in total.  If the latter is chosen, the records per relation could be few and thus the benefit of bulk insert could be small.


Regards
Takayuki Tsunakawa


fdw.sql (74 bytes) Download Attachment
local.sql (70 bytes) Download Attachment
prepare.sql (748 bytes) Download Attachment
v1-0001-Add-bulk-insert-for-foreign-tables.patch (41K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6
Hi,

Thanks for working on this!

On 11/10/20 1:45 AM, [hidden email] wrote:
> Hello,
>
>
> The attached patch implements the new bulk insert routine for
> postgres_fdw and the executor utilizing it.  It passes make
> check-world.
>

I haven't done any testing yet, just a quick review.

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.


> I measured performance in a basic non-partitioned case by modifying
> Tomas-san's scripts.  They perform an INSERT SELECT statement that
> copies one million records.  The table consists of two integer
> columns, with a primary key on one of those them.  You can run the
> attached prepare.sql to set up once.  local.sql inserts to the table
> directly, while fdw.sql inserts through a foreign table.
>
> The performance results, the average time of 5 runs,  were as follows
> on a Linux host where the average round-trip time of "ping localhost"
> was 34 us:
>
> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
> 11.1 seconds (11x improvement)
>

Nice. I think we can't really get much closer to local master, so 6.1
vs. 11.1 seconds look quite acceptable.

>
> The patch accumulates at most 100 records in ModifyTableState before
> inserting in bulk.  Also, when an input record is targeted for a
> different relation (= partition) than that for already accumulated
> records, insert the accumulated records and store the new record for
> later insert.
>
> [Issues]
>
> 1. Do we want a GUC parameter, say, max_bulk_insert_records =
> (integer), to control the number of records inserted at once? The
> range of allowed values would be between 1 and 1,000.  1 disables
> bulk insert. The possible reason of the need for this kind of
> parameter would be to limit the amount of memory used for accumulated
> records, which could be prohibitively large if each record is big.  I
> don't think this is a must, but I think we can have it.
>

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

> 2. Should we accumulate records per relation in ResultRelInfo
> instead? That is, when inserting into a partitioned table that has
> foreign partitions, delay insertion until a certain number of input
> records accumulate, and then insert accumulated records per relation
> (e.g., 50 records to relation A, 30 records to relation B, and 20
> records to relation C.)  If we do that,
>

I think there's a chunk of text missing here? If we do that, then what?

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

> * The order of insertion differs from the order of input records.  Is
> it OK?
>

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

> * Should the maximum count of accumulated records be applied per
> relation or the query? When many foreign partitions belong to a
> partitioned table, if the former is chosen, it may use much memory in
> total.  If the latter is chosen, the records per relation could be
> few and thus the benefit of bulk insert could be small.
>

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of course, you're entirely correct enforcing this at the partition level
may require a lot of memory. Sadly, I don't see a way around that,
except for (a) disabling batching or (b) ordering the data to insert
data into one partition at a time.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-6


On 11/10/20 4:05 PM, Tomas Vondra wrote:

> Hi,
>
> Thanks for working on this!
>
> On 11/10/20 1:45 AM, [hidden email] wrote:
>> Hello,
>>
>>
>> The attached patch implements the new bulk insert routine for
>> postgres_fdw and the executor utilizing it.  It passes make
>> check-world.
>>
>
> I haven't done any testing yet, just a quick review.
>
> I see the patch builds the "bulk" query in execute_foreign_modify. IMO
> that's something we should do earlier, when we're building the simple
> query (for 1-row inserts). I'd understand if you were concerned about
> overhead in case of 1-row inserts, trying to not plan the bulk query
> until necessary, but I'm not sure this actually helps.
>
> Or was the goal to build a query for every possible number of slots? I
> don't think that's really useful, considering it requires deallocating
> the old plan, preparing a new one, etc. IMO it should be sufficient to
> have two queries - one for 1-row inserts, one for the full batch. The
> last incomplete batch can be inserted using a loop of 1-row queries.
>
> That's what my patch was doing, but I'm not insisting on that - it just
> seems like a better approach to me. So feel free to argue why this is
> better.
>
>
>> I measured performance in a basic non-partitioned case by modifying
>> Tomas-san's scripts.  They perform an INSERT SELECT statement that
>> copies one million records.  The table consists of two integer
>> columns, with a primary key on one of those them.  You can run the
>> attached prepare.sql to set up once.  local.sql inserts to the table
>> directly, while fdw.sql inserts through a foreign table.
>>
>> The performance results, the average time of 5 runs,  were as follows
>> on a Linux host where the average round-trip time of "ping localhost"
>> was 34 us:
>>
>> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
>> 11.1 seconds (11x improvement)
>>
>
> Nice. I think we can't really get much closer to local master, so 6.1
> vs. 11.1 seconds look quite acceptable.
>
>>
>> The patch accumulates at most 100 records in ModifyTableState before
>> inserting in bulk.  Also, when an input record is targeted for a
>> different relation (= partition) than that for already accumulated
>> records, insert the accumulated records and store the new record for
>> later insert.
>>
>> [Issues]
>>
>> 1. Do we want a GUC parameter, say, max_bulk_insert_records =
>> (integer), to control the number of records inserted at once? The
>> range of allowed values would be between 1 and 1,000.  1 disables
>> bulk insert. The possible reason of the need for this kind of
>> parameter would be to limit the amount of memory used for accumulated
>> records, which could be prohibitively large if each record is big.  I
>> don't think this is a must, but I think we can have it.
>>
>
> I think it'd be good to have such GUC, even if only for testing and
> development. We should probably have a way to disable the batching,
> which the GUC could also do, I think. So +1 to have the GUC.
>
>> 2. Should we accumulate records per relation in ResultRelInfo
>> instead? That is, when inserting into a partitioned table that has
>> foreign partitions, delay insertion until a certain number of input
>> records accumulate, and then insert accumulated records per relation
>> (e.g., 50 records to relation A, 30 records to relation B, and 20
>> records to relation C.)  If we do that,
>>
>
> I think there's a chunk of text missing here? If we do that, then what?
>
> Anyway, I don't see why accumulating the records in ResultRelInfo would
> be better than what the patch does now. It seems to me like fairly
> specific to FDWs, so keeping it int FDW state seems appropriate. What
> would be the advantage of stashing it in ResultRelInfo?
>
>> * The order of insertion differs from the order of input records.  Is
>> it OK?
>>
>
> I think that's OK for most use cases, and if it's not (e.g. when there's
> something requiring the exact order of writes) then it's not possible to
> use batching. That's one of the reasons why I think we should have a GUC
> to disable the batching.
>
>> * Should the maximum count of accumulated records be applied per
>> relation or the query? When many foreign partitions belong to a
>> partitioned table, if the former is chosen, it may use much memory in
>> total.  If the latter is chosen, the records per relation could be
>> few and thus the benefit of bulk insert could be small.
>>
>
> I think it needs to be applied per relation, because that's the level at
> which we can do it easily and consistently. The whole point is to send
> data in sufficiently large chunks to minimize the communication overhead
> (latency etc.), but if you enforce it "per query" that seems hard.
>
> Imagine you're inserting data into a table with many partitions - how do
> you pick the number of rows to accumulate? The table may have 10 or 1000
> partitions, we may be inserting into all partitions or just a small
> subset, not all partitions may be foreign, etc. It seems pretty
> difficult to pick and enforce a reliable limit at the query level. But
> maybe I'm missing something and it's easier than I think?
>
> Of course, you're entirely correct enforcing this at the partition level
> may require a lot of memory. Sadly, I don't see a way around that,
> except for (a) disabling batching or (b) ordering the data to insert
> data into one partition at a time.
>

Two more comments regarding this:

1) If we want to be more strict about the memory consumption, we should
probably set the limit in terms of memory, not number of rows. Currently
the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
this is not the only place with this issue.

2) I wonder what the COPY FROM patch [1] does in this regard. I don't
have time to check right now, but I suggest we try to do the same thing,
if only to be consistent.

[1]
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply | Threaded
Open this post in threaded view
|

RE: POC: postgres_fdw insert batching

tsunakawa.takay@fujitsu.com
In reply to this post by Tomas Vondra-6
From: Tomas Vondra <[hidden email]>

> I see the patch builds the "bulk" query in execute_foreign_modify. IMO
> that's something we should do earlier, when we're building the simple
> query (for 1-row inserts). I'd understand if you were concerned about
> overhead in case of 1-row inserts, trying to not plan the bulk query
> until necessary, but I'm not sure this actually helps.
>
> Or was the goal to build a query for every possible number of slots? I
> don't think that's really useful, considering it requires deallocating
> the old plan, preparing a new one, etc. IMO it should be sufficient to
> have two queries - one for 1-row inserts, one for the full batch. The
> last incomplete batch can be inserted using a loop of 1-row queries.
>
> That's what my patch was doing, but I'm not insisting on that - it just
> seems like a better approach to me. So feel free to argue why this is
> better.

Don't be concerned, the processing is not changed for 1-row inserts: the INSERT query string is built in PlanForeignModify(), and the remote statement is prepared in execute_foreign_modify() during the first call to ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls.

The re-creation of INSERT query string and its corresponding PREPARE happen when the number of tuples to be inserted is different from the previous call to ExecForeignInsert()/ExecForeignBulkInsert().  That's because we don't know how many tuples will be inserted during planning (PlanForeignModify) or execution (until the scan ends for SELECT).  For example, if we insert 10,030 rows with the bulk size 100, the flow is:

  PlanForeignModify():
    build the INSERT query string for 1 row
  ExecForeignBulkInsert(100):
    drop the INSERT query string and prepared statement for 1 row
    build the query string and prepare statement for 100 row INSERT
    execute it
  ExecForeignBulkInsert(100):
    reuse the prepared statement for 100 row INSERT and execute it
...
  ExecForeignBulkInsert(30):
    drop the INSERT query string and prepared statement for 100 row
    build the query string and prepare statement for 30 row INSERT
    execute it


> I think it'd be good to have such GUC, even if only for testing and
> development. We should probably have a way to disable the batching,
> which the GUC could also do, I think. So +1 to have the GUC.

OK, I'll add it.  The name would be max_bulk_insert_tuples, because a) it might cover bulk insert for local relations in the future, and b) "tuple" is used in cpu_(index_)tuple_cost and parallel_tuple_cost, while "row" or "record" is not used in GUC (except for row_security).

The valid range would be between 1 and 1,000 (I said 10,000 previously, but I think it's overreaction and am a bit worried about unforseen trouble too many tuples might cause.)  1 disables the bulk processing and uses the traditonal ExecForeignInsert().  The default value is 100 (would 1 be sensible as a default value to avoid surprising users by increased memory usage?)


> > 2. Should we accumulate records per relation in ResultRelInfo
> > instead? That is, when inserting into a partitioned table that has
> > foreign partitions, delay insertion until a certain number of input
> > records accumulate, and then insert accumulated records per relation
> > (e.g., 50 records to relation A, 30 records to relation B, and 20
> > records to relation C.)  If we do that,
> >
>
> I think there's a chunk of text missing here? If we do that, then what?

Sorry, the two bullets below there are what follows.  Perhaps I should have written ":" instead of ",".


> Anyway, I don't see why accumulating the records in ResultRelInfo would
> be better than what the patch does now. It seems to me like fairly
> specific to FDWs, so keeping it int FDW state seems appropriate. What
> would be the advantage of stashing it in ResultRelInfo?

I thought of distributing input records to their corresponding partitions' ResultRelInfos.  For example, input record for partition 1 comes, store it in the ResultRelInfo for partition 1, then input record for partition 2 comes, store it in the ResultRelInfo for partition 2.  When a ResultRelInfo accumulates some number of rows, insert the accumulated rows therein into the partition.  When the input endds, perform bulk inserts for ResultRelInfos that have accumulated rows.



> I think that's OK for most use cases, and if it's not (e.g. when there's
> something requiring the exact order of writes) then it's not possible to
> use batching. That's one of the reasons why I think we should have a GUC
> to disable the batching.

Agreed.


> > * Should the maximum count of accumulated records be applied per
> > relation or the query? When many foreign partitions belong to a
> > partitioned table, if the former is chosen, it may use much memory in
> > total.  If the latter is chosen, the records per relation could be
> > few and thus the benefit of bulk insert could be small.
> >
>
> I think it needs to be applied per relation, because that's the level at
> which we can do it easily and consistently. The whole point is to send
> data in sufficiently large chunks to minimize the communication overhead
> (latency etc.), but if you enforce it "per query" that seems hard.
>
> Imagine you're inserting data into a table with many partitions - how do
> you pick the number of rows to accumulate? The table may have 10 or 1000
> partitions, we may be inserting into all partitions or just a small
> subset, not all partitions may be foreign, etc. It seems pretty
> difficult to pick and enforce a reliable limit at the query level. But
> maybe I'm missing something and it's easier than I think?
>
> Of course, you're entirely correct enforcing this at the partition level
> may require a lot of memory. Sadly, I don't see a way around that,
> except for (a) disabling batching or (b) ordering the data to insert
> data into one partition at a time.

OK, I think I'll try doing like that, after waiting for other opinions some days.


> Two more comments regarding this:
>
> 1) If we want to be more strict about the memory consumption, we should
> probably set the limit in terms of memory, not number of rows. Currently
> the 100 rows may be 10kB or 10MB, there's no way to know. Of course,
> this is not the only place with this issue.
>
> 2) I wonder what the COPY FROM patch [1] does in this regard. I don't
> have time to check right now, but I suggest we try to do the same thing,
> if only to be consistent.
>
> [1]
> https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-909
> 86e55489f%40postgrespro.ru

That COPY FROM patch uses the tuple accumulation mechanism for local tables as-is.  That is, it accumulates at most 1,000 tuples per partition.

/*
 * No more than this many tuples per CopyMultiInsertBuffer
 *
 * Caution: Don't make this too big, as we could end up with this many
 * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
 * multiInsertBuffers list.  Increasing this can cause quadratic growth in
 * memory requirements during copies into partitioned tables with a large
 * number of partitions.
 */
#define MAX_BUFFERED_TUPLES     1000


Regards
Takayuki Tsunakawa

1234 ... 6