POC: postgres_fdw insert batching

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

POC: postgres_fdw insert batching

Tomas Vondra-4
Hi,

One of the issues I'm fairly regularly reminded by users/customers is
that inserting into tables sharded using FDWs are rather slow. We do
even get it reported on pgsql-bugs from time to time [1].

Some of the slowness / overhead is expected, doe to the latency between
machines in the sharded setup. Even just 1ms latency will make it way
more expensive than a single instance.

But let's do a simple experiment, comparing a hash-partitioned regular
partitions, and one with FDW partitions in the same instance. Scripts to
run this are attached. The duration of inserting 1M rows to this table
(average of 10 runs on my laptop) looks like this:

   regular: 2872 ms
   FDW:     64454 ms

Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
which is probably where most single-DC clusters are :-(

Now, the primary reason why the performance degrades like this is that
while FDW has batching for SELECT queries (i.e. we read larger chunks of
data from the cursors), we don't have that for INSERTs (or other DML).
Every time you insert a row, it has to go all the way down into the
partition synchronously.

For some use cases this may be reduced by having many independent
connnections from different users, so the per-user latency is higher but
acceptable. But if you need to import larger amounts of data (say, a CSV
file for analytics, ...) this may not work.

Some time ago I wrote an ugly PoC adding batching, just to see how far
would it get us, and it seems quite promising - results for he same
INSERT benchmarks look like this:

    FDW batching: 4584 ms

So, rather nice improvement, I'd say ...

Before I spend more time hacking on this, I have a couple open questions
about the design, restrictions etc.


1) Extend the FDW API?

In the patch, the batching is simply "injected" into the existing insert
API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
extend the API with a "batched" version of the method, so that we can
easily determine whether the FDW supports batching or not - it would
require changes in the callers, though. OTOH it might be useful for
COPY, where we could do something similar to multi_insert (COPY already
benefits from this patch, but it does not use the batching built-into
COPY).


2) What about the insert results?

I'm not sure what to do about "result" status for the inserted rows. We
only really "stash" the rows into a buffer, so we don't know if it will
succeed or not. The patch simply assumes it will succeed, but that's
clearly wrong, and it may result in reporting a wrong number or rows.

The patch also disables the batching when the insert has a RETURNING
clause, because there's just a single slot (for the currently inserted
row). I suppose a "batching" method would take an array of slots.


3) What about the other DML operations (DELETE/UPDATE)?

The other DML operations could probably benefit from the batching too.
INSERT was good enough for a PoC, but having batching only for INSERT
seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
of quals, but likely doable.


3) Should we do batching for COPY insteads?

While looking at multi_insert, I've realized it's mostly exactly what
the new "batching insert" API function would need to be. But it's only
really used in COPY, so I wonder if we should just abandon the idea of
batching INSERTs and do batching COPY for FDW tables.

For cases that can replace INSERT with COPY this would be enough, but
unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
this :-(


4) Expected consistency?

I'm not entirely sure what are the consistency expectations for FDWs.
Currently the FDW nodes pointing to the same server share a connection,
so the inserted rows might be visible to other nodes. But if we only
stash the rows in a local buffer for a while, that's no longer true. So
maybe this breaks the consistency expectations?

But maybe that's OK - I'm not sure how the prepared statements/cursors
affect this. I can imagine restricting the batching only to plans where
this is not an issue (single FDW node or something), but it seems rather
fragile and undesirable.

I was thinking about adding a GUC to enable/disable the batching at some
level (global, server, table, ...) but it seems like a bad match because
the consistency expectations likely depend on a query. There should be a
GUC to set the batch size, though (it's hardcoded to 100 for now).


regards



[1] https://www.postgresql.org/message-id/CACnz%2BQ1q0%2B2KoJam9LyNMk8JmdC6qYHXWB895Wu2xcpoip18xQ%40mail.gmail.com

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

fdw.sql (3K) Download Attachment
local.sql (1K) Download Attachment
0001-fdw-insert-batching-v1.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
Hi Tomas,

On Mon, Jun 29, 2020 at 12:10 AM Tomas Vondra
<[hidden email]> wrote:

>
> Hi,
>
> One of the issues I'm fairly regularly reminded by users/customers is
> that inserting into tables sharded using FDWs are rather slow. We do
> even get it reported on pgsql-bugs from time to time [1].
>
> Some of the slowness / overhead is expected, doe to the latency between
> machines in the sharded setup. Even just 1ms latency will make it way
> more expensive than a single instance.
>
> But let's do a simple experiment, comparing a hash-partitioned regular
> partitions, and one with FDW partitions in the same instance. Scripts to
> run this are attached. The duration of inserting 1M rows to this table
> (average of 10 runs on my laptop) looks like this:
>
>    regular: 2872 ms
>    FDW:     64454 ms
>
> Yep, it's ~20x slower. On setup with ping latency well below 0.05ms.
> Imagine how would it look on sharded setups with 0.1ms or 1ms latency,
> which is probably where most single-DC clusters are :-(
>
> Now, the primary reason why the performance degrades like this is that
> while FDW has batching for SELECT queries (i.e. we read larger chunks of
> data from the cursors), we don't have that for INSERTs (or other DML).
> Every time you insert a row, it has to go all the way down into the
> partition synchronously.
>
> For some use cases this may be reduced by having many independent
> connnections from different users, so the per-user latency is higher but
> acceptable. But if you need to import larger amounts of data (say, a CSV
> file for analytics, ...) this may not work.
>
> Some time ago I wrote an ugly PoC adding batching, just to see how far
> would it get us, and it seems quite promising - results for he same
> INSERT benchmarks look like this:
>
>     FDW batching: 4584 ms
>
> So, rather nice improvement, I'd say ...

Very nice indeed.

> Before I spend more time hacking on this, I have a couple open questions
> about the design, restrictions etc.

I think you may want to take a look this recent proposal by Andrey Lepikhov:

* [POC] Fast COPY FROM command for the table with foreign partitions *
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Ashutosh Bapat-2
In reply to this post by Tomas Vondra-4
On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<[hidden email]> wrote:

>
>     FDW batching: 4584 ms
>
> So, rather nice improvement, I'd say ...

Very nice.

>
> Before I spend more time hacking on this, I have a couple open questions
> about the design, restrictions etc.
>
>
> 1) Extend the FDW API?
>
> In the patch, the batching is simply "injected" into the existing insert
> API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
> extend the API with a "batched" version of the method, so that we can
> easily determine whether the FDW supports batching or not - it would
> require changes in the callers, though. OTOH it might be useful for
> COPY, where we could do something similar to multi_insert (COPY already
> benefits from this patch, but it does not use the batching built-into
> COPY).

Amit Langote has pointed out a related patch being discussed on hackers at [1].

That patch introduces a new API. But if we can do it without
introducing a new API that will be good. FDWs which can support
batching can just modify their code and don't have to implement and
manage a new API. We already have a handful of those APIs.

>
> 2) What about the insert results?
>
> I'm not sure what to do about "result" status for the inserted rows. We
> only really "stash" the rows into a buffer, so we don't know if it will
> succeed or not. The patch simply assumes it will succeed, but that's
> clearly wrong, and it may result in reporting a wrong number or rows.

I didn't get this. We are executing an INSERT on the foreign server,
so we get the number of rows INSERTed from that server. We should just
add those up across batches. If there's a failure, it would abort the
transaction, local as well as remote.

>
> The patch also disables the batching when the insert has a RETURNING
> clause, because there's just a single slot (for the currently inserted
> row). I suppose a "batching" method would take an array of slots.
>

It will be a rare case when a bulk load also has a RETURNING clause.
So, we can leave with this restriction. We should try to choose a
design which allows that restriction to be lifted in the future. But I
doubt that restriction will be a serious one.

>
> 3) What about the other DML operations (DELETE/UPDATE)?
>
> The other DML operations could probably benefit from the batching too.
> INSERT was good enough for a PoC, but having batching only for INSERT
> seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> of quals, but likely doable.

Bulk INSERTs are more common in a sharded environment because of data
load in say OLAP systems. Bulk update/delete are rare, although not
that rare. So if an approach just supports bulk insert and not bulk
UPDATE/DELETE that will address a large number of usecases IMO. But if
we can make everything work together that would be good as well.

In your patch, I see that an INSERT statement with batch is
constructed as INSERT INTO ... VALUES (...), (...) as many values as
the batch size. That won't work as is for UPDATE/DELETE since we can't
pass multiple pairs of ctids and columns to be updated for each ctid
in one statement. Maybe we could build as many UPDATE/DELETE
statements as the size of a batch, but that would be ugly. What we
need is a feature like a batch prepared statement in libpq similar to
what JDBC supports
((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
This will allow a single prepared statement to be executed with a
batch of parameters, each batch corresponding to one foreign DML
statement.

>
>
> 3) Should we do batching for COPY insteads?
>
> While looking at multi_insert, I've realized it's mostly exactly what
> the new "batching insert" API function would need to be. But it's only
> really used in COPY, so I wonder if we should just abandon the idea of
> batching INSERTs and do batching COPY for FDW tables.

I think this won't support RETURNING as well. But if we could somehow
use copy protocol to send the data to the foreign server and yet treat
it as INSERT, that might work. I think we have find out which performs
better COPY or batch INSERT.

>
> For cases that can replace INSERT with COPY this would be enough, but
> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
> this :-(

Agreed, if we want to support bulk UPDATE/DELETE as well.

>
>
> 4) Expected consistency?
>
> I'm not entirely sure what are the consistency expectations for FDWs.
> Currently the FDW nodes pointing to the same server share a connection,
> so the inserted rows might be visible to other nodes. But if we only
> stash the rows in a local buffer for a while, that's no longer true. So
> maybe this breaks the consistency expectations?
>
> But maybe that's OK - I'm not sure how the prepared statements/cursors
> affect this. I can imagine restricting the batching only to plans where
> this is not an issue (single FDW node or something), but it seems rather
> fragile and undesirable.

I think that area is grey. Depending upon where the cursor is
positioned when a DML node executes a query, the data fetched from
cursor may or may not see the effect of DML. The cursor position is
based on the batch size so we already have problems in this area I
think. Assuming that the DML and SELECT are independent this will
work. So, the consistency problems exists, it will just be modulated
by batching DML. I doubt that's related to this feature exclusively
and should be solved independent of this feature.

>
> I was thinking about adding a GUC to enable/disable the batching at some
> level (global, server, table, ...) but it seems like a bad match because
> the consistency expectations likely depend on a query. There should be a
> GUC to set the batch size, though (it's hardcoded to 100 for now).
>

Similar to fetch_size, it should foreign server, table level setting, IMO.

[1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru


--
Best Wishes,
Ashutosh Bapat


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Etsuro Fujita-2
On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<[hidden email]> wrote:
> On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> <[hidden email]> wrote:

> > 3) What about the other DML operations (DELETE/UPDATE)?
> >
> > The other DML operations could probably benefit from the batching too.
> > INSERT was good enough for a PoC, but having batching only for INSERT
> > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> > of quals, but likely doable.
>
> Bulk INSERTs are more common in a sharded environment because of data
> load in say OLAP systems. Bulk update/delete are rare, although not
> that rare. So if an approach just supports bulk insert and not bulk
> UPDATE/DELETE that will address a large number of usecases IMO. But if
> we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify.  So, I'm not sure we
really need the bulk UPDATE/DELETE.

> > 3) Should we do batching for COPY insteads?
> >
> > While looking at multi_insert, I've realized it's mostly exactly what
> > the new "batching insert" API function would need to be. But it's only
> > really used in COPY, so I wonder if we should just abandon the idea of
> > batching INSERTs and do batching COPY for FDW tables.

> I think we have find out which performs
> better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side.  (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

> [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Ashutosh Bapat-3


On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote:
On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<[hidden email]> wrote:
> On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> <[hidden email]> wrote:

> > 3) What about the other DML operations (DELETE/UPDATE)?
> >
> > The other DML operations could probably benefit from the batching too.
> > INSERT was good enough for a PoC, but having batching only for INSERT
> > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> > of quals, but likely doable.
>
> Bulk INSERTs are more common in a sharded environment because of data
> load in say OLAP systems. Bulk update/delete are rare, although not
> that rare. So if an approach just supports bulk insert and not bulk
> UPDATE/DELETE that will address a large number of usecases IMO. But if
> we can make everything work together that would be good as well.

In most cases, I think the entire UPDATE/DELETE operations would be
pushed down to the remote side by DirectModify.  So, I'm not sure we
really need the bulk UPDATE/DELETE.

That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.
 

> > 3) Should we do batching for COPY insteads?
> >
> > While looking at multi_insert, I've realized it's mostly exactly what
> > the new "batching insert" API function would need to be. But it's only
> > really used in COPY, so I wonder if we should just abandon the idea of
> > batching INSERTs and do batching COPY for FDW tables.

> I think we have find out which performs
> better COPY or batch INSERT.

Maybe I'm missing something, but I think the COPY patch [1] seems more
promising to me, because 1) it would not get the remote side's planner
and executor involved, and 2) the data would be loaded more
efficiently by multi-insert on the demote side.  (Yeah, COPY doesn't
support RETURNING, but it's rare that RETURNING is needed in a bulk
load, as you mentioned.)

> [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru

Best regards,
Etsuro Fujita

--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Amit Langote
On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
<[hidden email]> wrote:

> On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote:
>> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
>> <[hidden email]> wrote:
>> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
>> > <[hidden email]> wrote:
>>
>> > > 3) What about the other DML operations (DELETE/UPDATE)?
>> > >
>> > > The other DML operations could probably benefit from the batching too.
>> > > INSERT was good enough for a PoC, but having batching only for INSERT
>> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
>> > > of quals, but likely doable.
>> >
>> > Bulk INSERTs are more common in a sharded environment because of data
>> > load in say OLAP systems. Bulk update/delete are rare, although not
>> > that rare. So if an approach just supports bulk insert and not bulk
>> > UPDATE/DELETE that will address a large number of usecases IMO. But if
>> > we can make everything work together that would be good as well.
>>
>> In most cases, I think the entire UPDATE/DELETE operations would be
>> pushed down to the remote side by DirectModify.  So, I'm not sure we
>> really need the bulk UPDATE/DELETE.
> That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.

While it's true that how to accommodate the DirectModify API in the
new inherited update/delete planning approach is an open question on
that thread, I would eventually like to find an answer to that.  That
is, that work shouldn't result in losing the foreign partition's
ability to use DirectModify API to optimize updates/deletes.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Etsuro Fujita-2
On Tue, Jun 30, 2020 at 2:54 PM Amit Langote <[hidden email]> wrote:

> On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
> <[hidden email]> wrote:
> > On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote:
> >> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
> >> <[hidden email]> wrote:
> >> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
> >> > <[hidden email]> wrote:
> >>
> >> > > 3) What about the other DML operations (DELETE/UPDATE)?
> >> > >
> >> > > The other DML operations could probably benefit from the batching too.
> >> > > INSERT was good enough for a PoC, but having batching only for INSERT
> >> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
> >> > > of quals, but likely doable.
> >> >
> >> > Bulk INSERTs are more common in a sharded environment because of data
> >> > load in say OLAP systems. Bulk update/delete are rare, although not
> >> > that rare. So if an approach just supports bulk insert and not bulk
> >> > UPDATE/DELETE that will address a large number of usecases IMO. But if
> >> > we can make everything work together that would be good as well.
> >>
> >> In most cases, I think the entire UPDATE/DELETE operations would be
> >> pushed down to the remote side by DirectModify.  So, I'm not sure we
> >> really need the bulk UPDATE/DELETE.
> > That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily.
>
> While it's true that how to accommodate the DirectModify API in the
> new inherited update/delete planning approach is an open question on
> that thread, I would eventually like to find an answer to that.  That
> is, that work shouldn't result in losing the foreign partition's
> ability to use DirectModify API to optimize updates/deletes.

That would be great!

Best regards,
Etsuro Fujita


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-4
In reply to this post by Ashutosh Bapat-2
On Mon, Jun 29, 2020 at 04:22:15PM +0530, Ashutosh Bapat wrote:

>On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
><[hidden email]> wrote:
>
>>
>>     FDW batching: 4584 ms
>>
>> So, rather nice improvement, I'd say ...
>
>Very nice.
>
>>
>> Before I spend more time hacking on this, I have a couple open questions
>> about the design, restrictions etc.
>>
>>
>> 1) Extend the FDW API?
>>
>> In the patch, the batching is simply "injected" into the existing insert
>> API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to
>> extend the API with a "batched" version of the method, so that we can
>> easily determine whether the FDW supports batching or not - it would
>> require changes in the callers, though. OTOH it might be useful for
>> COPY, where we could do something similar to multi_insert (COPY already
>> benefits from this patch, but it does not use the batching built-into
>> COPY).
>
>Amit Langote has pointed out a related patch being discussed on hackers at [1].
>
>That patch introduces a new API. But if we can do it without
>introducing a new API that will be good. FDWs which can support
>batching can just modify their code and don't have to implement and
>manage a new API. We already have a handful of those APIs.
>

I don't think extending the API is a big issue - the FDW code will need
changing anyway, so this seems minor.

I'll take a look at the COPY patch - I agree it seems like a good idea,
although it can be less convenient in various caes (e.g. I've seen a lot
of INSERT ... SELECT queries in sharded systems, etc.).

>>
>> 2) What about the insert results?
>>
>> I'm not sure what to do about "result" status for the inserted rows. We
>> only really "stash" the rows into a buffer, so we don't know if it will
>> succeed or not. The patch simply assumes it will succeed, but that's
>> clearly wrong, and it may result in reporting a wrong number or rows.
>
>I didn't get this. We are executing an INSERT on the foreign server,
>so we get the number of rows INSERTed from that server. We should just
>add those up across batches. If there's a failure, it would abort the
>transaction, local as well as remote.
>

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

>>
>> The patch also disables the batching when the insert has a RETURNING
>> clause, because there's just a single slot (for the currently inserted
>> row). I suppose a "batching" method would take an array of slots.
>>
>
>It will be a rare case when a bulk load also has a RETURNING clause.
>So, we can leave with this restriction. We should try to choose a
>design which allows that restriction to be lifted in the future. But I
>doubt that restriction will be a serious one.
>
>>
>> 3) What about the other DML operations (DELETE/UPDATE)?
>>
>> The other DML operations could probably benefit from the batching too.
>> INSERT was good enough for a PoC, but having batching only for INSERT
>> seems somewhat asmymetric. DELETE/UPDATE seem more complicated because
>> of quals, but likely doable.
>
>Bulk INSERTs are more common in a sharded environment because of data
>load in say OLAP systems. Bulk update/delete are rare, although not
>that rare. So if an approach just supports bulk insert and not bulk
>UPDATE/DELETE that will address a large number of usecases IMO. But if
>we can make everything work together that would be good as well.
>
>In your patch, I see that an INSERT statement with batch is
>constructed as INSERT INTO ... VALUES (...), (...) as many values as
>the batch size. That won't work as is for UPDATE/DELETE since we can't
>pass multiple pairs of ctids and columns to be updated for each ctid
>in one statement. Maybe we could build as many UPDATE/DELETE
>statements as the size of a batch, but that would be ugly. What we
>need is a feature like a batch prepared statement in libpq similar to
>what JDBC supports
>((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
>This will allow a single prepared statement to be executed with a
>batch of parameters, each batch corresponding to one foreign DML
>statement.
>

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

>>
>>
>> 3) Should we do batching for COPY insteads?
>>
>> While looking at multi_insert, I've realized it's mostly exactly what
>> the new "batching insert" API function would need to be. But it's only
>> really used in COPY, so I wonder if we should just abandon the idea of
>> batching INSERTs and do batching COPY for FDW tables.
>
>I think this won't support RETURNING as well. But if we could somehow
>use copy protocol to send the data to the foreign server and yet treat
>it as INSERT, that might work. I think we have find out which performs
>better COPY or batch INSERT.
>

I don't see why not support both, the use cases are somewhat different I
think.

>>
>> For cases that can replace INSERT with COPY this would be enough, but
>> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
>> this :-(
>
>Agreed, if we want to support bulk UPDATE/DELETE as well.
>
>>
>>
>> 4) Expected consistency?
>>
>> I'm not entirely sure what are the consistency expectations for FDWs.
>> Currently the FDW nodes pointing to the same server share a connection,
>> so the inserted rows might be visible to other nodes. But if we only
>> stash the rows in a local buffer for a while, that's no longer true. So
>> maybe this breaks the consistency expectations?
>>
>> But maybe that's OK - I'm not sure how the prepared statements/cursors
>> affect this. I can imagine restricting the batching only to plans where
>> this is not an issue (single FDW node or something), but it seems rather
>> fragile and undesirable.
>
>I think that area is grey. Depending upon where the cursor is
>positioned when a DML node executes a query, the data fetched from
>cursor may or may not see the effect of DML. The cursor position is
>based on the batch size so we already have problems in this area I
>think. Assuming that the DML and SELECT are independent this will
>work. So, the consistency problems exists, it will just be modulated
>by batching DML. I doubt that's related to this feature exclusively
>and should be solved independent of this feature.
>

OK, thanks for the feedback.

>>
>> I was thinking about adding a GUC to enable/disable the batching at some
>> level (global, server, table, ...) but it seems like a bad match because
>> the consistency expectations likely depend on a query. There should be a
>> GUC to set the batch size, though (it's hardcoded to 100 for now).
>>
>
>Similar to fetch_size, it should foreign server, table level setting, IMO.
>
>[1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru
>

Yeah, I agree we should have a GUC to define the batch size. What I had
in mind was something that would allow us to enable/disable batching to
increase the consistency guarantees, or something like that. I think
simple GUCs are a poor solution for that.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Ashutosh Bapat-3


On Tue, 30 Jun 2020 at 22:23, Tomas Vondra <[hidden email]> wrote:
>I didn't get this. We are executing an INSERT on the foreign server,
>so we get the number of rows INSERTed from that server. We should just
>add those up across batches. If there's a failure, it would abort the
>transaction, local as well as remote.
>

True, but it's not the FDW code doing the counting - it's the caller,
depending on whether the ExecForeignInsert returns a valid slot or NULL.
So it's not quite possible to just return a number of inserted tuples,
as returned by the remote server.

Hmm yes, now I remember that bit. So for every row buffered, we return a valid slot without knowing whether that row was inserted on the remote server or not. I think we have that problem even now where a single INSERT might result in multiple INSERTs on the remote server (rare but not completely impossible).
 

>In your patch, I see that an INSERT statement with batch is
>constructed as INSERT INTO ... VALUES (...), (...) as many values as
>the batch size. That won't work as is for UPDATE/DELETE since we can't
>pass multiple pairs of ctids and columns to be updated for each ctid
>in one statement. Maybe we could build as many UPDATE/DELETE
>statements as the size of a batch, but that would be ugly. What we
>need is a feature like a batch prepared statement in libpq similar to
>what JDBC supports
>((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/).
>This will allow a single prepared statement to be executed with a
>batch of parameters, each batch corresponding to one foreign DML
>statement.
>

I'm pretty sure we could make it work with some array/unnest tricks to
build a relation, and use that as a source of data.

That sounds great. The solution will be limited to postgres_fdw only.
 
I don't see why not support both, the use cases are somewhat different I
think.

+1, if we can do both.

--
Best Wishes,
Ashutosh
Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Andres Freund
In reply to this post by Tomas Vondra-4
Hi,

On 2020-06-28 17:10:02 +0200, Tomas Vondra wrote:

> 3) Should we do batching for COPY insteads?
>
> While looking at multi_insert, I've realized it's mostly exactly what
> the new "batching insert" API function would need to be. But it's only
> really used in COPY, so I wonder if we should just abandon the idea of
> batching INSERTs and do batching COPY for FDW tables.
>
> For cases that can replace INSERT with COPY this would be enough, but
> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do
> this :-(

I personally think - and I realize that that might be annoying to
somebody looking to make an incremental improvement - that the
nodeModifyTable.c and copy.c code dealing with DML has become too
complicated to add features like this without a larger
refactoring. Leading to choices like this, whether to add a feature in
one place but not the other.

I think before we add more complexity, we ought to centralize and clean
up the DML handling code so most is shared between copy.c and
nodeModifyTable.c. Then we can much more easily add batching to FDWs, to
CTAS, to INSERT INTO SELECT etc, for which there are patches already.


> 4) Expected consistency?
>
> I'm not entirely sure what are the consistency expectations for FDWs.
> Currently the FDW nodes pointing to the same server share a connection,
> so the inserted rows might be visible to other nodes. But if we only
> stash the rows in a local buffer for a while, that's no longer true. So
> maybe this breaks the consistency expectations?

Given that for local queries that's not the case (since the snapshot
won't have those changes visible), I think we shouldn't be too concerned
about that. If anything we should be concerned about the opposite.

If we are concerned, perhaps we could add functionality to flush all
pending changes before executing further statements?



> I was thinking about adding a GUC to enable/disable the batching at some
> level (global, server, table, ...) but it seems like a bad match because
> the consistency expectations likely depend on a query. There should be a
> GUC to set the batch size, though (it's hardcoded to 100 for now).

Hm. If libpq allowed to utilize pipelining ISTM the answer here would be
to not batch by building a single statement with all rows as a VALUES,
but issue the single INSERTs in a pipelined manner. That'd probably
remove all behavioural differences.   I really wish somebody would pick
up that libpq patch again.

Greetings,

Andres Freund


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Andrey Lepikhov
In reply to this post by Tomas Vondra-4
On 6/28/20 8:10 PM, Tomas Vondra wrote:
> Now, the primary reason why the performance degrades like this is that
> while FDW has batching for SELECT queries (i.e. we read larger chunks of
> data from the cursors), we don't have that for INSERTs (or other DML).
> Every time you insert a row, it has to go all the way down into the
> partition synchronously.

You added new fields into the PgFdwModifyState struct. Why you didn't
reused ResultRelInfo::ri_CopyMultiInsertBuffer field and
CopyMultiInsertBuffer machinery as storage for incoming tuples?

--
regards,
Andrey Lepikhov
Postgres Professional


Reply | Threaded
Open this post in threaded view
|

Re: POC: postgres_fdw insert batching

Tomas Vondra-4
On Fri, Jul 10, 2020 at 09:28:44AM +0500, Andrey V. Lepikhov wrote:

>On 6/28/20 8:10 PM, Tomas Vondra wrote:
>>Now, the primary reason why the performance degrades like this is that
>>while FDW has batching for SELECT queries (i.e. we read larger chunks of
>>data from the cursors), we don't have that for INSERTs (or other DML).
>>Every time you insert a row, it has to go all the way down into the
>>partition synchronously.
>
>You added new fields into the PgFdwModifyState struct. Why you didn't
>reused ResultRelInfo::ri_CopyMultiInsertBuffer field and
>CopyMultiInsertBuffer machinery as storage for incoming tuples?
>

Because I was focused on speeding-up inserts, and that is not using
CopyMultiInsertBuffer I think. I agree the way the tuples are stored
may be improved, of course.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services