Hi,
One of the issues I'm fairly regularly reminded by users/customers is that inserting into tables sharded using FDWs are rather slow. We do even get it reported on pgsql-bugs from time to time [1]. Some of the slowness / overhead is expected, doe to the latency between machines in the sharded setup. Even just 1ms latency will make it way more expensive than a single instance. But let's do a simple experiment, comparing a hash-partitioned regular partitions, and one with FDW partitions in the same instance. Scripts to run this are attached. The duration of inserting 1M rows to this table (average of 10 runs on my laptop) looks like this: regular: 2872 ms FDW: 64454 ms Yep, it's ~20x slower. On setup with ping latency well below 0.05ms. Imagine how would it look on sharded setups with 0.1ms or 1ms latency, which is probably where most single-DC clusters are :-( Now, the primary reason why the performance degrades like this is that while FDW has batching for SELECT queries (i.e. we read larger chunks of data from the cursors), we don't have that for INSERTs (or other DML). Every time you insert a row, it has to go all the way down into the partition synchronously. For some use cases this may be reduced by having many independent connnections from different users, so the per-user latency is higher but acceptable. But if you need to import larger amounts of data (say, a CSV file for analytics, ...) this may not work. Some time ago I wrote an ugly PoC adding batching, just to see how far would it get us, and it seems quite promising - results for he same INSERT benchmarks look like this: FDW batching: 4584 ms So, rather nice improvement, I'd say ... Before I spend more time hacking on this, I have a couple open questions about the design, restrictions etc. 1) Extend the FDW API? In the patch, the batching is simply "injected" into the existing insert API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to extend the API with a "batched" version of the method, so that we can easily determine whether the FDW supports batching or not - it would require changes in the callers, though. OTOH it might be useful for COPY, where we could do something similar to multi_insert (COPY already benefits from this patch, but it does not use the batching built-into COPY). 2) What about the insert results? I'm not sure what to do about "result" status for the inserted rows. We only really "stash" the rows into a buffer, so we don't know if it will succeed or not. The patch simply assumes it will succeed, but that's clearly wrong, and it may result in reporting a wrong number or rows. The patch also disables the batching when the insert has a RETURNING clause, because there's just a single slot (for the currently inserted row). I suppose a "batching" method would take an array of slots. 3) What about the other DML operations (DELETE/UPDATE)? The other DML operations could probably benefit from the batching too. INSERT was good enough for a PoC, but having batching only for INSERT seems somewhat asmymetric. DELETE/UPDATE seem more complicated because of quals, but likely doable. 3) Should we do batching for COPY insteads? While looking at multi_insert, I've realized it's mostly exactly what the new "batching insert" API function would need to be. But it's only really used in COPY, so I wonder if we should just abandon the idea of batching INSERTs and do batching COPY for FDW tables. For cases that can replace INSERT with COPY this would be enough, but unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do this :-( 4) Expected consistency? I'm not entirely sure what are the consistency expectations for FDWs. Currently the FDW nodes pointing to the same server share a connection, so the inserted rows might be visible to other nodes. But if we only stash the rows in a local buffer for a while, that's no longer true. So maybe this breaks the consistency expectations? But maybe that's OK - I'm not sure how the prepared statements/cursors affect this. I can imagine restricting the batching only to plans where this is not an issue (single FDW node or something), but it seems rather fragile and undesirable. I was thinking about adding a GUC to enable/disable the batching at some level (global, server, table, ...) but it seems like a bad match because the consistency expectations likely depend on a query. There should be a GUC to set the batch size, though (it's hardcoded to 100 for now). regards [1] https://www.postgresql.org/message-id/CACnz%2BQ1q0%2B2KoJam9LyNMk8JmdC6qYHXWB895Wu2xcpoip18xQ%40mail.gmail.com -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services ![]() ![]() ![]() |
Hi Tomas,
On Mon, Jun 29, 2020 at 12:10 AM Tomas Vondra <[hidden email]> wrote: > > Hi, > > One of the issues I'm fairly regularly reminded by users/customers is > that inserting into tables sharded using FDWs are rather slow. We do > even get it reported on pgsql-bugs from time to time [1]. > > Some of the slowness / overhead is expected, doe to the latency between > machines in the sharded setup. Even just 1ms latency will make it way > more expensive than a single instance. > > But let's do a simple experiment, comparing a hash-partitioned regular > partitions, and one with FDW partitions in the same instance. Scripts to > run this are attached. The duration of inserting 1M rows to this table > (average of 10 runs on my laptop) looks like this: > > regular: 2872 ms > FDW: 64454 ms > > Yep, it's ~20x slower. On setup with ping latency well below 0.05ms. > Imagine how would it look on sharded setups with 0.1ms or 1ms latency, > which is probably where most single-DC clusters are :-( > > Now, the primary reason why the performance degrades like this is that > while FDW has batching for SELECT queries (i.e. we read larger chunks of > data from the cursors), we don't have that for INSERTs (or other DML). > Every time you insert a row, it has to go all the way down into the > partition synchronously. > > For some use cases this may be reduced by having many independent > connnections from different users, so the per-user latency is higher but > acceptable. But if you need to import larger amounts of data (say, a CSV > file for analytics, ...) this may not work. > > Some time ago I wrote an ugly PoC adding batching, just to see how far > would it get us, and it seems quite promising - results for he same > INSERT benchmarks look like this: > > FDW batching: 4584 ms > > So, rather nice improvement, I'd say ... Very nice indeed. > Before I spend more time hacking on this, I have a couple open questions > about the design, restrictions etc. I think you may want to take a look this recent proposal by Andrey Lepikhov: * [POC] Fast COPY FROM command for the table with foreign partitions * https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru -- Amit Langote EnterpriseDB: http://www.enterprisedb.com |
In reply to this post by Tomas Vondra-4
On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra
<[hidden email]> wrote: > > FDW batching: 4584 ms > > So, rather nice improvement, I'd say ... Very nice. > > Before I spend more time hacking on this, I have a couple open questions > about the design, restrictions etc. > > > 1) Extend the FDW API? > > In the patch, the batching is simply "injected" into the existing insert > API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to > extend the API with a "batched" version of the method, so that we can > easily determine whether the FDW supports batching or not - it would > require changes in the callers, though. OTOH it might be useful for > COPY, where we could do something similar to multi_insert (COPY already > benefits from this patch, but it does not use the batching built-into > COPY). Amit Langote has pointed out a related patch being discussed on hackers at [1]. That patch introduces a new API. But if we can do it without introducing a new API that will be good. FDWs which can support batching can just modify their code and don't have to implement and manage a new API. We already have a handful of those APIs. > > 2) What about the insert results? > > I'm not sure what to do about "result" status for the inserted rows. We > only really "stash" the rows into a buffer, so we don't know if it will > succeed or not. The patch simply assumes it will succeed, but that's > clearly wrong, and it may result in reporting a wrong number or rows. I didn't get this. We are executing an INSERT on the foreign server, so we get the number of rows INSERTed from that server. We should just add those up across batches. If there's a failure, it would abort the transaction, local as well as remote. > > The patch also disables the batching when the insert has a RETURNING > clause, because there's just a single slot (for the currently inserted > row). I suppose a "batching" method would take an array of slots. > It will be a rare case when a bulk load also has a RETURNING clause. So, we can leave with this restriction. We should try to choose a design which allows that restriction to be lifted in the future. But I doubt that restriction will be a serious one. > > 3) What about the other DML operations (DELETE/UPDATE)? > > The other DML operations could probably benefit from the batching too. > INSERT was good enough for a PoC, but having batching only for INSERT > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because > of quals, but likely doable. Bulk INSERTs are more common in a sharded environment because of data load in say OLAP systems. Bulk update/delete are rare, although not that rare. So if an approach just supports bulk insert and not bulk UPDATE/DELETE that will address a large number of usecases IMO. But if we can make everything work together that would be good as well. In your patch, I see that an INSERT statement with batch is constructed as INSERT INTO ... VALUES (...), (...) as many values as the batch size. That won't work as is for UPDATE/DELETE since we can't pass multiple pairs of ctids and columns to be updated for each ctid in one statement. Maybe we could build as many UPDATE/DELETE statements as the size of a batch, but that would be ugly. What we need is a feature like a batch prepared statement in libpq similar to what JDBC supports ((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/). This will allow a single prepared statement to be executed with a batch of parameters, each batch corresponding to one foreign DML statement. > > > 3) Should we do batching for COPY insteads? > > While looking at multi_insert, I've realized it's mostly exactly what > the new "batching insert" API function would need to be. But it's only > really used in COPY, so I wonder if we should just abandon the idea of > batching INSERTs and do batching COPY for FDW tables. I think this won't support RETURNING as well. But if we could somehow use copy protocol to send the data to the foreign server and yet treat it as INSERT, that might work. I think we have find out which performs better COPY or batch INSERT. > > For cases that can replace INSERT with COPY this would be enough, but > unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do > this :-( Agreed, if we want to support bulk UPDATE/DELETE as well. > > > 4) Expected consistency? > > I'm not entirely sure what are the consistency expectations for FDWs. > Currently the FDW nodes pointing to the same server share a connection, > so the inserted rows might be visible to other nodes. But if we only > stash the rows in a local buffer for a while, that's no longer true. So > maybe this breaks the consistency expectations? > > But maybe that's OK - I'm not sure how the prepared statements/cursors > affect this. I can imagine restricting the batching only to plans where > this is not an issue (single FDW node or something), but it seems rather > fragile and undesirable. I think that area is grey. Depending upon where the cursor is positioned when a DML node executes a query, the data fetched from cursor may or may not see the effect of DML. The cursor position is based on the batch size so we already have problems in this area I think. Assuming that the DML and SELECT are independent this will work. So, the consistency problems exists, it will just be modulated by batching DML. I doubt that's related to this feature exclusively and should be solved independent of this feature. > > I was thinking about adding a GUC to enable/disable the batching at some > level (global, server, table, ...) but it seems like a bad match because > the consistency expectations likely depend on a query. There should be a > GUC to set the batch size, though (it's hardcoded to 100 for now). > Similar to fetch_size, it should foreign server, table level setting, IMO. [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru -- Best Wishes, Ashutosh Bapat |
On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat
<[hidden email]> wrote: > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra > <[hidden email]> wrote: > > 3) What about the other DML operations (DELETE/UPDATE)? > > > > The other DML operations could probably benefit from the batching too. > > INSERT was good enough for a PoC, but having batching only for INSERT > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because > > of quals, but likely doable. > > Bulk INSERTs are more common in a sharded environment because of data > load in say OLAP systems. Bulk update/delete are rare, although not > that rare. So if an approach just supports bulk insert and not bulk > UPDATE/DELETE that will address a large number of usecases IMO. But if > we can make everything work together that would be good as well. In most cases, I think the entire UPDATE/DELETE operations would be pushed down to the remote side by DirectModify. So, I'm not sure we really need the bulk UPDATE/DELETE. > > 3) Should we do batching for COPY insteads? > > > > While looking at multi_insert, I've realized it's mostly exactly what > > the new "batching insert" API function would need to be. But it's only > > really used in COPY, so I wonder if we should just abandon the idea of > > batching INSERTs and do batching COPY for FDW tables. > I think we have find out which performs > better COPY or batch INSERT. Maybe I'm missing something, but I think the COPY patch [1] seems more promising to me, because 1) it would not get the remote side's planner and executor involved, and 2) the data would be loaded more efficiently by multi-insert on the demote side. (Yeah, COPY doesn't support RETURNING, but it's rare that RETURNING is needed in a bulk load, as you mentioned.) > [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru Best regards, Etsuro Fujita |
On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote: On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily. Â
-- Best Wishes, Ashutosh |
On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat
<[hidden email]> wrote: > On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote: >> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat >> <[hidden email]> wrote: >> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra >> > <[hidden email]> wrote: >> >> > > 3) What about the other DML operations (DELETE/UPDATE)? >> > > >> > > The other DML operations could probably benefit from the batching too. >> > > INSERT was good enough for a PoC, but having batching only for INSERT >> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because >> > > of quals, but likely doable. >> > >> > Bulk INSERTs are more common in a sharded environment because of data >> > load in say OLAP systems. Bulk update/delete are rare, although not >> > that rare. So if an approach just supports bulk insert and not bulk >> > UPDATE/DELETE that will address a large number of usecases IMO. But if >> > we can make everything work together that would be good as well. >> >> In most cases, I think the entire UPDATE/DELETE operations would be >> pushed down to the remote side by DirectModify. So, I'm not sure we >> really need the bulk UPDATE/DELETE. > That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily. While it's true that how to accommodate the DirectModify API in the new inherited update/delete planning approach is an open question on that thread, I would eventually like to find an answer to that. That is, that work shouldn't result in losing the foreign partition's ability to use DirectModify API to optimize updates/deletes. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com |
On Tue, Jun 30, 2020 at 2:54 PM Amit Langote <[hidden email]> wrote:
> On Tue, Jun 30, 2020 at 1:22 PM Ashutosh Bapat > <[hidden email]> wrote: > > On Tue, 30 Jun 2020 at 08:47, Etsuro Fujita <[hidden email]> wrote: > >> On Mon, Jun 29, 2020 at 7:52 PM Ashutosh Bapat > >> <[hidden email]> wrote: > >> > On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra > >> > <[hidden email]> wrote: > >> > >> > > 3) What about the other DML operations (DELETE/UPDATE)? > >> > > > >> > > The other DML operations could probably benefit from the batching too. > >> > > INSERT was good enough for a PoC, but having batching only for INSERT > >> > > seems somewhat asmymetric. DELETE/UPDATE seem more complicated because > >> > > of quals, but likely doable. > >> > > >> > Bulk INSERTs are more common in a sharded environment because of data > >> > load in say OLAP systems. Bulk update/delete are rare, although not > >> > that rare. So if an approach just supports bulk insert and not bulk > >> > UPDATE/DELETE that will address a large number of usecases IMO. But if > >> > we can make everything work together that would be good as well. > >> > >> In most cases, I think the entire UPDATE/DELETE operations would be > >> pushed down to the remote side by DirectModify. So, I'm not sure we > >> really need the bulk UPDATE/DELETE. > > That may not be true for a partitioned table whose partitions are foreign tables. Esp. given the work that Amit Langote is doing [1]. It really depends on the ability of postgres_fdw to detect that the DML modifying each of the partitions can be pushed down. That may not come easily. > > While it's true that how to accommodate the DirectModify API in the > new inherited update/delete planning approach is an open question on > that thread, I would eventually like to find an answer to that. That > is, that work shouldn't result in losing the foreign partition's > ability to use DirectModify API to optimize updates/deletes. That would be great! Best regards, Etsuro Fujita |
In reply to this post by Ashutosh Bapat-2
On Mon, Jun 29, 2020 at 04:22:15PM +0530, Ashutosh Bapat wrote:
>On Sun, Jun 28, 2020 at 8:40 PM Tomas Vondra ><[hidden email]> wrote: > >> >> FDW batching: 4584 ms >> >> So, rather nice improvement, I'd say ... > >Very nice. > >> >> Before I spend more time hacking on this, I have a couple open questions >> about the design, restrictions etc. >> >> >> 1) Extend the FDW API? >> >> In the patch, the batching is simply "injected" into the existing insert >> API method, i.e. ExecForeignInsert et al. I wonder if it'd be better to >> extend the API with a "batched" version of the method, so that we can >> easily determine whether the FDW supports batching or not - it would >> require changes in the callers, though. OTOH it might be useful for >> COPY, where we could do something similar to multi_insert (COPY already >> benefits from this patch, but it does not use the batching built-into >> COPY). > >Amit Langote has pointed out a related patch being discussed on hackers at [1]. > >That patch introduces a new API. But if we can do it without >introducing a new API that will be good. FDWs which can support >batching can just modify their code and don't have to implement and >manage a new API. We already have a handful of those APIs. > I don't think extending the API is a big issue - the FDW code will need changing anyway, so this seems minor. I'll take a look at the COPY patch - I agree it seems like a good idea, although it can be less convenient in various caes (e.g. I've seen a lot of INSERT ... SELECT queries in sharded systems, etc.). >> >> 2) What about the insert results? >> >> I'm not sure what to do about "result" status for the inserted rows. We >> only really "stash" the rows into a buffer, so we don't know if it will >> succeed or not. The patch simply assumes it will succeed, but that's >> clearly wrong, and it may result in reporting a wrong number or rows. > >I didn't get this. We are executing an INSERT on the foreign server, >so we get the number of rows INSERTed from that server. We should just >add those up across batches. If there's a failure, it would abort the >transaction, local as well as remote. > True, but it's not the FDW code doing the counting - it's the caller, depending on whether the ExecForeignInsert returns a valid slot or NULL. So it's not quite possible to just return a number of inserted tuples, as returned by the remote server. >> >> The patch also disables the batching when the insert has a RETURNING >> clause, because there's just a single slot (for the currently inserted >> row). I suppose a "batching" method would take an array of slots. >> > >It will be a rare case when a bulk load also has a RETURNING clause. >So, we can leave with this restriction. We should try to choose a >design which allows that restriction to be lifted in the future. But I >doubt that restriction will be a serious one. > >> >> 3) What about the other DML operations (DELETE/UPDATE)? >> >> The other DML operations could probably benefit from the batching too. >> INSERT was good enough for a PoC, but having batching only for INSERT >> seems somewhat asmymetric. DELETE/UPDATE seem more complicated because >> of quals, but likely doable. > >Bulk INSERTs are more common in a sharded environment because of data >load in say OLAP systems. Bulk update/delete are rare, although not >that rare. So if an approach just supports bulk insert and not bulk >UPDATE/DELETE that will address a large number of usecases IMO. But if >we can make everything work together that would be good as well. > >In your patch, I see that an INSERT statement with batch is >constructed as INSERT INTO ... VALUES (...), (...) as many values as >the batch size. That won't work as is for UPDATE/DELETE since we can't >pass multiple pairs of ctids and columns to be updated for each ctid >in one statement. Maybe we could build as many UPDATE/DELETE >statements as the size of a batch, but that would be ugly. What we >need is a feature like a batch prepared statement in libpq similar to >what JDBC supports >((https://mkyong.com/jdbc/jdbc-preparedstatement-example-batch-update/). >This will allow a single prepared statement to be executed with a >batch of parameters, each batch corresponding to one foreign DML >statement. > I'm pretty sure we could make it work with some array/unnest tricks to build a relation, and use that as a source of data. >> >> >> 3) Should we do batching for COPY insteads? >> >> While looking at multi_insert, I've realized it's mostly exactly what >> the new "batching insert" API function would need to be. But it's only >> really used in COPY, so I wonder if we should just abandon the idea of >> batching INSERTs and do batching COPY for FDW tables. > >I think this won't support RETURNING as well. But if we could somehow >use copy protocol to send the data to the foreign server and yet treat >it as INSERT, that might work. I think we have find out which performs >better COPY or batch INSERT. > I don't see why not support both, the use cases are somewhat different I think. >> >> For cases that can replace INSERT with COPY this would be enough, but >> unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do >> this :-( > >Agreed, if we want to support bulk UPDATE/DELETE as well. > >> >> >> 4) Expected consistency? >> >> I'm not entirely sure what are the consistency expectations for FDWs. >> Currently the FDW nodes pointing to the same server share a connection, >> so the inserted rows might be visible to other nodes. But if we only >> stash the rows in a local buffer for a while, that's no longer true. So >> maybe this breaks the consistency expectations? >> >> But maybe that's OK - I'm not sure how the prepared statements/cursors >> affect this. I can imagine restricting the batching only to plans where >> this is not an issue (single FDW node or something), but it seems rather >> fragile and undesirable. > >I think that area is grey. Depending upon where the cursor is >positioned when a DML node executes a query, the data fetched from >cursor may or may not see the effect of DML. The cursor position is >based on the batch size so we already have problems in this area I >think. Assuming that the DML and SELECT are independent this will >work. So, the consistency problems exists, it will just be modulated >by batching DML. I doubt that's related to this feature exclusively >and should be solved independent of this feature. > OK, thanks for the feedback. >> >> I was thinking about adding a GUC to enable/disable the batching at some >> level (global, server, table, ...) but it seems like a bad match because >> the consistency expectations likely depend on a query. There should be a >> GUC to set the batch size, though (it's hardcoded to 100 for now). >> > >Similar to fetch_size, it should foreign server, table level setting, IMO. > >[1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru > Yeah, I agree we should have a GUC to define the batch size. What I had in mind was something that would allow us to enable/disable batching to increase the consistency guarantees, or something like that. I think simple GUCs are a poor solution for that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
On Tue, 30 Jun 2020 at 22:23, Tomas Vondra <[hidden email]> wrote: >I didn't get this. We are executing an INSERT on the foreign server, Hmm yes, now I remember that bit. So for every row buffered, we return a valid slot without knowing whether that row was inserted on the remote server or not. I think we have that problem even now where a single INSERT might result in multiple INSERTs on the remote server (rare but not completely impossible). Â
That sounds great. The solution will be limited to postgres_fdw only. Â I don't see why not support both, the use cases are somewhat different I +1, if we can do both. Best Wishes, Ashutosh |
In reply to this post by Tomas Vondra-4
Hi,
On 2020-06-28 17:10:02 +0200, Tomas Vondra wrote: > 3) Should we do batching for COPY insteads? > > While looking at multi_insert, I've realized it's mostly exactly what > the new "batching insert" API function would need to be. But it's only > really used in COPY, so I wonder if we should just abandon the idea of > batching INSERTs and do batching COPY for FDW tables. > > For cases that can replace INSERT with COPY this would be enough, but > unfortunately it does nothing for DELETE/UPDATE so I'm hesitant to do > this :-( I personally think - and I realize that that might be annoying to somebody looking to make an incremental improvement - that the nodeModifyTable.c and copy.c code dealing with DML has become too complicated to add features like this without a larger refactoring. Leading to choices like this, whether to add a feature in one place but not the other. I think before we add more complexity, we ought to centralize and clean up the DML handling code so most is shared between copy.c and nodeModifyTable.c. Then we can much more easily add batching to FDWs, to CTAS, to INSERT INTO SELECT etc, for which there are patches already. > 4) Expected consistency? > > I'm not entirely sure what are the consistency expectations for FDWs. > Currently the FDW nodes pointing to the same server share a connection, > so the inserted rows might be visible to other nodes. But if we only > stash the rows in a local buffer for a while, that's no longer true. So > maybe this breaks the consistency expectations? Given that for local queries that's not the case (since the snapshot won't have those changes visible), I think we shouldn't be too concerned about that. If anything we should be concerned about the opposite. If we are concerned, perhaps we could add functionality to flush all pending changes before executing further statements? > I was thinking about adding a GUC to enable/disable the batching at some > level (global, server, table, ...) but it seems like a bad match because > the consistency expectations likely depend on a query. There should be a > GUC to set the batch size, though (it's hardcoded to 100 for now). Hm. If libpq allowed to utilize pipelining ISTM the answer here would be to not batch by building a single statement with all rows as a VALUES, but issue the single INSERTs in a pipelined manner. That'd probably remove all behavioural differences. I really wish somebody would pick up that libpq patch again. Greetings, Andres Freund |
In reply to this post by Tomas Vondra-4
On 6/28/20 8:10 PM, Tomas Vondra wrote:
> Now, the primary reason why the performance degrades like this is that > while FDW has batching for SELECT queries (i.e. we read larger chunks of > data from the cursors), we don't have that for INSERTs (or other DML). > Every time you insert a row, it has to go all the way down into the > partition synchronously. You added new fields into the PgFdwModifyState struct. Why you didn't reused ResultRelInfo::ri_CopyMultiInsertBuffer field and CopyMultiInsertBuffer machinery as storage for incoming tuples? -- regards, Andrey Lepikhov Postgres Professional |
On Fri, Jul 10, 2020 at 09:28:44AM +0500, Andrey V. Lepikhov wrote:
>On 6/28/20 8:10 PM, Tomas Vondra wrote: >>Now, the primary reason why the performance degrades like this is that >>while FDW has batching for SELECT queries (i.e. we read larger chunks of >>data from the cursors), we don't have that for INSERTs (or other DML). >>Every time you insert a row, it has to go all the way down into the >>partition synchronously. > >You added new fields into the PgFdwModifyState struct. Why you didn't >reused ResultRelInfo::ri_CopyMultiInsertBuffer field and >CopyMultiInsertBuffer machinery as storage for incoming tuples? > Because I was focused on speeding-up inserts, and that is not using CopyMultiInsertBuffer I think. I agree the way the tuples are stored may be improved, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
On Sun, Jul 12, 2020 at 02:11:01AM +0200, Tomas Vondra wrote:
> Because I was focused on speeding-up inserts, and that is not using > CopyMultiInsertBuffer I think. I agree the way the tuples are stored > may be improved, of course. The CF bot is telling that the regression tests of postgres_fdw are crashing. Could you look at that? -- Michael |
In reply to this post by Tomas Vondra-4
Hello Tomas san,
Thank you for picking up this. I'm interested in this topic, too. (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.) As others referred, Andrey-san's fast COPY to foreign partitions is also promising. But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT. Is there anything that makes you worry and stops development? Could I give it a try to implement this (I'm not sure I can, sorry. I'm worried if we can change the executor's call chain easily.) > 1) Extend the FDW API? Yes, I think, because FDWs for other DBMSs will benefit from this. (But it's questionable whether we want users to transfer data in Postgres database to other DBMSs...) MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2), ... Oracle doesn't have this syntax, but it can use CTE as follows: INSERT INTO table WITH t AS ( SELECT record1 FROM DUAL UNION ALL SELECT record2 FROM DUAL UNION ALL ... ) SELECT * FROM t; And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ... The API would simply be: TupleTableSlot ** ExecForeignMultiInsert(EState *estate, ResultRelInfo *rinfo, TupleTableSlot **slot, TupleTableSlot **planSlot, int numSlots); > 2) What about the insert results? I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely. Other FDWs may be able to do it, so the API can be like above. For the same reason, support for RETURNING clause will vary from DBMS to DBMS. > 3) What about the other DML operations (DELETE/UPDATE)? I don't think they are necessary for the time being. If we want them, they will be implemented using the libpq batch/pipelining as Andres-san said. > 3) Should we do batching for COPY insteads? I'm thinking of issuing INSERT with multiple records as your patch does, because: * When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY. * COPY doesn't invoke rules unlike INSERT. (I don't think the rule is a feature what users care about, though.) Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY. [1] Fast COPY FROM command for the table with foreign partitions https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru Regards Takayuki Tsunakawa |
On Thu, Oct 08, 2020 at 02:40:10AM +0000, [hidden email] wrote:
>Hello Tomas san, > > >Thank you for picking up this. I'm interested in this topic, too. (As an aside, we'd like to submit a bulk insert patch for ECPG in the near future.) > >As others referred, Andrey-san's fast COPY to foreign partitions is also promising. But I think your bulk INSERT is a separate feature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS SELECT. > >Is there anything that makes you worry and stops development? Could I give it a try to implement this (I'm not sure I can, sorry. I'm worried if we can change the executor's call chain easily.) > It's primarily a matter of having too much other stuff on my plate, thus not having time to work on this feature. I was not too worried about any particular issue, but I wanted some feedback before spending more time on extending the API. I'm not sure when I'll have time to work on this again, so if you are interested and willing to work on it, please go ahead. I'll gladly do reviews and help you with it. > >> 1) Extend the FDW API? > >Yes, I think, because FDWs for other DBMSs will benefit from this. (But it's questionable whether we want users to transfer data in Postgres database to other DBMSs...) > I think transferring data to other databases is fine - interoperability is a big advantage for users, I don't see it as something threatening the PostgreSQL project. I doubt this would make it more likely for users to migrate from PostgreSQL - there are many ways to do that already. >MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2), ... Oracle doesn't have this syntax, but it can use CTE as follows: > > INSERT INTO table > WITH t AS ( > SELECT record1 FROM DUAL UNION ALL > SELECT record2 FROM DUAL UNION ALL > ... > ) > SELECT * FROM t; > >And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ... > True. In some cases INSERT may be replaced by COPY, but it has various other features too. >The API would simply be: > >TupleTableSlot ** >ExecForeignMultiInsert(EState *estate, > ResultRelInfo *rinfo, > TupleTableSlot **slot, > TupleTableSlot **planSlot, > int numSlots); > > +1, seems quite reasonable >> 2) What about the insert results? > >I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely. Other FDWs may be able to do it, so the API can be like above. > Yeah. I think handling complete failure should not be very difficult, but there are cases that worry me more. For example, what if there's a before trigger (on the remote db) that "skips" inserting some of the rows by returning NULL? >For the same reason, support for RETURNING clause will vary from DBMS to DBMS. > Yeah. I wonder if the FDW needs to indicate which features are supported by the ExecForeignMultiInsert, e.g. by adding a function that decides whether batch insert is supported (it might also do that internally by calling ExecForeignInsert, of course). > >> 3) What about the other DML operations (DELETE/UPDATE)? > >I don't think they are necessary for the time being. If we want them, they will be implemented using the libpq batch/pipelining as Andres-san said. > I agree. > >> 3) Should we do batching for COPY insteads? > >I'm thinking of issuing INSERT with multiple records as your patch does, because: > >* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY. > >* COPY doesn't invoke rules unlike INSERT. (I don't think the rule is a feature what users care about, though.) Also, I'm a bit concerned that there might be, or will be, other differences between INSERT and COPY. > I agree. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services |
From: Tomas Vondra <[hidden email]>
> I'm not sure when I'll have time to work on this again, so if you are > interested and willing to work on it, please go ahead. I'll gladly do > reviews and help you with it. Thank you very much. > I think transferring data to other databases is fine - interoperability > is a big advantage for users, I don't see it as something threatening > the PostgreSQL project. I doubt this would make it more likely for users > to migrate from PostgreSQL - there are many ways to do that already. Definitely true. Users may want to use INSERT SELECT to do some data transformation in their OLTP database and load it into a non-Postgres data warehouse. > Yeah. I think handling complete failure should not be very difficult, > but there are cases that worry me more. For example, what if there's a > before trigger (on the remote db) that "skips" inserting some of the > rows by returning NULL? > Yeah. I wonder if the FDW needs to indicate which features are supported > by the ExecForeignMultiInsert, e.g. by adding a function that decides > whether batch insert is supported (it might also do that internally by > calling ExecForeignInsert, of course). Thanks for your advice. I'll try to address them. Regards Takayuki Tsunakawa |
Hello,
The attached patch implements the new bulk insert routine for postgres_fdw and the executor utilizing it. It passes make check-world. I measured performance in a basic non-partitioned case by modifying Tomas-san's scripts. They perform an INSERT SELECT statement that copies one million records. The table consists of two integer columns, with a primary key on one of those them. You can run the attached prepare.sql to set up once. local.sql inserts to the table directly, while fdw.sql inserts through a foreign table. The performance results, the average time of 5 runs, were as follows on a Linux host where the average round-trip time of "ping localhost" was 34 us: master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw: 11.1 seconds (11x improvement) The patch accumulates at most 100 records in ModifyTableState before inserting in bulk. Also, when an input record is targeted for a different relation (= partition) than that for already accumulated records, insert the accumulated records and store the new record for later insert. [Issues] 1. Do we want a GUC parameter, say, max_bulk_insert_records = (integer), to control the number of records inserted at once? The range of allowed values would be between 1 and 1,000. 1 disables bulk insert. The possible reason of the need for this kind of parameter would be to limit the amount of memory used for accumulated records, which could be prohibitively large if each record is big. I don't think this is a must, but I think we can have it. 2. Should we accumulate records per relation in ResultRelInfo instead? That is, when inserting into a partitioned table that has foreign partitions, delay insertion until a certain number of input records accumulate, and then insert accumulated records per relation (e.g., 50 records to relation A, 30 records to relation B, and 20 records to relation C.) If we do that, * The order of insertion differs from the order of input records. Is it OK? * Should the maximum count of accumulated records be applied per relation or the query? When many foreign partitions belong to a partitioned table, if the former is chosen, it may use much memory in total. If the latter is chosen, the records per relation could be few and thus the benefit of bulk insert could be small. Regards Takayuki Tsunakawa ![]() ![]() ![]() ![]() |
Hi,
Thanks for working on this! On 11/10/20 1:45 AM, [hidden email] wrote: > Hello, > > > The attached patch implements the new bulk insert routine for > postgres_fdw and the executor utilizing it. It passes make > check-world. > I haven't done any testing yet, just a quick review. I see the patch builds the "bulk" query in execute_foreign_modify. IMO that's something we should do earlier, when we're building the simple query (for 1-row inserts). I'd understand if you were concerned about overhead in case of 1-row inserts, trying to not plan the bulk query until necessary, but I'm not sure this actually helps. Or was the goal to build a query for every possible number of slots? I don't think that's really useful, considering it requires deallocating the old plan, preparing a new one, etc. IMO it should be sufficient to have two queries - one for 1-row inserts, one for the full batch. The last incomplete batch can be inserted using a loop of 1-row queries. That's what my patch was doing, but I'm not insisting on that - it just seems like a better approach to me. So feel free to argue why this is better. > I measured performance in a basic non-partitioned case by modifying > Tomas-san's scripts. They perform an INSERT SELECT statement that > copies one million records. The table consists of two integer > columns, with a primary key on one of those them. You can run the > attached prepare.sql to set up once. local.sql inserts to the table > directly, while fdw.sql inserts through a foreign table. > > The performance results, the average time of 5 runs, were as follows > on a Linux host where the average round-trip time of "ping localhost" > was 34 us: > > master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw: > 11.1 seconds (11x improvement) > Nice. I think we can't really get much closer to local master, so 6.1 vs. 11.1 seconds look quite acceptable. > > The patch accumulates at most 100 records in ModifyTableState before > inserting in bulk. Also, when an input record is targeted for a > different relation (= partition) than that for already accumulated > records, insert the accumulated records and store the new record for > later insert. > > [Issues] > > 1. Do we want a GUC parameter, say, max_bulk_insert_records = > (integer), to control the number of records inserted at once? The > range of allowed values would be between 1 and 1,000. 1 disables > bulk insert. The possible reason of the need for this kind of > parameter would be to limit the amount of memory used for accumulated > records, which could be prohibitively large if each record is big. I > don't think this is a must, but I think we can have it. > I think it'd be good to have such GUC, even if only for testing and development. We should probably have a way to disable the batching, which the GUC could also do, I think. So +1 to have the GUC. > 2. Should we accumulate records per relation in ResultRelInfo > instead? That is, when inserting into a partitioned table that has > foreign partitions, delay insertion until a certain number of input > records accumulate, and then insert accumulated records per relation > (e.g., 50 records to relation A, 30 records to relation B, and 20 > records to relation C.) If we do that, > I think there's a chunk of text missing here? If we do that, then what? Anyway, I don't see why accumulating the records in ResultRelInfo would be better than what the patch does now. It seems to me like fairly specific to FDWs, so keeping it int FDW state seems appropriate. What would be the advantage of stashing it in ResultRelInfo? > * The order of insertion differs from the order of input records. Is > it OK? > I think that's OK for most use cases, and if it's not (e.g. when there's something requiring the exact order of writes) then it's not possible to use batching. That's one of the reasons why I think we should have a GUC to disable the batching. > * Should the maximum count of accumulated records be applied per > relation or the query? When many foreign partitions belong to a > partitioned table, if the former is chosen, it may use much memory in > total. If the latter is chosen, the records per relation could be > few and thus the benefit of bulk insert could be small. > I think it needs to be applied per relation, because that's the level at which we can do it easily and consistently. The whole point is to send data in sufficiently large chunks to minimize the communication overhead (latency etc.), but if you enforce it "per query" that seems hard. Imagine you're inserting data into a table with many partitions - how do you pick the number of rows to accumulate? The table may have 10 or 1000 partitions, we may be inserting into all partitions or just a small subset, not all partitions may be foreign, etc. It seems pretty difficult to pick and enforce a reliable limit at the query level. But maybe I'm missing something and it's easier than I think? Of course, you're entirely correct enforcing this at the partition level may require a lot of memory. Sadly, I don't see a way around that, except for (a) disabling batching or (b) ordering the data to insert data into one partition at a time. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company |
On 11/10/20 4:05 PM, Tomas Vondra wrote: > Hi, > > Thanks for working on this! > > On 11/10/20 1:45 AM, [hidden email] wrote: >> Hello, >> >> >> The attached patch implements the new bulk insert routine for >> postgres_fdw and the executor utilizing it. It passes make >> check-world. >> > > I haven't done any testing yet, just a quick review. > > I see the patch builds the "bulk" query in execute_foreign_modify. IMO > that's something we should do earlier, when we're building the simple > query (for 1-row inserts). I'd understand if you were concerned about > overhead in case of 1-row inserts, trying to not plan the bulk query > until necessary, but I'm not sure this actually helps. > > Or was the goal to build a query for every possible number of slots? I > don't think that's really useful, considering it requires deallocating > the old plan, preparing a new one, etc. IMO it should be sufficient to > have two queries - one for 1-row inserts, one for the full batch. The > last incomplete batch can be inserted using a loop of 1-row queries. > > That's what my patch was doing, but I'm not insisting on that - it just > seems like a better approach to me. So feel free to argue why this is > better. > > >> I measured performance in a basic non-partitioned case by modifying >> Tomas-san's scripts. They perform an INSERT SELECT statement that >> copies one million records. The table consists of two integer >> columns, with a primary key on one of those them. You can run the >> attached prepare.sql to set up once. local.sql inserts to the table >> directly, while fdw.sql inserts through a foreign table. >> >> The performance results, the average time of 5 runs, were as follows >> on a Linux host where the average round-trip time of "ping localhost" >> was 34 us: >> >> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw: >> 11.1 seconds (11x improvement) >> > > Nice. I think we can't really get much closer to local master, so 6.1 > vs. 11.1 seconds look quite acceptable. > >> >> The patch accumulates at most 100 records in ModifyTableState before >> inserting in bulk. Also, when an input record is targeted for a >> different relation (= partition) than that for already accumulated >> records, insert the accumulated records and store the new record for >> later insert. >> >> [Issues] >> >> 1. Do we want a GUC parameter, say, max_bulk_insert_records = >> (integer), to control the number of records inserted at once? The >> range of allowed values would be between 1 and 1,000. 1 disables >> bulk insert. The possible reason of the need for this kind of >> parameter would be to limit the amount of memory used for accumulated >> records, which could be prohibitively large if each record is big. I >> don't think this is a must, but I think we can have it. >> > > I think it'd be good to have such GUC, even if only for testing and > development. We should probably have a way to disable the batching, > which the GUC could also do, I think. So +1 to have the GUC. > >> 2. Should we accumulate records per relation in ResultRelInfo >> instead? That is, when inserting into a partitioned table that has >> foreign partitions, delay insertion until a certain number of input >> records accumulate, and then insert accumulated records per relation >> (e.g., 50 records to relation A, 30 records to relation B, and 20 >> records to relation C.) If we do that, >> > > I think there's a chunk of text missing here? If we do that, then what? > > Anyway, I don't see why accumulating the records in ResultRelInfo would > be better than what the patch does now. It seems to me like fairly > specific to FDWs, so keeping it int FDW state seems appropriate. What > would be the advantage of stashing it in ResultRelInfo? > >> * The order of insertion differs from the order of input records. Is >> it OK? >> > > I think that's OK for most use cases, and if it's not (e.g. when there's > something requiring the exact order of writes) then it's not possible to > use batching. That's one of the reasons why I think we should have a GUC > to disable the batching. > >> * Should the maximum count of accumulated records be applied per >> relation or the query? When many foreign partitions belong to a >> partitioned table, if the former is chosen, it may use much memory in >> total. If the latter is chosen, the records per relation could be >> few and thus the benefit of bulk insert could be small. >> > > I think it needs to be applied per relation, because that's the level at > which we can do it easily and consistently. The whole point is to send > data in sufficiently large chunks to minimize the communication overhead > (latency etc.), but if you enforce it "per query" that seems hard. > > Imagine you're inserting data into a table with many partitions - how do > you pick the number of rows to accumulate? The table may have 10 or 1000 > partitions, we may be inserting into all partitions or just a small > subset, not all partitions may be foreign, etc. It seems pretty > difficult to pick and enforce a reliable limit at the query level. But > maybe I'm missing something and it's easier than I think? > > Of course, you're entirely correct enforcing this at the partition level > may require a lot of memory. Sadly, I don't see a way around that, > except for (a) disabling batching or (b) ordering the data to insert > data into one partition at a time. > Two more comments regarding this: 1) If we want to be more strict about the memory consumption, we should probably set the limit in terms of memory, not number of rows. Currently the 100 rows may be 10kB or 10MB, there's no way to know. Of course, this is not the only place with this issue. 2) I wonder what the COPY FROM patch [1] does in this regard. I don't have time to check right now, but I suggest we try to do the same thing, if only to be consistent. [1] https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company |
In reply to this post by Tomas Vondra-6
From: Tomas Vondra <[hidden email]>
> I see the patch builds the "bulk" query in execute_foreign_modify. IMO > that's something we should do earlier, when we're building the simple > query (for 1-row inserts). I'd understand if you were concerned about > overhead in case of 1-row inserts, trying to not plan the bulk query > until necessary, but I'm not sure this actually helps. > > Or was the goal to build a query for every possible number of slots? I > don't think that's really useful, considering it requires deallocating > the old plan, preparing a new one, etc. IMO it should be sufficient to > have two queries - one for 1-row inserts, one for the full batch. The > last incomplete batch can be inserted using a loop of 1-row queries. > > That's what my patch was doing, but I'm not insisting on that - it just > seems like a better approach to me. So feel free to argue why this is > better. Don't be concerned, the processing is not changed for 1-row inserts: the INSERT query string is built in PlanForeignModify(), and the remote statement is prepared in execute_foreign_modify() during the first call to ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls. The re-creation of INSERT query string and its corresponding PREPARE happen when the number of tuples to be inserted is different from the previous call to ExecForeignInsert()/ExecForeignBulkInsert(). That's because we don't know how many tuples will be inserted during planning (PlanForeignModify) or execution (until the scan ends for SELECT). For example, if we insert 10,030 rows with the bulk size 100, the flow is: PlanForeignModify(): build the INSERT query string for 1 row ExecForeignBulkInsert(100): drop the INSERT query string and prepared statement for 1 row build the query string and prepare statement for 100 row INSERT execute it ExecForeignBulkInsert(100): reuse the prepared statement for 100 row INSERT and execute it ... ExecForeignBulkInsert(30): drop the INSERT query string and prepared statement for 100 row build the query string and prepare statement for 30 row INSERT execute it > I think it'd be good to have such GUC, even if only for testing and > development. We should probably have a way to disable the batching, > which the GUC could also do, I think. So +1 to have the GUC. OK, I'll add it. The name would be max_bulk_insert_tuples, because a) it might cover bulk insert for local relations in the future, and b) "tuple" is used in cpu_(index_)tuple_cost and parallel_tuple_cost, while "row" or "record" is not used in GUC (except for row_security). The valid range would be between 1 and 1,000 (I said 10,000 previously, but I think it's overreaction and am a bit worried about unforseen trouble too many tuples might cause.) 1 disables the bulk processing and uses the traditonal ExecForeignInsert(). The default value is 100 (would 1 be sensible as a default value to avoid surprising users by increased memory usage?) > > 2. Should we accumulate records per relation in ResultRelInfo > > instead? That is, when inserting into a partitioned table that has > > foreign partitions, delay insertion until a certain number of input > > records accumulate, and then insert accumulated records per relation > > (e.g., 50 records to relation A, 30 records to relation B, and 20 > > records to relation C.) If we do that, > > > > I think there's a chunk of text missing here? If we do that, then what? Sorry, the two bullets below there are what follows. Perhaps I should have written ":" instead of ",". > Anyway, I don't see why accumulating the records in ResultRelInfo would > be better than what the patch does now. It seems to me like fairly > specific to FDWs, so keeping it int FDW state seems appropriate. What > would be the advantage of stashing it in ResultRelInfo? I thought of distributing input records to their corresponding partitions' ResultRelInfos. For example, input record for partition 1 comes, store it in the ResultRelInfo for partition 1, then input record for partition 2 comes, store it in the ResultRelInfo for partition 2. When a ResultRelInfo accumulates some number of rows, insert the accumulated rows therein into the partition. When the input endds, perform bulk inserts for ResultRelInfos that have accumulated rows. > I think that's OK for most use cases, and if it's not (e.g. when there's > something requiring the exact order of writes) then it's not possible to > use batching. That's one of the reasons why I think we should have a GUC > to disable the batching. Agreed. > > * Should the maximum count of accumulated records be applied per > > relation or the query? When many foreign partitions belong to a > > partitioned table, if the former is chosen, it may use much memory in > > total. If the latter is chosen, the records per relation could be > > few and thus the benefit of bulk insert could be small. > > > > I think it needs to be applied per relation, because that's the level at > which we can do it easily and consistently. The whole point is to send > data in sufficiently large chunks to minimize the communication overhead > (latency etc.), but if you enforce it "per query" that seems hard. > > Imagine you're inserting data into a table with many partitions - how do > you pick the number of rows to accumulate? The table may have 10 or 1000 > partitions, we may be inserting into all partitions or just a small > subset, not all partitions may be foreign, etc. It seems pretty > difficult to pick and enforce a reliable limit at the query level. But > maybe I'm missing something and it's easier than I think? > > Of course, you're entirely correct enforcing this at the partition level > may require a lot of memory. Sadly, I don't see a way around that, > except for (a) disabling batching or (b) ordering the data to insert > data into one partition at a time. OK, I think I'll try doing like that, after waiting for other opinions some days. > Two more comments regarding this: > > 1) If we want to be more strict about the memory consumption, we should > probably set the limit in terms of memory, not number of rows. Currently > the 100 rows may be 10kB or 10MB, there's no way to know. Of course, > this is not the only place with this issue. > > 2) I wonder what the COPY FROM patch [1] does in this regard. I don't > have time to check right now, but I suggest we try to do the same thing, > if only to be consistent. > > [1] > https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-909 > 86e55489f%40postgrespro.ru That COPY FROM patch uses the tuple accumulation mechanism for local tables as-is. That is, it accumulates at most 1,000 tuples per partition. /* * No more than this many tuples per CopyMultiInsertBuffer * * Caution: Don't make this too big, as we could end up with this many * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's * multiInsertBuffers list. Increasing this can cause quadratic growth in * memory requirements during copies into partitioned tables with a large * number of partitions. */ #define MAX_BUFFERED_TUPLES 1000 Regards Takayuki Tsunakawa |
Free forum by Nabble | Edit this page |