pg_dump multi VALUES INSERT

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

pg_dump multi VALUES INSERT

Surafel Temesgen

Hello,

According to the documentation –inserts option is mainly useful for making dumps that can be loaded into non-PostgreSQL databases and to reduce the amount of rows that might lost during error in reloading but multi values insert command are equally portable and compact and also faster to reload than single row statement. I think it deserve an option of its own

The patch attached add additional option for multi values insert statement with a default values of 100 row per statement so the row lose during error is at most 100 rather than entire table.

Comments?

Regards

Surafel


multi_values_inserts_dum_v1.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Tom Lane-2
Surafel Temesgen <[hidden email]> writes:
> According to the documentation –inserts option is mainly useful for making
> dumps that can be loaded into non-PostgreSQL databases and to reduce the
> amount of rows that might lost during error in reloading but multi values
> insert command are equally portable and compact and also faster to reload
> than single row statement. I think it deserve an option of its own

I don't actually see the point of this.  If you want dump/reload speed
you should be using COPY.  If that isn't your first priority, it's
unlikely that using an option like this would be a good idea.  It makes
the effects of a bad row much harder to predict, and it increases your
odds of OOM problems with wide rows substantially.

I grant that COPY might not be an option if you're trying to transfer
data to a different DBMS, but the other problems seem likely to apply
anywhere.  The bad-data hazard, in particular, is probably a much larger
concern than it is for Postgres-to-Postgres transfers.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Stephen Frost
Greetings,

* Tom Lane ([hidden email]) wrote:

> Surafel Temesgen <[hidden email]> writes:
> > According to the documentation –inserts option is mainly useful for making
> > dumps that can be loaded into non-PostgreSQL databases and to reduce the
> > amount of rows that might lost during error in reloading but multi values
> > insert command are equally portable and compact and also faster to reload
> > than single row statement. I think it deserve an option of its own
>
> I don't actually see the point of this.  If you want dump/reload speed
> you should be using COPY.  If that isn't your first priority, it's
> unlikely that using an option like this would be a good idea.  It makes
> the effects of a bad row much harder to predict, and it increases your
> odds of OOM problems with wide rows substantially.
>
> I grant that COPY might not be an option if you're trying to transfer
> data to a different DBMS, but the other problems seem likely to apply
> anywhere.  The bad-data hazard, in particular, is probably a much larger
> concern than it is for Postgres-to-Postgres transfers.
I can't say that I can really buy off on this argument.

The point of it is that it makes loading into other RDBMS faster.  Yes,
it has many of the same issues as our COPY does, but we support it
because it's much faster.  The same is true here, just for other
databases, so I'm +1 on the general idea.

I've not looked at the patch itself at all, yet.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Fabien COELHO-3
In reply to this post by Surafel Temesgen

> The patch attached add additional option for multi values insert statement
> with a default values of 100 row per statement so the row lose during error
> is at most 100 rather than entire table.

Patch does not seem to apply anymore, could you rebase?

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Michael Paquier-2
In reply to this post by Stephen Frost
On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote:
> The point of it is that it makes loading into other RDBMS faster.  Yes,
> it has many of the same issues as our COPY does, but we support it
> because it's much faster.  The same is true here, just for other
> databases, so I'm +1 on the general idea.

Well, the patch author has mentioned that he cares about also being able
to detect errors when processing the dump, which multi inserts make that
easier to check for.  However, even if you specify --data-only you still
need to worry about the first SET commands ahead, which requires manual
handling of the dump...

I am honestly not convinced that it is worth complicating pg_dump for
that, as there is no guarantee either that the DDLs generated by pg_dump
will be compatible with what other systems expect.  This kind of
compatibility for fetch and reload can also be kind of tricky with
portability issues, so I'd rather let this stuff being handled correctly
by other tools like pgloader or others rather than expecting to get this
stuff half-baked within Postgres core tools.
--
Michael

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Stephen Frost
Greetings,

* Michael Paquier ([hidden email]) wrote:

> On Wed, Oct 17, 2018 at 03:05:28PM -0400, Stephen Frost wrote:
> > The point of it is that it makes loading into other RDBMS faster.  Yes,
> > it has many of the same issues as our COPY does, but we support it
> > because it's much faster.  The same is true here, just for other
> > databases, so I'm +1 on the general idea.
>
> Well, the patch author has mentioned that he cares about also being able
> to detect errors when processing the dump, which multi inserts make that
> easier to check for.  However, even if you specify --data-only you still
> need to worry about the first SET commands ahead, which requires manual
> handling of the dump...
That's hardly a serious complication..

> I am honestly not convinced that it is worth complicating pg_dump for
> that, as there is no guarantee either that the DDLs generated by pg_dump
> will be compatible with what other systems expect.  This kind of
> compatibility for fetch and reload can also be kind of tricky with
> portability issues, so I'd rather let this stuff being handled correctly
> by other tools like pgloader or others rather than expecting to get this
> stuff half-baked within Postgres core tools.

I can see an argument for not wanting to complicate pg_dump, but we've
explicitly stated that the purpose of --inserts is to facilitate
restoring into other database systems and I don't agree that we should
just punt on that entirely.  For better or worse, there's an awful lot
of weight put behind things which are in core and we should take care to
do what we can to make those things better, especially when someone is
proposing a patch to improve the situation.

Sure, the patch might need work or have other issues, but that's an
opportunity for us to provide feedback to the author and encourage them
to improve the patch.

As for the other things that make it difficult to use pg_dump to get a
result out that can be loaded into other database systems- let's try to
improve on that too.  Having an option to skip the 'setup' bits, such as
the SET commands, certainly wouldn't be hard.

I certainly don't see us adding code to pg_dump to handle 'fetch and
reload' into some non-PG system, or, really, even into a PG system, and
that certainly isn't something the patch does, so I don't think it's a
particularly interesting argument.  Users can handle that as needed
themselves.

In other words, none of the arguments put forth really seem to be a
reason to reject the general idea of this patch, so I'm still +1 on
that.  Having just glanced over the patch quickly, I think I would have
done something like '--inserts=100' as the way to enable it instead of
adding a new option though.  Not that I feel very strongly about it.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen
In reply to this post by Fabien COELHO-3
hi,

On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <[hidden email]> wrote:
Patch does not seem to apply anymore, could you rebase?


The attached patch is a rebased version and work by ‘inserts=100’ as Stephen suggest


regards
Surafel

multi_values_inserts_dum_v2.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Alvaro Herrera-9
On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <[hidden email]> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.  It shouldn't take much more code to handle
it that way, which makes more sense to me.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Pavel Stehule


út 6. 11. 2018 v 18:18 odesílatel Alvaro Herrera <[hidden email]> napsal:
On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <[hidden email]> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100.  It shouldn't take much more code to handle
it that way, which makes more sense to me.


+1

100 looks really strange

Pavel

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen
In reply to this post by Alvaro Herrera-9


On Tue, Nov 6, 2018 at 8:18 PM Alvaro Herrera <[hidden email]> wrote:
On 2018-Nov-06, Surafel Temesgen wrote:

> hi,
>
> On Sun, Nov 4, 2018 at 1:18 PM Fabien COELHO <[hidden email]> wrote:
>
> > Patch does not seem to apply anymore, could you rebase?
> >
> The attached patch is a rebased version and work by ‘inserts=100’ as
> Stephen suggest

I thought the suggestion was that the number could be any positive
integer, not hardcoded 100. 
ok
It shouldn't take much more code to handle
it that way, which makes more sense to me

yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert statement and if it is not specified one row per statement used

regards

Surafel


multi_values_inserts_dum_v3.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Dmitry Dolgov
> On Thu, Nov 8, 2018 at 2:03 PM Surafel Temesgen <[hidden email]> wrote:
>
> yes its not much line of code. Attach is a patch that optionally accept the number of row in a single insert statement and if it is not specified one row per statement used

Hi,

Unfortunately, patch needs to be rebased, could you please post an updated
version?

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen


On Fri, Nov 30, 2018 at 7:16 PM Dmitry Dolgov <[hidden email]> wrote:

Unfortunately, patch needs to be rebased, could you please post an updated
version?
 
Thank you for informing, Here is an updated patch against current master
Regards
Surafel
 

multi_values_inserts_dum_v4.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Fabien COELHO-3

Hello Surafel,

> Thank you for informing, Here is an updated patch against current master

Patch applies cleanly, compiles, "make check" is okay, but given that the
feature is not tested...

Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided:
This feature is seldom used (hmmm... 2 existing instances), because it
interferes with argument processing if such switches are used as the last
one. It is only okay with commands which do not expect arguments. For
backward compatibility, this suggests to add another switch, eg
--insert-multi=100 or whatever, which would possibly default to 100. The
alternative is to break compatibility with adding a mandatory argument,
but I guess it would not be admissible to committers.

Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to
avoid it and use some stricter option and error out on malformed integers.

The --help output does not document the --inserts argument, nor the
documentation.

There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding
function, I'd suggest to simply extend it so that it takes into account
the "multi insert" setting and default to the previous behavior if not
set.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen


On Tue, Dec 25, 2018 at 2:47 PM Fabien COELHO <[hidden email]> wrote:

Thank you for looking into it

Hello Surafel,

> Thank you for informing, Here is an updated patch against current master

Patch applies cleanly, compiles, "make check" is okay, but given that the
feature is not tested...

Feature should be tested somewhere.

ISTM that command-line switches with optional arguments should be avoided:
This feature is seldom used (hmmm... 2 existing instances), because it
interferes with argument processing if such switches are used as the last
one. It is only okay with commands which do not expect arguments. For
backward compatibility, this suggests to add another switch, eg
--insert-multi=100 or whatever, which would possibly default to 100. The
alternative is to break compatibility with adding a mandatory argument,
but I guess it would not be admissible to committers.

Function "atoi" parses "1zzz" as 1, which is debatable, so I'd suggest to
avoid it and use some stricter option and error out on malformed integers.

The --help output does not document the --inserts argument, nor the
documentation.


done

There is an indendation issue within the while loop.

Given that the implementation is largely a copy-paste of the preceding
function, I'd suggest to simply extend it so that it takes into account
the "multi insert" setting and default to the previous behavior if not
set. 
 

At first i also try to do it like that but it seems the function will became long and more complex to me

Regards

Surafel


multi_values_inserts_dum_v5.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Fabien COELHO-3

> At first i also try to do it like that but it seems the function will
> became long and more complex to me

Probably. But calling it with size 100 should result in the same behavior,
so it is really just an extension of the preceeding one? Or am I missing
something?

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen


On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <[hidden email]> wrote:

> At first i also try to do it like that but it seems the function will
> became long and more complex to me

Probably. But calling it with size 100 should result in the same behavior,
so it is really just an extension of the preceeding one? Or am I missing
something?


Specifying table data using single value insert statement and user specified values insert statement
have enough deference that demand to be separate function and they are not the same thing that should implement
with the same function. Regarding code duplication i think the solution is making those code separate function
and call at appropriate place.

Regards
Surafel

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
In reply to this post by Surafel Temesgen
Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR:  syntax error at or near ")"
LINE 1: );
        ^

I'm not aware of a valid way to insert multiple 0 column rows in a
single INSERT statement, so not sure how you're going to handle that
case.


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen
Hi,
Thank you for looking at it
On Mon, Dec 31, 2018 at 12:38 PM David Rowley <[hidden email]> wrote:
Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

For example:

# create table t();
# insert into t default values;
# insert into t default values;

$ pg_dump --table t --inserts --insert-multi=100 postgres > dump.sql

# \i dump.sql
[...]
INSERT 0 1
psql:dump.sql:35: ERROR:  syntax error at or near ")"
LINE 1: );
        ^

The attach patch contain a fix for it
Regards
Surafel
 

multi_values_inserts_dum_v6.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <[hidden email]> wrote:
> On Mon, Dec 31, 2018 at 12:38 PM David Rowley <[hidden email]> wrote:
>> Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

> The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.

I've not looked at the rest of the patch.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen


On Thu, Jan 3, 2019 at 1:38 AM David Rowley <[hidden email]> wrote:
On Thu, 3 Jan 2019 at 01:50, Surafel Temesgen <[hidden email]> wrote:
> On Mon, Dec 31, 2018 at 12:38 PM David Rowley <[hidden email]> wrote:
>> Just looking at the v5 patch, it seems not to handle 0 column tables correctly.

> The attach patch contain a fix for it

+ /* if it is zero-column table then we're done */
+ if (nfields == 0)
+ {
+ archputs(insertStmt->data, fout);
+ continue;
+ }

So looks like you're falling back on one INSERT per row for this case.
Given that this function is meant to be doing 'dump_inserts_multiple'
INSERTs per row, I think the comment should give some details of why
we can't do multi-inserts, and explain the reason for it. "we're done"
is just not enough detail.


right ,  attach patch add more detail comment

regards
Surafel

multi_values_inserts_dum_v6.patch (15K) Download Attachment
1234