pg_dump multi VALUES INSERT

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
66 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <[hidden email]> wrote:

> On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <[hidden email]> wrote:
>> > At first i also try to do it like that but it seems the function will
>> > became long and more complex to me
>>
>> Probably. But calling it with size 100 should result in the same behavior,
>> so it is really just an extension of the preceeding one? Or am I missing
>> something?
>>
>
> Specifying table data using single value insert statement and user specified values insert statement
> have enough deference that demand to be separate function and they are not the same thing that should implement
> with the same function. Regarding code duplication i think the solution is making those code separate function
> and call at appropriate place.
I don't really buy this. I've just hacked up a version of
dumpTableData_insert() which supports a variable number rows per
statement. It seems fairly clean and easy to me. Likely the fact that
this is very possible greatly increases the chances of this getting in
since it gets rid of the code duplication. I did also happen to move
the row building code out of the function into its own function, but
that's not really required. I just did that so I could see all the
code in charge of terminating each statement on my screen without
having to scroll.  I've not touched any of the plumbing work to plug
the rows_per_statement variable into the command line argument. So
it'll need a bit of merge work with the existing patch.   There will
need to be some code that ensures that the user does not attempt to
have 0 rows per statement. The code I wrote won't behave well if that
happens.

... Checks existing patch ...

I see you added a test, but missed checking for 0. That needs to be fixed.

+ if (dopt.dump_inserts_multiple < 0)
+ {
+ write_msg(NULL, "argument of --insert-multi must be positive number\n");
+ exit_nicely(1);
+ }

I also didn't adopt passing the rows-per-statement into the FETCH.  I
think that's a very bad idea and we should keep that strictly at 100.
I don't see any reason to tie the two together. If a user wants 10
rows per statement, do we really want to FETCH 10 times more often?
And what happens when they want 1 million rows per statement? We've no
reason to run out of memory from this since we're just dumping the
rows out to the archive on each row.

1.

+        Specify the number of values per <command>INSERT</command> command.
+        This will make the dump file smaller than <option>--inserts</option>
+        and it is faster to reload but lack per row data lost on error
+        instead entire affected insert statement data lost.

Unsure what you mean about "data lost".  It also controls the number
of "rows" per <command>INSERT</command> statement, not the number of
values.

I think it would be fine just to say:

+        When using <option>--inserts</option>, this allows the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+        This setting defaults to 1.

I've used "maximum" there as the final statement on each table can
have less and also 0-column tables will always be 1 row per statement.

2. Is --insert-multi a good name? What if they do --insert-multi=1?
That's not very "multi".  Is --rows-per-insert better?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

multi-row-inserts_for_pg_dump_drowley.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Alvaro Herrera-9
FWIW you can insert multiple zero-column rows with "insert into ..
select union all select union all select".

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen
In reply to this post by David Rowley-3


On Fri, Jan 4, 2019 at 3:08 PM David Rowley <[hidden email]> wrote:
On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <[hidden email]> wrote:
> On Fri, Dec 28, 2018 at 6:46 PM Fabien COELHO <[hidden email]> wrote:
>> > At first i also try to do it like that but it seems the function will
>> > became long and more complex to me
>>
>> Probably. But calling it with size 100 should result in the same behavior,
>> so it is really just an extension of the preceeding one? Or am I missing
>> something?
>>
>
> Specifying table data using single value insert statement and user specified values insert statement
> have enough deference that demand to be separate function and they are not the same thing that should implement
> with the same function. Regarding code duplication i think the solution is making those code separate function
> and call at appropriate place.

I don't really buy this. I've just hacked up a version of
dumpTableData_insert() which supports a variable number rows per
statement. It seems fairly clean and easy to me. Likely the fact that
this is very possible greatly increases the chances of this getting in
since it gets rid of the code duplication. I did also happen to move
the row building code out of the function into its own function, but
that's not really required. I just did that so I could see all the
code in charge of terminating each statement on my screen without
having to scroll.  I've not touched any of the plumbing work to plug
the rows_per_statement variable into the command line argument. So
it'll need a bit of merge work with the existing patch.   There will
need to be some code that ensures that the user does not attempt to
have 0 rows per statement. The code I wrote won't behave well if that
happens.


The attache patch use your method mostly 

... Checks existing patch ...

I see you added a test, but missed checking for 0. That needs to be fixed.

+ if (dopt.dump_inserts_multiple < 0)
+ {
+ write_msg(NULL, "argument of --insert-multi must be positive number\n");
+ exit_nicely(1);
+ }


fixed

I also didn't adopt passing the rows-per-statement into the FETCH.  I
think that's a very bad idea and we should keep that strictly at 100.
I don't see any reason to tie the two together. If a user wants 10
rows per statement, do we really want to FETCH 10 times more often?
And what happens when they want 1 million rows per statement? We've no
reason to run out of memory from this since we're just dumping the
rows out to the archive on each row.


okay 

+        Specify the number of values per <command>INSERT</command> command.
+        This will make the dump file smaller than <option>--inserts</option>
+        and it is faster to reload but lack per row data lost on error
+        instead entire affected insert statement data lost.

Unsure what you mean about "data lost".  It also controls the number
of "rows" per <command>INSERT</command> statement, not the number of
values.

I think it would be fine just to say:

+        When using <option>--inserts</option>, this allows the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+        This setting defaults to 1.



i change it too except "This setting defaults to 1"  because it doesn't have default value.
1 row per statement means --inserts option .
   

2. Is --insert-multi a good name? What if they do --insert-multi=1?
That's not very "multi".  Is --rows-per-insert better?


--rows-per-insert is better for me .

regards
Surafel

multi_values_inserts_dum_v7.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen <[hidden email]> wrote:
> The attache patch use your method mostly

I disagree with the "mostly" part.  As far as I can see, you took the
idea and then made a series of changes to completely break it.  For
bonus points, you put back my comment change to make it incorrect
again.

Here's what I got after applying your latest patch:

$ pg_dump --table=t --inserts --rows-per-insert=4 postgres

[...]
INSERT INTO public.t VALUES (1);
)
INSERT INTO public.t VALUES (, ( 2);
)
INSERT INTO public.t VALUES (, ( 3);
)
INSERT INTO public.t VALUES (, ( 4);
);
INSERT INTO public.t VALUES (5);
)
INSERT INTO public.t VALUES (, ( 6);
)
INSERT INTO public.t VALUES (, ( 7);
)
INSERT INTO public.t VALUES (, ( 8);
);
INSERT INTO public.t VALUES (9);
)
;

I didn't test, but I'm pretty sure that's not valid INSERT syntax.

I'd suggest taking my changes and doing the plumbing work to tie the
rows_per_statement into the command line arg instead of how I left it
hardcoded as 3.

>> +        When using <option>--inserts</option>, this allows the maximum number
>> +        of rows per <command>INSERT</command> statement to be specified.
>> +        This setting defaults to 1.
>>
> i change it too except "This setting defaults to 1"  because it doesn't have default value.
> 1 row per statement means --inserts option .

If it does not default to 1 then what happens when the option is not
specified?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David G Johnston
In reply to this post by Surafel Temesgen
On Thu, Jan 17, 2019 at 5:15 AM Surafel Temesgen <[hidden email]> wrote:

> On Fri, Jan 4, 2019 at 3:08 PM David Rowley <[hidden email]> wrote:
>>
>> On Mon, 31 Dec 2018 at 18:58, Surafel Temesgen <[hidden email]> wrote:
>>
>>
>> 2. Is --insert-multi a good name? What if they do --insert-multi=1?
>> That's not very "multi".  Is --rows-per-insert better?
>>
>
> --rows-per-insert is better for me .

Some thoughts/suggestions:

+ int dump_inserts_multiple;

The option name uses rows, seems like this should mirror that and be
named "dump_inserts_max_rows"

+     <varlistentry>
+      <term><option>--rows-per-insert</option></term>
+      <listitem>
+       <para>
+        When using <option>--rows-per-insert</option>, this allows
the maximum number
+        of rows per <command>INSERT</command> statement to be specified.
+       </para>
+      </listitem>
+     </varlistentry>

"When using <repeat option name from 20 characters ago>..." - no other
option description uses this redundant language and this should not
either.  Just say what it does.

This specifies the maximum number of rows (default 1) that will be
attached to each <command>INSERT</command> command generated by the
<option>--inserts</option> or <option>--column-inserts</option>
options; exactly one of which must be specified as well. (see my note
at the end)

+ printf(_("  --rows-per-insert            number of row per INSERT
command\n"));

(maximum?) number of row[s] per INSERT command

+ qr/\Qpg_dump: option --on-conflict-do-nothing requires option
--inserts , --rows-per-insert or --column-inserts\E/,
+ 'pg_dump: option --on-conflict-do-nothing requires option --inserts
, --rows-per-insert or --column-inserts');

You don't put spaces on both sides of the comma, just after; and add a
comma before the "or" (I think...not withstanding the below)

I suggest we require that --rows-per-insert be dependent upon exactly
one of --inserts or --column-inserts being set and not let it be set
by itself (in which case the original message for
--on-conflict-do-nothing is OK).

David J.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen
In reply to this post by David Rowley-3


On Fri, Jan 18, 2019 at 7:14 AM David Rowley <[hidden email]> wrote:
On Fri, 18 Jan 2019 at 01:15, Surafel Temesgen <[hidden email]> wrote:
> The attache patch use your method mostly

I disagree with the "mostly" part.  As far as I can see, you took the
idea and then made a series of changes to completely break it.  For
bonus points, you put back my comment change to make it incorrect
again.

Here's what I got after applying your latest patch:

$ pg_dump --table=t --inserts --rows-per-insert=4 postgres

[...]
INSERT INTO public.t VALUES (1);
)
INSERT INTO public.t VALUES (, ( 2);
)
INSERT INTO public.t VALUES (, ( 3);
)
INSERT INTO public.t VALUES (, ( 4);
);
INSERT INTO public.t VALUES (5);
)
INSERT INTO public.t VALUES (, ( 6);
)
INSERT INTO public.t VALUES (, ( 7);
)
INSERT INTO public.t VALUES (, ( 8);
);
INSERT INTO public.t VALUES (9);
)
;

I didn't test, but I'm pretty sure that's not valid INSERT syntax.

this happen because i don't disallow the usage of --inserts  and --rows-per-insert
option together.it should be error out in those case.i correct it in attached patch


I'd suggest taking my changes and doing the plumbing work to tie the
rows_per_statement into the command line arg instead of how I left it
hardcoded as 3.

>> +        When using <option>--inserts</option>, this allows the maximum number
>> +        of rows per <command>INSERT</command> statement to be specified.
>> +        This setting defaults to 1.
>>
> i change it too except "This setting defaults to 1"  because it doesn't have default value.
> 1 row per statement means --inserts option .

If it does not default to 1 then what happens when the option is not
specified

if --inserts option specified it use single values insert statement otherwise
it use COPY command

regards
Surafel

multi_values_inserts_dum_v8.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <[hidden email]> wrote:
> this happen because i don't disallow the usage of --inserts  and --rows-per-insert
> option together.it should be error out in those case.i correct it in attached patch

I don't think it should be an error. It's not like the two options
conflict. I imagined that you'd need to specify you want --inserts and
optionally could control how many rows per statement that would be put
in those commands. I'd be surprised to be confronted with an error for
asking for that.

It might be worth doing the same as what we do if --column-inserts is
specified without --inserts. In this case we just do:

/* --column-inserts implies --inserts */
if (dopt.column_inserts)
dopt.dump_inserts = 1;

If you do it that way you'll not need to modify the code much from how
I wrote it. We can likely debate if we want --rows-per-insert to imply
--inserts once there's a working patch.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen


On Fri, Jan 18, 2019 at 2:29 PM David Rowley <[hidden email]> wrote:
On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <[hidden email]> wrote:
> this happen because i don't disallow the usage of --inserts  and --rows-per-insert
> option together.it should be error out in those case.i correct it in attached patch

I don't think it should be an error. It's not like the two options
conflict. I imagined that you'd need to specify you want --inserts and
optionally could control how many rows per statement that would be put
in those commands. I'd be surprised to be confronted with an error for
asking for that.


if you specified --inserts option you already specified the number of rows per statement which is 1 .
if more than one rows per statement needed it must be specified using --rows-per-insert
and specifying one row per statement using --inserts option at the same time specify
different number of rows per statement with --rows-per-insert option seems conflicting to me.   

It might be worth doing the same as what we do if --column-inserts is
specified without --inserts. In this case we just do:

/* --column-inserts implies --inserts */
if (dopt.column_inserts)
dopt.dump_inserts = 1;

If you do it that way you'll not need to modify the code much from how
I wrote it. We can likely debate if we want --rows-per-insert to imply
--inserts once there's a working patch.
 

version 3 of the patch work in similar way except it doesn't have two option.

regards
Surafel
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David G Johnston
On Fri, Jan 18, 2019 at 5:02 AM Surafel Temesgen <[hidden email]> wrote:

> On Fri, Jan 18, 2019 at 2:29 PM David Rowley <[hidden email]> wrote:
>>
>> On Fri, 18 Jan 2019 at 19:29, Surafel Temesgen <[hidden email]> wrote:
>> > this happen because i don't disallow the usage of --inserts  and --rows-per-insert
>> > option together.it should be error out in those case.i correct it in attached patch
>>
>> I don't think it should be an error. It's not like the two options
>> conflict. I imagined that you'd need to specify you want --inserts and
>> optionally could control how many rows per statement that would be put
>> in those commands. I'd be surprised to be confronted with an error for
>> asking for that.
>>
>
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.

So, the other way of looking at it - why do we even need an entirely
new option.  Modify --inserts to accept an optional integer value that
defaults to 1 (I'm not sure how tricky dealing with optional option
values is though...).

--inserts-columns implies --inserts but if you want to change the
number of rows you need to specify both (or add the same optional
integer to --inserts-columns)

David J.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David G Johnston
In reply to this post by Fabien COELHO-3
On Tue, Dec 25, 2018 at 4:47 AM Fabien COELHO <[hidden email]> wrote:
> ISTM that command-line switches with optional arguments should be avoided:
> This feature is seldom used (hmmm... 2 existing instances), because it
> interferes with argument processing if such switches are used as the last
> one.

Excellent point; though avoiding adding yet another limited-use option
seems like a fair trade-off here.  Though maybe we also need to add
the traditional "--" option as well. I'm not married to the idea
though; but its also not like mis-interpreting the final argument as
an integer instead of a database is going to be a silent error.

David J.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
In reply to this post by Surafel Temesgen
On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen <[hidden email]> wrote:
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.

So you're saying an INSERT, where you insert multiple rows in a single
statement is not an insert? That logic surprises me.  --inserts makes
pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
INSERTs. I'd love to know why you think there's some conflict with
that.

By your logic, you could say --column-inserts and --inserts should
also conflict, but they don't. --column-inserts happens to be coded to
imply --inserts. I really suggest we follow the lead from that. Doing
it this way reduces the complexity of the code where we build the
INSERT statement.  Remember that a patch that is overly complex has
much less chance of making it.  I'd really suggest you keep this as
simple as possible.

It also seems perfectly logical to me to default --rows-per-insert to
1 so that when --inserts is specified we do 1 row per INSERT. If the
user changes that value to something higher then nothing special needs
to happen as the function building the INSERT statement will always be
paying attention to whatever the --rows-per-insert value is set to.
That simplifies the logic meaning you don't need to check if --inserts
was specified.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Surafel Temesgen


On Tue, Jan 22, 2019 at 3:35 PM David Rowley <[hidden email]> wrote:
On Sat, 19 Jan 2019 at 01:01, Surafel Temesgen <[hidden email]> wrote:
> if you specified --inserts option you already specified the number of rows per statement which is 1 .
> if more than one rows per statement needed it must be specified using --rows-per-insert
> and specifying one row per statement using --inserts option at the same time specify
> different number of rows per statement with --rows-per-insert option seems conflicting to me.

So you're saying an INSERT, where you insert multiple rows in a single
statement is not an insert? That logic surprises me.  --inserts makes
pg_dump use INSERTs rather than COPY. --rows-per-inserts still uses
INSERTs. I'd love to know why you think there's some conflict with
that.

By your logic, you could say --column-inserts and --inserts should
also conflict, but they don't. --column-inserts happens to be coded to
imply --inserts. I really suggest we follow the lead from that. Doing
it this way reduces the complexity of the code where we build the
INSERT statement.  Remember that a patch that is overly complex has
much less chance of making it.  I'd really suggest you keep this as
simple as possible.


okay i understand it now .Fabien also comment about it uptread i misunderstand it as
using separate new option.

It also seems perfectly logical to me to default --rows-per-insert to
1 so that when --inserts is specified we do 1 row per INSERT. If the
user changes that value to something higher then nothing special needs
to happen as the function building the INSERT statement will always be
paying attention to whatever the --rows-per-insert value is set to.
That simplifies the logic meaning you don't need to check if --inserts
was specified.

 
okay .thank you for explaining. i attach a patch corrected as such
 

multi_values_inserts_dum_v9.patch (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Alvaro Herrera-9
Nice stuff.

Is it possible to avoid the special case for 0 columns by using the
UNION ALL syntax I showed?

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Fabien COELHO-3
In reply to this post by Surafel Temesgen

Hello Surafel,

> okay .thank you for explaining. i attach a patch corrected as such

About this v9: applies cleanly, compiles, global and local "make check"
ok.

The option is not exercise in the TAP tests. I'd suggest that it should be
tested on a small table with zero, 1, more than the value set number of
rows. Maybe use -t and other options to reduce the output to the minimum.

About the documentation:

  +   When using <option>--rows-per-insert</option>, this allows the maximum number
  +   of rows per <command>INSERT</command> statement to be specified.

I'd suggest a more direct and simple style, something like:

   Set the maximum number of rows per INSERT statement.
   This option implies --inserts.
   Default to 1.

About the help message, the new option expects an argument, but it does
not show:

  +  printf(_("  --rows-per-insert    number of row per INSERT command\n"));

About the code, maybe avoid using an int as a bool, eg:

     ... && !dopt.dump_inserts_multiple)
  -> ... && dopt.dump_inserts_multiple == 0)

Spacing around operators, eg: "!=1)" -> "!= 1)"

ISTM that the "dump_inserts_multiple" field is useless, you can reuse
"dump_inserts" instead, i.e. --inserts sets it to 1 *if zero*, and
--rows-per-inserts=XXX sets it to XXX. That would simplify the code
significantly.

ISTM that there are indentation issues, eg on added

   if (dopt->dump_inserts_multiple == 1) {

The old code is not indented properly.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
In reply to this post by Alvaro Herrera-9
On Wed, 23 Jan 2019 at 04:08, Alvaro Herrera <[hidden email]> wrote:
> Is it possible to avoid the special case for 0 columns by using the
> UNION ALL syntax I showed?

It would be possible, but my thoughts are that we're moving away from
the SQL standard by doing so.

Looking at the standard I see:

<row value constructor element list> ::=
<row value constructor element> [ { <comma> <row value constructor
element> }... ]

so it appears that multirow VALUES clauses are allowed.

INSERT INTO ... DEFAULT VALUES; is standard too, but looking at
SELECT, neither the target list or FROM clause is optional.

You could maybe argue that 0-column tables are not standard anyway.
Going by DROP COLUMN I see "4) C shall be a column of T and C shall
not be the only column of T.". Are we the only database to break that?

I think since pg_dump --inserts is meant to be for importing data into
other databases then we should likely keep it as standard as possible.

Another argument against is that we've only supported empty SELECT
clauses since 9.4, so it may not help anyone who mysteriously wanted
to import data into an old version. Maybe that's a corner case, but
I'm sure 0 column tables are too.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
In reply to this post by Surafel Temesgen
On Wed, 23 Jan 2019 at 02:13, Surafel Temesgen <[hidden email]> wrote:
> okay .thank you for explaining. i attach a patch corrected as such

I did a bit of work to this to fix a bunch of things:

1. Docs for --rows-per-insert didn't mention anything about a parameter.
2. You'd not followed the alphabetical order of how the parameters are
documented.
3. Various parts of the docs claimed that --inserts just inserted 1
row per statement. Those needed to be updated.
4. New options out of order in --help. The rest were in alphabetical order.
5. DumpOptions struct variable was not in the right place. It was
grouped in with some parameterless options.
6. Code in dumpTableData_insert() was convoluted. Not sure what you
had added end_of_statement for or why you were checking PQntuples(res)
== 0.  You'd also made the number_of_row variable 1-based and set it
to 1 when we had added 0 rows. You then checked for the existence of 1
row by checking the variable was > 1... That made very little sense to
me. I've pretty much put back the code that I had sent to you
previously, just without the part where I split the row building code
out into another function.
7. A comment in dumpTableData_insert() claimed that the insertStmt
would end in "VALUES(", but it'll end in "VALUES ". I had updated that
in my last version, but you must have missed that.
8. I've made it so --rows-per-insert implies --inserts. This is
aligned with how --column-inserts behaves.

I changed a few other things. I simplified the condition that raises
an error when someone does --on-conflict-do-nothing without the
--inserts option. There was no need to check for the other options
that imply --inserts since that will already be enabled if one of the
other options has.

I also removed most of the error checking you'd added to ensure that
the --rows-per-insert parameter was a number.  I'd have kept this but
I saw that we do nothing that special for the compression option. I
didn't see why --rows-per-insert was any more special. It was quite a
bit of code for very little reward.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pg_dump-rows-per-insert-option_v10.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Fabien COELHO-3

Hello David & Surafel,

About this v10:

Patch applies and compiles cleanly, local & global "make check" ok.

A few comments, possibly redundant with some already in the thread.

Out of abc-order rows-per-inserts option in getopt struct.

help string does not specify the expected argument.

I still think that the added rows_per_insert field is useless, ISTM That
"int dump_inserts" can be reused, and you could also drop boolean
got_rows_per_insert.

The feature is not tested anywhere. I still think that there should be a
test on empty/small/larger-than-rows-per-insert tables, possibly added to
existing TAP-tests.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
On Wed, 23 Jan 2019 at 22:08, Fabien COELHO <[hidden email]> wrote:
> Out of abc-order rows-per-inserts option in getopt struct.

I missed that. Thanks. Fixed in attached.

> help string does not specify the expected argument.

Also fixed.

> I still think that the added rows_per_insert field is useless, ISTM That
> "int dump_inserts" can be reused, and you could also drop boolean
> got_rows_per_insert.

I thought about this and looked into it, but I decided it didn't look
like a smart thing to do.  The reason is that if --inserts sets
dump_inserts to 1 then --rows-per-insert sets it to something else
that's likely fine, but if that happens in the opposite order then the
--rows-per-insert gets overwritten with 1. The bad news is the order
that happens is defined by the order of the command line args.  It
might be possible to make it work by having --inserts set some other
variable, then set dump_inserts to 1 if it's set to 0 and the other
variable is set to >= 1... but that requires another variable,  which
is what you want to avoid... I think it's best to have a variable per
argument.

I could get rid of the got_rows_per_insert variable, but it would
require setting the default value for rows_per_insert in the main()
function rather than in InitDumpOptions().   I thought
InitDumpOptions() looked like just the place to do this, so went with
that option.  To make it work without got_rows_per_insert,
rows_per_insert would have to be 0 by default and we'd know we saw a
--rows-per-insert command line arg by the fact that rows_per_insert
was non-zero.  Would you rather have it that way?

> The feature is not tested anywhere. I still think that there should be a
> test on empty/small/larger-than-rows-per-insert tables, possibly added to
> existing TAP-tests.

I was hoping to get away with not having to do that... mainly because
I've no idea how.   Please have at it if you know how it's done.

FWIW I looked at 002_pg_dump.pl and did add a test, but I was unable
to make it pass because I couldn't really figure out how the regex
matching is meant to work. It does not seem to be explained very well
in the comments and I lack patience for Perl.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pg_dump-rows-per-insert-option_v11.patch (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

Fabien COELHO-3

Hello David,

> I thought about this and looked into it, but I decided it didn't look
> like a smart thing to do.  The reason is that if --inserts sets
> dump_inserts to 1 then --rows-per-insert sets it to something else
> that's likely fine, but if that happens in the opposite order then the
> --rows-per-insert gets overwritten with 1.

You can test before doing that!

   case X:
     if (opt.dump_inserts == 0)
       opt.dump_inserts = 1;
     // otherwise option is already set

> The bad news is the order that happens is defined by the order of the
> command line args.

> It might be possible to make it work by having --inserts set some other
> variable,

ISTM that it is enough to test whether the variable is zero.

> then set dump_inserts to 1 if it's set to 0 and the other variable is
> set to >= 1... but that requires another variable, which is what you
> want to avoid...

I still do not understand the need for another variable.

   int ninserts = 0; // default is to use copy
   while (getopt...)
   {
     switch (...) {
       case "--inserts":
         if (ninserts == 0) ninserts = 1;
         break;
       case "--rows-per-insert":
         ninserts = arg_value;
         checks...
         break;
      ...

> I think it's best to have a variable per argument.

I disagree, because it adds complexity where none is needed: here the new
option is an extension of a previous one, thus the previous one just
becomes a particular case, so it seems simpler to manage it as the
particular case it is rather than a special case, creating the need for
checking the consistency and so if two variables are used.

> I could get rid of the got_rows_per_insert variable, but it would
> require setting the default value for rows_per_insert in the main()
> function rather than in InitDumpOptions().   I thought
> InitDumpOptions() looked like just the place to do this, so went with
> that option.  To make it work without got_rows_per_insert,
> rows_per_insert would have to be 0 by default and we'd know we saw a
> --rows-per-insert command line arg by the fact that rows_per_insert
> was non-zero.  Would you rather have it that way?

Yep, esp as rows_per_insert & dump_inserts could be the same.

>> The feature is not tested anywhere. I still think that there should be a
>> test on empty/small/larger-than-rows-per-insert tables, possibly added to
>> existing TAP-tests.
>
> I was hoping to get away with not having to do that... mainly because
> I've no idea how.

Hmmm. That is another question! Maybe someone will help.

--
Fabien.

Reply | Threaded
Open this post in threaded view
|

Re: pg_dump multi VALUES INSERT

David Rowley-3
On Thu, 24 Jan 2019 at 04:45, Fabien COELHO <[hidden email]> wrote:

> I still do not understand the need for another variable.
>
>    int ninserts = 0; // default is to use copy
>    while (getopt...)
>    {
>      switch (...) {
>        case "--inserts":
>          if (ninserts == 0) ninserts = 1;
>          break;
>        case "--rows-per-insert":
>          ninserts = arg_value;
>          checks...
>          break;
>       ...
I didn't think of that. Attached is a version that changes it to work
along those lines.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pg_dump-rows-per-insert-option_v12.patch (18K) Download Attachment
1234