pg_dump additional options for performance

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

pg_dump additional options for performance

Stephen Frost
Simon,

  I agree with adding these options in general, since I find myself
  frustrated by having to vi huge dumps to change simple schema things.
  A couple of comments on the patch though:

  - Conflicting option handling
    I think we are doing our users a disservice by putting it on them to
        figure out exactly what:
        multiple object groups cannot be used together
        means to them.  You and I may understand what an "object group" is,
        and why there can be only one, but it's a great deal less clear than
        the prior message of
        options -s/--schema-only and -a/--data-only cannot be used together
        My suggestion would be to either list out the specific options which
        can't be used together, as was done previously, or add a bit of (I
        realize, boring) code and actually tell the user which of the
        conflicting options were used.

  - Documentation
        When writing the documentation I would stress that "pre-schema" and
        "post-schema" be defined in terms of PostgreSQL objects and why they
        are pre vs. post.

  - Technically, the patch needs to be updated slightly since another
        pg_dump-related patch was committed recently which also added
        options and thus causes a conflict.

  Beyond those minor points, the patch looks good to me.

  Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Simon Riggs

On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:

> Simon,
>
>   I agree with adding these options in general, since I find myself
>   frustrated by having to vi huge dumps to change simple schema things.
>   A couple of comments on the patch though:
>
>   - Conflicting option handling
>     I think we are doing our users a disservice by putting it on them to
> figure out exactly what:
> multiple object groups cannot be used together
> means to them.  You and I may understand what an "object group" is,
> and why there can be only one, but it's a great deal less clear than
> the prior message of
> options -s/--schema-only and -a/--data-only cannot be used together
> My suggestion would be to either list out the specific options which
> can't be used together, as was done previously, or add a bit of (I
> realize, boring) code and actually tell the user which of the
> conflicting options were used.
>
>   - Documentation
> When writing the documentation I would stress that "pre-schema" and
> "post-schema" be defined in terms of PostgreSQL objects and why they
> are pre vs. post.
>
>   - Technically, the patch needs to be updated slightly since another
> pg_dump-related patch was committed recently which also added
> options and thus causes a conflict.
>
>   Beyond those minor points, the patch looks good to me.

Thanks for the review. I'll make the changes you suggest.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Simon Riggs

On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:

> On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
> > Simon,
> >
> >   I agree with adding these options in general, since I find myself
> >   frustrated by having to vi huge dumps to change simple schema things.
> >   A couple of comments on the patch though:
> >
> >   - Conflicting option handling
> >     I think we are doing our users a disservice by putting it on them to
> > figure out exactly what:
> > multiple object groups cannot be used together
> > means to them.  You and I may understand what an "object group" is,
> > and why there can be only one, but it's a great deal less clear than
> > the prior message of
> > options -s/--schema-only and -a/--data-only cannot be used together
> > My suggestion would be to either list out the specific options which
> > can't be used together, as was done previously, or add a bit of (I
> > realize, boring) code and actually tell the user which of the
> > conflicting options were used.
> >
> >   - Documentation
> > When writing the documentation I would stress that "pre-schema" and
> > "post-schema" be defined in terms of PostgreSQL objects and why they
> > are pre vs. post.
> >
> >   - Technically, the patch needs to be updated slightly since another
> > pg_dump-related patch was committed recently which also added
> > options and thus causes a conflict.
> >
> >   Beyond those minor points, the patch looks good to me.
>
> Thanks for the review. I'll make the changes you suggest.
Patch updated to head, plus changes/docs requested.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

pg_dump_prepost.v3.patch (38K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
Simon,

* Simon Riggs ([hidden email]) wrote:
> On Sun, 2008-07-20 at 05:47 +0100, Simon Riggs wrote:
> > On Sat, 2008-07-19 at 23:07 -0400, Stephen Frost wrote:
[...]
> > >   - Conflicting option handling

Thanks for putting in the extra code to explicitly indicate which
conflicting options were used.

> > >   - Documentation
> > > When writing the documentation I would stress that "pre-schema" and
> > > "post-schema" be defined in terms of PostgreSQL objects and why they
> > > are pre vs. post.

Perhaps this is up for some debate, but I find the documentation added
for these options to be lacking the definitions I was looking for, and
the explanation of why they are what they are.  I'm also not sure I
agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
really fit with the option names or what they do.  Would you consider:

<term><option>--schema-pre-load</option></term>
<listitem>
 <para>
   Pre-Data Load - Minimum amount of the schema required before data
   loading can begin.  This consists mainly of creating the tables
   using the <command>CREATE TABLE</command>.
   This part can be requested using <option>--schema-pre-load</>.
 </para>
</listitem>

<term><option>--schema-post-load</option></term>
<listitem>
 <para>
   Post-Data Load - The rest of the schema definition, including keys,
   indexes, etc.  By putting keys and indexes after the data has been
   loaded the whole process of restoring data is much faster.  This is
   because it is faster to build indexes and check keys in bulk than
   piecemeal as the data is loaded.
   This part can be requested using <option>--schema-post-load</>.
 </para>
</listitem>

Even this doesn't cover everything though- it's too focused on tables
and data loading.  Where do functions go?  What about types?

A couple of additional points:

  - The command-line help hasn't been updated.  Clearly, that also needs
        to be done to consider the documentation aspect complete.

  - There appears to be a bit of mistakenly included additions.  The
    patch to pg_restore.sgml attempts to add in documentation for
        --superuser.  I'm guessing that was unintentional, and looks like
        just a mistaken extra copy&paste.

> > >   - Technically, the patch needs to be updated slightly since another
> > > pg_dump-related patch was committed recently which also added
> > > options and thus causes a conflict.

I think this might have just happened again, funny enough.  It's
something that a committer could perhaps fix, but if you're reworking
the patch anyway...

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Simon Riggs

On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:

> Perhaps this is up for some debate, but I find the documentation added
> for these options to be lacking the definitions I was looking for, and
> the explanation of why they are what they are.  I'm also not sure I
> agree with the "Pre-Schema" and "Post-Schema" nomenclature as it doesn't
> really fit with the option names or what they do.  Would you consider:

Will reword.

> Even this doesn't cover everything though- it's too focused on tables
> and data loading.  Where do functions go?  What about types?

Yes, it is focused on tables and data loading. What about
functions/types? No relevance here.

>   - The command-line help hasn't been updated.  Clearly, that also needs
> to be done to consider the documentation aspect complete.
>
>   - There appears to be a bit of mistakenly included additions.  The
>     patch to pg_restore.sgml attempts to add in documentation for
> --superuser.  I'm guessing that was unintentional, and looks like
> just a mistaken extra copy&paste.

Thanks, will do.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
* Simon Riggs ([hidden email]) wrote:
> On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
> > Even this doesn't cover everything though- it's too focused on tables
> > and data loading.  Where do functions go?  What about types?
>
> Yes, it is focused on tables and data loading. What about
> functions/types? No relevance here.

I don't see how they're not relevant, it's not like they're being
excluded and in fact they show up in the pre-load output.  Heck, even if
they *were* excluded, that should be made clear in the documentation
(either be an explicit include list, or saying they're excluded).

Part of what's driving this is making sure we have a plan for future
objects and where they'll go.  Perhaps it would be enough to just say
"pre-load is everything in the schema, except things which are faster
done in bulk (eg: indexes, keys)".  I don't think it's right to say
pre-load is "only object definitions required to load data" when it
includes functions and ACLs though.

Hopefully my suggestion and these comments will get us to a happy
middle-ground.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

DavidGould
On Sun, Jul 20, 2008 at 09:18:29PM -0400, Stephen Frost wrote:

> * Simon Riggs ([hidden email]) wrote:
> > On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
> > > Even this doesn't cover everything though- it's too focused on tables
> > > and data loading.  Where do functions go?  What about types?
> >
> > Yes, it is focused on tables and data loading. What about
> > functions/types? No relevance here.
>
> I don't see how they're not relevant, it's not like they're being
> excluded and in fact they show up in the pre-load output.  Heck, even if
> they *were* excluded, that should be made clear in the documentation
> (either be an explicit include list, or saying they're excluded).
>
> Part of what's driving this is making sure we have a plan for future
> objects and where they'll go.  Perhaps it would be enough to just say
> "pre-load is everything in the schema, except things which are faster
> done in bulk (eg: indexes, keys)".  I don't think it's right to say
> pre-load is "only object definitions required to load data" when it
> includes functions and ACLs though.
>
> Hopefully my suggestion and these comments will get us to a happy
> middle-ground.

One observation, indexes should be built right after the table data
is loaded for each table, this way, the index build gets a hot cache
for the table data instead of having to re-read it later as we do now.

-dg
 


--
David Gould       [hidden email]      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
* daveg ([hidden email]) wrote:
> One observation, indexes should be built right after the table data
> is loaded for each table, this way, the index build gets a hot cache
> for the table data instead of having to re-read it later as we do now.

That's not how pg_dump has traditionally worked, and the point of this
patch is to add options to easily segregate the main pieces of the
existing pg_dump output (main schema definition, data dump, key/index
building).  You suggestion brings up an interesting point that should
pg_dump's traditional output structure change the "--schema-post-load"
set of objects wouldn't be as clear to newcomers since the load and the
indexes would be interleaved in the regular output.

I'd be curious about the performance impact this has on an actual load
too.  It would probably be more valuable on smaller loads where it would
have less of an impact anyway than on loads larger than the cache size.
Still, not an issue for this patch, imv.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Tom Lane-2
Stephen Frost <[hidden email]> writes:
> * daveg ([hidden email]) wrote:
>> One observation, indexes should be built right after the table data
>> is loaded for each table, this way, the index build gets a hot cache
>> for the table data instead of having to re-read it later as we do now.

> That's not how pg_dump has traditionally worked, and the point of this
> patch is to add options to easily segregate the main pieces of the
> existing pg_dump output (main schema definition, data dump, key/index
> building).  You suggestion brings up an interesting point that should
> pg_dump's traditional output structure change the "--schema-post-load"
> set of objects wouldn't be as clear to newcomers since the load and the
> indexes would be interleaved in the regular output.

Yeah.  Also, that is pushing into an entirely different line of
development, which is to enable multithreaded pg_restore.  The patch
at hand is necessarily incompatible with that type of operation, and
wouldn't be used together with it.

As far as the documentation/definition aspect goes, I think it should
just say the parts are
        * stuff needed before you can load the data
        * the data
        * stuff needed after loading the data
and not try to be any more specific than that.  There are corner cases
that will turn any simple breakdown into a lie, and I doubt that it's
worth trying to explain them all.  (Take a close look at the dependency
loop breaking logic in pg_dump if you doubt this.)

I hadn't realized that Simon was using "pre-schema" and "post-schema"
to name the first and third parts.  I'd agree that this is confusing
nomenclature: it looks like it's trying to say that the data is the
schema, and the schema is not!  How about "pre-data and "post-data"?

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Simon Riggs
In reply to this post by Stephen Frost

On Sun, 2008-07-20 at 21:18 -0400, Stephen Frost wrote:

> * Simon Riggs ([hidden email]) wrote:
> > On Sun, 2008-07-20 at 17:43 -0400, Stephen Frost wrote:
> > > Even this doesn't cover everything though- it's too focused on tables
> > > and data loading.  Where do functions go?  What about types?
> >
> > Yes, it is focused on tables and data loading. What about
> > functions/types? No relevance here.
>
> I don't see how they're not relevant, it's not like they're being
> excluded and in fact they show up in the pre-load output.  Heck, even if
> they *were* excluded, that should be made clear in the documentation
> (either be an explicit include list, or saying they're excluded).
>
> Part of what's driving this is making sure we have a plan for future
> objects and where they'll go.  Perhaps it would be enough to just say
> "pre-load is everything in the schema, except things which are faster
> done in bulk (eg: indexes, keys)".  I don't think it's right to say
> pre-load is "only object definitions required to load data" when it
> includes functions and ACLs though.
>
> Hopefully my suggestion and these comments will get us to a happy
> middle-ground.

I don't really understand what you're saying.

The options split the dump into 3 parts that's all: before the load, the
load and after the load.

--schema-pre-load says
"Dumps exactly what <option>--schema-only</> would dump, but only those
statements before the data load."

What is it you are suggesting? I'm unclear.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Simon Riggs
In reply to this post by Tom Lane-2

On Sun, 2008-07-20 at 23:34 -0400, Tom Lane wrote:

> Stephen Frost <[hidden email]> writes:
> > * daveg ([hidden email]) wrote:
> >> One observation, indexes should be built right after the table data
> >> is loaded for each table, this way, the index build gets a hot cache
> >> for the table data instead of having to re-read it later as we do now.
>
> > That's not how pg_dump has traditionally worked, and the point of this
> > patch is to add options to easily segregate the main pieces of the
> > existing pg_dump output (main schema definition, data dump, key/index
> > building).  You suggestion brings up an interesting point that should
> > pg_dump's traditional output structure change the "--schema-post-load"
> > set of objects wouldn't be as clear to newcomers since the load and the
> > indexes would be interleaved in the regular output.

Stephen: Agreed.

> Yeah.  Also, that is pushing into an entirely different line of
> development, which is to enable multithreaded pg_restore.  The patch
> at hand is necessarily incompatible with that type of operation, and
> wouldn't be used together with it.
>
> As far as the documentation/definition aspect goes, I think it should
> just say the parts are
> * stuff needed before you can load the data
> * the data
> * stuff needed after loading the data
> and not try to be any more specific than that.  There are corner cases
> that will turn any simple breakdown into a lie, and I doubt that it's
> worth trying to explain them all.  (Take a close look at the dependency
> loop breaking logic in pg_dump if you doubt this.)

Tom: Agreed.

> I hadn't realized that Simon was using "pre-schema" and "post-schema"
> to name the first and third parts.  I'd agree that this is confusing
> nomenclature: it looks like it's trying to say that the data is the
> schema, and the schema is not!  How about "pre-data and "post-data"?

OK by me. Any other takers?



I also suggested having three options
--want-pre-schema
--want-data
--want-post-schema
so we could ask for any or all parts in the one dump. --data-only and
--schema-only are negative options so don't allow this.
(I don't like those names either, just thinking about capabilities)

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Tom Lane-2
Simon Riggs <[hidden email]> writes:
> I also suggested having three options
> --want-pre-schema
> --want-data
> --want-post-schema
> so we could ask for any or all parts in the one dump. --data-only and
> --schema-only are negative options so don't allow this.
> (I don't like those names either, just thinking about capabilities)

Maybe invert the logic?

        --omit-pre-data
        --omit-data
        --omit-post-data

Not wedded to these either, just tossing out an idea...

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
In reply to this post by Simon Riggs
* Simon Riggs ([hidden email]) wrote:
> The options split the dump into 3 parts that's all: before the load, the
> load and after the load.
>
> --schema-pre-load says
> "Dumps exactly what <option>--schema-only</> would dump, but only those
> statements before the data load."
>
> What is it you are suggesting? I'm unclear.

That part is fine, the problem is that elsewhere in the documentation
(patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
change it to be "objects required before data loading", which isn't the
same.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
In reply to this post by Tom Lane-2
Tom,

* Tom Lane ([hidden email]) wrote:
> As far as the documentation/definition aspect goes, I think it should
> just say the parts are
> * stuff needed before you can load the data
> * the data
> * stuff needed after loading the data
> and not try to be any more specific than that.  There are corner cases
> that will turn any simple breakdown into a lie, and I doubt that it's
> worth trying to explain them all.  (Take a close look at the dependency
> loop breaking logic in pg_dump if you doubt this.)

Even that is a lie though, which I guess is what my problem is.  It's
really "everything for the schema, except stuff that is better done in
bulk", I believe.  Also, I'm a bit concerned about people who would
argue that you need PKs and FKs before you can load the data.  Probably
couldn't be avoided tho.

> I hadn't realized that Simon was using "pre-schema" and "post-schema"
> to name the first and third parts.  I'd agree that this is confusing
> nomenclature: it looks like it's trying to say that the data is the
> schema, and the schema is not!  How about "pre-data and "post-data"?

Argh.  The command-line options follow the 'data'/'load' line
(--schema-pre-load and --schema-post-load), and so I think those are
fine.  The problem was that in the documentation he switched to saying
they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
In reply to this post by Simon Riggs
Simon,

* Simon Riggs ([hidden email]) wrote:
> > I hadn't realized that Simon was using "pre-schema" and "post-schema"
> > to name the first and third parts.  I'd agree that this is confusing
> > nomenclature: it looks like it's trying to say that the data is the
> > schema, and the schema is not!  How about "pre-data and "post-data"?
>
> OK by me. Any other takers?

Having the command-line options be "--schema-pre-data" and
"--schema-post-data" is fine with me.  Leaving them the way they are is
also fine by me.  It's the documentation (back to pg_dump.sgml,
~774/~797) that starts talking about Pre-Schema and Post-Schema.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Andrew Dunstan
In reply to this post by Tom Lane-2


Tom Lane wrote:

> Simon Riggs <[hidden email]> writes:
>  
>> I also suggested having three options
>> --want-pre-schema
>> --want-data
>> --want-post-schema
>> so we could ask for any or all parts in the one dump. --data-only and
>> --schema-only are negative options so don't allow this.
>> (I don't like those names either, just thinking about capabilities)
>>    
>
> Maybe invert the logic?
>
> --omit-pre-data
> --omit-data
> --omit-post-data
>
> Not wedded to these either, just tossing out an idea...
>
>
>  

Please, no. Negative logic seems likely to cause endless confusion.

I'd even be happier with --schema-part-1 and --schema-part-2 if we can't
find some more expressive way of designating them.

cheers

andrew

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Tom Lane-2
Andrew Dunstan <[hidden email]> writes:
> Tom Lane wrote:
>> Maybe invert the logic?
>> --omit-pre-data
>> --omit-data
>> --omit-post-data

> Please, no. Negative logic seems likely to cause endless confusion.

I think it might actually be less confusing, because with this approach,
each switch has an identifiable default (no) and setting it doesn't
cause side-effects on settings of other switches.  The interactions of
the switches as Simon presents 'em seem less than obvious.

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Simon Riggs
In reply to this post by Stephen Frost

On Mon, 2008-07-21 at 07:46 -0400, Stephen Frost wrote:

> * Simon Riggs ([hidden email]) wrote:
> > The options split the dump into 3 parts that's all: before the load, the
> > load and after the load.
> >
> > --schema-pre-load says
> > "Dumps exactly what <option>--schema-only</> would dump, but only those
> > statements before the data load."
> >
> > What is it you are suggesting? I'm unclear.
>
> That part is fine, the problem is that elsewhere in the documentation
> (patch line starting ~774 before, ~797 after, to the pg_dump.sgml) you
> change it to be "objects required before data loading", which isn't the
> same.

OK, gotcha now - will change that. I thought you might mean something
about changing the output itself.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Tom Lane-2
In reply to this post by Stephen Frost
Stephen Frost <[hidden email]> writes:
> * Tom Lane ([hidden email]) wrote:
>> As far as the documentation/definition aspect goes, I think it should
>> just say the parts are
>> * stuff needed before you can load the data
>> * the data
>> * stuff needed after loading the data

> Even that is a lie though, which I guess is what my problem is.

True; the stuff done after is done that way at least in part for
performance reasons rather than because it has to be done that way.
(I think it's not only performance issues, though --- for circular
FKs you pretty much have to load the data first.)

>> I hadn't realized that Simon was using "pre-schema" and "post-schema"
>> to name the first and third parts.  I'd agree that this is confusing
>> nomenclature: it looks like it's trying to say that the data is the
>> schema, and the schema is not!  How about "pre-data and "post-data"?

> Argh.  The command-line options follow the 'data'/'load' line
> (--schema-pre-load and --schema-post-load), and so I think those are
> fine.  The problem was that in the documentation he switched to saying
> they were "Pre-Schema" and "Post-Schema", which could lead to confusion.

Ah, I see.  No objection to those switch names, at least assuming we
want to stick to positive-logic switches.  What did you think of the
negative-logic suggestion (--omit-xxx)?

                        regards, tom lane

--
Sent via pgsql-patches mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
Reply | Threaded
Open this post in threaded view
|

Re: pg_dump additional options for performance

Stephen Frost
Tom, et al,

* Tom Lane ([hidden email]) wrote:
> Ah, I see.  No objection to those switch names, at least assuming we
> want to stick to positive-logic switches.  What did you think of the
> negative-logic suggestion (--omit-xxx)?

My preference is for positive-logic switches in general.  The place
where I would use this patch would lend itself to being more options if
--omit-xxxx were used.  I expect that would hold true for most people.
It would be:

  --omit-data --omit-post-load
  --omit-pre-load --omit-post-load
  --omit-pre-load --omit-data

vs.

  --schema-pre-load
  --data-only
  --schema-post-load

Point being that I'd be dumping these into seperate files where I could
more easily manipulate the pre-load or post-load files.  I'd still want
pre/post load to be seperate though since this would be used in cases
where there's alot of data (hence the reason for the split) and putting
pre and post together and running them before data would slow things
down quite a bit.

Are there use cases for just --omit-post-load or --omit-pre-load?
Probably, but I just don't see any situation where I'd use them like
that.

        Thanks,

                Stephen

signature.asc (204 bytes) Download Attachment
123