Speeding up pg_upgrade

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Speeding up pg_upgrade

Bruce Momjian
As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
zero-downtime upgrades.  After the usual discussion of using logical
replication, Slony, and perhaps having the server be able to read old
and new system catalogs, we discussed speeding up pg_upgrade.

There are clusters that take a long time to dump the schema from the old
cluster and recreate it in the new cluster.  One idea of speeding up
pg_upgrade would be to allow pg_upgrade to be run in two stages:

1.  prevent system catalog changes while the old cluster is running, and
dump the old cluster's schema and restore it in the new cluster

2.  shut down the old cluster and copy/link the data files

My question is whether the schema dump/restore is time-consuming enough
to warrant this optional more complex API, and whether people would
support adding a server setting that prevented all system table changes?

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
Bruce,

* Bruce Momjian ([hidden email]) wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  After the usual discussion of using logical
> replication, Slony, and perhaps having the server be able to read old
> and new system catalogs, we discussed speeding up pg_upgrade.

Sounds familiar.

> There are clusters that take a long time to dump the schema from the old
> cluster and recreate it in the new cluster.  One idea of speeding up
> pg_upgrade would be to allow pg_upgrade to be run in two stages:
>
> 1.  prevent system catalog changes while the old cluster is running, and
> dump the old cluster's schema and restore it in the new cluster
>
> 2.  shut down the old cluster and copy/link the data files

Perhaps a bit more complicated, but couldn't we copy/link while the
old cluster is online and in backup mode, finish backup mode, shut down
the old cluster, and then play forward the WAL to catch any relation
extents being added or similar, and then flip to the new PG version?

> My question is whether the schema dump/restore is time-consuming enough
> to warrant this optional more complex API, and whether people would
> support adding a server setting that prevented all system table changes?

When you say 'system table changes', you're referring to basically all
DDL, right?  Just wish to clarify as there might be some confusion
between the terminology you're using here and allow_system_table_mods.

Would we need to have autovacuum shut down too..?

The other concern is if there's changes made to the catalogs by non-DDL
activity that needs to be addressed too (logical replication?); nothing
definite springs to mind off-hand for me, but perhaps others will think
of things.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Dave Page-7
In reply to this post by Bruce Momjian
Hi

On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <[hidden email]> wrote:
As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
zero-downtime upgrades.  After the usual discussion of using logical
replication, Slony, and perhaps having the server be able to read old
and new system catalogs, we discussed speeding up pg_upgrade.

There are clusters that take a long time to dump the schema from the old
cluster and recreate it in the new cluster.  One idea of speeding up
pg_upgrade would be to allow pg_upgrade to be run in two stages:

1.  prevent system catalog changes while the old cluster is running, and
dump the old cluster's schema and restore it in the new cluster

2.  shut down the old cluster and copy/link the data files

When we were discussing this, I was thinking that the linking could be done in phase 1 too, as that's potentially slow on a very large schema.
 

My question is whether the schema dump/restore is time-consuming enough
to warrant this optional more complex API, and whether people would
support adding a server setting that prevented all system table changes?

I've certainly heard of cases where pg_upgrade takes significant amounts of time to run on very complex databases. 

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Bruce Momjian
In reply to this post by Stephen Frost
On Tue, Dec  5, 2017 at 09:16:02AM -0500, Stephen Frost wrote:
> Bruce,
>
> * Bruce Momjian ([hidden email]) wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades.  After the usual discussion of using logical
> > replication, Slony, and perhaps having the server be able to read old
> > and new system catalogs, we discussed speeding up pg_upgrade.
>
> Sounds familiar.

Yeah.  :-|

> > There are clusters that take a long time to dump the schema from the old
> > cluster and recreate it in the new cluster.  One idea of speeding up
> > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> >
> > 1.  prevent system catalog changes while the old cluster is running, and
> > dump the old cluster's schema and restore it in the new cluster
> >
> > 2.  shut down the old cluster and copy/link the data files
>
> Perhaps a bit more complicated, but couldn't we copy/link while the
> old cluster is online and in backup mode, finish backup mode, shut down
> the old cluster, and then play forward the WAL to catch any relation
> extents being added or similar, and then flip to the new PG version?

Well, that would require reading the old WAL, which would add an
additional compibility requirement that seems unwise.

> > My question is whether the schema dump/restore is time-consuming enough
> > to warrant this optional more complex API, and whether people would
> > support adding a server setting that prevented all system table changes?
>
> When you say 'system table changes', you're referring to basically all
> DDL, right?  Just wish to clarify as there might be some confusion
> between the terminology you're using here and allow_system_table_mods.

Not only all DDL, but even updating them for the internal stuff, like
pg_class.relfrozenxid.

> Would we need to have autovacuum shut down too..?

Yes.

> The other concern is if there's changes made to the catalogs by non-DDL
> activity that needs to be addressed too (logical replication?); nothing
> definite springs to mind off-hand for me, but perhaps others will think
> of things.

Yes, it could extend to many parts of the system, which is why I am
asking if it is worth it.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Bruce Momjian
In reply to this post by Dave Page-7
On Tue, Dec  5, 2017 at 11:16:26PM +0900, Dave Page wrote:

> Hi
>
> On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <[hidden email]> wrote:
>
>     As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
>     zero-downtime upgrades.  After the usual discussion of using logical
>     replication, Slony, and perhaps having the server be able to read old
>     and new system catalogs, we discussed speeding up pg_upgrade.
>
>     There are clusters that take a long time to dump the schema from the old
>     cluster and recreate it in the new cluster.  One idea of speeding up
>     pg_upgrade would be to allow pg_upgrade to be run in two stages:
>
>     1.  prevent system catalog changes while the old cluster is running, and
>     dump the old cluster's schema and restore it in the new cluster
>
>     2.  shut down the old cluster and copy/link the data files
>
>
> When we were discussing this, I was thinking that the linking could be done in
> phase 1 too, as that's potentially slow on a very large schema.

Uh, good point!  You can create the hard links while system system is
running, no problem!  It would only be copy that can't be done while the
system is running.  Of course a big question is whether hard linking
takes any measurable time.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
In reply to this post by Dave Page-7
Dave,

* Dave Page ([hidden email]) wrote:

> On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <[hidden email]> wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades.  After the usual discussion of using logical
> > replication, Slony, and perhaps having the server be able to read old
> > and new system catalogs, we discussed speeding up pg_upgrade.
> >
> > There are clusters that take a long time to dump the schema from the old
> > cluster and recreate it in the new cluster.  One idea of speeding up
> > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> >
> > 1.  prevent system catalog changes while the old cluster is running, and
> > dump the old cluster's schema and restore it in the new cluster
> >
> > 2.  shut down the old cluster and copy/link the data files
>
> When we were discussing this, I was thinking that the linking could be done
> in phase 1 too, as that's potentially slow on a very large schema.
Right, I had that thought too when first reading this, but the problem
there is that new files can show up due to a relation being extended (at
least, and perhaps in other cases too..).

> > My question is whether the schema dump/restore is time-consuming enough
> > to warrant this optional more complex API, and whether people would
> > support adding a server setting that prevented all system table changes?
>
> I've certainly heard of cases where pg_upgrade takes significant amounts of
> time to run on very complex databases.

Right, but that doesn't really answer the question as to which part of
the pg_upgrade process is taking up the time.

In any case, of course, if we're able to move part of what pg_upgrade
does to be while the old server is online then that takes whatever the
cost of that is out of the downtime window.  The question is if that's a
5% improvement in the overall performance of pg_upgrade or a 70% one.
This will be case-by-case, of course, but if, in the best-case, we only
get a 5% improvement then this might not be worth the risk.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Bruce Momjian
On Tue, Dec  5, 2017 at 09:23:49AM -0500, Stephen Frost wrote:

> Dave,
>
> * Dave Page ([hidden email]) wrote:
> > On Tue, Dec 5, 2017 at 11:01 PM, Bruce Momjian <[hidden email]> wrote:
> > > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > > zero-downtime upgrades.  After the usual discussion of using logical
> > > replication, Slony, and perhaps having the server be able to read old
> > > and new system catalogs, we discussed speeding up pg_upgrade.
> > >
> > > There are clusters that take a long time to dump the schema from the old
> > > cluster and recreate it in the new cluster.  One idea of speeding up
> > > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> > >
> > > 1.  prevent system catalog changes while the old cluster is running, and
> > > dump the old cluster's schema and restore it in the new cluster
> > >
> > > 2.  shut down the old cluster and copy/link the data files
> >
> > When we were discussing this, I was thinking that the linking could be done
> > in phase 1 too, as that's potentially slow on a very large schema.
>
> Right, I had that thought too when first reading this, but the problem
> there is that new files can show up due to a relation being extended (at
> least, and perhaps in other cases too..).

Oh, yikes, yes.

> > > My question is whether the schema dump/restore is time-consuming enough
> > > to warrant this optional more complex API, and whether people would
> > > support adding a server setting that prevented all system table changes?
> >
> > I've certainly heard of cases where pg_upgrade takes significant amounts of
> > time to run on very complex databases.
>
> Right, but that doesn't really answer the question as to which part of
> the pg_upgrade process is taking up the time.
>
> In any case, of course, if we're able to move part of what pg_upgrade
> does to be while the old server is online then that takes whatever the
> cost of that is out of the downtime window.  The question is if that's a
> 5% improvement in the overall performance of pg_upgrade or a 70% one.
> This will be case-by-case, of course, but if, in the best-case, we only
> get a 5% improvement then this might not be worth the risk.

Yes, and who is going to know if they have a setup where the more
complex API is worth it?  pg_upgrade is already complex enough to use.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
In reply to this post by Bruce Momjian
Bruce,

* Bruce Momjian ([hidden email]) wrote:

> On Tue, Dec  5, 2017 at 09:16:02AM -0500, Stephen Frost wrote:
> > > There are clusters that take a long time to dump the schema from the old
> > > cluster and recreate it in the new cluster.  One idea of speeding up
> > > pg_upgrade would be to allow pg_upgrade to be run in two stages:
> > >
> > > 1.  prevent system catalog changes while the old cluster is running, and
> > > dump the old cluster's schema and restore it in the new cluster
> > >
> > > 2.  shut down the old cluster and copy/link the data files
> >
> > Perhaps a bit more complicated, but couldn't we copy/link while the
> > old cluster is online and in backup mode, finish backup mode, shut down
> > the old cluster, and then play forward the WAL to catch any relation
> > extents being added or similar, and then flip to the new PG version?
>
> Well, that would require reading the old WAL, which would add an
> additional compibility requirement that seems unwise.
In my proposal, this would be the old version of PG reading the old WAL.

Thinking about it a bit further though, I'm not sure it'd end up working
in link mode anyway, due to post-backup-finish changes that could be
made by the old server on the data files before it's shut down.

We have to have a way of dealing with the delta between the hard link
trees after the old server is shut down though because there could be
new relation extents, at least.

> > > My question is whether the schema dump/restore is time-consuming enough
> > > to warrant this optional more complex API, and whether people would
> > > support adding a server setting that prevented all system table changes?
> >
> > When you say 'system table changes', you're referring to basically all
> > DDL, right?  Just wish to clarify as there might be some confusion
> > between the terminology you're using here and allow_system_table_mods.
>
> Not only all DDL, but even updating them for the internal stuff, like
> pg_class.relfrozenxid.
Good point.  We'd really need a pretty bullet-proof way to ensure that
the catalog isn't changed during this time period and that seems like it
might be difficult without a lot of work.

> > Would we need to have autovacuum shut down too..?
>
> Yes.

Ok, makes sense.

> > The other concern is if there's changes made to the catalogs by non-DDL
> > activity that needs to be addressed too (logical replication?); nothing
> > definite springs to mind off-hand for me, but perhaps others will think
> > of things.
>
> Yes, it could extend to many parts of the system, which is why I am
> asking if it is worth it.

My initial reaction is that it's worth it, but then I also wonder about
other issues (having to get an ANALYZE done on the new cluster before
opening it up, for example..) and it makes me wonder if perhaps it'll
end up being too much risk for too little gain.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
In reply to this post by Bruce Momjian
Bruce,

* Bruce Momjian ([hidden email]) wrote:
> > In any case, of course, if we're able to move part of what pg_upgrade
> > does to be while the old server is online then that takes whatever the
> > cost of that is out of the downtime window.  The question is if that's a
> > 5% improvement in the overall performance of pg_upgrade or a 70% one.
> > This will be case-by-case, of course, but if, in the best-case, we only
> > get a 5% improvement then this might not be worth the risk.
>
> Yes, and who is going to know if they have a setup where the more
> complex API is worth it?  pg_upgrade is already complex enough to use.

Sure, but the solution there is really to make pg_upgrade simpler to
use, even as we add these more complicated APIs to it.  What that likely
means in practical terms is that we have another utility, which uses
pg_upgrade underneath, that you're able to configure to know about your
existing cluster and the version of PG you want to upgrade to and where
you want it and if you want a copy or if hard-links are ok, etc.

Having such a tool is actually what I'd been hoping would come out of
the documented process for doing a "pg_upgrade" on replicas that's
currently in our documentation.  That's not happened yet, but it's
something that David Steele and I have been chatting about because the
procedure in the documentation is terribly difficult and dangerous for
those who aren't as familiar with the system.

Perhaps we could have one tool that handles both the more complicated
pg_upgrade API and deals with upgrading replicas.  Alternatively, we
could have a config file for pg_upgrade instead which might be a simpler
way for people to describe exactly their current configuration and what
they'd like to go to.  Upgrading replicas involves using something like
SSH though..

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Peter Eisentraut-6
In reply to this post by Stephen Frost
On 12/5/17 09:23, Stephen Frost wrote:
> Right, but that doesn't really answer the question as to which part of
> the pg_upgrade process is taking up the time.

Yeah, that should be measured before we do anything.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Bruce Momjian
In reply to this post by Stephen Frost
On Tue, Dec  5, 2017 at 09:30:53AM -0500, Stephen Frost wrote:

> > > The other concern is if there's changes made to the catalogs by non-DDL
> > > activity that needs to be addressed too (logical replication?); nothing
> > > definite springs to mind off-hand for me, but perhaps others will think
> > > of things.
> >
> > Yes, it could extend to many parts of the system, which is why I am
> > asking if it is worth it.
>
> My initial reaction is that it's worth it, but then I also wonder about
> other issues (having to get an ANALYZE done on the new cluster before
> opening it up, for example..) and it makes me wonder if perhaps it'll
> end up being too much risk for too little gain.

Yes, dump/reload of analyze statistics seems like a better use of time.
I have avoided it since it locks us into supporting the text
respresentation of data type, but at this point it might be worth it.

--
  Bruce Momjian  <[hidden email]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Alexander Kukushkin
Hi,


Yes, dump/reload of analyze statistics seems like a better use of time.
I have avoided it since it locks us into supporting the text
respresentation of data type, but at this point it might be worth it.


Couple of months ago we at Zalando upgraded a few databases of different sizes to 9.6.
During preparations to the I've found 2.5 pain-points:

1. We are using schema-based api deployment. Basically ~every week we create a new schema in the database and hundreds of stored procedures in it.
    Off course we remove old API schemas and trying not to keep more than last 10. Before the upgrade we basically dropped all API schemas except the one used in production.
    And even in this case dump-restore phase was taking much more time than relinking of datafiles.
    Unfortunately I don't have any numbers right now, but usually run of pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was spend in dump-restore.

2 ANALYZE phase is a pain. I think everybody agrees with it. 

2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes reasonable, except one case: some of the columns might have non default statistics target.
    It breaks `vacuumdb --analyze-in-stages`, because those specific columns it will not use value of default_statistics_target provided by vacuumdb.
    What I did - reset those non default values right before running pg_upgrade and restored them only when analyze was completed. Off course after that I've re-analyze those columns.



Regards,
Alexander Kukushkin
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
Alexander,

* Alexander Kukushkin ([hidden email]) wrote:
> Couple of months ago we at Zalando upgraded a few databases of different
> sizes to 9.6.

Thanks for sharing your experience!

> During preparations to the I've found 2.5 pain-points:
>
> 1. We are using schema-based api deployment. Basically ~every week we
> create a new schema in the database and hundreds of stored procedures in it.
>     Off course we remove old API schemas and trying not to keep more than
> last 10. Before the upgrade we basically dropped all API schemas except the
> one used in production.
>     And even in this case dump-restore phase was taking much more time than
> relinking of datafiles.
>     Unfortunately I don't have any numbers right now, but usually run of
> pg_upgrade was taking about 30-35 seconds, and about 2/3 of the time was
> spend in dump-restore.
Ok, so eliminating 2/3 of the time would mean bringing it down to more
like 10 seconds.  That certainly seems worthwhile to me.  With the
linking time being much less than the dump/restore, we could at least
consider moving forward with Bruce's original idea where we do the
dump/restore while the system is online but then the linking with it
offline and get a serious performance boost out of it.  That also avoids
the issue with new files showing up while the system is running that I
brought up when we were talking about having the linking done with the
system online.

> 2 ANALYZE phase is a pain. I think everybody agrees with it.
>
> 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> reasonable, except one case: some of the columns might have non default
> statistics target.

Ok, if the stage-1 is very fast and performance is reasonable enough
after that then perhaps it's not so bad to keep it as-is for now and
focus on the dump/restore time.  That said, we should certainly also
work on improving this too.

>     It breaks `vacuumdb --analyze-in-stages`, because those specific
> columns it will not use value of default_statistics_target provided by
> vacuumdb.
>     What I did - reset those non default values right before running
> pg_upgrade and restored them only when analyze was completed. Off course
> after that I've re-analyze those columns.

Ah, yeah, ouch, that's unfortuante..  I wonder if there's something we
could do there to fix it..

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Álvaro Herrera
Stephen Frost wrote:
> Alexander,

> * Alexander Kukushkin ([hidden email]) wrote:

> > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> >
> > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > reasonable, except one case: some of the columns might have non default
> > statistics target.
>
> Ok, if the stage-1 is very fast and performance is reasonable enough
> after that then perhaps it's not so bad to keep it as-is for now and
> focus on the dump/restore time.  That said, we should certainly also
> work on improving this too.

It seems pretty clear to me that we should somehow transfer stats from
the old server to the new one.  Shouldn't it just be a matter of
serializing the MCV/histogram/ndistinct values, then have capabilities
to load on the new server?  I suppose it'd just be used during binary
upgrade, but the point seems painful enough for a lot of users.
Obviously it would not be the raw contents of pg_statistic{,_ext}, but
rather something a bit higher-level.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
Alvaro,

* Alvaro Herrera ([hidden email]) wrote:

> Stephen Frost wrote:
> > * Alexander Kukushkin ([hidden email]) wrote:
>
> > > 2 ANALYZE phase is a pain. I think everybody agrees with it.
> > >
> > > 2.5 Usually ANALYZE stage 1 completes quite fast and performance becomes
> > > reasonable, except one case: some of the columns might have non default
> > > statistics target.
> >
> > Ok, if the stage-1 is very fast and performance is reasonable enough
> > after that then perhaps it's not so bad to keep it as-is for now and
> > focus on the dump/restore time.  That said, we should certainly also
> > work on improving this too.
>
> It seems pretty clear to me that we should somehow transfer stats from
> the old server to the new one.  Shouldn't it just be a matter of
> serializing the MCV/histogram/ndistinct values, then have capabilities
> to load on the new server?  I suppose it'd just be used during binary
> upgrade, but the point seems painful enough for a lot of users.
> Obviously it would not be the raw contents of pg_statistic{,_ext}, but
> rather something a bit higher-level.
Right, I think that's what Bruce was getting at and certainly makes
sense to me as well.  I agree that it's a definite pain point for
people.  One complication is going to be custom data types, of course..

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Justin Pryzby
In reply to this post by Bruce Momjian
On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
>
> There are clusters that take a long time to dump the schema from the old
> cluster

Maybe it isn't representative of a typical case, but I can offer a data point:

For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
(for which ~90% is on a ZFS tablespace with compression).  We have what's
traditionally considered to be an excessive number of child tables, which works
okay since planning time is unimportant to us for the report queries which hit
them.  Some of the tables are wide (historically up to 1600 columns).  Some of
those have default values on nearly every column, and pg_attrdef was large
(>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).  Since
something similar is run by pg_upgrade, I worked around the issue for now by
dropping defaults on the historic children in advance of upgrades (at some
point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's not
the first time we've seen an issue with larger number of children*columns.

Our slowest pg-upgrade was ~40min, caused by column defaults in a case where I
failed to re-DROP DEFAULTs after our first scheduled upgrade date was pushed
back by over a month.  Most of the rest were completed in less than 15min.

Justin

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Tom Lane-2
In reply to this post by Álvaro Herrera
Alvaro Herrera <[hidden email]> writes:
> It seems pretty clear to me that we should somehow transfer stats from
> the old server to the new one.  Shouldn't it just be a matter of
> serializing the MCV/histogram/ndistinct values, then have capabilities
> to load on the new server?

The reason pg_upgrade hasn't done that in the past is not wishing to
assume that the new version does stats identically to the old version.
Since we do in fact add stats or change stuff around from time to time,
that's not a negligible consideration.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Robert Haas
On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <[hidden email]> wrote:

> Alvaro Herrera <[hidden email]> writes:
>> It seems pretty clear to me that we should somehow transfer stats from
>> the old server to the new one.  Shouldn't it just be a matter of
>> serializing the MCV/histogram/ndistinct values, then have capabilities
>> to load on the new server?
>
> The reason pg_upgrade hasn't done that in the past is not wishing to
> assume that the new version does stats identically to the old version.
> Since we do in fact add stats or change stuff around from time to time,
> that's not a negligible consideration.

Yes, but we don't do that for every release.  We could put rules into
pg_upgrade about which releases changed the stats format incompatibly,
and not transfer the stats when crossing between two releases with
incompatible formats.  That's more than zero effort, of course, but it
might be worth it.  We've already got CATALOG_VERSION_NO,
XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.
Now, I have to admit that I find the process of trying to remember to
bump the correct set of version numbers in every commit just a tad
frustrating; it adds a cognitive burden I'd just as well skip.
However, the failure to transfer stats over the years seems to have
actually caused real problems for many users, so I think in this case
we might be best off sucking it up and adding one more version number.

We might even want to make it a little more fine-grained and track it
separately by data type, but I'm not sure if that's really worth it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Stephen Frost
Robert, all,

* Robert Haas ([hidden email]) wrote:

> On Thu, Dec 7, 2017 at 11:42 AM, Tom Lane <[hidden email]> wrote:
> > Alvaro Herrera <[hidden email]> writes:
> >> It seems pretty clear to me that we should somehow transfer stats from
> >> the old server to the new one.  Shouldn't it just be a matter of
> >> serializing the MCV/histogram/ndistinct values, then have capabilities
> >> to load on the new server?
> >
> > The reason pg_upgrade hasn't done that in the past is not wishing to
> > assume that the new version does stats identically to the old version.
> > Since we do in fact add stats or change stuff around from time to time,
> > that's not a negligible consideration.
>
> Yes, but we don't do that for every release.  We could put rules into
> pg_upgrade about which releases changed the stats format incompatibly,
> and not transfer the stats when crossing between two releases with
> incompatible formats.  That's more than zero effort, of course, but it
> might be worth it.  We've already got CATALOG_VERSION_NO,
> XLOG_PAGE_MAGIC, PG_CONTROL_VERSION, PG_PROTOCOL_LATEST,
> BTREE_VERSION, HASH_VERSION, BRIN_CURRENT_VERSION,
> GIN_CURRENT_VERSION, LOGICALREP_PROTO_VERSION_NUM,
> PG_PAGE_LAYOUT_VERSION, PG_DATA_CHECKSUM_VERSION, K_VERS_MAJOR,
> K_VERS_MINOR, K_VERS_REV, and the utterly unused MIGRATOR_API_VERSION.
If we go down that route, since this makes a pretty serious difference
in terms of what the user has to deal with post-pg_upgrade, I'd suggest
we require an additional option for the user to pass when stats aren't
going to be migrated, so they are aware of that.

The concern I have hear is that we end up changing things in v13 and
suddenly everyone has to re-analyze but they didn't to go from 10->11 or
11->12 and they'll get caught off-guard by it.

Of course, this might end up having an entirely different effect: it
might mean that we're suddenly a lot shier about changing the stats in a
backwards-incompatible way, just as we now are basically stuck with the
existing on-disk heap format..

> Now, I have to admit that I find the process of trying to remember to
> bump the correct set of version numbers in every commit just a tad
> frustrating; it adds a cognitive burden I'd just as well skip.

Agreed, would be great if we could improve on this.

> However, the failure to transfer stats over the years seems to have
> actually caused real problems for many users, so I think in this case
> we might be best off sucking it up and adding one more version number.

Yes, it's definitely been an issue for users.

> We might even want to make it a little more fine-grained and track it
> separately by data type, but I'm not sure if that's really worth it.

This would have the nice property that we could just re-analyze the data
types where things changed, something that's more likely to happen with
new data types than existing ones, I'd guess, and so that might be much
more reasonable for users.

Thanks!

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Speeding up pg_upgrade

Álvaro Herrera
In reply to this post by Tom Lane-2
Tom Lane wrote:

> Alvaro Herrera <[hidden email]> writes:
> > It seems pretty clear to me that we should somehow transfer stats from
> > the old server to the new one.  Shouldn't it just be a matter of
> > serializing the MCV/histogram/ndistinct values, then have capabilities
> > to load on the new server?
>
> The reason pg_upgrade hasn't done that in the past is not wishing to
> assume that the new version does stats identically to the old version.
> Since we do in fact add stats or change stuff around from time to time,
> that's not a negligible consideration.

Sure, but the new version can probably limp along with incomplete stats
until the next natural ANALYZE runs -- the system is operational in much
shorter time than if you have to make it all wait for the post-upgrade
full-database ANALYZE run.  The serialization step is so that the
underlying representation doesn't have to remain identical -- surely the
new server would be able to represent whatever the old server was able
to, regardless of any improvement made.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

12
Previous Thread Next Thread