[PERFORM] Can anyone explain this: duplicate dbs.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[PERFORM] Can anyone explain this: duplicate dbs.

SpaceBallOne
Wondering if someone could explain a pecularity for me:

We have a database which takes 1000ms to perform a certain query on.

If I pg_dump that database then create a new database (e.g. "tempdb") and upload the dump file (thus making a duplicate) then the same query only takes 190ms !!
Vacuum, vacuum analyse, and vacuum full analyse does not seem to have an impact on these times.

Can anyone explain why this may be occurring and how I might be able to keep the original database running at the same speed as "tempdb"?

Thanks in advance,

Dave.
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

Christopher Kings-Lynne
> Can anyone explain why this may be occurring and how I might be able to
> keep the original database running at the same speed as "tempdb"?

You're not vacuuming anywhere near often enough.  Read up the database
maintenance section of the manual.  Then, set up contrib/pg_autovacuum
to vacuum your database regularly, or make a cron job to run "vacuumdb
-a -z -q" once an hour, say.

You can fix for the case when you haven't been vacuuming enough by a
once off VACUUM FULL ANALYZE command, but this will lock tables
exclusively as it does its work.

Chris

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

Christopher Kings-Lynne
In reply to this post by SpaceBallOne
> If I pg_dump that database then create a new database (e.g. "tempdb")
> and upload the dump file (thus making a duplicate) then the same query
> only takes 190ms !!
> Vacuum, vacuum analyse, and vacuum full analyse does not seem to have an
> impact on these times.

Damn, for some reason I didn't read that you had already tried vacuum
full.  In that case, I can't explain it except perhaps you aren't
vacuuming properly, or the right thing, or it's a disk cache thing.

Chris

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

John Arbash Meinel
In reply to this post by SpaceBallOne
SpaceBallOne wrote:

> Wondering if someone could explain a pecularity for me:
>
> We have a database which takes 1000ms to perform a certain query on.
>
> If I pg_dump that database then create a new database (e.g. "tempdb")
> and upload the dump file (thus making a duplicate) then the same query
> only takes 190ms !!
> Vacuum, vacuum analyse, and vacuum full analyse does not seem to have
> an impact on these times.
>
> Can anyone explain why this may be occurring and how I might be able
> to keep the original database running at the same speed as "tempdb"?
>
> Thanks in advance,
>
> Dave.
What version of postgres?

There are a few possibilities. If you are having a lot of updates to the
table, you can get index bloat. And vacuum doesn't fix indexes. You have
to "REINDEX" to do that. Though REINDEX has the same lock that VACUUM
FULL has, so you need to be a little careful with it.

Probably better is to do CLUSTER, as it does a REINDEX and a sort, so
your table ends up nicer when you are done.

Also, older versions of postgres had a worse time with index bloat. One
thing that caused a lot of problem is a table that you insert into over
time, so that all the values are incrementing. If you are deleting older
entries, that area won't be re-used because they fall at the back end. I
believe newer versions have been fixed.

By the way, I think doing:

CREATE DATABASE tempdb WITH TEMPLATE = originaldb;

Is a much faster way of doing dump and load. I *think* it would recreate
indexes, etc. If it just does a copy it may not show the dump/restore
improvement.

John
=:->


signature.asc (261 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

SpaceBallOne
> What version of postgres?

8.0.2 ... but I think I've seen this before on 7.3 ...

> There are a few possibilities. If you are having a lot of updates to the
> table, you can get index bloat. And vacuum doesn't fix indexes. You have
> to "REINDEX" to do that. Though REINDEX has the same lock that VACUUM
> FULL has, so you need to be a little careful with it.

> Probably better is to do CLUSTER, as it does a REINDEX and a sort, so
> your table ends up nicer when you are done.

Thanks, will try those next time this problem crops up (i just deleted /
recreated the database to speed things for its users in the office ...
probably should have held off to see if I could find a solution first!).

Yes, the database / table-in-question does have a lot of updates, deletes,
and new rows (relatively speaking for a small business).

Would CLUSTER / REINDEX still have an effect if our queries were done via
sequential scan? This is a old database (as in built by me when i was just
starting to learn unix / postgres) so the database design is pretty horrible
(little normalisation, no indexes).

Have taken Chris's advice onboard too and setup cron to do a vacuumdb hourly
instead of my weekly vacuum.

Cheers,

Dave.




---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [hidden email] so that your
      message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

John Arbash Meinel
SpaceBallOne wrote:

>> What version of postgres?
>
>
> 8.0.2 ... but I think I've seen this before on 7.3 ...
>
>> There are a few possibilities. If you are having a lot of updates to the
>> table, you can get index bloat. And vacuum doesn't fix indexes. You have
>> to "REINDEX" to do that. Though REINDEX has the same lock that VACUUM
>> FULL has, so you need to be a little careful with it.
>
>
>> Probably better is to do CLUSTER, as it does a REINDEX and a sort, so
>> your table ends up nicer when you are done.
>
>
> Thanks, will try those next time this problem crops up (i just deleted
> / recreated the database to speed things for its users in the office
> ... probably should have held off to see if I could find a solution
> first!).
>
> Yes, the database / table-in-question does have a lot of updates,
> deletes, and new rows (relatively speaking for a small business).
>
> Would CLUSTER / REINDEX still have an effect if our queries were done
> via sequential scan? This is a old database (as in built by me when i
> was just starting to learn unix / postgres) so the database design is
> pretty horrible (little normalisation, no indexes).
Well, my first recommendation is to put in some indexes. :) They are
relatively easy to setup and can drastically improve select performance.

What version of postgres are you using?
What does it say at the end of "VACUUM FULL ANALYZE VERBOSE", that
should tell you how many free pages were reclaimed and how big your free
space map should be.

If you only did 1 VACUUM FULL, you might try another, as it sounds like
your tables aren't properly filled. I'm pretty sure vacuum only removes
empty pages/marks locations for the free space map so they can be
re-used, while vacuum full will move entries around to create free pages.

It sounds like it didn't do it properly.

But even so, CLUSTER is still your friend, as it allows you to "presort"
the rows in your tables.

>
> Have taken Chris's advice onboard too and setup cron to do a vacuumdb
> hourly instead of my weekly vacuum.
>
> Cheers,
>
> Dave.
>
>
John
=:->


signature.asc (261 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

Christopher Kings-Lynne
In reply to this post by SpaceBallOne
> Would CLUSTER / REINDEX still have an effect if our queries were done
> via sequential scan?

SELECTS don't write to the database, so they have no effect at all on
vacuuming/analyzing.  You only need to worry about that with writes.

> This is a old database (as in built by me when i
> was just starting to learn unix / postgres) so the database design is
> pretty horrible (little normalisation, no indexes).

No indexes?  Bloody hell :D

Use EXPLAIN ANALYZE SELECT ... ; on all of your selects to see where
they are slow and where you can add indexes...

Chris

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

Tom Lane-2
In reply to this post by Christopher Kings-Lynne
Christopher Kings-Lynne <[hidden email]> writes:
>> If I pg_dump that database then create a new database (e.g. "tempdb")
>> and upload the dump file (thus making a duplicate) then the same query
>> only takes 190ms !!
>> Vacuum, vacuum analyse, and vacuum full analyse does not seem to have an
>> impact on these times.

> Damn, for some reason I didn't read that you had already tried vacuum
> full.

I'm thinking index bloat, and a PG version too old for vacuum full to
recover any index space.  But without any information about PG version
or EXPLAIN ANALYZE results, we're all just guessing.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [hidden email])
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

Tom Lane-2
In reply to this post by John Arbash Meinel
John A Meinel <[hidden email]> writes:
> By the way, I think doing:

> CREATE DATABASE tempdb WITH TEMPLATE = originaldb;

> Is a much faster way of doing dump and load. I *think* it would recreate
> indexes, etc. If it just does a copy it may not show the dump/restore
> improvement.

CREATE DATABASE just does a physical copy, so it won't do anything at
all for bloat issues.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [hidden email] so that your
      message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|

Re: [PERFORM] Can anyone explain this: duplicate dbs.

Bruno Wolff III
In reply to this post by John Arbash Meinel
On Tue, May 24, 2005 at 21:39:15 -0500,
  John A Meinel <[hidden email]> wrote:
>
> By the way, I think doing:
>
> CREATE DATABASE tempdb WITH TEMPLATE = originaldb;
>
> Is a much faster way of doing dump and load. I *think* it would recreate
> indexes, etc. If it just does a copy it may not show the dump/restore
> improvement.

You need to be careful when doing this. See section 18.3 of the 8.0 docs
for caveats.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster