Date vs Timestamp without timezone Partition Key

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Date vs Timestamp without timezone Partition Key

Cedric Leong
Just in case someone is interested enough to answer this. Does anyone know if the performance for a date column vs a timestamp column as a partition key is large? What i mean with large is that say you have 6 partitions with 10GB each. Would it be a 10 second+ difference? An explanation of how this internally works would be appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: Date vs Timestamp without timezone Partition Key

Tom Lane-2
Cedric Leong <[hidden email]> writes:
> Just in case someone is interested enough to answer this. Does anyone know
> if the performance for a date column vs a timestamp column as a partition
> key is large?

I doubt it's even measurable, at least on 64-bit machines.  You're
basically talking about 32-bit integer comparisons vs 64-bit integer
comparisons.

On a 32-bit machine it's possible that an index on a date column
will be physically smaller, so you could get some wins from reduced
I/O.  But on (most?) 64-bit machines that difference goes away too,
because of alignment restrictions.

As always, YMMV; it never hurts to do your own testing.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Date vs Timestamp without timezone Partition Key

Cedric Leong
Somewhat unrelated but note to anyone who wants to swap out partition keys. Don't create a clone of the table with the new partition key and insert data. It messes up the query planner massively and makes everything much slower.

On Mon, May 25, 2020 at 12:48 AM Tom Lane <[hidden email]> wrote:
Cedric Leong <[hidden email]> writes:
> Just in case someone is interested enough to answer this. Does anyone know
> if the performance for a date column vs a timestamp column as a partition
> key is large?

I doubt it's even measurable, at least on 64-bit machines.  You're
basically talking about 32-bit integer comparisons vs 64-bit integer
comparisons.

On a 32-bit machine it's possible that an index on a date column
will be physically smaller, so you could get some wins from reduced
I/O.  But on (most?) 64-bit machines that difference goes away too,
because of alignment restrictions.

As always, YMMV; it never hurts to do your own testing.

                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: Date vs Timestamp without timezone Partition Key

David Rowley
On Sat, 6 Jun 2020 at 14:12, Cedric Leong <[hidden email]> wrote:
> Somewhat unrelated but note to anyone who wants to swap out partition keys. Don't create a clone of the table with the new partition key and insert data. It messes up the query planner massively and makes everything much slower.

That complaint would have more meaning if you'd mentioned which
version of PostgreSQL you're using. The performance of partitioning in
PostgreSQL has changed significantly over the past 3 releases. Also
would be useful to know what you've actually done (actual commands).
I can't imagine it makes *everything* slower, so it might be good to
mention what is actually slower.

David


Reply | Threaded
Open this post in threaded view
|

Re: Date vs Timestamp without timezone Partition Key

Cedric Leong
It's less of a complaint rather than just a warning not to do what I did.

Version:
PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit

The actual command list would probably be impractical to put in here just because the majority of it would just be creating a large amount of partition tables. But in summary what i've done is basically this:
Existing database has a partitioned fact table
1.  Create an exact clone of that partitioned fact table which includes all the same indexes, columns, and partitioned tables 
2. Change the partitioned table's partition key from an indexed date column to an indexed timestamp without timezone column
3. Do an insert into from the old partitioned fact table to the new partitioned fact table which includes all the same rows (insert into since i wanted the timestamp without timezone column to be in a new timezone)
4. Switch the names of the tables so the new one will be the one that's used
5. VACUUM FULL; ANALYZE;

For my use case which is a data warehouse star schema, this fact table is basically the base table of every report. To be more specific, the reports I've tested on varied from 2x slower to 4x slower. From what I see so far that's because the query plan is drastically different for both. An example of this test would look like this: https://explain.depesz.com/s/6rP8 and https://explain.depesz.com/s/cLUY
These tests are running the exact same query on two different tables with the exception that they use their respective partition keys.


On Fri, Jun 5, 2020 at 10:17 PM David Rowley <[hidden email]> wrote:
On Sat, 6 Jun 2020 at 14:12, Cedric Leong <[hidden email]> wrote:
> Somewhat unrelated but note to anyone who wants to swap out partition keys. Don't create a clone of the table with the new partition key and insert data. It messes up the query planner massively and makes everything much slower.

That complaint would have more meaning if you'd mentioned which
version of PostgreSQL you're using. The performance of partitioning in
PostgreSQL has changed significantly over the past 3 releases. Also
would be useful to know what you've actually done (actual commands).
I can't imagine it makes *everything* slower, so it might be good to
mention what is actually slower.

David
Reply | Threaded
Open this post in threaded view
|

Re: Date vs Timestamp without timezone Partition Key

David Rowley
On Sat, 6 Jun 2020 at 14:49, Cedric Leong <[hidden email]> wrote:
> It's less of a complaint rather than just a warning not to do what I did.

My point was really that nobody really knew what you did or what you
did it on. So it didn't seem like a worthwhile warning as it
completely lacked detail.

> These tests are running the exact same query on two different tables with the exception that they use their respective partition keys.

Are you sure?  It looks like the old one does WHERE date =
((now())::date - '7 days'::interval) and the new version does
(date(created_at) = ((now())::date - '7 days'::interval). I guess you
renamed date to "created_at" and changed the query to use date(). If
that expression is not indexed then I imagine that would be a good
reason for the planner to have moved away from using the index on that
column. Also having date(created_at) will also not allow run-time
pruning to work since your partition key is "created_at".

You might be able to change the query to query a range of value on the
new timestamp column. This will allow you to get rid of the date()
function. For example:

where created_at >= date_trunc('day', now() - '7 days'::interval) and
created_at < date_trunc('day', now() - '6 days'::interval)

David


Reply | Threaded
Open this post in threaded view
|

Re: Date vs Timestamp without timezone Partition Key

Cedric Leong
I can confirm that was the issue, after removing the expression and using only what was indexed it definitely fixed the query plan. I appreciate all the help you've given me, I didn't really think to look there but it makes a ton of sense that a filter on the database would only work well if it's indexed.

Thanks again,

On Fri, Jun 5, 2020 at 11:13 PM David Rowley <[hidden email]> wrote:
On Sat, 6 Jun 2020 at 14:49, Cedric Leong <[hidden email]> wrote:
> It's less of a complaint rather than just a warning not to do what I did.

My point was really that nobody really knew what you did or what you
did it on. So it didn't seem like a worthwhile warning as it
completely lacked detail.

> These tests are running the exact same query on two different tables with the exception that they use their respective partition keys.

Are you sure?  It looks like the old one does WHERE date =
((now())::date - '7 days'::interval) and the new version does
(date(created_at) = ((now())::date - '7 days'::interval). I guess you
renamed date to "created_at" and changed the query to use date(). If
that expression is not indexed then I imagine that would be a good
reason for the planner to have moved away from using the index on that
column. Also having date(created_at) will also not allow run-time
pruning to work since your partition key is "created_at".

You might be able to change the query to query a range of value on the
new timestamp column. This will allow you to get rid of the date()
function. For example:

where created_at >= date_trunc('day', now() - '7 days'::interval) and
created_at < date_trunc('day', now() - '6 days'::interval)

David