truncating timestamps on arbitrary intervals

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

truncating timestamps on arbitrary intervals

John Naylor-2
Hi,

When analyzing time-series data, it's useful to be able to bin
timestamps into equally spaced ranges. date_trunc() is only able to
bin on a specified whole unit. In the attached patch for the March
commitfest, I propose a new function date_trunc_interval(), which can
truncate to arbitrary intervals, e.g.:

select date_trunc_interval('15 minutes', timestamp '2020-02-16
20:48:40'); date_trunc_interval
---------------------
 2020-02-16 20:45:00
(1 row)

With this addition, it might be possible to turn the existing
date_trunc() functions into wrappers. I haven't done that here because
it didn't seem practical at this point. For one, the existing
functions have special treatment for weeks, centuries, and millennia.

Note: I've only written the implementation for the type timestamp
without timezone. Adding timezone support would be pretty simple, but
I wanted to get feedback on the basic idea first before making it
complete. I've also written tests and very basic documentation.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v1-datetrunc_interval.patch (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

David Fetter
On Wed, Feb 26, 2020 at 10:50:19AM +0800, John Naylor wrote:
> Hi,
>
> When analyzing time-series data, it's useful to be able to bin
> timestamps into equally spaced ranges. date_trunc() is only able to
> bin on a specified whole unit.

Thanks for adding this very handy feature!

> In the attached patch for the March
> commitfest, I propose a new function date_trunc_interval(), which can
> truncate to arbitrary intervals, e.g.:
>
> select date_trunc_interval('15 minutes', timestamp '2020-02-16
> 20:48:40'); date_trunc_interval
> ---------------------
>  2020-02-16 20:45:00
> (1 row)

I believe the following should error out, but doesn't.

# SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
 date_trunc_interval
═════════════════════
 2001-01-01 00:00:00
(1 row)

> With this addition, it might be possible to turn the existing
> date_trunc() functions into wrappers. I haven't done that here because
> it didn't seem practical at this point. For one, the existing
> functions have special treatment for weeks, centuries, and millennia.

I agree that turning it into a wrapper would be separate work.

> Note: I've only written the implementation for the type timestamp
> without timezone. Adding timezone support would be pretty simple,
> but I wanted to get feedback on the basic idea first before making
> it complete. I've also written tests and very basic documentation.

Please find attached an update that I believe fixes the bug I found in
a principled way.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

v2-0001-Add-date_trunc_interval-interval-timestamp.patch (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
On Wed, Feb 26, 2020 at 3:51 PM David Fetter <[hidden email]> wrote:
>
> I believe the following should error out, but doesn't.
>
> # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
>  date_trunc_interval
> ═════════════════════
>  2001-01-01 00:00:00
> (1 row)

You're quite right. I forgot to add error checking for
second-and-below units. I've added your example to the tests. (I
neglected to mention in my first email that because I chose to convert
the interval to the pg_tm struct (seemed easiest), it's not
straightforward how to allow multiple unit types, and I imagine the
use case is small, so I had it throw an error.)

> Please find attached an update that I believe fixes the bug I found in
> a principled way.

Thanks for that! I made a couple adjustments and incorporated your fix
into v3: While working on v1, I noticed the DTK_FOO macros already had
an idiom for bitmasking (see utils/datetime.h), so I used that instead
of a bespoke enum. Also, since the bitmask is checked once, I removed
the individual member checks, allowing me to remove all the gotos.

There's another small wrinkle: Since we store microseconds internally,
it's neither convenient nor useful to try to error out for things like
'2 ms 500 us', since that is just as well written as '2500 us', and
stored exactly the same. I'm inclined to just skip the millisecond
check and just use microseconds, but I haven't done that yet.

Also, I noticed this bug in v1:

SELECT date_trunc_interval('17 days', TIMESTAMP '2001-02-16 20:38:40.123456');
 date_trunc_interval
---------------------
 2001-01-31 00:00:00
(1 row)

This is another consequence of month and day being 1-based. Fixed,
with new tests.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v3-datetrunc_interval.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

Tom Lane-2
John Naylor <[hidden email]> writes:
> [ v3-datetrunc_interval.patch ]

A few thoughts:

* In general, binning involves both an origin and a stride.  When
working with plain numbers it's almost always OK to set the origin
to zero, but it's less clear to me whether that's all right for
timestamps.  Do we need another optional argument?  Even if we
don't, "zero" for tm_year is 1900, which is going to give results
that surprise somebody.

* I'm still not convinced that the code does the right thing for
1-based months or days.  Shouldn't you need to subtract 1, then
do the modulus, then add back 1?

* Speaking of modulus, would it be clearer to express the
calculations like
        timestamp -= timestamp % interval;
(That's just a question, I'm not sure.)

* Code doesn't look to have thought carefully about what to do with
negative intervals, or BC timestamps.

* The comment
         * Justify all lower timestamp units and throw an error if any
         * of the lower interval units are non-zero.
doesn't seem to have a lot to do with what the code after it actually
does.  Also, you need explicit /* FALLTHRU */-type comments in that
switch, or pickier buildfarm members will complain.

* Seems like you could jam all the unit-related error checking into
that switch's default: case, where it will cost nothing if the
call is valid:

        switch (unit)
        {
            ...
            default:
                if (unit == 0)
                        // complain about zero interval
                else
                        // complain about interval with multiple components
        }

* I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
contents of the interval.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

David Fetter
In reply to this post by John Naylor-2
On Wed, Feb 26, 2020 at 06:38:57PM +0800, John Naylor wrote:

> On Wed, Feb 26, 2020 at 3:51 PM David Fetter <[hidden email]> wrote:
> >
> > I believe the following should error out, but doesn't.
> >
> > # SELECT date_trunc_interval('1 year 1 ms', TIMESTAMP '2001-02-16 20:38:40');
> >  date_trunc_interval
> > ═════════════════════
> >  2001-01-01 00:00:00
> > (1 row)
>
> You're quite right. I forgot to add error checking for
> second-and-below units. I've added your example to the tests. (I
> neglected to mention in my first email that because I chose to convert
> the interval to the pg_tm struct (seemed easiest), it's not
> straightforward how to allow multiple unit types, and I imagine the
> use case is small, so I had it throw an error.)

I suspect that this could be sanely expanded to span some sets of
adjacent types in a future patch, e.g. year + month or hour + minute.

> > Please find attached an update that I believe fixes the bug I found in
> > a principled way.
>
> Thanks for that! I made a couple adjustments and incorporated your fix
> into v3: While working on v1, I noticed the DTK_FOO macros already had
> an idiom for bitmasking (see utils/datetime.h),

Oops!  Sorry I missed that.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
In reply to this post by Tom Lane-2
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane <[hidden email]> wrote:

>
> John Naylor <[hidden email]> writes:
> > [ v3-datetrunc_interval.patch ]
>
> A few thoughts:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.
Not sure.

A surprise I foresee in general might be: '1 week' is just '7 days',
and not aligned on "WOY". Since the function is passed an interval and
not text, we can't raise a warning. But date_trunc() already covers
that, so probably not a big deal.

> * I'm still not convinced that the code does the right thing for
> 1-based months or days.  Shouldn't you need to subtract 1, then
> do the modulus, then add back 1?

Yes, brain fade on my part. Fixed in the attached v4.

> * Speaking of modulus, would it be clearer to express the
> calculations like
>         timestamp -= timestamp % interval;
> (That's just a question, I'm not sure.)

Seems nicer, so done that way.

> * Code doesn't look to have thought carefully about what to do with
> negative intervals, or BC timestamps.

By accident, negative intervals currently behave the same as positive
ones. We could make negative intervals round up rather than truncate
(or vice versa). I don't know the best thing to do here.

As for BC, changed so it goes in the correct direction at least, and added test.

> * The comment
>          * Justify all lower timestamp units and throw an error if any
>          * of the lower interval units are non-zero.
> doesn't seem to have a lot to do with what the code after it actually
> does.  Also, you need explicit /* FALLTHRU */-type comments in that
> switch, or pickier buildfarm members will complain.

Stale comment from an earlier version, fixed. Not sure if "justify" is
the right term, but "zero" is a bit misleading. Added fall thru's.

> * Seems like you could jam all the unit-related error checking into
> that switch's default: case, where it will cost nothing if the
> call is valid:
>
>         switch (unit)
>         {
>             ...
>             default:
>                 if (unit == 0)
>                         // complain about zero interval
>                 else
>                         // complain about interval with multiple components
>         }
Done.

> * I'd use ERRCODE_INVALID_PARAMETER_VALUE for any case of disallowed
> contents of the interval.

Done.

Also removed the millisecond case, since it's impossible, or at least
not worth it, to distinguish from the microsecond case.

Note: I haven't done any additional docs changes in v4.

TODO: with timezone

possible TODO: origin parameter

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v4-datetrunc_interval.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
In reply to this post by Tom Lane-2
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane <[hidden email]> wrote:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

I tried the simplest way in the attached v5. Examples (third param is origin):

-- same result as no origin:
select date_trunc_interval('5 min'::interval, TIMESTAMP '2020-02-01
01:01:01', TIMESTAMP '2020-02-01');
 date_trunc_interval
---------------------
 2020-02-01 01:00:00
(1 row)

-- shift bins by 2.5 min:
select date_trunc_interval('5 min'::interval, TIMESTAMP '2020-02-1
01:01:01', TIMESTAMP '2020-02-01 00:02:30');
 date_trunc_interval
---------------------
 2020-02-01 00:57:30
(1 row)

-- align weeks to start on Sunday
select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
---------------------
 2020-02-09 00:00:00
(1 row)

I've put off adding documentation on the origin piece pending comments
about the approach.

I haven't thought seriously about timezone yet, but hopefully it's
just work and nothing to think too hard about.


--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v5-datetrunc_interval.patch (14K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

Isaac Morland
On Fri, 13 Mar 2020 at 03:13, John Naylor <[hidden email]> wrote:
On Wed, Feb 26, 2020 at 11:36 PM Tom Lane <[hidden email]> wrote:
>
> * In general, binning involves both an origin and a stride.  When
> working with plain numbers it's almost always OK to set the origin
> to zero, but it's less clear to me whether that's all right for
> timestamps.  Do we need another optional argument?  Even if we
> don't, "zero" for tm_year is 1900, which is going to give results
> that surprise somebody.

- align weeks to start on Sunday
select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
---------------------
 2020-02-09 00:00:00
(1 row)

I'm confused by this. If my calendars are correct, both 1900-01-02 and 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are both Tuesday, shouldn't the day part be left alone when truncating to 7 days? Also, I'd like to confirm that the default starting point for 7 day periods (weeks) is Monday, per ISO. I know it's very fashionable in North America to split the weekend in half but it's not the international standard.

Perhaps the starting point for dates should be either 0001-01-01 (the proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the current 400-year repeating cycle of leap years and weeks, and a Monday, giving the appropriate ISO result for truncating to 7 day periods).
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland <[hidden email]> wrote:

>
> On Fri, 13 Mar 2020 at 03:13, John Naylor <[hidden email]> wrote:
>
>> - align weeks to start on Sunday
>> select date_trunc_interval('7 days'::interval, TIMESTAMP '2020-02-11
>> 01:01:01.0', TIMESTAMP '1900-01-02');
>>  date_trunc_interval
>> ---------------------
>>  2020-02-09 00:00:00
>> (1 row)
>
>
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are both Tuesday, shouldn't the day part be left alone when truncating to 7 days?

Thanks for taking a look! The non-intuitive behavior you found is
because the patch shifts the timestamp before converting to the
internal pg_tm type. The pg_tm type stores day of the month, which is
used for the calculation. It's not counting the days since the origin.
Then the result is shifted back.

To get more logical behavior, perhaps the optional parameter is better
as an offset instead of an origin. Alternatively (or additionally),
the function could do the math on int64 timestamps directly.

> Also, I'd like to confirm that the default starting point for 7 day periods (weeks) is Monday, per ISO.

That's currently the behavior in the existing date_trunc function,
when passed the string 'week'. Given that keyword, it calculates the
week of the year.

When using the proposed function with arbitrary intervals, it uses day
of the month, as found in the pg_tm struct. It doesn't treat 7 days
differently then 5 or 10 without user input (origin or offset), since
there is nothing special about 7 day intervals as such internally. To
show the difference between date_trunc, and date_trunc_interval as
implemented in v5 with no origin:

select date_trunc('week', d), count(*) from generate_series(
'2020-02-01'::timestamp, '2020-03-31', '1 day') d group by 1 order by
1;
     date_trunc      | count
---------------------+-------
 2020-01-27 00:00:00 |     2
 2020-02-03 00:00:00 |     7
 2020-02-10 00:00:00 |     7
 2020-02-17 00:00:00 |     7
 2020-02-24 00:00:00 |     7
 2020-03-02 00:00:00 |     7
 2020-03-09 00:00:00 |     7
 2020-03-16 00:00:00 |     7
 2020-03-23 00:00:00 |     7
 2020-03-30 00:00:00 |     2
(10 rows)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
---------------------+-------
 2020-02-01 00:00:00 |     7
 2020-02-08 00:00:00 |     7
 2020-02-15 00:00:00 |     7
 2020-02-22 00:00:00 |     7
 2020-02-29 00:00:00 |     1
 2020-03-01 00:00:00 |     7
 2020-03-08 00:00:00 |     7
 2020-03-15 00:00:00 |     7
 2020-03-22 00:00:00 |     7
 2020-03-29 00:00:00 |     3
(10 rows)

Resetting the day every month is counterintuitive if not broken, and
as I mentioned it might make more sense to use the int64 timestamp
directly, at least for intervals less than one month. I'll go look
into doing that.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

Artur Zakirov
In reply to this post by John Naylor-2
Hello,

On 3/13/2020 4:13 PM, John Naylor wrote:
> I've put off adding documentation on the origin piece pending comments
> about the approach.
>
> I haven't thought seriously about timezone yet, but hopefully it's
> just work and nothing to think too hard about.

Thank you for the patch. I looked it and tested a bit.

There is one interesting case which might be mentioned in the
documentation or in the tests is the following. The function has
interesting behaviour with real numbers:

=# select date_trunc_interval('0.1 year'::interval, TIMESTAMP
'2020-02-01 01:21:01');
  date_trunc_interval
---------------------
  2020-02-01 00:00:00

=# select date_trunc_interval('1.1 year'::interval, TIMESTAMP
'2020-02-01 01:21:01');
ERROR:  only one interval unit allowed for truncation

It is because the second interval has two interval units:

=# select '0.1 year'::interval;
  interval
----------
  1 mon

=# select '1.1 year'::interval;
    interval
--------------
  1 year 1 mon

--
Artur


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
On Sun, Mar 15, 2020 at 2:26 PM I wrote:
>
> To get more logical behavior, perhaps the optional parameter is better
> as an offset instead of an origin. Alternatively (or additionally),
> the function could do the math on int64 timestamps directly.

For v6, I changed the algorithm to use pg_tm for months and years, and
int64 for all smaller units. Despite the split, I think it's easier to
read now, and certainly shorter. This has the advantage that now
mixing units is allowed, as long as you don't mix months/years with
days or smaller, which often doesn't make sense and is not very
practical. (not yet documented) One consequence of this is that when
operating on months/years, and the origin contains smaller units, the
smaller units are ignored. Example:

select date_trunc_interval('12 months'::interval, timestamp
'2012-03-01 01:21:01', timestamp '2011-03-22');
 date_trunc_interval
---------------------
 2012-03-01 00:00:00
(1 row)

Even though not quite a full year has passed, it ignores the days in
the origin time and detects a difference in 12 calendar months. That
might be fine, although we could also throw an error and say origins
must be in the form of 'YYYY-01-01 00:00:00' when truncating on months
and/or years.

I added a sketch of documentation for the origin parameter and more tests.

On Fri, Mar 13, 2020 at 7:48 PM Isaac Morland <[hidden email]> wrote:
> I'm confused by this. If my calendars are correct, both 1900-01-02 and 2020-02-11 are Tuesdays. So if the date being adjusted and the origin are both Tuesday, shouldn't the day part be left alone when truncating to 7 days? Also, I'd like to confirm that the default starting point for 7 day periods (weeks) is Monday, per ISO.

This is fixed.

select date_trunc_interval('7 days'::interval, timestamp '2020-02-11
01:01:01.0', TIMESTAMP '1900-01-02');
 date_trunc_interval
---------------------
 2020-02-11 00:00:00
(1 row)

select date_trunc_interval('7 days'::interval, d), count(*) from
generate_series( '2020-02-01'::timestamp, '2020-03-31', '1 day') d
group by 1 order by 1;
 date_trunc_interval | count
---------------------+-------
 2020-01-27 00:00:00 |     2
 2020-02-03 00:00:00 |     7
 2020-02-10 00:00:00 |     7
 2020-02-17 00:00:00 |     7
 2020-02-24 00:00:00 |     7
 2020-03-02 00:00:00 |     7
 2020-03-09 00:00:00 |     7
 2020-03-16 00:00:00 |     7
 2020-03-23 00:00:00 |     7
 2020-03-30 00:00:00 |     2
(10 rows)

> Perhaps the starting point for dates should be either 0001-01-01 (the proleptic beginning of the CE calendar) or 2001-01-01 (the beginning of the current 400-year repeating cycle of leap years and weeks, and a Monday, giving the appropriate ISO result for truncating to 7 day periods).

I went ahead with 2001-01-01 for the time being.

On Thu, Mar 19, 2020 at 4:20 PM Artur Zakirov <[hidden email]> wrote:
>
> =# select date_trunc_interval('1.1 year'::interval, TIMESTAMP
> '2020-02-01 01:21:01');
> ERROR:  only one interval unit allowed for truncation

For any lingering cases like this (i don't see any), maybe an error
hint is in order. The following works now, as expected for 1 year 1
month:

select date_trunc_interval('1.1 year'::interval, timestamp '2002-05-01
01:21:01');
 date_trunc_interval
---------------------
 2002-02-01 00:00:0

I'm going to look into implementing timezone while awaiting comments on v6.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v6-datetrunc_interval.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

Tels
In reply to this post by John Naylor-2
Hello John,

this looks like a nice feature. I'm wondering how it relates to the
following use-case:

When drawing charts, the user can select pre-defined widths on times
(like "15 min", "1 hour").

The data for these slots is fitted always to intervalls that start in 0
in the slot, e.g. if the user selects "15 min", the interval always
starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the
resulting data, and so that requesting the same chart at xx:00 and xx:01
actually draws the same chart, and not some bar with only one minute
data at at the end.

In PSQL, this works out to using this as GROUP BY and then summing up
all values:

   SELECT FLOOR(EXTRACT(EPOCH from thetime) / 3600) * 3600, SUM(events)
FROM mytable ... GROUP BY 1;

whereas here 3600 means "hourly".

It is of course easy for things like "1 hour", but for yearly I had to
come up with things like:

   EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)

and its gets even more involved for weeks, weekdays or odd things like
fortnights.

So would that mean one could replace this by:

  date_trunc_interval('3600 seconds'::interval, thetime)

and it would be easier, and (hopefully) faster?

Best regards,

Tels


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
On Tue, Mar 24, 2020 at 9:34 PM Tels <[hidden email]> wrote:

>
> Hello John,
>
> this looks like a nice feature. I'm wondering how it relates to the
> following use-case:
>
> When drawing charts, the user can select pre-defined widths on times
> (like "15 min", "1 hour").
>
> The data for these slots is fitted always to intervalls that start in 0
> in the slot, e.g. if the user selects "15 min", the interval always
> starts at xx:00, xx:15, xx:30 or xx:45. This is to aid caching of the
> resulting data, and so that requesting the same chart at xx:00 and xx:01
> actually draws the same chart, and not some bar with only one minute
> data at at the end.

Hi Tels, thanks for your interest! Sounds like the exact use case I had in mind.

> It is of course easy for things like "1 hour", but for yearly I had to
> come up with things like:
>
>    EXTRAC(YEAR FROM thetime) * 12 + EXTRACT(MONTH FROM thetime)
>
> and its gets even more involved for weeks, weekdays or odd things like
> fortnights.

To be clear, this particular case was already handled by the existing
date_trunc, but only single units and a few other special cases. I
understand if you have to write code to handle 15 minutes, you need to
use that structure for all cases.

Fortnight would be trivial:

date_trunc_interval('14 days'::interval, thetime [, optional start of
the fortnight])

...but weekdays would still need something extra.

> So would that mean one could replace this by:
>
>   date_trunc_interval('3600 seconds'::interval, thetime)
>
> and it would be easier, and (hopefully) faster?

Certainly easier and more flexible. It's hard to make guesses about
performance, but with your example above where you have two SQL
function calls plus some expression evaluation, I think a single
function would be faster in many cases.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
In reply to this post by John Naylor-2
I wrote:

> I'm going to look into implementing timezone while awaiting comments on v6.

I attempted this in the attached v7. There are 4 new functions for
truncating timestamptz on an interval -- with and without origin, and
with and without time zone.

Parts of it are hackish, and need more work, but I think it's in
passable enough shape to get feedback on. The origin parameter logic
was designed with timestamps-without-time-zone in mind, and
retrofitting time zone on top of that was a bit messy. There might be
bugs.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v7-datetrunc_interval.patch (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

Artur Zakirov
On 3/30/2020 9:30 PM, John Naylor wrote:
> I attempted this in the attached v7. There are 4 new functions for
> truncating timestamptz on an interval -- with and without origin, and
> with and without time zone.

Thank you for new version of the patch.

I'm not sure that I fully understand the 'origin' parameter. Is it valid
to have a value of 'origin' which is greater than a value of 'timestamp'
parameter?

I get some different results in such case:

=# select date_trunc_interval('2 year', timestamp '2020-01-16 20:38:40',
timestamp '2022-01-17 00:00:00');
  date_trunc_interval
---------------------
  2020-01-01 00:00:00

=# select date_trunc_interval('3 year', timestamp '2020-01-16 20:38:40',
timestamp '2022-01-17 00:00:00');
  date_trunc_interval
---------------------
  2022-01-01 00:00:00

So here I'm not sure which result is correct.

It seems that the patch is still in progress, but I have some nitpicking.

> +        <entry><literal><function>date_trunc_interval(<type>interval</type>, <type>timestamptz</type>, <type>text</type>)</function></literal></entry>
> +        <entry><type>timestamptz  </type></entry>

It seems that 'timestamptz' in both argument and result descriptions
should be replaced by 'timestamp with time zone' (see other functions
descriptions). Though it is okay to use 'timestamptz' in SQL examples.

timestamp_trunc_interval_internal() and
timestamptz_trunc_interval_internal() have similar code. I think they
can be rewritten to avoid code duplication.

--
Artur


Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
On Tue, Mar 31, 2020 at 4:34 PM Artur Zakirov <[hidden email]> wrote:
> Thank you for new version of the patch.

Thanks for taking a look! Attached is v8, which addresses your points,
adds tests and fixes some bugs. There are still some WIP detritus in
the timezone code, so I'm not claiming it's ready, but it's much
closer. I'm fairly confident in the implementation of timestamp
without time zone, however.

> I'm not sure that I fully understand the 'origin' parameter. Is it valid
> to have a value of 'origin' which is greater than a value of 'timestamp'
> parameter?

That is the intention. The returned values should be

origin +/- (n * interval)

where n is an integer.

> I get some different results in such case:
>
> =# select date_trunc_interval('2 year', timestamp '2020-01-16 20:38:40',
> timestamp '2022-01-17 00:00:00');
>   date_trunc_interval
> ---------------------
>   2020-01-01 00:00:00

This was correct per how I coded it, but I have rethought where to
draw the bins for user-specified origins. I have decided that the
above is inconsistent with units smaller than a month. We shouldn't
"cross" the bin until the input has reached Jan. 17, in this case. In
v8, the answer to the above is

 date_trunc_interval
---------------------
 2018-01-17 00:00:00
(1 row)

(This could probably be better documented)

> =# select date_trunc_interval('3 year', timestamp '2020-01-16 20:38:40',
 timestamp '2022-01-17 00:00:00');
>   date_trunc_interval
> ---------------------
>   2022-01-01 00:00:00
>
> So here I'm not sure which result is correct.

This one is just plain broken. The result should always be equal or
earlier than the input. In v8 the result is now:

 date_trunc_interval
---------------------
 2019-01-17 00:00:00
(1 row)

> It seems that the patch is still in progress, but I have some nitpicking.
>
> > +        <entry><literal><function>date_trunc_interval(<type>interval</type>, <type>timestamptz</type>, <type>text</type>)</function></literal></entry>
> > +        <entry><type>timestamptz  </type></entry>
>
> It seems that 'timestamptz' in both argument and result descriptions
> should be replaced by 'timestamp with time zone' (see other functions
> descriptions). Though it is okay to use 'timestamptz' in SQL examples.

Any and all nitpicks welcome! I have made these match the existing
date_trunc documentation more closely.

> timestamp_trunc_interval_internal() and
> timestamptz_trunc_interval_internal() have similar code. I think they
> can be rewritten to avoid code duplication.

I thought so too (and noticed the same about the existing date_trunc),
but it's more difficult than it looks.

Note: I copied some tests from timestamp to timestamptz with a few
tweaks. A few tz tests still don't pass. I'm not yet sure if the
problem is in the test, or my code. Some detailed review of the tests
and their results would be helpful.

--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v8-datetrunc_interval.patch (51K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: truncating timestamps on arbitrary intervals

John Naylor-2
In v9, I've simplified the patch somewhat to make it easier for future
work to build on.

- When truncating on month-or-greater intervals, require the origin to
align on month. This removes the need to handle weird corner cases
that have no straightforward behavior.
- Remove hackish and possibly broken code to allow origin to be after
the input timestamp. The default origin is Jan 1, 1 AD, so only AD
dates will behave correctly by default. This is not enforced for now,
since it may be desirable to find a way to get this to work in a nicer
way.
- Rebase docs over PG13 formatting changes.



--
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

v9-datetrunc-interval.patch (39K) Download Attachment