[PROPOSAL] Temporal query processing with range types

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[PROPOSAL] Temporal query processing with range types

Anton Dignös
Hi hackers,

we are a group of researches that work on temporal databases. Our main focus is
the processing of data with time intervals, such as the range types in
PostgreSQL.

For this type of processing the range types that are stored with the tuples are
considered to represent the tuples' valid time, such as for instance the time
period an employee works for a project. Intuitively, the processing of temporal
operators corresponds to the use of "at each time point" in natural language.
For example, a temporal count on such a relation (temporal relation) shall
determine the count at each time point, while an anti-join shall compute an
anti-join at each time point.

We would like to contribute to PostgreSQL a solution that supports the query
processing of "at each time point". The basic idea is to offer two new
operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
ranges of tuples so that subsequent queries can use the usual grouping and
equality conditions to get the intended results.

Query processing works as follows:
<temporal operator> := <NORMALIZE | ALIGN> + <traditional operator>

Attached is a proposal patch file that implements this extension and on which
the example queries below can be run. The support includes distinct,
aggregation, set operations, Cartesian product, Join, Left Join, Right Join,
Full Join, and Anti-join.

In our prototype the syntax for the two new operators is:

... FROM (table_ref NORMALIZE table_ref
          USING(att_list)
          WITH (columnref, columnref)
         ) alias_clause ...

and

... FROM (table_ref ALIGN table_ref
          ON a_expr
          WITH (columnref, columnref)
         ) alias_clause ...

where NORMALIZE is used for distinct, aggregation and set operations and ALIGN
is used for Cartesian product, Join, Left Join, Right Join, Full Join, and
Anti-join.

-------------------------------------------------------------------------------
EXAMPLE FOR AGGREGATION
-------------------------------------------------------------------------------

Relation 'projects' records when employees are assigned to projects.

CREATE TABLE projects (empl VARCHAR, proj VARCHAR, pay FLOAT, t DATERANGE);
INSERT INTO projects VALUES
('Ann', 'P1', 60, DATERANGE('2016-01-01', '2016-09-01')),
('Sam', 'P1', 40, DATERANGE('2016-06-01', '2016-12-01')),
('Joe', 'P2', 80, DATERANGE('2016-03-01', '2016-06-01'));

TABLE projects;

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-09-01)
 Sam  | P1   |  40 | [2016-06-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(3 rows)

QUERY1: At each time point, what is the number of employees assigned
to projects?

First, we use NORMALIZE to adjust the ranges so that they can be
aggregated as usual by using grouping on the adjusted timestamp t:

SELECT *
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted;

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-03-01)
 Ann  | P1   |  60 | [2016-03-01,2016-06-01)
 Ann  | P1   |  60 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-09-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(6 rows)

Observe that the time ranges have been adjusted, and it is now trivial
to compute the count of employees at each time point by grouping on t:

SELECT COUNT(*), t
FROM (projects p1 NORMALIZE projects p2 USING() WITH(t,t)) p_adjusted
GROUP BY t;

 count |            t
-------+-------------------------
     1 | [2016-01-01,2016-03-01)
     2 | [2016-03-01,2016-06-01)
     2 | [2016-06-01,2016-09-01)
     1 | [2016-09-01,2016-12-01)
(4 rows)

QUERY2: At each time point, what is the number of employees assigned
to EACH project?

SELECT *
FROM (projects p1 NORMALIZE projects p2 USING(proj) WITH(t,t)) p_adjusted;

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-06-01)
 Ann  | P1   |  60 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-09-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(5 rows)

and

SELECT COUNT(*), proj, t
FROM (projects p1 NORMALIZE projects p2 USING(proj) WITH(t,t)) p_adjusted
GROUP BY proj, t;

 count | proj |            t
-------+------+-------------------------
     1 | P2   | [2016-03-01,2016-06-01)
     1 | P1   | [2016-01-01,2016-06-01)
     2 | P1   | [2016-06-01,2016-09-01)
     1 | P1   | [2016-09-01,2016-12-01)
(4 rows)

-------------------------------------------------------------------------------
EXAMPLE FOR ANTI-JOIN
-------------------------------------------------------------------------------

QUERY3: At each time point, deterime the employees with the highest
salary.

Without time this query is answered with a NOT EXISTS subquery. With
time periods everything remains, but the time ranges must be adjusted
first:

SELECT *
FROM (projects p1 ALIGN projects p2 ON p1.pay < p2.pay WITH (t,t)) p1_adjusted
WHERE NOT EXISTS (
 SELECT *
 FROM (projects p2 ALIGN projects p1 ON p1.pay < p2.pay WITH (t,t)) p2_adjusted
 WHERE p1_adjusted.pay < p2_adjusted.pay
       AND p1_adjusted.t = p2_adjusted.t );

 empl | proj | pay |            t
------+------+-----+-------------------------
 Ann  | P1   |  60 | [2016-01-01,2016-03-01)
 Ann  | P1   |  60 | [2016-06-01,2016-09-01)
 Sam  | P1   |  40 | [2016-09-01,2016-12-01)
 Joe  | P2   |  80 | [2016-03-01,2016-06-01)
(4 rows)


-------------------------------------------------------------------------------
EXAMPLE FOR JOINS
-------------------------------------------------------------------------------

CREATE TABLE managers (mgr VARCHAR, proj VARCHAR, t DATERANGE);
INSERT INTO managers VALUES
('Tom', 'P1', DATERANGE('2016-09-01', '2016-12-01')),
('Frank', 'P2', DATERANGE('2016-01-01', '2016-12-01'));

TABLE managers;

  mgr  | proj |            t
-------+------+-------------------------
 Tom   | P1   | [2016-09-01,2016-12-01)
 Frank | P2   | [2016-01-01,2016-12-01)
(2 rows)

QUERY4: At each time point, determine employees and their managers
(including the periods with no employee or no manager).

WITH p_ext AS (SELECT *, t u FROM projects),
     m_ext AS (SELECT *, t v FROM managers)
SELECT proj, empl, mgr, t
FROM (p_ext ALIGN m_ext ON m_ext.proj = p_ext.proj WITH (t,t)) p_adjusted
       FULL OUTER JOIN
     (m_ext ALIGN p_ext ON m_ext.proj = p_ext.proj WITH (t,t)) m_adjusted
       USING (proj, t)
WHERE t = u * v OR v IS NULL OR u IS NULL;

 proj | empl |  mgr  |            t
------+------+-------+-------------------------
 P1   | Ann  |       | [2016-01-01,2016-09-01)
 P1   | Sam  |       | [2016-06-01,2016-09-01)
 P1   | Sam  | Tom   | [2016-09-01,2016-12-01)
 P2   |      | Frank | [2016-01-01,2016-03-01)
 P2   | Joe  | Frank | [2016-03-01,2016-06-01)
 P2   |      | Frank | [2016-06-01,2016-12-01)
(6 rows)

-------------------------------------------------------------------------------
IMPLEMENTATION
-------------------------------------------------------------------------------

Our implementation approach is to use two operators that take care of
adjusting the ranges in such a way that afterwards the corresponding
nontemporal operator can be used to compute the result:
input --> {NORMALIZE/ALIGN} --> nontemporal operator --> result.

The two operators (NORMALIZE and ALIGN) in this prototype are rewritten into
subqueries, which then serve as input for a new executor function
ExecTemporalAdjustment. The executor function takes care of the splitting of
the range types.

For NORMALIZE the tuples' ranges need to be split into all sub-ranges
according to all matching ranges of the second relation. For this we
create a subquery that first joins one relation with the range
boundaries of the other and then sorts the result. The executor
function splits the ranges in a sweep-line based manner.

For ALIGN the tuples' ranges must be split into all intersections and
differences with the other relation according to the join condition.
For this we create a subquery that first joins the two relations and
then sorts the result. The executor function splits the ranges
accordingly in a sweep-line based manner.

After the splitting it is possible to only use equality (GROUP BY,
DISTINCT, JOIN CONDITION, ...) on the range types to get the intended
result.

The subquery is 'visible' with the EXPLAIN command.

Range types need to be of half-open, i.e., [lower, upper).

Unbound ranges (Infinity), NaN, and NULL values (of ranges and range
bounds) are not yet supported.

-------------------------------------------------------------------------------

We are grateful for any feedback.

Best regards,

Anton, Johann, Michael, Peter


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tpg_primitives_out_v1.patch (160K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

David Fetter
On Fri, Jul 22, 2016 at 01:15:17PM +0200, Anton Dignös wrote:
> Hi hackers,
>
> we are a group of researches that work on temporal databases.  Our
> main focus is the processing of data with time intervals, such as
> the range types in PostgreSQL.

Thanks for your hard work so far!

[Explanation and examples elided]

To what extent, if any, are you attempting to follow the SQL:2011
standard?

http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Anton Dignös
On Sat, Jul 23, 2016 at 12:01 AM, David Fetter <[hidden email]> wrote:

> On Fri, Jul 22, 2016 at 01:15:17PM +0200, Anton Dignös wrote:
>> Hi hackers,
>>
>> we are a group of researches that work on temporal databases.  Our
>> main focus is the processing of data with time intervals, such as
>> the range types in PostgreSQL.
>
> Thanks for your hard work so far!
>
> [Explanation and examples elided]
>
> To what extent, if any, are you attempting to follow the SQL:2011
> standard?
>
> http://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

The querying in the SQL:2011 standard is based on simple SQL range restrictions
and period predicates (OVERLAP, PRECEDES, FOR SYSTEM_TIME AS OF, etc) that
functionality-wise in PostgreSQL are already covered by the operators and
functions on range types.

Operations such as aggregation, outer joins, set-operations on ranges
(mentioned in
Section 2.5 "Future directions" in the above paper) are not yet part of the
standard. These are the operations that require the adjustment (or splitting) of
ranges.

Best,

Anton


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Robert Haas
In reply to this post by Anton Dignös
On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös <[hidden email]> wrote:
> We would like to contribute to PostgreSQL a solution that supports the query
> processing of "at each time point". The basic idea is to offer two new
> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
> ranges of tuples so that subsequent queries can use the usual grouping and
> equality conditions to get the intended results.

I think that it is great that you want to contribute your work to
PostgreSQL.  I don't know whether there will be a consensus that this
is generally useful functionality that we should accept, but I commend
the effort anyhow.  Assuming there is, getting this into a state that
we consider committable will probably take quite a bit of additional
work on your part; no one will do it for you.  If you're still
interested in proceeding given those caveats, please add your patch
here so that it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Moser
On 27.07.2016 at 16:09 Robert Haas wrote:

> On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös <[hidden email]> wrote:
>> We would like to contribute to PostgreSQL a solution that supports the query
>> processing of "at each time point". The basic idea is to offer two new
>> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
>> ranges of tuples so that subsequent queries can use the usual grouping and
>> equality conditions to get the intended results.
>
> I think that it is great that you want to contribute your work to
> PostgreSQL.  I don't know whether there will be a consensus that this
> is generally useful functionality that we should accept, but I commend
> the effort anyhow.  Assuming there is, getting this into a state that
> we consider committable will probably take quite a bit of additional
> work on your part; no one will do it for you.


Hi hackers,

thank you for your feedback.

We are aware that contributing to PostgreSQL is a long way with a lot
of work.  We are committed to go all the way and do the work as
discussed in the community.

We had some internal discussions about the project, looking also at
some other patches to better understand whether the patch is
work-in-progress or ready for commitfest.


> If you're still
> interested in proceeding given those caveats, please add your patch
> here so that it gets reviewed:
>
> https://commitfest.postgresql.org/action/commitfest_view/open


We decided to follow your recommendation and add the patch to the
commitfest.


Looking forward for your feedback,
Anton, Johann, Michael, Peter


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Haribabu Kommi-2


On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <[hidden email]> wrote:

We decided to follow your recommendation and add the patch to the commitfest.


Path is not applying properly to HEAD.
Moved to next CF with "waiting on author" status.


Regards,
Hari Babu
Fujitsu Australia
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Moser
Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:

>
>
> On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     We decided to follow your recommendation and add the patch to the
>     commitfest.
>
>
> Path is not applying properly to HEAD.
> Moved to next CF with "waiting on author" status.
>
We updated our patch. We tested it with the latest
commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

>
> Regards,
> Hari Babu
> Fujitsu Australia

Best regards,
Anton, Johann, Michael, Peter


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tpg_primitives_out_v2.patch (119K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

David Fetter
On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:

> Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:
> >
> >
> > On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <[hidden email]
> > <mailto:[hidden email]>> wrote:
> >
> >
> >     We decided to follow your recommendation and add the patch to the
> >     commitfest.
> >
> >
> > Path is not applying properly to HEAD.
> > Moved to next CF with "waiting on author" status.
> >
>
> We updated our patch. We tested it with the latest
> commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.

This looks neat, but it no longer applies to master.  Is a rebase in
the offing?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Moser
Am 16.12.2016 um 07:17 schrieb David Fetter:

> On Wed, Dec 07, 2016 at 03:57:33PM +0100, Peter Moser wrote:
>> Am 05.12.2016 um 06:11 schrieb Haribabu Kommi:
>>>
>>>
>>> On Tue, Oct 25, 2016 at 8:44 PM, Peter Moser <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>>
>>>     We decided to follow your recommendation and add the patch to the
>>>     commitfest.
>>>
>>>
>>> Path is not applying properly to HEAD.
>>> Moved to next CF with "waiting on author" status.
>>>
>>
>> We updated our patch. We tested it with the latest
>> commit dfe530a09226a9de80f2b4c3d5f667bf51481c49.
>
> This looks neat, but it no longer applies to master.  Is a rebase in
> the offing?
We rebased our patch on top of HEAD, that is, commit
93513d1b6559b2d0805f0b02d312ee550e3d010b.


Best regards,
Anton, Johann, Michael, Peter


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tpg_primitives_out_v3.patch (119K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Eisentraut-6
So I'm looking at this patch in the commit fest.  I have only a general
understanding of temporal query processing.

What this patch does is to add two new clauses for FROM-list items,
NORMALIZE and ALIGN, which reshuffle a set of ranges into a new list
that can then be aggregated more easily.  From the original message:

> For NORMALIZE the tuples' ranges need to be split into all sub-ranges
> according to all matching ranges of the second relation. For this we
> create a subquery that first joins one relation with the range
> boundaries of the other and then sorts the result. The executor
> function splits the ranges in a sweep-line based manner.
>
> For ALIGN the tuples' ranges must be split into all intersections and
> differences with the other relation according to the join condition.
> For this we create a subquery that first joins the two relations and
> then sorts the result. The executor function splits the ranges
> accordingly in a sweep-line based manner.

So there isn't really temporal query processing as such here, only some
helpers that can make it easier.

I can see how those operations can be useful, but it would help if there
were a more formal definition to be able to check that further.

What I'm missing here is some references: existing implementations,
standards, documentation, research papers, alternative ideas, rejected
alternatives, etc.

Also, the submission is missing documentation and test cases.  There are
technical terms used in the code that I don't understand.

I think there are probably many interesting applications for normalizing
or otherwise adjusting ranges.  I'd like to see an overview and
consideration of other applications.

Ideally, I'd like to see these things implemented as some kind of
user-space construct, like an operator or function.  I think we'd need a
clearer definition of what it is they do before we can evaluate that.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Moser
> What this patch does is to add two new clauses for FROM-list items,

> NORMALIZE and ALIGN, which reshuffle a set of ranges into a new list
> that can then be aggregated more easily.  From the original message:
>
> > For NORMALIZE the tuples' ranges need to be split into all sub-ranges
> > according to all matching ranges of the second relation. For this we
> > create a subquery that first joins one relation with the range
> > boundaries of the other and then sorts the result. The executor
> > function splits the ranges in a sweep-line based manner.
> >
> > For ALIGN the tuples' ranges must be split into all intersections and
> > differences with the other relation according to the join condition.
> > For this we create a subquery that first joins the two relations and
> > then sorts the result. The executor function splits the ranges
> > accordingly in a sweep-line based manner.
>
> So there isn't really temporal query processing as such here, only some
> helpers that can make it easier.

The goal of temporal aligners and normalizers is to split ranges to allow a
reduction from temporal queries to their non-temporal counterparts. Splitting
ranges is necessary for temporal query processing. Temporal aligners and
normalizer may then be used as building-blocks for any temporal query construct.

> I can see how those operations can be useful, but it would help if there
> were a more formal definition to be able to check that further.

We have published two papers, that contain formal definitions and related work
for the temporal aligner and normalizer. Please see [1] and [2].

> What I'm missing here is some references: existing implementations,
> standards, documentation, research papers, alternative ideas, rejected
> alternatives, etc.

A good overview of existing implementations in DBMSs, SQL standard, and history
is given in [3].

> Also, the submission is missing documentation and test cases.  There are
> technical terms used in the code that I don't understand.

We added a second patch with test cases and expected results. We are now
writing the documentation in sgml-format.

> I think there are probably many interesting applications for normalizing
> or otherwise adjusting ranges.  I'd like to see an overview and
> consideration of other applications.

Please see the attached file adjustment.sql for some interesting applications.

> Ideally, I'd like to see these things implemented as some kind of
> user-space construct, like an operator or function.  I think we'd need a
> clearer definition of what it is they do before we can evaluate that.

Can you please explain what you mean by "user-space construct" in this case.



Best regards,
Anton, Johann, Michael, Peter

----
[1] Anton Dignös, Michael H. Böhlen, Johann Gamper:
Temporal alignment. SIGMOD Conference 2012: 433-444
http://doi.acm.org/10.1145/2213836.2213886
[2] Anton Dignös, Michael H. Böhlen, Johann Gamper, Christian S. Jensen:
Extending the Kernel of a Relational DBMS with Comprehensive Support for
Sequenced Temporal Queries. ACM Trans. Database Syst. 41(4): 26:1-26:46 (2016)
http://doi.acm.org/10.1145/2967608
[3] https://www2.cs.arizona.edu/people/rts/sql3.html and
https://www2.cs.arizona.edu/people/rts/tsql2.html



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

adjustment.sql (9K) Download Attachment
tpg_primitives_out_tests_v1.patch (46K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Eisentraut-6
On 1/13/17 9:22 AM, Peter Moser wrote:
> The goal of temporal aligners and normalizers is to split ranges to allow a
> reduction from temporal queries to their non-temporal counterparts.
> Splitting
> ranges is necessary for temporal query processing. Temporal aligners and
> normalizer may then be used as building-blocks for any temporal query
> construct.

I would need to see the exact definitions of these constructs.  Please
send some documentation.

> We have published two papers, that contain formal definitions and
> related work
> for the temporal aligner and normalizer. Please see [1] and [2].

I don't have access to those.

>> I think there are probably many interesting applications for normalizing
>> or otherwise adjusting ranges.  I'd like to see an overview and
>> consideration of other applications.
>
> Please see the attached file adjustment.sql for some interesting
> applications.

That's surely interesting, but without knowing what these operations are
supposed to do, I can only reverse engineer and guess.

>> Ideally, I'd like to see these things implemented as some kind of
>> user-space construct, like an operator or function.  I think we'd need a
>> clearer definition of what it is they do before we can evaluate that.
>
> Can you please explain what you mean by "user-space construct" in this case.

Implement them using the extensibility features, such as a user-defined
operator.  I don't know if it's possible, but it's something to consider.

Using common terms such as ALIGN and NORMALIZE for such a specific
functionality seems a bit wrong.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Moser
2017-01-18 3:57 GMT+01:00 Peter Eisentraut <[hidden email]>:

>
> On 1/13/17 9:22 AM, Peter Moser wrote:
> > The goal of temporal aligners and normalizers is to split ranges to allow a
> > reduction from temporal queries to their non-temporal counterparts.
> > Splitting
> > ranges is necessary for temporal query processing. Temporal aligners and
> > normalizer may then be used as building-blocks for any temporal query
> > construct.
>
> I would need to see the exact definitions of these constructs.  Please
> send some documentation.
>
> > We have published two papers, that contain formal definitions and
> > related work
> > for the temporal aligner and normalizer. Please see [1] and [2].
>
> I don't have access to those.

The papers can be freely downloaded from
http://www.inf.unibz.it/~dignoes/publications.html using the "Author-ize link".

> >> I think there are probably many interesting applications for normalizing
> >> or otherwise adjusting ranges.  I'd like to see an overview and
> >> consideration of other applications.
> >
> > Please see the attached file adjustment.sql for some interesting
> > applications.
>
> That's surely interesting, but without knowing what these operations are
> supposed to do, I can only reverse engineer and guess.

Intuitively what they do is as follows:


NORMALIZE: splits all the ranges of one relation according to all the range
boundaries of another (but possibly the same) relation whenever some equality
condition over some given attributes is satisfied.

When the two relations are the same, all ranges with the given equal attributes
are either equal or disjoint. After this, the traditional GROUP BY or DISTINCT
can be applied. The attributes given to NORMALIZE for a (temporal) GROUP BY are
the grouping attributes and for a (temporal) DISTINCT the target list
attributes.

When the two relations are different, but they each contain disjoint ranges
for the same attributes (as the current limitation for the set operations is)
we perform a symmetric NORMALIZE on each of them. Then we have a similar
situation as before, i.e., in both relations ranges with the same attributes
are either equal or disjoint and a traditional set operation
(EXCEPT/INTERSECT/UNION) can be applied. The attributes given to NORMALIZE for
a (temporal) EXCEPT/INTERSECT/UNION are the target list attributes.


ALIGN: splits all the ranges of one relation according to all the range
intersections of another relation, i.e., it produces all intersections and
non-overlapping parts, whenever some condition is satisfied.

We perform a symmetric ALIGN on each relation, after which a traditional inner
or outer join can be applied using equality on the ranges to calculate the
overlap. The condition given to a (temporal) inner or outer join is the
join condition without overlap.

> >> Ideally, I'd like to see these things implemented as some kind of
> >> user-space construct, like an operator or function.  I think we'd need a
> >> clearer definition of what it is they do before we can evaluate that.
> >
> > Can you please explain what you mean by "user-space construct" in this case.
>
> Implement them using the extensibility features, such as a user-defined
> operator.  I don't know if it's possible, but it's something to consider.

We experimented with user-defined operators and set-returning functions. The
main issue with these is that ALIGN and NORMALIZE rely on comparison and
processing of one tuple with many tuples at a time that is not possible with
operators, and for set-returning functions there is no clean way of passing
tables and/or conditions.

> Using common terms such as ALIGN and NORMALIZE for such a specific
> functionality seems a bit wrong.

Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
options? We are also thankful for any suggestion or comments about the syntax.


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Michael Paquier
On Tue, Jan 24, 2017 at 6:32 PM, Peter Moser <[hidden email]> wrote:
> [reviews and discussions]

The patch proposed has rotten. Please provide a rebase. By the way, I
am having a hard time applying your patches with patch or any other
methods... I am moving it to CF 2017-03 because of the lack of
reviews.
--
Michael


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Moser
2017-02-01 6:19 GMT+01:00 Michael Paquier <[hidden email]>:
> The patch proposed has rotten. Please provide a rebase. By the way, I
> am having a hard time applying your patches with patch or any other
> methods... I am moving it to CF 2017-03 because of the lack of
> reviews.

We have rebased our patches on top of commit
53dd2da257fb5904b087b97dd9c2867390d309c1
from "Thu Feb 2 14:12:35 2017 +0200".

Hereby, we used the following commands to create both patches:
git diff --no-prefix origin/master -- src/ ':!src/test/*' >
tpg_primitives_out_v4.patch

git diff --no-prefix origin/master -- src/test/ >
tpg_primitives_out_tests_v2.patch

We have also tested our patches on the current HEAD with the command:
patch -p0 < patch-file

Both worked without problems or warnings on our Linux machine.
Could you please explain, which problems occurred while you tried to
apply our patches?


Best regards,
Anton, Johann, Michael, Peter


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

tpg_primitives_out_v4.patch (162K) Download Attachment
tpg_primitives_out_tests_v2.patch (47K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Peter Eisentraut-6
On 2/2/17 12:43 PM, Peter Moser wrote:

> Hereby, we used the following commands to create both patches:
> git diff --no-prefix origin/master -- src/ ':!src/test/*' >
> tpg_primitives_out_v4.patch
>
> git diff --no-prefix origin/master -- src/test/ >
> tpg_primitives_out_tests_v2.patch
>
> We have also tested our patches on the current HEAD with the command:
> patch -p0 < patch-file
>
> Both worked without problems or warnings on our Linux machine.
> Could you please explain, which problems occurred while you tried to
> apply our patches?

Your patches apply OK for me.

In the future, just use git diff without any options or git
format-patch, and put both the code and the tests in one patch.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Robert Haas
In reply to this post by Peter Moser
On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <[hidden email]> wrote:

> NORMALIZE: splits all the ranges of one relation according to all the range
> boundaries of another (but possibly the same) relation whenever some equality
> condition over some given attributes is satisfied.
>
> When the two relations are the same, all ranges with the given equal attributes
> are either equal or disjoint. After this, the traditional GROUP BY or DISTINCT
> can be applied. The attributes given to NORMALIZE for a (temporal) GROUP BY are
> the grouping attributes and for a (temporal) DISTINCT the target list
> attributes.
>
> When the two relations are different, but they each contain disjoint ranges
> for the same attributes (as the current limitation for the set operations is)
> we perform a symmetric NORMALIZE on each of them. Then we have a similar
> situation as before, i.e., in both relations ranges with the same attributes
> are either equal or disjoint and a traditional set operation
> (EXCEPT/INTERSECT/UNION) can be applied. The attributes given to NORMALIZE for
> a (temporal) EXCEPT/INTERSECT/UNION are the target list attributes.
>
>
> ALIGN: splits all the ranges of one relation according to all the range
> intersections of another relation, i.e., it produces all intersections and
> non-overlapping parts, whenever some condition is satisfied.
>
> We perform a symmetric ALIGN on each relation, after which a traditional inner
> or outer join can be applied using equality on the ranges to calculate the
> overlap. The condition given to a (temporal) inner or outer join is the
> join condition without overlap.

I don't quite understand the difference between NORMALIZE and ALIGN.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Robert Haas
In reply to this post by Peter Moser
On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <[hidden email]> wrote:
>> Using common terms such as ALIGN and NORMALIZE for such a specific
>> functionality seems a bit wrong.
>
> Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
> options? We are also thankful for any suggestion or comments about the syntax.

So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
except apparently there's no join type and the optimizer can never
reorder these operations with each other or with other joins.  Is that
right?  The optimizer changes in this patch seem fairly minimal, so
I'm guessing it can't be doing anything very complex here.

What happens if you perform the ALIGN or NORMALIZE operation using
something other than an equality operator, like, say, less-than?  Or
an arbitrary user-defined operator.

There's no documentation in this patch.  I'm not sure you want to go
to the trouble of writing SGML documentation until this has been
reviewed enough that it has a real chance of getting committed, but on
the other hand we're obviously all struggling to understand what it
does, so I think if not SGML documentation it at least needs a real
clear explanation of what the syntax is and does in a README or
something, even just for initial review.

We don't have anything quite like this in PostgreSQL today.  An
ordinary join just matches up things in relation A and relation B and
outputs the matching rows, and something like a SRF takes a set of
input rows and returns a set of output rows.  This is a hybrid - it
takes in two sets of rows, one from each relation being joined, and
produces a derived set of output rows that takes into account both
inputs.

+ * INPUT:
+ *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
+ *             where q can be any join qualifier, and r.ts, r.te, s.ts, and s.t
e
+ *             can be any column name.
+ *
+ * OUTPUT:
+ *             (
+ *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
+ *      FROM
+ *      (
+ *             SELECT *, row_id() OVER () rn FROM r
+ *      ) r
+ *      LEFT OUTER JOIN
+ *      s
+ *      ON q AND r.ts < s.te AND r.te > s.ts
+ *      ORDER BY rn, P1, P2
+ *      ) c

It's hard to see what's going on here.  What's ts?  What's te?  If you
used longer names for these things, it might be a bit more
self-documenting.

If we are going to transform an ALIGN operator in to a left outer
join, why do we also have an executor node for it?

+               fcLowerLarg = makeFuncCall(SystemFuncName("lower"),
+
list_make1(crLargTs),
+
UNKNOWN_LOCATION);
+               fcLowerRarg = makeFuncCall(SystemFuncName("lower"),
+
list_make1(crRargTs),
+
UNKNOWN_LOCATION);
+               fcUpperLarg = makeFuncCall(SystemFuncName("upper"),
+
list_make1(crLargTs),
+
UNKNOWN_LOCATION);
+               fcUpperRarg = makeFuncCall(SystemFuncName("upper"),
+
list_make1(crRargTs),
+
UNKNOWN_LOCATION);

Why is a temporal operator calling functions that upper-case and
lower-case strings?  In one sense this whole function (and much of the
nearby code) is very straightforward code and you can see exactly why
it's doing it.  In another sense it's totally inscrutable: WHY is it
doing any of that stuff?

-       char       *strategy;           /* partitioning strategy
('list' or 'range') */
-       List       *partParams;         /* List of PartitionElems */
-       int                     location;               /* token
location, or -1 if unknown */
+       char       *strategy;   /* partitioning strategy ('list' or 'range') */
+       List       *partParams; /* List of PartitionElems */
+       int                     location;       /* token location, or
-1 if unknown */

I think this is some kind of mistake on your end while generating the
patch.  It looks like you patched one version of the source code, and
diffed against another.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

David G Johnston
On Wed, Feb 15, 2017 at 12:24 PM, Robert Haas <[hidden email]> wrote:
On Tue, Jan 24, 2017 at 4:32 AM, Peter Moser <[hidden email]> wrote:
>> Using common terms such as ALIGN and NORMALIZE for such a specific
>> functionality seems a bit wrong.
>
> Would ALIGN RANGES/RANGE ALIGN and NORMALIZE RANGES/RANGE NORMALIZE be better
> options? We are also thankful for any suggestion or comments about the syntax.

So it seems like an ALIGN or NORMALIZE option is kind of like a JOIN,
except apparently there's no join type and the optimizer can never
reorder these operations with each other or with other joins.  Is that
right?  The optimizer changes in this patch seem fairly minimal, so
I'm guessing it can't be doing anything very complex here.

+ * INPUT:
+ *             (r ALIGN s ON q WITH (r.ts, r.te, s.ts, s.te)) c
+ *             where q can be any join qualifier, and r.ts, r.te, s.ts, and s.t
e
+ *             can be any column name.
+ *
+ * OUTPUT:
+ *             (
+ *             SELECT r.*, GREATEST(r.ts, s.ts) P1, LEAST(r.te, s.te) P2
+ *      FROM
+ *      (
+ *             SELECT *, row_id() OVER () rn FROM r
+ *      ) r
+ *      LEFT OUTER JOIN
+ *      s
+ *      ON q AND r.ts < s.te AND r.te > s.ts
+ *      ORDER BY rn, P1, P2
+ *      ) c

It's hard to see what's going on here.  What's ts?  What's te?  If you
used longer names for these things, it might be a bit more
self-documenting.

Just reasoning out loud here...​

ISTM ts and te are "temporal [range] start" and "temporal [range] end"​ (or probably just the common "timestamp start/end")

​From what I can see it is affecting an intersection of the two ranges and, furthermore, splitting the LEFT range into sub-ranges that match up with the sub-ranges found on the right side.  From the example above this seems like it should be acting on self-normalized ranges - but I may be missing something by evaluating this out of context.

r1 [1, 6] [ts, te] [time period start, time period end]
s1 [2, 3]
s2 [3, 4]
s3 [5, 7]

r LEFT JOIN s ON (r.ts < s.te AND r.te > s.ts)

r1[1, 6],s1[2, 3] => [max(r.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[2, 3]
r1[1, 6],s2[3, 4] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[3, 4]
r1[1, 6],s3[5, 7] => [max(t.ts, s.ts),min(r.te, s.te)] => r1[1, 6],d[5, 6]

Thus the intersection is [2,6] but since s1 has three ranges that begin between 2 and 6 (i.e., 2, 3, and 5) there are three output records that correspond to those sub-ranges.

The description in the OP basically distinguishes between NORMALIZE and ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two ranges - discarding the non-overlapping data - while NORMALIZE performs the alignment while also retaining the non-overlapping data.

The rest of the syntax seems to deal with selecting subsets of range records based upon attribute data.

David J.
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Temporal query processing with range types

Robert Haas
On Wed, Feb 15, 2017 at 3:33 PM, David G. Johnston
<[hidden email]> wrote:
> The description in the OP basically distinguishes between NORMALIZE and
> ALIGN in that ALIGN, as described above, affects an INTERSECTION on the two
> ranges - discarding the non-overlapping data - while NORMALIZE performs the
> alignment while also retaining the non-overlapping data.

Hmm.  So is ALIGN like an inner join and NORMALIZE like a full outer
join?  Couldn't you have left and right outer joins, too, like you
null-extend the parts of the lefthand range that don't match but throw
away parts of the righthand range that don't match?

Also, it sounds like all of this is intended to work with ranges that
are stored in different columns rather than with PostgreSQL's built-in
range types.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
123
Previous Thread Next Thread