RE: Handling time series data with PostgreSQL

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: Handling time series data with PostgreSQL

Jayaram
Dear All,

I'm Jayaram S, oracle DBA. Currently we are planning to develop a stock market based application which deals 80% of data with time data. We are in the process of choosing the right database for the requirement especially for time series data. After all multiple investigations, I found PostgreSQL with timescaleDB works better than other DBs.

But still I'm new to PGSQL and we wanted only open source technology to handle our requirements. It will be helpful to me if anyone can suggest implementing the time series concepts in PostgreSQL database. 
It's better if I can get proper docs or links with explanation.

Thanks in advance.,

--
Thanks & Regards,
Jayaram S,
Banglore.
India.
Reply | Threaded
Open this post in threaded view
|

Re: Handling time series data with PostgreSQL

Adalberto Caccia
Hi,
TimescaleDB as a Postgresql extension has been used in my firm for two years now, I've recently managed to upgrade it from pg10 to pg12 and from discrete VM's to Kubernetes as well.
Frankly speaking, being new to TimescaleDB at that time, I've found it easy to manage, easy to scale (it's 100% compatible with pg replication, unfortunately not the logical one, yet...), easy to install, easy to upgrade... what else?
From a developer's perspective, it just adds "superpowers" to ordinary PG tables, all under the hood. On disk, it features a "chunked" layout, where each chunk belongs to a definite "time" range; and of course the "time" column on which to index time data is just passed as a parameter to the call to TimescaleDB, for each table on which you need such power.
At the moment, we're also using it for time aggregate calculations, but only for the coarse ones (30m --> 1h and 1h --> 1 day), while we're still handling the finer ones (1s --> 1m and so on) in Kafka+Flink, which is a common scenario for a streaming data platform, anyway.

Regards,
Adalberto


Il giorno mar 6 ott 2020 alle ore 11:47 Jayaram <[hidden email]> ha scritto:
Dear All,

I'm Jayaram S, oracle DBA. Currently we are planning to develop a stock market based application which deals 80% of data with time data. We are in the process of choosing the right database for the requirement especially for time series data. After all multiple investigations, I found PostgreSQL with timescaleDB works better than other DBs.

But still I'm new to PGSQL and we wanted only open source technology to handle our requirements. It will be helpful to me if anyone can suggest implementing the time series concepts in PostgreSQL database. 
It's better if I can get proper docs or links with explanation.

Thanks in advance.,

--
Thanks & Regards,
Jayaram S,
Banglore.
India.
Reply | Threaded
Open this post in threaded view
|

Re: Handling time series data with PostgreSQL

Jayaram
Hi Adalberto,

Awesome.!! Thanks for your reply. 

So, Do we need the timescaleDB as mandatory to handle time series data? Is there any way to handle hourly to days,months,yearly data with PGSQL alone without timescale addon?
Ours is a new project and we are unsure about whether we should have both timescaleDB and PGSQL or PGSQL alone is capable of handling this time series data by tuning the right indexes.etc..

What we are planning is to start with PGSQL alone (For handling hourly,daily,monthly,weekly,yearly calculations) and later when we add seconds and minutes, we will include the timescaleDB. Is this the right approach?
Is PGSQL alone capable of meeting this requirement?

Thank you in advance.

Regards,
Jayaram S.

On Wed, Oct 7, 2020 at 1:34 AM Adalberto Caccia <[hidden email]> wrote:
Hi,
TimescaleDB as a Postgresql extension has been used in my firm for two years now, I've recently managed to upgrade it from pg10 to pg12 and from discrete VM's to Kubernetes as well.
Frankly speaking, being new to TimescaleDB at that time, I've found it easy to manage, easy to scale (it's 100% compatible with pg replication, unfortunately not the logical one, yet...), easy to install, easy to upgrade... what else?
From a developer's perspective, it just adds "superpowers" to ordinary PG tables, all under the hood. On disk, it features a "chunked" layout, where each chunk belongs to a definite "time" range; and of course the "time" column on which to index time data is just passed as a parameter to the call to TimescaleDB, for each table on which you need such power.
At the moment, we're also using it for time aggregate calculations, but only for the coarse ones (30m --> 1h and 1h --> 1 day), while we're still handling the finer ones (1s --> 1m and so on) in Kafka+Flink, which is a common scenario for a streaming data platform, anyway.

Regards,
Adalberto


Il giorno mar 6 ott 2020 alle ore 11:47 Jayaram <[hidden email]> ha scritto:
Dear All,

I'm Jayaram S, oracle DBA. Currently we are planning to develop a stock market based application which deals 80% of data with time data. We are in the process of choosing the right database for the requirement especially for time series data. After all multiple investigations, I found PostgreSQL with timescaleDB works better than other DBs.

But still I'm new to PGSQL and we wanted only open source technology to handle our requirements. It will be helpful to me if anyone can suggest implementing the time series concepts in PostgreSQL database. 
It's better if I can get proper docs or links with explanation.

Thanks in advance.,

--
Thanks & Regards,
Jayaram S,
Banglore.
India.


--
Thanks & Regards,
Jayaram S,
Banglore.
Mobile: 91-7760951366.
Reply | Threaded
Open this post in threaded view
|

Re: Handling time series data with PostgreSQL

Stephen Frost
Greetings,

* Jayaram ([hidden email]) wrote:
> So, Do we need the timescaleDB as mandatory to handle time series data? Is
> there any way to handle hourly to days,months,yearly data with PGSQL alone
> without timescale addon?

Certainly there is and a lot of people do it- what isn't clear is what
it is you feel is missing from PG when it comes to handling time series
data..?  Generally speaking there's concerns about PG's ability to
handle lots of partitions (which comes from there being very large
amounts of data being stored), but v12 and v13 have made great
improvements in that area and it's not nearly an issue any longer (and
performs better in quite a few cases than extensions).

> Ours is a new project and we are unsure about whether we should have both
> timescaleDB and PGSQL or PGSQL alone is capable of handling this time
> series data by tuning the right indexes.etc..

Partitioning and index tuning in PG (look at using BRIN if you haven't
already...) is important when you get to larger data volumes.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Handling time series data with PostgreSQL

Mark Johnson-2
I think the OP may be referring to Oracle's Temporal Validity feature.  This type of feature has yet to be implemented in PostgreSQL (see https://www.postgresql.org/docs/13/unsupported-features-sql-standard.html item T181).

Temporal Validity allows you to add a time dimension to any table, and only display rows of data that are valid for the requested time period.  Oracle's implementation of Temporal Validity uses the PERIOD FOR clause in CREATE TABLE, ALTER TABLE, and SELECT statements as illustrated below:

CREATE TABLE EMPLOYEE
(
   ID        NUMBER PRIMARY KEY,
   TAX_ID    VARCHAR2(10),
   HIRE_DATE TIMESTAMP,
   TERM_DATE TIMESTAMP,
   PERIOD FOR EMP_VALID_TIME (HIRE_DATE, TERM_DATE)
);

SELECT * FROM EMPLOYEE
   VERSIONS PERIOD FOR EMP_VALID_TIME
   BETWEEN TO_TIMESTAMP('06-OCT-2013', 'DD-MON-YYYY')
   AND TO_TIMESTAMP('31-OCT-2013', 'DD-MON-YYYY');

   ID TAX_ID     HIRE_DATE                                TERM_DATE
----- ---------- ------------------------------ ------------------------------
    1 123456789  06-OCT-13 12.00.00.000000 AM   07-NOV-15 12.00.00.000000 AM
    2 222456789  07-OCT-13 12.00.00.000000 AM
    4 444004444
    5 505050505  30-OCT-13 12.00.00.000000 AM   31-OCT-13 12.00.00.000000 AM
    6 666999666  30-SEP-13 12.00.00.000000 AM   31-DEC-13 12.00.00.000000 AM


The above feature requires Oracle 12 or higher.  SQL Server 2016 and later also support it.  In earlier releases of each DBMS we tried to accomplish the same by adding pairs of timestamp columns to each table and then writing our own code to handle row filtering.  Partitioning isn't needed.  Certainly partitioning by range could be used, but it would still require some manual efforts.

-Mark

On Wed, Oct 7, 2020 at 10:41 AM Stephen Frost <[hidden email]> wrote:
Greetings,

* Jayaram ([hidden email]) wrote:
> So, Do we need the timescaleDB as mandatory to handle time series data? Is
> there any way to handle hourly to days,months,yearly data with PGSQL alone
> without timescale addon?

Certainly there is and a lot of people do it- what isn't clear is what
it is you feel is missing from PG when it comes to handling time series
data..?  Generally speaking there's concerns about PG's ability to
handle lots of partitions (which comes from there being very large
amounts of data being stored), but v12 and v13 have made great
improvements in that area and it's not nearly an issue any longer (and
performs better in quite a few cases than extensions).

> Ours is a new project and we are unsure about whether we should have both
> timescaleDB and PGSQL or PGSQL alone is capable of handling this time
> series data by tuning the right indexes.etc..

Partitioning and index tuning in PG (look at using BRIN if you haven't
already...) is important when you get to larger data volumes.

Thanks,

Stephen
Reply | Threaded
Open this post in threaded view
|

Re: Handling time series data with PostgreSQL

Stephen Frost
Greetings,

* Mark Johnson ([hidden email]) wrote:
> I think the OP may be referring to Oracle's Temporal Validity feature.

Perhaps, but that's not the only way to manage time series data.

> [ ... ] In earlier releases of each DBMS we tried to accomplish
> the same by adding pairs of timestamp columns to each table and then
> writing our own code to handle row filtering.  Partitioning isn't needed.
> Certainly partitioning by range could be used, but it would still require
> some manual efforts.

I've found that using the range data types can work quite will, with
overlaps queries, to manage time-series data instead of using pairs of
timestamp columns.  With range data types you can also create exclusion
constraints to ensure that you don't end up introducing overlapping
ranges.

Either way require adjusting your queries though, no?  And inserting and
maintaining the data..?  I can appreciate wanting to be standards
compliant but this specific use-case doesn't really provide much
justification for using this particular feature.  Perhaps there are
better ones.

Thanks,

Stephen

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Handling time series data with PostgreSQL

Jayaram
Thank you Stephen, Mark for your inputs. I will go through the documentation for time series data best practices in PGSQL. 

Regards,
Jayaram S.

On Thu, Oct 8, 2020 at 1:57 AM Stephen Frost <[hidden email]> wrote:
Greetings,

* Mark Johnson ([hidden email]) wrote:
> I think the OP may be referring to Oracle's Temporal Validity feature.

Perhaps, but that's not the only way to manage time series data.

> [ ... ] In earlier releases of each DBMS we tried to accomplish
> the same by adding pairs of timestamp columns to each table and then
> writing our own code to handle row filtering.  Partitioning isn't needed.
> Certainly partitioning by range could be used, but it would still require
> some manual efforts.

I've found that using the range data types can work quite will, with
overlaps queries, to manage time-series data instead of using pairs of
timestamp columns.  With range data types you can also create exclusion
constraints to ensure that you don't end up introducing overlapping
ranges.

Either way require adjusting your queries though, no?  And inserting and
maintaining the data..?  I can appreciate wanting to be standards
compliant but this specific use-case doesn't really provide much
justification for using this particular feature.  Perhaps there are
better ones.

Thanks,

Stephen


--
Thanks & Regards,
Jayaram S,
Banglore.
Mobile: 91-7760951366.