Re: Pulling data from Postgres DB table for every 5 seconds.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

github kran

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran
lup
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

lup


On Jan 9, 2019, at 10:02 AM, github kran <[hidden email]> wrote:


Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

There is DML event trapping.  You don’t poll every 5seconds you react immediately to each event (with trigger or event).  From the trigger perspective you probably have everything you need to update IOT with addition searching.
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

Ron-2
In reply to this post by github kran
On 1/9/19 11:02 AM, github kran wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB

"the data".  All 1GB every 5 seconds?

( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Or just a tiny subset every 5 seconds?


Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

--
Angular momentum makes the world go 'round.
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

github kran
In reply to this post by lup
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.

Ron- Its a tiny subset of 1 GB Data for every 5 seconds but not on the entire data.


Thanks !!.

On Wed, Jan 9, 2019 at 11:10 AM Rob Sargent <[hidden email]> wrote:


On Jan 9, 2019, at 10:02 AM, github kran <[hidden email]> wrote:


Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran

There is DML event trapping.  You don’t poll every 5seconds you react immediately to each event (with trigger or event).  From the trigger perspective you probably have everything you need to update IOT with addition searching.
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

Mark Fletcher
In reply to this post by github kran
On Wed, Jan 9, 2019 at 9:02 AM github kran <[hidden email]> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 


It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great.

Mark
lup
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

lup
In reply to this post by github kran


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

github kran
In reply to this post by Mark Fletcher
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity). 
Any other alternatives ?.

On Wed, Jan 9, 2019 at 11:24 AM Mark Fletcher <[hidden email]> wrote:
On Wed, Jan 9, 2019 at 9:02 AM github kran <[hidden email]> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 


It's unclear whether you want to do table scans or if you're just looking for changes to the database. If you're looking just for changes, consider implementing something using logical replication. We have a logical replication system set up to stream changes from the database into an elastic search cluster, and it works great.

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

github kran
In reply to this post by lup
Rob -   It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.  

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <[hidden email]> wrote:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside
lup
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

lup


On Jan 9, 2019, at 11:11 AM, github kran <[hidden email]> wrote:

Rob -   It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.  

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <[hidden email]> wrote:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside

(Custom here is to “bottom post”)

Have you tried triggers and found them to have too much impact on total system?  I can’t see them being more expensive than looking for changes every 5 seconds.  If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed.  I would have the trigger write to queue and have something else using the queue to talk to IOT piece.

Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?

Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

Mark Fletcher
In reply to this post by github kran
On Wed, Jan 9, 2019 at 10:10 AM github kran <[hidden email]> wrote:
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity). 
Any other alternatives ?.

The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however.

Mark 
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

github kran
In reply to this post by lup



On Wed, Jan 9, 2019 at 12:26 PM Rob Sargent <[hidden email]> wrote:


On Jan 9, 2019, at 11:11 AM, github kran <[hidden email]> wrote:

Rob -   It's a Java based application. We dont have triggers yet on the table and is trigger a only option in 9.6 version ?.  

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent <[hidden email]> wrote:


On 1/9/19 10:21 AM, github kran wrote:
Thanks for your reply Rob. Reading the below documentation link says the EVENT trigger is only supported for DDL commands. Is it not correct ?.

1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html 
(An event trigger fires whenever the event with which it is associated occurs in the database in which it is defined. Currently, the only supported events are ddl_command_startddl_command_endtable_rewrite and sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?. Does it slow down if we are reading the data using the API when we have a trigger in place ?.


Ah, right you are.  Are triggers off the table?  You would want to write the trigger function in some (trusted?) language with access to the outside

(Custom here is to “bottom post”)

Have you tried triggers and found them to have too much impact on total system?  I can’t see them being more expensive than looking for changes every 5 seconds.  If your hardware can scan 1T that quickly then I suspect your trigger will not be noticed.  I would have the trigger write to queue and have something else using the queue to talk to IOT piece.

Failing that, perhaps your java (server-side?) app making the changes can be taught to emit the necessary details to IOT-thingy?

Sure will try with a trigger and send data to a Queue and gather data for every 5 seconds reading off from the queue and send to IOT topic. Already we have a queue where every message inserted or updated on the table is sent to a queue but that is lot of data we are gathering. We want to rather minimize collecting data from DB.
 
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

github kran
In reply to this post by Mark Fletcher


On Wed, Jan 9, 2019 at 12:36 PM Mark Fletcher <[hidden email]> wrote:
On Wed, Jan 9, 2019 at 10:10 AM github kran <[hidden email]> wrote:
Mark - We are currently on 9.6 version of postgres and cant use this feature of logical replication.Answering to your question we are looking for any changes in the data related to a specific table ( changes like any update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product entity). 
Any other alternatives ?.

The feature was added in 9.4 (I think). We are on 9.6 and it works great. Not sure about RDS Aurora, however.

Mark 

Mark - just curious to know on the logical replication. Do you think I can use it for my use case where i need to publish data to a subscriber when there is a change in the data updated for a row or any new inserts happening on the table. Intention
is to send this data in Json format by collecting this modified data in real time to a subscriber.

Tahanks
Kran
 
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

Mark Fletcher
On Wed, Jan 9, 2019 at 12:58 PM github kran <[hidden email]> wrote:

Mark - just curious to know on the logical replication. Do you think I can use it for my use case where i need to publish data to a subscriber when there is a change in the data updated for a row or any new inserts happening on the table. Intention
is to send this data in Json format by collecting this modified data in real time to a subscriber.

From what you've said, it's a great use case for that feature. The one thing to note is that you will have to code up a logical replication client. If I can do it, pretty much anyone can, but it might take some time to get things right. I wrote about some of what I found when developing our client a year ago here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/

We ended up just using the included test output plugin that comes with the postgresql distribution. And we didn't end up streaming to Kafka or anything else first. We just take the data and insert it into our elasticsearch cluster directly as we get it.

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

Martín Marqués-2
El 9/1/19 a las 20:22, Mark Fletcher escribió:

> On Wed, Jan 9, 2019 at 12:58 PM github kran <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     Mark - just curious to know on the logical replication. Do you think
>     I can use it for my use case where i need to publish data to a
>     subscriber when there is a change in the data updated for a row or
>     any new inserts happening on the table. Intention
>     is to send this data in Json format by collecting this modified data
>     in real time to a subscriber.
>
> From what you've said, it's a great use case for that feature. The one
> thing to note is that you will have to code up a logical replication
> client. If I can do it, pretty much anyone can, but it might take some
> time to get things right. I wrote about some of what I found when
> developing our client a year ago
> here: https://wingedpig.com/2017/09/20/streaming-postgres-changes/
>
> We ended up just using the included test output plugin that comes with
> the postgresql distribution. And we didn't end up streaming to Kafka or
> anything else first. We just take the data and insert it into our
> elasticsearch cluster directly as we get it.

I realy doubt that would work. Aurora doesn't have WALs, so how would
you be able to decode the transactions?

AFAIU, you can't use logical decoding on Aurora. Maybe you should be
asking at the Aurora support channel.

Regards,

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: Pulling data from Postgres DB table for every 5 seconds.

Tony Shelver
In reply to this post by github kran
I am not familiar with Aurora, but..


Set up a message queue in Postgres, which calls into AMPQ (RabbitMQ) to send a message for consumption by one or more clients.

This provides a function that can be called from a trigger to send the message.

After that, you have all the goodness of a standards-based open source MQ platform to distribute your data / notifications.



On Wed, 9 Jan 2019 at 19:02, github kran <[hidden email]> wrote:

Hi Postgres Team,

I have an application using RDS Aurora Postgresql 9.6 version having 4 TB of DB size. In this DB we have a table PRODUCT_INFO with around  1 million rows and table size of 1 GB.
We are looking for a implementation where we want to pull the data in real time for every 5 seconds from the DB ( Table mentioned above) and send it to IOT topic whenever an event occurs for a product. ( event is any new product information or change in the existing 
product information.). 

This table has few DML operations in real time either INSERT or UPDATE based on the productId. ( Update whenever there is a change in the product information and INSERT when a record doesnt exists for that product).

We have REST API's built in the backend pulling data from this backend RDS Aurora POSTGRES DB and used by clients. 

UseCase
We dont want clients to pull the data for every 5 seconds from DB but rather provide a service which can fetch the data from DB in real time and push the data to IOT topic by pulling data for every 5 seconds from DB.

Questions
1) How can I get information by pulling from the DB every 5 seconds without impacting the performance of the DB.
2) What are the options I have pulling the data from this table every 5 seconds. Does POSTGRES has any other options apart from TRIGGER ?.


Any ideas would be helpful.

Thanks !!
GithubKran