Solving my query needs with Rank and may be CrossTab

Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Solving my query needs with Rank and may be CrossTab

Iaam Onkara
Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help

lup
Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

lup


On Dec 1, 2019, at 3:54 PM, Iaam Onkara <[hidden email]> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help


I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

Iaam Onkara
@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 3:54 PM, Iaam Onkara <[hidden email]> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help


I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).

lup
Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

lup


On Dec 1, 2019, at 4:38 PM, Iaam Onkara <[hidden email]> wrote:


@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 3:54 PM, Iaam Onkara <[hidden email]> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help


I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).


I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem.  (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables. 
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table. 
Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

Iaam Onkara
Yes indexes on Code and Timestamp column may also be needed even though Patient_ID column will be indexed.

By incremental tables do you mean tables with Auto Increment primary key for ID ?

What I am having tough time figuring out is how to transform the result into this even after using multiple CTEs

On Sun, Dec 1, 2019 at 10:58 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 4:38 PM, Iaam Onkara <[hidden email]> wrote:


@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 3:54 PM, Iaam Onkara <[hidden email]> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help


I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).


I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem.  (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables. 
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table. 
lup
Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

lup


On Dec 1, 2019, at 11:09 PM, Iaam Onkara <[hidden email]> wrote:


Yes indexes on Code and Timestamp column may also be needed even though Patient_ID column will be indexed.

I believe you will want a compound index covering both columns
By incremental tables do you mean tables with Auto Increment primary key for ID 
No. I mean a series of intermediate tables each with one more report column. These can be temporary and unlogged but the will need an index on patient. (Again, you have the option of predefining the full report table and repeatedly updating a single column.)

What I am having tough time figuring out is how to transform the result into this even after using multiple CTEs

On Sun, Dec 1, 2019 at 10:58 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 4:38 PM, Iaam Onkara <[hidden email]> wrote:


@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 3:54 PM, Iaam Onkara <[hidden email]> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help


I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).


I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem.  (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables. 
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table. 
Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

Iaam Onkara
@Rob. What your referring to sounds like Materialized views, isn't it? An example query would be helpful in understand your recommendation/approach better.

On Mon, Dec 2, 2019 at 7:42 AM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 11:09 PM, Iaam Onkara <[hidden email]> wrote:


Yes indexes on Code and Timestamp column may also be needed even though Patient_ID column will be indexed.

I believe you will want a compound index covering both columns
By incremental tables do you mean tables with Auto Increment primary key for ID 
No. I mean a series of intermediate tables each with one more report column. These can be temporary and unlogged but the will need an index on patient. (Again, you have the option of predefining the full report table and repeatedly updating a single column.)

What I am having tough time figuring out is how to transform the result into this even after using multiple CTEs

On Sun, Dec 1, 2019 at 10:58 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 4:38 PM, Iaam Onkara <[hidden email]> wrote:


@Rob: There is no time window. It is the latest values for given set of attributes regardless of timestamp. If some attributes have multiple values then multiple rows can be returned with other attributes having blank values.

Creating one sub select for one column is an obvious approach but will not be performant specially when the dataset grows, I am looking for a solution which doesn't require one sub select per column.

On Sun, Dec 1, 2019 at 5:28 PM Rob Sargent <[hidden email]> wrote:


On Dec 1, 2019, at 3:54 PM, Iaam Onkara <[hidden email]> wrote:

Hi Friends,

I have a table with data like this gist https://gist.github.com/daya/d0794efcd4278fc5dce6e7339d03a8fd and I want to fetch the latest values for a given set of attributes so the result set looks like this gist https://gist.github.com/daya/0cb7f8682520a1dd4cdda8c0266f77f6

Please note in the desired result set 
  1. There is an assumed mapping of Code to display i.e. code 39156-5 is BMI.
  2. "Oxygen Saturation" has only one value and "Pulse" has no value
In my attempts and with some help I have  this query 

with v_max as
(SELECT
code, uom, val, created_on, dense_rank() over ( partition by code order by created_on desc) as r
FROM vitals v
where (v.code = '8480-6' or v.code='8462-4' or v.code='39156-5' or v.code='8302-2')
)
SELECT c.display, uom, val, created_on
from v_max v inner join codes c on v.code=c.code
where r = 1; 

which gives this result 

But the result set that I want I am unable to get. Or if is it even possible to get?

Thanks for your help


I take it the last value by timestamp per code per patient is the one to be reported? Or is there a time window?
Turning rows into columns can be done with sub-selects per derived column or (usually faster) temporary tables built up in separate selects with each adding usually one column (but possibly more).


I agree the sub select per column can easily become slow. Incremental tables does not, in my experience, suffer the same problem.  (One can also repeatedly update a single table with the predefined structure.) There maybe a way to get what you want with multiple CTEs but I suspect that approach would be more akin to multiple sub selects than to incremental tables. 
From your further description of the problem it will be critical to have an index on the code AND time stamp columns of the source table. 
lup
Reply | Threaded
Open this post in threaded view
|

Re: Solving my query needs with Rank and may be CrossTab

lup


On Dec 2, 2019, at 9:55 AM, Iaam Onkara <[hidden email]> wrote:

@Rob. What your referring to sounds like Materialized views, isn't it? An example query would be helpful in understand your recommendation/approach better.

On Mon, Dec 2, 2019 at 7:42 AM Rob Sargent <[hidden email]> wrote:


Using the update-fixed-table style:
-- Get all possible people, null their values
create table report as
select distinct patient, null::float as bmi, null::float as sysbp, null::float as diabp, null::int as height
from source_table;
create index on report(patient);
-- get the height code (8302-2 using tilde operator because the import included leading blanks)
update report r set height = last_value 
from (select distinct patient, last_value(measurement) over
       (partition by patient, code
        order by sampletime)
from source_table
where code ~ '8302-2') as m where r.patient = m.patient
;
-- then similar for other codes. You may want to format the results, as in combining sys/dia bp readings after the update operations
-- the time drag of course is forever finding max(measurement time).  A composite index might help; indeed the unique key on the source is patient,code,timestamp I think.