Conditional return of aggregated data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Conditional return of aggregated data

Laura Smith
Hi,

I have some aggregated statistics which are currently being queried as follows:

create view stats_age as
SELECT a.category,
       a.statcount,
       b.a_desc
FROM reg_aggregated_stats a,regs_select_age b where a.category='age' and a.datapoint::smallint=b.a_val order by a.datapoint asc;

However, as these relate to event registrations, a suggestion has been made that the database really should be returning nothing until a certain number of registrations has been met (in order to avoid privacy infringing inferrence from what should be an otherwise fully irreversibly aggregated dataset).

Now, the queries are wrapped in PL/PGSQL functions anyway, so I could make a second call to Postgres to find out sum(statcount) and then conditionally return based on that.

But is there a smarter way to do this out of a single SQL query ?

My initial idea was something along the lines of :
 select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;

But as I soon discovered that's not valid syntax! Hence ideas welcome from those smarter than me.

Thanks !

Laura


Reply | Threaded
Open this post in threaded view
|

Re: Conditional return of aggregated data

Wim Bertels-3
Hallo Laura,

i don't know if i understand your question fully,
but this might be helpfull?: FILTER

SELECT
    count(*) AS unfiltered,
    count(*) FILTER (WHERE i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

https://www.postgresql.org/docs/current/sql-expressions.html

hth,
Wim

Laura Smith schreef op ma 02-12-2019 om 11:10 [+0000]:

> Hi,
>
> I have some aggregated statistics which are currently being queried
> as follows:
>
> create view stats_age as
> SELECT a.category,
>        a.statcount,
>        b.a_desc
> FROM reg_aggregated_stats a,regs_select_age b where a.category='age'
> and a.datapoint::smallint=b.a_val order by a.datapoint asc;
>
> However, as these relate to event registrations, a suggestion has
> been made that the database really should be returning nothing until
> a certain number of registrations has been met (in order to avoid
> privacy infringing inferrence from what should be an otherwise fully
> irreversibly aggregated dataset).
>
> Now, the queries are wrapped in PL/PGSQL functions anyway, so I could
> make a second call to Postgres to find out sum(statcount) and then
> conditionally return based on that.
>
> But is there a smarter way to do this out of a single SQL query ?
>
> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as
> aggstat,statcount,short_name_en from stats_residence where
> aggstat>some_number;
>
> But as I soon discovered that's not valid syntax! Hence ideas welcome
> from those smarter than me.
>
> Thanks !
>
> Laura
>
>
--
mvg,
Wim Bertels
--
Lector
UC Leuven-Limburg
--
My only love sprung from my only hate!
Too early seen unknown, and known too late!
                -- William Shakespeare, "Romeo and Juliet"

Reply | Threaded
Open this post in threaded view
|

Re: Conditional return of aggregated data

Ravi Krishna-5
In reply to this post by Laura Smith

> My initial idea was something along the lines of :
>  select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;

Am I missing something basic.  The above can be done using
GROUP BY and HAVING clause.


Reply | Threaded
Open this post in threaded view
|

Re: Conditional return of aggregated data

Ravi Krishna-5

>
> > My initial idea was something along the lines of :
> >  select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;
>
> Am I missing something basic.  The above can be done using
> GROUP BY and HAVING clause.
or this
with t as
(select (select sum(statcount) from stats_residence) as aggstat,
         statcount,short_name_en from stats_residence
)
select * from t where aggstat > some_number

Apology if I did not understand the question correctly.


Reply | Threaded
Open this post in threaded view
|

Re: Conditional return of aggregated data

Laura Smith
>
> or this
> with t as
> (select (select sum(statcount) from stats_residence) as aggstat,
> statcount,short_name_en from stats_residence
> )
> select * from t where aggstat > some_number
>
> Apology if I did not understand the question correctly.


Hi Ravi,

Thanks for helping show me the way.  You're quite right, a CTE did the trick

Laura


Reply | Threaded
Open this post in threaded view
|

Re: Conditional return of aggregated data

Alban Hertroys-4
In reply to this post by Laura Smith

On Mon, 2 Dec 2019 at 12:11, Laura Smith <[hidden email]> wrote:

My initial idea was something along the lines of :
 select (select sum(statcount) from stats_residence) as aggstat,statcount,short_name_en from stats_residence where aggstat>some_number;
 
One option is to move the aggregate to the where-clause. If you also need the value in your select-list, you can just repeat the subselect there, usually the planner is smart enough to figure out that it can just re-use the result.

select short_name_en from stats_residence where (select sum(statcount) from stats_residence) >some_number;

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.