A GROUP BY question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

A GROUP BY question

stan-9
I am trying to write, what is for me, a fairly complex query. It uses JOINS,
and also GROUP BY. I have this working with the exception of adding the
GROUP BY clause.

Is there some reason I cannot add a GROUP BY function to a JOIN?

Here is what I have:


CREATE OR REPLACE view tasks_view as
select
        project.proj_no ,
        employee.first_name ,
        employee.last_name ,
        employee.id ,
        task_instance.hours ,
        work_type.type,
        work_type.descrip,
        rate.rate,
        employee.hourly_rate ,
        rate.rate * task_instance.hours as result ,
        SUM(rate.rate * task_instance.hours)
        ^^^^^^^^^^^^^^
from
        task_instance
GROUP BY
^^^^^^^^^^^^^^^^^^
        project.project_key
^^^^^^^^^^^^^^^^^^^^^^
join rate on
        rate.employee_key = task_instance.employee_key
        AND
        rate.work_type_key = task_instance.work_type_key
inner join employee on
        rate.employee_key = employee.employee_key
inner join work_type on
        rate.work_type_key = work_type.work_type_key
inner join project on
        project.project_key = task_instance.project_key
ORDER BY
        project.proj_no ,
        employee.id
        ;

The underlined lines are what I added, and I get a syntax error pointing to
the join. This works fine without the added lines.

--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
                                                -- Benjamin Franklin


Reply | Threaded
Open this post in threaded view
|

Re: A GROUP BY question

stan-9
On Tue, Aug 13, 2019 at 05:54:03AM -0400, stan wrote:

> I am trying to write, what is for me, a fairly complex query. It uses JOINS,
> and also GROUP BY. I have this working with the exception of adding the
> GROUP BY clause.
>
> Is there some reason I cannot add a GROUP BY function to a JOIN?
>
> Here is what I have:
>
>
> CREATE OR REPLACE view tasks_view as
> select
> project.proj_no ,
> employee.first_name ,
> employee.last_name ,
> employee.id ,
> task_instance.hours ,
> work_type.type,
> work_type.descrip,
> rate.rate,
> employee.hourly_rate ,
> rate.rate * task_instance.hours as result ,
> SUM(rate.rate * task_instance.hours)
> ^^^^^^^^^^^^^^
> from
> task_instance
> GROUP BY
> ^^^^^^^^^^^^^^^^^^
> project.project_key
> ^^^^^^^^^^^^^^^^^^^^^^
> join rate on
> rate.employee_key = task_instance.employee_key
> AND
> rate.work_type_key = task_instance.work_type_key
> inner join employee on
> rate.employee_key = employee.employee_key
> inner join work_type on
> rate.work_type_key = work_type.work_type_key
> inner join project on
> project.project_key = task_instance.project_key
> ORDER BY
> project.proj_no ,
> employee.id
> ;
>
Maybe I have a basic misunderstanding. What I am trying to get is a total
cost for each project. This would be calculated by multiplying rate and
hours for each row, on a per project base, and then summing all of th
products of this multiplication.

I did get the following to be accepted from a syntax basis, but it returns
rows with the product for each row, and something in the sum column which
is the same.


DROP view tasks_view ;

CREATE OR REPLACE view tasks_view as
select
        project.proj_no ,
        employee.first_name ,
        employee.last_name ,
        employee.id ,
        task_instance.hours ,
        work_type.type,
        work_type.descrip,
        rate.rate,
        employee.hourly_rate ,
        rate.rate * task_instance.hours as result ,
        SUM (rate.rate * task_instance.hours)
from
        task_instance
join rate on
        rate.employee_key = task_instance.employee_key
        AND
        rate.work_type_key = task_instance.work_type_key
inner join employee on
        rate.employee_key = employee.employee_key
inner join work_type on
        rate.work_type_key = work_type.work_type_key
inner join project on
        project.project_key = task_instance.project_key
GROUP BY
        project.project_key ,
        employee.first_name ,
        employee.last_name ,
        employee.id ,
        task_instance.hours ,
        work_type.type,
        work_type.descrip,
        rate.rate,
        employee.hourly_rate
ORDER BY
        project.proj_no
        ;

--
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
                                                -- Benjamin Franklin


Reply | Threaded
Open this post in threaded view
|

Re: A GROUP BY question

Alban Hertroys-4

> On 13 Aug 2019, at 13:10, stan <[hidden email]> wrote:
>
> select
> project.proj_no ,

Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.

> SUM (rate.rate * task_instance.hours)
> from
> task_instance
> join rate on
> rate.employee_key = task_instance.employee_key
> AND
> rate.work_type_key = task_instance.work_type_key

(break)

> inner join employee on
> rate.employee_key = employee.employee_key
> inner join work_type on
> rate.work_type_key = work_type.work_type_key

These are now probably redundant, you don’t need them unless they filter your results.

> inner join project on
> project.project_key = task_instance.project_key

And this JOIN could be dropped if project_key and proj_no weren’t different fields. If both are unique in project, you could drop one of them and keep the same functionality with fewer joins. That said, in the “war” between surrogate and natural keys I’m on the natural keys side. Clearly, not everyone agrees on that.

> GROUP BY
> project.project_key ,

Same columns removed here too.

> ORDER BY
> project.proj_no
> ;

That should give you the total cost for each project.

You could get the same result repeated per employee and per work type as you tried originally, by putting the above revised query as a subquery and joining that back into the full query in the place of your project-related tables (add the project_key so you have something to join against).

The repeated sum risks getting multiplied in the final output though, especially if unaware people will be putting the results in an Excel sheet or something. From experience, that either results in people reporting the wrong financial results (several orders too high) or blaming your query.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Reply | Threaded
Open this post in threaded view
|

Re: A GROUP BY question

Jan Kohnert
In reply to this post by stan-9

Hi Stan,

 

Am Dienstag, 13. August 2019, 13:10:18 CEST schrieb stan:

> Maybe I have a basic misunderstanding. What I am trying to get is a total

> cost for each project. This would be calculated by multiplying rate and

> hours for each row, on a per project base, and then summing all of th

> products of this multiplication.

 

First of all, complex views including many joins, group by, having etc usually tend to show bad performance on large dataset if used for anything else than simple select from ... statement. So at least my personal experience.

 

> CREATE OR REPLACE view tasks_view as

> select

> project.proj_no ,

> employee.first_name ,

> employee.last_name ,

> employee.id ,

> task_instance.hours ,

> work_type.type,

> work_type.descrip,

> rate.rate,

> employee.hourly_rate ,

> rate.rate * task_instance.hours as result ,

--------------------^

this gives you the product *per line*

 

> SUM (rate.rate * task_instance.hours)

--------------------^

this gives you *sum of all products over all grouped lines*

 

> from

> task_instance

> join rate on

> rate.employee_key = task_instance.employee_key

> AND

> rate.work_type_key = task_instance.work_type_key

> inner join employee on

> rate.employee_key = employee.employee_key

> inner join work_type on

> rate.work_type_key = work_type.work_type_key

> inner join project on

> project.project_key = task_instance.project_key

> GROUP BY

> project.project_key ,

> employee.first_name ,

> employee.last_name ,

> employee.id ,

> task_instance.hours ,

> work_type.type,

> work_type.descrip,

> rate.rate,

> employee.hourly_rate

---------------------^

 

You group by the columns you use in the sum, so you will get no sum at all, but the product *per line* as selected just before the sum

 

> ORDER BY

> project.proj_no

> ;

 

You will have to find out if you really need to group by some lines, and take a sum over those lines or need the product (rate.rate * task_instance.hours) per line.

 

--

Best regards

Jan

Reply | Threaded
Open this post in threaded view
|

Re: A GROUP BY question

David G Johnston
In reply to this post by Alban Hertroys-4
On Tuesday, August 13, 2019, Alban Hertroys <[hidden email]> wrote:

> On 13 Aug 2019, at 13:10, stan <[hidden email]> wrote:
>
> select
>       project.proj_no ,

Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.

Sure you can, at least generally, with Window Functions/Expressions (i.e., OVER)

David J.

Reply | Threaded
Open this post in threaded view
|

Re: A GROUP BY question

Alban Hertroys-4

> On 13 Aug 2019, at 15:19, David G. Johnston <[hidden email]> wrote:
>
> On Tuesday, August 13, 2019, Alban Hertroys <[hidden email]> wrote:
>
> > On 13 Aug 2019, at 13:10, stan <[hidden email]> wrote:
> >
> > select
> >       project.proj_no ,
>
> Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.
>
> Sure you can, at least generally, with Window Functions/Expressions (i.e., OVER)

That’s why I added “in a meaningful way” ;)

Repeating the same SUM-result on every line in a group is not what I’d call a meaningful result; the SUM has no bearing on the detailed line and leads to the kind of mistakes I already mentioned.
(For the record; I do this kind of grouping in a hierarchical database regularly, but there the grouped SUM is at a different level in the hierarchy and I consider it thus sufficiently separated from the detail rows.)

Besides, I figured the OP was already struggling with the query syntax, adding window functions into the mix didn’t seem a good idea in the context. Possible?, sure, desirable?, I would say not.

Alban Hertroys
--
There is always an exception to always.