Improper const-evaluation of HAVING with grouping sets and subquery pullup

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Improper const-evaluation of HAVING with grouping sets and subquery pullup

Heikki Linnakangas
This query produces an incorrect result:

regression=# select four, x
   from (select four, ten, 'foo'::text as x from tenk1 ) as t
   group by grouping sets(four, x) having x = 'foo' order by four;
  four |  x
------+-----
     0 |
     1 |
     2 |
     3 |
       | foo
(5 rows)

The "having x = 'foo'" clause should've filtered out the rows where x is
NULL, leaving only the last row as the result. Even though x is a
constant 'foo' in the subquery, HAVING clause is supposed to be
evaluated after grouping. What happens is that subquery pullup replaces
x with the constant, and the "'foo' = 'foo'" qual is later
const-evaluated to true.

I propose the attached patch to fix that. It forces the use of
PlaceHolderVars in subquery pullup, if the parent query has grouping
sets and HAVING. I'm not 100% sure that's the right approach or a misuse
of the placeholder system, so comments welcome. At first, I tried to set
wrap_non_vars=true only when processing the havingQual, so that
placeholders would only be there. But that didn't work out, I think
because grouping sets planning would then put both the Const, and the
PlaceHolderVar for the Const, in the Agg's targetlist, but only one of
them would be set to NULL when doing the grouping.

Another thing is that the check could be made much tighter, so that
PlaceHolderVars were only used for expressions actually used in the
HAVING. But it didn't seem worth the trouble to me.

- Heikki


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

0001-Fix-subquery-pullup-into-a-query-containing-GROUPING.patch (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
Heikki Linnakangas <[hidden email]> writes:
> This query produces an incorrect result:
> regression=# select four, x
>    from (select four, ten, 'foo'::text as x from tenk1 ) as t
>    group by grouping sets(four, x) having x = 'foo' order by four;

> The "having x = 'foo'" clause should've filtered out the rows where x is
> NULL, leaving only the last row as the result. Even though x is a
> constant 'foo' in the subquery, HAVING clause is supposed to be
> evaluated after grouping. What happens is that subquery pullup replaces
> x with the constant, and the "'foo' = 'foo'" qual is later
> const-evaluated to true.

Ouch.

> I propose the attached patch to fix that. It forces the use of
> PlaceHolderVars in subquery pullup, if the parent query has grouping
> sets and HAVING. I'm not 100% sure that's the right approach or a misuse
> of the placeholder system, so comments welcome.

Seems like the point is that grouping sets can inject null values of
columns, in more or less the same way that outer joins can.  So it
seems like using PlaceHolderVars, which were invented to account
for that property of outer joins, is a reasonable approach to a fix.
I don't have time to review the patch in detail right now though;
do you want to put it in the CF queue?

One thing I'm wondering is why only the HAVING clause would be subject
to the problem.  I'm a bit surprised that the "x" in the targetlist
didn't become a constant as well.  This may be pointing to some
klugery in the GROUPING SETS patch that we could clean up if we
use placeholders for this.

                        regards, tom lane


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> One thing I'm wondering is why only the HAVING clause would be
 Tom> subject to the problem. I'm a bit surprised that the "x" in the
 Tom> targetlist didn't become a constant as well. This may be pointing
 Tom> to some klugery in the GROUPING SETS patch that we could clean up
 Tom> if we use placeholders for this.

This shows that the problem can extend to the targetlist too:

select four, x || 'x'
  from (select four, ten, 'foo'::text as x from tenk1 ) as t
 group by grouping sets(four, x);

 four | ?column?
------+----------
    3 | foox
    0 | foox
    1 | foox
    2 | foox
      | foox
(5 rows)

What seems to happen in the original case is that the 'foo' constant
ends up in the projection of the input to the aggregate node, with a Var
in the output.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
In reply to this post by Tom Lane-2
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 >> I propose the attached patch to fix that. It forces the use of
 >> PlaceHolderVars in subquery pullup, if the parent query has grouping
 >> sets and HAVING. I'm not 100% sure that's the right approach or a
 >> misuse of the placeholder system, so comments welcome.

I've been testing Heikki's patch with the havingQual condition removed,
and I haven't found any issues yet.

 Tom> One thing I'm wondering is why only the HAVING clause would be
 Tom> subject to the problem. I'm a bit surprised that the "x" in the
 Tom> targetlist didn't become a constant as well. This may be pointing
 Tom> to some klugery in the GROUPING SETS patch that we could clean up
 Tom> if we use placeholders for this.

As far as I can tell, the "x" _does_ become a constant, but then in
setrefs, because it's still labelled with a sortgroupref, it gets
replaced by a Var again. I don't recall touching any of that in the GS
work, because it was already like that for plain GROUP BY.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> I've been testing Heikki's patch with the havingQual condition
 Andrew> removed, and I haven't found any issues yet.

Further experimentation reveals that this change has the effect of
blocking constant-folding in aggregate input expressions that refer to
constant outputs of the subquery.

Is it worth doing anything about that?

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
Andrew Gierth <[hidden email]> writes:
> Further experimentation reveals that this change has the effect of
> blocking constant-folding in aggregate input expressions that refer to
> constant outputs of the subquery.
> Is it worth doing anything about that?

Uh ... but I thought the point here is that the outputs aren't really
constant in the presence of grouping sets.

                        regards, tom lane


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 >> Further experimentation reveals that this change has the effect of
 >> blocking constant-folding in aggregate input expressions that refer
 >> to constant outputs of the subquery. Is it worth doing anything
 >> about that?

 Tom> Uh ... but I thought the point here is that the outputs aren't
 Tom> really constant in the presence of grouping sets.

select x, y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
 group by grouping sets (x,y);

x and y aren't constants, but z is.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
Andrew Gierth <[hidden email]> writes:
> "Tom" == Tom Lane <[hidden email]> writes:
>  Tom> Uh ... but I thought the point here is that the outputs aren't
>  Tom> really constant in the presence of grouping sets.

> select x, y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
>  group by grouping sets (x,y);

> x and y aren't constants, but z is.

OK, but that just means we should put PHV wrapping around only the
grouping-set columns.

BTW, also need to think about GS expressions, eg

select x+y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
 group by grouping sets (x+y);

Not real sure what needs to happen here.

                        regards, tom lane


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 >> x and y aren't constants, but z is.

 Tom> OK, but that just means we should put PHV wrapping around only the
 Tom> grouping-set columns.

Well, can we also take advantage of the fact that we know that anything
that's not in the grouping-set columns must be in an aggregate argument,
and just omit the PHV inside aggregate args? (And even if grouping
columns appear inside aggregate args, they are _not_ nulled out there.)

 Tom> BTW, also need to think about GS expressions, eg

 Tom> select x+y, sum(z) from (select 1 as x, 2 as y, 3 as z) s
 Tom>  group by grouping sets (x+y);

 Tom> Not real sure what needs to happen here.

That one currently works (note you have to add another grouping set to
test it, since the case of exactly one grouping set is reduced to plain
GROUP BY) because setrefs fixes up the reference after-the-fact,
replacing the outer x+y (or whatever it got munged to) with a Var based
on matching the sortgroupref. This currently fails:

select (x+y)*1, sum(z) from (select 1 as x, 2 as y, 3 as z) s
 group by grouping sets (x+y, x);

because the logic in setrefs that would normally detect that (x+y)
exists in the child tlist doesn't fire because the whole expression was
replaced by a constant.

With the patch to use PHVs it works, but I admit to some confusion over
exactly why.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Heikki Linnakangas
Here's another interesting case, without any subqueries:

postgres=# SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
  newalias1 | newalias3
-----------+-----------
          1 |         1
          3 |         3
          2 |         2
          2 |         2
          3 |         3
          1 |         1
(6 rows)

Why are there no "summary" rows with NULLs, despite the ROLLUP? If you
replace one of the g's with (g+0), you get the expected result:

postgres=# SELECT g as newalias1, (g+0) as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
  newalias1 | newalias3
-----------+-----------
          1 |         1
          3 |         3
          2 |         2
          2 |
          3 |
          1 |
(6 rows)

- Heikki


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Heikki" == Heikki Linnakangas <[hidden email]> writes:

 Heikki> Here's another interesting case, without any subqueries:
 Heikki> postgres=# SELECT g as newalias1, g as newalias3
 Heikki> FROM generate_series(1,3) g
 Heikki> GROUP BY newalias1, ROLLUP(newalias3);
 Heikki>  newalias1 | newalias3
 Heikki> -----------+-----------
 Heikki>          1 |         1
 Heikki>          3 |         3
 Heikki>          2 |         2
 Heikki>          2 |         2
 Heikki>          3 |         3
 Heikki>          1 |         1
 Heikki> (6 rows)

 Heikki> Why are there no "summary" rows with NULLs, despite the ROLLUP?

To my knowledge this is the correct result. (Though neither version of
the query is legal per the SQL spec; allowing expressions and aliases in
GROUP BY are nonstandard extensions.)

Here's why it happens: after substituting for the aliases, you have

GROUP BY g, rollup(g)

which is equivalent to

GROUP BY GROUPING SETS ((g,g), (g))

which is equivalent to

GROUP BY GROUPING SETS ((g), (g))

because duplicate terms within a single grouping set are redundant just
as they are in GROUP BY.

 Heikki> If you replace one of the g's with (g+0), you get the expected
 Heikki> result:

Well, in this case the terms in the grouping set are no longer
duplicate; the expansion becomes

GROUP BY GROUPING SETS ((g,(g+0)), (g))

and therefore the (g+0) expression becomes null for one of the resulting
sets.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> To my knowledge this is the correct result. (Though neither
 Andrew> version of the query is legal per the SQL spec; allowing
 Andrew> expressions and aliases in GROUP BY are nonstandard
 Andrew> extensions.)

And neither Oracle nor MSSQL (at least the versions available on
sqlfiddle) allow either query, so there's nothing I can compare against.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
In reply to this post by Andrew Gierth
Andrew Gierth <[hidden email]> writes:
> "Heikki" == Heikki Linnakangas <[hidden email]> writes:
>  Heikki> Why are there no "summary" rows with NULLs, despite the ROLLUP?

> Here's why it happens: after substituting for the aliases, you have
> GROUP BY g, rollup(g)
> which is equivalent to
> GROUP BY GROUPING SETS ((g,g), (g))
> which is equivalent to
> GROUP BY GROUPING SETS ((g), (g))

I don't think I buy this explanation, because the plan tree doesn't show
any indication that we're actually folding (g,g) to (g):

regression=# EXPLAIN SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
                                   QUERY PLAN                                  
--------------------------------------------------------------------------------
 HashAggregate  (cost=15.00..21.50 rows=400 width=8)
   Hash Key: g, g
   Hash Key: g
   ->  Function Scan on generate_series g  (cost=0.00..10.00 rows=1000 width=8)
(4 rows)

regression=# EXPLAIN SELECT g as newalias1, g+0 as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
                                   QUERY PLAN                                  
--------------------------------------------------------------------------------
 HashAggregate  (cost=17.50..25.00 rows=400 width=8)
   Hash Key: g, (g + 0)
   Hash Key: g
   ->  Function Scan on generate_series g  (cost=0.00..12.50 rows=1000 width=8)
(4 rows)

If these behave differently, why does the plan structure look the same?

I think that Heikki's expectation is the correct one, and the reason the
output looks the way it does is that setrefs.c is dropping the ball
somehow and confusing the two "g" references.  The finished plan has two
identical Var references in the Agg node's output tlist:

              :targetlist (
                 {TARGETENTRY
                 :expr
                    {VAR
                    :varno 65001
                    :varattno 1
                    :vartype 23
                    :vartypmod -1
                    :varcollid 0
                    :varlevelsup 0
                    :varnoold 1
                    :varoattno 1
                    :location 7
                    }
                 :resno 1
                 :resname newalias1
                 :ressortgroupref 1
                 :resorigtbl 0
                 :resorigcol 0
                 :resjunk false
                 }
                 {TARGETENTRY
                 :expr
                    {VAR
                    :varno 65001
                    :varattno 1
                    :vartype 23
                    :vartypmod -1
                    :varcollid 0
                    :varlevelsup 0
                    :varnoold 1
                    :varoattno 1
                    :location 23
                    }
                 :resno 2
                 :resname newalias3
                 :ressortgroupref 2
                 :resorigtbl 0
                 :resorigcol 0
                 :resjunk false
                 }
              )

The two "g" values are correctly distinguished in the FunctionScan's
tlist, however:

                 :targetlist (
                    {TARGETENTRY
                    :expr
                       {VAR
                       :varno 1
                       :varattno 1
                       :vartype 23
                       :vartypmod -1
                       :varcollid 0
                       :varlevelsup 0
                       :varnoold 1
                       :varoattno 1
                       :location 7
                       }
                    :resno 1
                    :resname <>
                    :ressortgroupref 1
                    :resorigtbl 0
                    :resorigcol 0
                    :resjunk false
                    }
                    {TARGETENTRY
                    :expr
                       {VAR
                       :varno 1
                       :varattno 1
                       :vartype 23
                       :vartypmod -1
                       :varcollid 0
                       :varlevelsup 0
                       :varnoold 1
                       :varoattno 1
                       :location 23
                       }
                    :resno 2
                    :resname <>
                    :ressortgroupref 2
                    :resorigtbl 0
                    :resorigcol 0
                    :resjunk false
                    }
                 )

We should have used ressortgroupref matching to prevent this, but without
having checked the code right now, I think that we might only apply that
logic to non-Var tlist entries.  If the Agg output tlist had mentioned
column 2 not column 1 of the child node, I bet we'd get the right answer.

Digression: this seems like another example in which the "same" Var can
represent two different values.  I've had a bee in my bonnet for awhile
that we need to stop doing that, but I'm not entirely sure what to do
instead.  In the case of Vars that might go to null because of an outer
join, we could perhaps fix things by not smashing join alias Vars to their
referents (at least, not till much much later in the planner than we do
now).  That doesn't seem to apply here though.  Perhaps the GROUP BY
operation ought to be understood as providing its own set of output Vars?
That would mean creating an RTE to represent GROUP BY, but maybe that's
not awful.

In the nearer term, maybe we can get a back-patchable fix by being
more careful about ressortgroupref matching.

                        regards, tom lane


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
I wrote:
> I think that Heikki's expectation is the correct one, and the reason the
> output looks the way it does is that setrefs.c is dropping the ball
> somehow and confusing the two "g" references.  ...
> We should have used ressortgroupref matching to prevent this, but without
> having checked the code right now, I think that we might only apply that
> logic to non-Var tlist entries.  If the Agg output tlist had mentioned
> column 2 not column 1 of the child node, I bet we'd get the right answer.

Indeed, the attached patch passes all regression tests and produces the
same answers for both of Heikki's examples:

regression=# SELECT g as newalias1, g as newalias3
FROM generate_series(1,3) g
GROUP BY newalias1, ROLLUP(newalias3);
 newalias1 | newalias3
-----------+-----------
         1 |         1
         3 |         3
         2 |         2
         2 |          
         3 |          
         1 |          
(6 rows)


                        regards, tom lane


diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1382b67..9aeaf87 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** set_upper_references(PlannerInfo *root,
*** 1744,1751 ****
  TargetEntry *tle = (TargetEntry *) lfirst(l);
  Node   *newexpr;
 
! /* If it's a non-Var sort/group item, first try to match by sortref */
! if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
  {
  newexpr = (Node *)
  search_indexed_tlist_for_sortgroupref(tle->expr,
--- 1744,1751 ----
  TargetEntry *tle = (TargetEntry *) lfirst(l);
  Node   *newexpr;
 
! /* If it's a sort/group item, first try to match by sortref */
! if (tle->ressortgroupref != 0)
  {
  newexpr = (Node *)
  search_indexed_tlist_for_sortgroupref(tle->expr,
*************** search_indexed_tlist_for_non_var(Expr *n
*** 2113,2119 ****
 
  /*
   * search_indexed_tlist_for_sortgroupref --- find a sort/group expression
-  * (which is assumed not to be just a Var)
   *
   * If a match is found, return a Var constructed to reference the tlist item.
   * If no match, return NULL.
--- 2113,2118 ----


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
In reply to this post by Tom Lane-2
>>>>> "Tom" == Tom Lane <[hidden email]> writes:

 Tom> I don't think I buy this explanation, because the plan tree
 Tom> doesn't show any indication that we're actually folding (g,g) to
 Tom> (g):

Huh. Yes, you're right.

 Tom> Digression: this seems like another example in which the "same"
 Tom> Var can represent two different values.

At one point in the evolution of the GS patch it had a new node type,
GroupedVar or some such name, to represent the possibly-nulled-out
values. I'm not sure that it was being introduced early enough in
planning to have prevented this problem (I suspect not, I'd have to dig
up the old code). That stuff was all ripped out very late in the
development process because as I recall, both you and Andres disliked
it.

--
Andrew (irc:RhodiumToad)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
In reply to this post by Tom Lane-2
I wrote:
>> We should have used ressortgroupref matching to prevent this, but without
>> having checked the code right now, I think that we might only apply that
>> logic to non-Var tlist entries.  If the Agg output tlist had mentioned
>> column 2 not column 1 of the child node, I bet we'd get the right answer.

> Indeed, the attached patch passes all regression tests and produces the
> same answers for both of Heikki's examples:

I did some back-porting work on this.  The patch applies back to 9.5
where grouping sets were introduced, but it only fixes the problem
in 9.6 and later --- in 9.5, you still get the wrong output :-(.

Bisecting says the behavior changed at commit 3fc6e2d7f "Make the upper
part of the planner work by generating and comparing Paths".  Ugh.
We are certainly not back-patching that into 9.5, and I'm disinclined
even to try to identify exactly why that commit changed the behavior.

Given that this is such a weird corner case, and we've not heard
complaints from actual users about it, I'm inclined just to apply
the patch in 9.6 and up and call it good.

                        regards, tom lane


diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1382b67..fa9a3f0 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** set_upper_references(PlannerInfo *root,
*** 1744,1751 ****
  TargetEntry *tle = (TargetEntry *) lfirst(l);
  Node   *newexpr;
 
! /* If it's a non-Var sort/group item, first try to match by sortref */
! if (tle->ressortgroupref != 0 && !IsA(tle->expr, Var))
  {
  newexpr = (Node *)
  search_indexed_tlist_for_sortgroupref(tle->expr,
--- 1744,1751 ----
  TargetEntry *tle = (TargetEntry *) lfirst(l);
  Node   *newexpr;
 
! /* If it's a sort/group item, first try to match by sortref */
! if (tle->ressortgroupref != 0)
  {
  newexpr = (Node *)
  search_indexed_tlist_for_sortgroupref(tle->expr,
*************** search_indexed_tlist_for_non_var(Expr *n
*** 2113,2119 ****
 
  /*
   * search_indexed_tlist_for_sortgroupref --- find a sort/group expression
-  * (which is assumed not to be just a Var)
   *
   * If a match is found, return a Var constructed to reference the tlist item.
   * If no match, return NULL.
--- 2113,2118 ----
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index fd618af..833d515 100644
*** a/src/test/regress/expected/groupingsets.out
--- b/src/test/regress/expected/groupingsets.out
*************** select a, d, grouping(a,b,c)
*** 360,365 ****
--- 360,394 ----
   2 | 2 |        2
  (4 rows)
 
+ -- check that distinct grouping columns are kept separate
+ -- even if they are equal()
+ explain (costs off)
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+                    QUERY PLAN                  
+ ------------------------------------------------
+  GroupAggregate
+    Group Key: g, g
+    Group Key: g
+    ->  Sort
+          Sort Key: g
+          ->  Function Scan on generate_series g
+ (6 rows)
+
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+  alias1 | alias2
+ --------+--------
+       1 |      1
+       1 |      
+       2 |      2
+       2 |      
+       3 |      3
+       3 |      
+ (6 rows)
+
  -- simple rescan tests
  select a, b, sum(v.x)
    from (values (1),(2)) v(x), gstest_data(v.x)
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 564ebc9..2b4ab69 100644
*** a/src/test/regress/sql/groupingsets.sql
--- b/src/test/regress/sql/groupingsets.sql
*************** select a, d, grouping(a,b,c)
*** 141,146 ****
--- 141,157 ----
    from gstest3
   group by grouping sets ((a,b), (a,c));
 
+ -- check that distinct grouping columns are kept separate
+ -- even if they are equal()
+ explain (costs off)
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+
+ select g as alias1, g as alias2
+   from generate_series(1,3) g
+  group by alias1, rollup(alias2);
+
  -- simple rescan tests
 
  select a, b, sum(v.x)


--
Sent via pgsql-bugs mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Michael Paquier
On Thu, Oct 26, 2017 at 4:29 AM, Tom Lane <[hidden email]> wrote:
> Given that this is such a weird corner case, and we've not heard
> complaints from actual users about it, I'm inclined just to apply
> the patch in 9.6 and up and call it good.

Tom, are you planning to finish wrapping up this one? For now I am
moving it to next CF.
--
Michael

Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Andrew Gierth
>>>>> "Michael" == Michael Paquier <[hidden email]> writes:

 Michael> Tom, are you planning to finish wrapping up this one? For now
 Michael> I am moving it to next CF.

I can take it if Tom and Heikki are busy.

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
Andrew Gierth <[hidden email]> writes:
> "Michael" == Michael Paquier <[hidden email]> writes:
>  Michael> Tom, are you planning to finish wrapping up this one? For now
>  Michael> I am moving it to next CF.

> I can take it if Tom and Heikki are busy.

FWIW, I'm not really comfortable that the proposed patch is correct
or complete.  It may just need more study to get there.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: Improper const-evaluation of HAVING with grouping sets and subquery pullup

Tom Lane-2
I wrote:
> FWIW, I'm not really comfortable that the proposed patch is correct
> or complete.  It may just need more study to get there.

I've done another round of study on this patch.  The attached updated
version is the same code as Heikki proposed (minus the incorrect
restriction to queries with HAVING quals), but I reworked the comments
and expanded the regression test cases.

One thing that wasn't clear to me before was whether we need wrap_non_vars
for this case or not; we don't for outer joins, so I was unconvinced about
it here.  It turns out we do: the point of the wrapper is to prevent
constant folding or other expression preprocessing from merging a
pulled-up expression with the surrounding expression, resulting in
something that won't match the grouping set expression when it comes time
to do that matching.  For instance if we have a boolean subquery output
expression, say "x = y as cond", and that gets hoisted into an upper
expression "not cond", then without the PHV wrapper we will happily
simplify that to "x <> y" which will not match the grouping set
expression.  There's a regression test below that misbehaves if you take
out the "wrap_non_vars = true" line.

I spent some time thinking about Andrew's observation that we don't really
need the wrappers everyplace.  It's true, but pullup_replace_vars is far
from being able to do the right thing there, and I'm not sure that trying
to teach it to do so is reasonable.  (I'm inclined to think that the idea
I threw out upthread about converting grouping expressions into Vars
belonging to a new RTE kind might be the way to go.)  In any case I don't
think we'd possibly come out with a patch simple enough to back-patch.
So let's leave that optimization for future work.

I think the attached is probably ready to go, though I've not checked yet
whether it will work pre-9.6.  Anyone want to do more review?

                        regards, tom lane


diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 0e2a220..45d82da 100644
*** a/src/backend/optimizer/prep/prepjointree.c
--- b/src/backend/optimizer/prep/prepjointree.c
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1003,1013 ****
 
  /*
  * The subquery's targetlist items are now in the appropriate form to
! * insert into the top query, but if we are under an outer join then
! * non-nullable items and lateral references may have to be turned into
! * PlaceHolderVars.  If we are dealing with an appendrel member then
! * anything that's not a simple Var has to be turned into a
! * PlaceHolderVar.  Set up required context data for pullup_replace_vars.
  */
  rvcontext.root = root;
  rvcontext.targetlist = subquery->targetList;
--- 1003,1010 ----
 
  /*
  * The subquery's targetlist items are now in the appropriate form to
! * insert into the top query, except that we may need to wrap them in
! * PlaceHolderVars.  Set up required context data for pullup_replace_vars.
  */
  rvcontext.root = root;
  rvcontext.targetlist = subquery->targetList;
*************** pull_up_simple_subquery(PlannerInfo *roo
*** 1019,1032 ****
  rvcontext.relids = NULL;
  rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
  rvcontext.varno = varno;
! rvcontext.need_phvs = (lowest_nulling_outer_join != NULL ||
!   containing_appendrel != NULL);
! rvcontext.wrap_non_vars = (containing_appendrel != NULL);
  /* initialize cache array with indexes 0 .. length(tlist) */
  rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) *
  sizeof(Node *));
 
  /*
  * Replace all of the top query's references to the subquery's outputs
  * with copies of the adjusted subtlist items, being careful not to
  * replace any of the jointree structure. (This'd be a lot cleaner if we
--- 1016,1064 ----
  rvcontext.relids = NULL;
  rvcontext.outer_hasSubLinks = &parse->hasSubLinks;
  rvcontext.varno = varno;
! /* these flags will be set below, if needed */
! rvcontext.need_phvs = false;
! rvcontext.wrap_non_vars = false;
  /* initialize cache array with indexes 0 .. length(tlist) */
  rvcontext.rv_cache = palloc0((list_length(subquery->targetList) + 1) *
  sizeof(Node *));
 
  /*
+ * If we are under an outer join then non-nullable items and lateral
+ * references may have to be turned into PlaceHolderVars.
+ */
+ if (lowest_nulling_outer_join != NULL)
+ rvcontext.need_phvs = true;
+
+ /*
+ * If we are dealing with an appendrel member then anything that's not a
+ * simple Var has to be turned into a PlaceHolderVar.  We force this to
+ * ensure that what we pull up doesn't get merged into a surrounding
+ * expression during later processing and then fail to match the
+ * expression actually available from the appendrel.
+ */
+ if (containing_appendrel != NULL)
+ {
+ rvcontext.need_phvs = true;
+ rvcontext.wrap_non_vars = true;
+ }
+
+ /*
+ * If the parent query uses grouping sets, we need a PlaceHolderVar for
+ * anything that's not a simple Var.  Again, this ensures that expressions
+ * retain their separate identity so that they will match grouping set
+ * columns when appropriate.  (It'd be sufficient to wrap values used in
+ * grouping set columns, and do so only in non-aggregated portions of the
+ * tlist and havingQual, but that would require a lot of infrastructure
+ * that pullup_replace_vars hasn't currently got.)
+ */
+ if (parse->groupingSets)
+ {
+ rvcontext.need_phvs = true;
+ rvcontext.wrap_non_vars = true;
+ }
+
+ /*
  * Replace all of the top query's references to the subquery's outputs
  * with copies of the adjusted subtlist items, being careful not to
  * replace any of the jointree structure. (This'd be a lot cleaner if we
diff --git a/src/test/regress/expected/groupingsets.out b/src/test/regress/expected/groupingsets.out
index 833d515..cbfdbfd 100644
*** a/src/test/regress/expected/groupingsets.out
--- b/src/test/regress/expected/groupingsets.out
*************** select g as alias1, g as alias2
*** 389,394 ****
--- 389,439 ----
        3 |      
  (6 rows)
 
+ -- check that pulled-up subquery outputs still go to null when appropriate
+ select four, x
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   having x = 'foo';
+  four |  x  
+ ------+-----
+       | foo
+ (1 row)
+
+ select four, x || 'x'
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   order by four;
+  four | ?column?
+ ------+----------
+     0 |
+     1 |
+     2 |
+     3 |
+       | foox
+ (5 rows)
+
+ select (x+y)*1, sum(z)
+  from (select 1 as x, 2 as y, 3 as z) s
+  group by grouping sets (x+y, x);
+  ?column? | sum
+ ----------+-----
+         3 |   3
+           |   3
+ (2 rows)
+
+ select x, not x as not_x, q2 from
+   (select *, q1 = 1 as x from int8_tbl i1) as t
+   group by grouping sets(x, q2)
+   order by x, q2;
+  x | not_x |        q2        
+ ---+-------+-------------------
+  f | t     |                  
+    |       | -4567890123456789
+    |       |               123
+    |       |               456
+    |       |  4567890123456789
+ (5 rows)
+
  -- simple rescan tests
  select a, b, sum(v.x)
    from (values (1),(2)) v(x), gstest_data(v.x)
diff --git a/src/test/regress/sql/groupingsets.sql b/src/test/regress/sql/groupingsets.sql
index 2b4ab69..b28d821 100644
*** a/src/test/regress/sql/groupingsets.sql
--- b/src/test/regress/sql/groupingsets.sql
*************** select g as alias1, g as alias2
*** 152,157 ****
--- 152,177 ----
    from generate_series(1,3) g
   group by alias1, rollup(alias2);
 
+ -- check that pulled-up subquery outputs still go to null when appropriate
+ select four, x
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   having x = 'foo';
+
+ select four, x || 'x'
+   from (select four, ten, 'foo'::text as x from tenk1) as t
+   group by grouping sets (four, x)
+   order by four;
+
+ select (x+y)*1, sum(z)
+  from (select 1 as x, 2 as y, 3 as z) s
+  group by grouping sets (x+y, x);
+
+ select x, not x as not_x, q2 from
+   (select *, q1 = 1 as x from int8_tbl i1) as t
+   group by grouping sets(x, q2)
+   order by x, q2;
+
  -- simple rescan tests
 
  select a, b, sum(v.x)
12
Previous Thread Next Thread