BUG #15383: Join Filter cost estimation problem in 10.5

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15383: Join Filter cost estimation problem in 10.5

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15383
Logged by:          Marko Tiikkaja
Email address:      [hidden email]
PostgreSQL version: 10.5
Operating system:   Linux
Description:        

I was looking at a problematic plan submitted by "sjamaan" on IRC and I
noticed that the join filter estimation seems completely wrong here:

create function expensive_func(int) returns int as $$ begin return 1; end $$
language plpgsql stable cost 10000;
create table unique_inner(a int primary key);
insert into unique_inner select generate_series(1, 10000);

explain select * from unique_inner gs1(i) join generate_series(1, 10) gs2(i)
using (i) where expensive_func(gs1.i + gs2.i) > 0;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Hash Join  (cost=303.19..315.81 rows=333 width=4)
   Hash Cond: (gs2.i = gs1.i)
   Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
   ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000
width=4)
   ->  Hash  (cost=159.75..159.75 rows=11475 width=4)
         ->  Seq Scan on unique_inner gs1  (cost=0.00..159.75 rows=11475
width=4)
(6 rows)

(Notice how even though the function is expected to be called at least 333
times, the cost doesn't account for even a single call.)

Dropping the primary key constraint makes the costs more reasonable (though
I'm still not sure how the planner arrives at these costs):

alter table unique_inner drop constraint unique_inner_pkey;
explain select * from unique_inner gs1(i) join generate_series(1, 10) gs2(i)
using (i) where expensive_func(gs1.i + gs2.i) > 0;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Hash Join  (cost=22.50..1436880.94 rows=19125 width=4)
   Hash Cond: (gs1.i = gs2.i)
   Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
   ->  Seq Scan on unique_inner gs1  (cost=0.00..159.75 rows=11475
width=4)
   ->  Hash  (cost=10.00..10.00 rows=1000 width=4)
         ->  Function Scan on generate_series gs2  (cost=0.00..10.00
rows=1000 width=4)
(6 rows)

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15383: Join Filter cost estimation problem in 10.5

Tom Lane-2
=?utf-8?q?PG_Bug_reporting_form?= <[hidden email]> writes:

>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Hash Join  (cost=303.19..315.81 rows=333 width=4)
>    Hash Cond: (gs2.i = gs1.i)
>    Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
>    ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000
> width=4)
>    ->  Hash  (cost=159.75..159.75 rows=11475 width=4)
>          ->  Seq Scan on unique_inner gs1  (cost=0.00..159.75 rows=11475
> width=4)
> (6 rows)

> (Notice how even though the function is expected to be called at least 333
> times, the cost doesn't account for even a single call.)

Yeah.  This evidently got broken sometime during v10 development,
because 9.6 and below generate a more reasonable cost:

 Hash Join  (cost=270.00..25298.75 rows=333 width=4)
   Hash Cond: (gs2.i = gs1.i)
   Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
   ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000 width=4)
   ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
         ->  Seq Scan on unique_inner gs1  (cost=0.00..145.00 rows=10000 width=4)

> Dropping the primary key constraint makes the costs more reasonable

Interesting.  That sort of points the finger in the direction of the
inner_unique patch, though it could be elsewhere.

Will look into it if nobody beats me to it.

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15383: Join Filter cost estimation problem in 10.5

David Rowley-3
On 14 September 2018 at 05:57, Tom Lane <[hidden email]> wrote:

>
> > (Notice how even though the function is expected to be called at least 333
> > times, the cost doesn't account for even a single call.)
>
> Yeah.  This evidently got broken sometime during v10 development,
> because 9.6 and below generate a more reasonable cost:
>
>  Hash Join  (cost=270.00..25298.75 rows=333 width=4)
>    Hash Cond: (gs2.i = gs1.i)
>    Join Filter: (expensive_func((gs1.i + gs2.i)) > 0)
>    ->  Function Scan on generate_series gs2  (cost=0.00..10.00 rows=1000 width=4)
>    ->  Hash  (cost=145.00..145.00 rows=10000 width=4)
>          ->  Seq Scan on unique_inner gs1  (cost=0.00..145.00 rows=10000 width=4)
>
> > Dropping the primary key constraint makes the costs more reasonable
>
> Interesting.  That sort of points the finger in the direction of the
> inner_unique patch, though it could be elsewhere.

This seems to be a result of using the semifactors.outer_match_frac in
final_cost_hashjoin(). This is calculated to be very low @
3.3333333333333335e-05, which results in outer_matched_rows being set
to 0 in:

outer_matched_rows = rint(outer_path_rows *
extra->semifactors.outer_match_frac);

It's not all that obvious what might be done to fix this giving that
the low outer_match_frac is the result of performing an estimation on
both the gs1.i = gs2.i qual and the function call. Each of which are
estimated independently as:

gs1.i = gs2.i = 0.0001
expensive_func(gs1.i + gs2.i) > 0 = 0.33333333333333331

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15383: Join Filter cost estimation problem in 10.5

Tom Lane-2
David Rowley <[hidden email]> writes:
> On 14 September 2018 at 05:57, Tom Lane <[hidden email]> wrote:
>> Yeah.  This evidently got broken sometime during v10 development,
>> because 9.6 and below generate a more reasonable cost:

> This seems to be a result of using the semifactors.outer_match_frac in
> final_cost_hashjoin(). This is calculated to be very low @
> 3.3333333333333335e-05, which results in outer_matched_rows being set
> to 0 in:

So after poking around for awhile, my conclusion is that the cost
estimation aspects of the inner_unique patch are completely broken,
and it's not going to be very easy to fix.

The core issue here is that compute_semi_anti_join_factors was never
designed to work for any joins that aren't really SEMI or ANTI joins,
and just taking out the assert about that doesn't fix it.  In
particular, passing jointype == JOIN_SEMI to clauselist_selectivity,
when the underlying sjinfo has jointype == JOIN_INNER, isn't a supported
combination.  If you look at eqjoinsel() you will notice that it pays
no attention at all to the jointype parameter, only to sjinfo->jointype.
Therefore what we get out of the first clauselist_selectivity is the
same value as we get from the second one (ie, inner-join selectivity)
leading to entirely insane results from compute_semi_anti_join_factors.

There are more problems though.  If you change eqjoinsel() to look at
the jointype parameter, you get a cost estimate of around 4000, which
is because outer_match_frac gets set to 0.16667, which is better but
still not exactly good.  The reason for that is that eqjoinsel_semi
punts (due to lack of any stats for the generate_series RTE) and
returns 0.5, and likewise we get a default estimate of 0.33333 from
the inequality on the expensive_func result, so 0.16667 is the
combined jselec estimate.  So the default estimate from eqjoinsel_semi
is unrelated to the default estimate from eqjoinsel_inner, which is
unhelpful for what we're doing here, plus scalargtjoinsel didn't
adjust its behavior at all.  We have, in fact, not really built out
the semijoin estimation infrastructure anywhere except eqjoinsel and
neqjoinsel (though I see somebody made it work for networkjoinsel).
This is mostly tolerable for the purposes of real SEMI and ANTI joins,
because it's relatively hard/unusual for those to have join quals that
aren't equality quals.  But if you expect that infrastructure to give
you sane results for random other joins, you're going to be sorely
disappointed.

Also worth noting here is that it's wrong in the first place to be
including the selectivity of the inequality qual in our calculation
of how many rows are going to be fed to the inequality qual :-(.
Again, the infrastructure involved isn't really broken for its
designed purpose, because with a normal semijoin there aren't any
filter quals to be considered separately; but that assumption breaks
down when you try to use that infrastructure for a regular join that
happens to have a unique inner side.

So my conclusion here is that we probably ought to revert the changes
in compute_semi_anti_join_factors that made it not reject other join
types, and that unique_inner cases need some other cost estimation
mechanism that doesn't depend on pretending that a join is a SEMI
join when it isn't.

(Another, longer-term project is to rationalize the situation with
joinsel functions getting jointype parameters that are different
from sjinfo->jointype.  The cases where that can happen are pretty
random and underdocumented, and to the extent that there's any
guidance at all, it's the comment at the head of selfuncs.c that
says it's better to ignore jointype in favor of sjinfo->jointype.
So I'd not be very much on board with just changing eqjoinsel
even if that were enough to fix this completely --- we'd need to
take a very hard look at a bunch of cases to figure out what behavior
we really want the estimators to have.  In the end it might be best to
just refuse to allow those values to be different.)

                        regards, tom lane

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15383: Join Filter cost estimation problem in 10.5

David Rowley-3
On 20 September 2018 at 08:18, Tom Lane <[hidden email]> wrote:

> So after poking around for awhile, my conclusion is that the cost
> estimation aspects of the inner_unique patch are completely broken,
> and it's not going to be very easy to fix.
>
> The core issue here is that compute_semi_anti_join_factors was never
> designed to work for any joins that aren't really SEMI or ANTI joins,
> and just taking out the assert about that doesn't fix it.  In
> particular, passing jointype == JOIN_SEMI to clauselist_selectivity,
> when the underlying sjinfo has jointype == JOIN_INNER, isn't a supported
> combination.  If you look at eqjoinsel() you will notice that it pays
> no attention at all to the jointype parameter, only to sjinfo->jointype.
> Therefore what we get out of the first clauselist_selectivity is the
> same value as we get from the second one (ie, inner-join selectivity)
> leading to entirely insane results from compute_semi_anti_join_factors.
I looked at this again and I see that when we're building the join rel
we call calc_joinrel_size_estimate() to set ->rows. We do this based
on the actual join type as at this stage we've yet to detect if the
join has a unique inner side or not.  My original idea with this code
was that unique joins would be costed in the same way as semi joins
which I had hoped would encourage the unique side to be put on the
inner side when the costs were roughly the same.  Now in light of this
bug, and given that the rows for the join rel is set based on the
original join type, I think it's better just to pull out the code that
attempts to cost unique joins differently.  I don't really see
changing the rows property of the joinrel as something we can actually
do since any updated estimate would be bogus if the join order was
reversed, which is possible for JOIN_INNER with a unique inner, but
not possible for a JOIN_SEMI.

I've attached a patch which is a partial revert of the costing code
that was changed in 9c7f5229ad6.

I'm a bit unsure how good an idea it is to put back the Assert in
compute_semi_anti_join_factors() in the back branches.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

revert_unique_join_costing.patch (11K) Download Attachment