IS NOT DISTINCT FROM statement

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

IS NOT DISTINCT FROM statement

Artur Zając

I have some query:

 

EXPLAIN ANALYZE select id from sometable where fkey IS NOT DISTINCT FROM 21580;

 

 

                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------

Gather  (cost=10.00..39465.11 rows=1 width=4) (actual time=0.512..129.625 rows=1 loops=1)

   Workers Planned: 4

   Workers Launched: 4

   ->  Parallel Seq Scan on sometable  (cost=0.00..39455.01 rows=1 width=4) (actual time=77.995..103.806 rows=0 loops=5)

         Filter: (NOT (fkey IS DISTINCT FROM 21580))

         Rows Removed by Filter: 675238

Planning time: 0.101 ms

Execution time: 148.517 ms

 

 

Other Query:

 

EXPLAIN ANALYZE select id from table where fkey=21580;

 

                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------

Index Scan using sometable_index1 on sometable  (cost=0.43..8.45 rows=1 width=4) (actual time=0.075..0.076 rows=1 loops=1)

   Index Cond: (fkey = 21580)

Planning time: 0.117 ms

Execution time: 0.101 ms

(4 rows)

 

There is unique index on sometable(fkey);

 

Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” when value on right side of expression is not NULL or is this any way to use index with „IS NOT DISTINCT FROM” statement?

 

 

Artur Zajac

 

 

Reply | Threaded
Open this post in threaded view
|

Re: IS NOT DISTINCT FROM statement

Laurenz Albe
Artur Zając wrote:
> Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” when value
> on right side of expression is not NULL or is this any way to use index with „IS NOT DISTINCT FROM” statement?

That would subtly change the semantics of the expression:

test=> SELECT NULL IS NOT DISTINCT FROM 21580;
 ?column?
----------
 f
(1 row)

test=> SELECT NULL = 21580;
 ?column?
----------
 
(1 row)

One expression is FALSE, the other NULL.

It doesn't matter in the context of your specific query, but it could matter.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com


Reply | Threaded
Open this post in threaded view
|

Re: IS NOT DISTINCT FROM statement

David Rowley-3
In reply to this post by Artur Zając
On Sat, 9 Mar 2019 at 00:30, Artur Zając <[hidden email]> wrote:
> Is there any reason that „NOT DISTINCT FROM” can’t be autotransformed to „=” when value on right side of expression is not NULL or is this any way to use index with „IS NOT DISTINCT FROM” statement?

Probably nothing other than nobody has done it yet.  It might be
reasonable to have some transformation stage called from
distribute_restrictinfo_to_rels() when adding single rel RestrictInfos
to RTE_RELATION base rels.  It's only these you can check for NOT NULL
constraints, i.e. not so possible with rtekinds such as RTE_FUNCTION
and the like.

It becomes more complex if you consider that someone might have added
a partial index on the relation that matches the IS NOT DISTINCT FROM
clause.  In this case, they might not be happy that their index can no
longer be used. Fixing that would require some careful surgery on
predicate_implied_by() to teach it about IS NOT DISTINCT FROM clauses.
However, that seems to go a step beyond what predicate_implied_by()
does for now. Currently, it only gets to know about quals. Not the
relations they belong to, so there'd be no way to know that the NOT
NULL constraint exists from there.  I'm not sure if there's a good
reason for this or not, it might be because it's not been required
before.  It gets more complex still if you want to consider other
quals in the list to prove not nullness.

In short, probably possible, but why not just write an equality
clause, if you know NULLs are not possible?

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

RE: IS NOT DISTINCT FROM statement

Artur Zając

> In short, probably possible, but why not just write an equality clause, if you know NULLs are not possible?

In fact I construct query like this (usually in pl/pgsql).

SELECT column FROM table WHERE column1 IS NOT DISTINCT FROM $1 AND column2 IS NOT DISTINCT FROM $2;

"IS NOT DISTINCT FROM" statement simplifies the query ($1 OR $2 may be null, col1 and col2 has indexes).

I made some workaround. I made function:

CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS
$BODY$
        SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END);
$BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE;


And then

explain analyze select id from sometable where smarteq(id1,21580);
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using sometable_index1 on sometable  (cost=0.43..8.45 rows=1 width=4) (actual time=0.085..0.086 rows=1 loops=1)
   Index Cond: (id1 = 21580)
 Planning time: 0.223 ms
 Execution time: 0.117 ms
(4 rows)

explain analyze select id from sometable where smarteq(id1,NULL);
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on sometable  (cost=19338.59..57045.02 rows=882343 width=4) (actual time=116.236..306.304 rows=881657 loops=1)
   Recheck Cond: (id1 IS NULL)
   Heap Blocks: exact=9581
   ->  Bitmap Index Scan on sometable_index1  (cost=0.00..19118.00 rows=882343 width=0) (actual time=114.209..114.209 rows=892552 loops=1)
         Index Cond: (id1 IS NULL)
 Planning time: 0.135 ms
 Execution time: 339.229 ms

It looks like it works, but I must check if it will still works in plpgsql (I expect some problems if query is prepared).

Artur Zajac



Reply | Threaded
Open this post in threaded view
|

Re: IS NOT DISTINCT FROM statement

David Rowley-3
On Sat, 9 Mar 2019 at 01:25, Artur Zając <[hidden email]> wrote:
> I made some workaround. I made function:
>
> CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS
> $BODY$
>         SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END);
> $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE;

> explain analyze select id from sometable where smarteq(id1,NULL);
>                                                                   QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on sometable  (cost=19338.59..57045.02 rows=882343 width=4) (actual time=116.236..306.304 rows=881657 loops=1)
>    Recheck Cond: (id1 IS NULL)
>    Heap Blocks: exact=9581
>    ->  Bitmap Index Scan on sometable_index1  (cost=0.00..19118.00 rows=882343 width=0) (actual time=114.209..114.209 rows=892552 loops=1)
>          Index Cond: (id1 IS NULL)
>  Planning time: 0.135 ms
>  Execution time: 339.229 ms
>
> It looks like it works, but I must check if it will still works in plpgsql (I expect some problems if query is prepared).

I think with either that you'll just be at the mercy of whether a
generic or custom plan is chosen.  If you get a custom plan then
likely your case statement will be inlined and constant folded away,
but for a generic plan, that can't happen since those constants are
not consts, they're parameters.   Most likely, if you've got an index
on the column you'll perhaps always get a custom plan as the generic
plan would result in a seqscan and it would have to evaluate your case
statement for each row.  By default, generic plans are only considered
on the 6th query execution and are only chosen if the generic cost is
cheaper than the average custom plan cost + fuzz cost for planning.
PG12 gives you a bit more control over that with the plan_cache_mode
GUC, but... that's the not out yet.

However, possibly the cost of planning each execution is cheaper than
doing the seq scan, so you might be better off with this.  There is a
risk that the planner does for some reason choose a generic plan and
ends up doing the seq scan, but for that to happen likely the table
would have to be small, in which case it wouldn't matter or the costs
would have to be off, which might cause you some pain.

The transformation mentioned earlier could only work if the arguments
of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with
Params since the values are unknown to the planner.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply | Threaded
Open this post in threaded view
|

Re: IS NOT DISTINCT FROM statement

Tom Lane-2
David Rowley <[hidden email]> writes:
> On Sat, 9 Mar 2019 at 01:25, Artur Zając <[hidden email]> wrote:
>> CREATE OR REPLACE FUNCTION smarteq(v1 int,v2 INT) RETURNS BOOL AS
>> $BODY$
>> SELECT (CASE WHEN v2 IS NULL THEN (v1 IS NULL) ELSE v1=v2 END);
>> $BODY$ LANGUAGE 'sql' IMMUTABLE PARALLEL SAFE;

> The transformation mentioned earlier could only work if the arguments
> of the IS NOT DISTINCT FROM were Vars or Consts. It couldn't work with
> Params since the values are unknown to the planner.

Just looking at this example, I'm wondering if there'd be any value in
adding a rule to eval_const_expressions that converts IS DISTINCT FROM
with one constant-NULL argument into an IS NOT NULL test on the other
argument.  Doing anything with the general case would be hard, as you
mentioned, but this "workaround" suggests that the OP isn't actually
concerned with the general case.

[ experiments... ] Oh, look at this:

regression=# explain verbose select f1 is distinct from null from int4_tbl;
                          QUERY PLAN                          
---------------------------------------------------------------
 Seq Scan on public.int4_tbl  (cost=0.00..1.05 rows=5 width=1)
   Output: (f1 IS NOT NULL)
(2 rows)

regression=# explain verbose select f1 is not distinct from null from int4_tbl;
                          QUERY PLAN                          
---------------------------------------------------------------
 Seq Scan on public.int4_tbl  (cost=0.00..1.05 rows=5 width=1)
   Output: (f1 IS NULL)
(2 rows)

So somebody already inserted this optimization, but I don't see it
happening in eval_const_expressions ... oh, it's way earlier,
in transformAExprDistinct:

    /*
     * If either input is an undecorated NULL literal, transform to a NullTest
     * on the other input. That's simpler to process than a full DistinctExpr,
     * and it avoids needing to require that the datatype have an = operator.
     */
    if (exprIsNullConstant(rexpr))
        return make_nulltest_from_distinct(pstate, a, lexpr);
    if (exprIsNullConstant(lexpr))
        return make_nulltest_from_distinct(pstate, a, rexpr);

I'm hesitant to call that wrong; the ability to avoid a dependency on an
"=" operator is kind of nice.  But it doesn't help for cases requiring a
Param substitution.

So maybe if we *also* had a check for this in eval_const_expressions,
that would address the OP's problem.  But the use-case would be a bit
narrow given that the parser is catching the simplest case.

                        regards, tom lane