BUG #16251: ::text type casting of a constant breaks query performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16251: ::text type casting of a constant breaks query performance

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16251
Logged by:          Dima Pavlov
Email address:      [hidden email]
PostgreSQL version: 12.1
Operating system:   Windows 10
Description:        

Test table and indexes (PostgreSQL 12.1):

CREATE TABLE t (dt timestamp with time zone);
CREATE INDEX ind ON t USING btree (dt);

INSERT
INTO t(dt)
    SELECT
    (
        timestamp '2020-01-01 00:00:00' +
        random() * (
            timestamp '2020-02-29 00:00:00' -
            timestamp '2020-01-01 00:00:00'
        )
    )
    FROM generate_series(1, 10000)

-------------------------------------

In the first query, everything is ok, appropriate index "ind" is used:

explain (analyze, buffers)
SELECT *
FROM t
WHERE
   ('2020-02-08')::date IS NULL
   OR
   dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit  (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
loops=1)"
"  Buffers: shared hit=3"
"  ->  Index Only Scan using ind on t  (cost=0.29..303.75 rows=3627 width=8)
(actual time=0.184..0.184 rows=1 loops=1)"
"        Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
zone)"
"        Heap Fetches: 1"
"        Buffers: shared hit=3"
"Planning Time: 2.365 ms"
"Execution Time: 0.239 ms"

-----------------------------------------------

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

explain (analyze, buffers)
SELECT *
FROM t
WHERE
   ('2020-02-08'::text)::date IS NULL
   OR
   dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit  (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307 rows=1
loops=1)"
"  Buffers: shared hit=6232"
"  ->  Index Only Scan using ind on t  (cost=0.29..561.28 rows=3658 width=8)
(actual time=45.304..45.304 rows=1 loops=1)"
"        Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
'2020-02-08 00:00:00+05'::timestamp with time zone))"
"        Rows Removed by Filter: 6367"
"        Heap Fetches: 6368"
"        Buffers: shared hit=6232"
"Planning Time: 0.348 ms"
"Execution Time: 45.343 ms"

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16251: ::text type casting of a constant breaks query performance

Pavel Stehule


so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <[hidden email]> napsal:
The following bug has been logged on the website:

Bug reference:      16251
Logged by:          Dima Pavlov
Email address:      [hidden email]
PostgreSQL version: 12.1
Operating system:   Windows 10
Description:       

Test table and indexes (PostgreSQL 12.1):

CREATE TABLE t (dt timestamp with time zone);
CREATE INDEX ind ON t USING btree (dt);

INSERT
INTO t(dt)
    SELECT
    (
        timestamp '2020-01-01 00:00:00' +
        random() * (
            timestamp '2020-02-29 00:00:00' -
            timestamp '2020-01-01 00:00:00'
        )
    )
    FROM generate_series(1, 10000)

-------------------------------------

In the first query, everything is ok, appropriate index "ind" is used:

explain (analyze, buffers)
SELECT *
FROM t
WHERE
   ('2020-02-08')::date IS NULL
   OR
   dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit  (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
loops=1)"
"  Buffers: shared hit=3"
"  ->  Index Only Scan using ind on t  (cost=0.29..303.75 rows=3627 width=8)
(actual time=0.184..0.184 rows=1 loops=1)"
"        Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
zone)"
"        Heap Fetches: 1"
"        Buffers: shared hit=3"
"Planning Time: 2.365 ms"
"Execution Time: 0.239 ms"

-----------------------------------------------

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

explain (analyze, buffers)
SELECT *
FROM t
WHERE
   ('2020-02-08'::text)::date IS NULL
   OR
   dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit  (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307 rows=1
loops=1)"
"  Buffers: shared hit=6232"
"  ->  Index Only Scan using ind on t  (cost=0.29..561.28 rows=3658 width=8)
(actual time=45.304..45.304 rows=1 loops=1)"
"        Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
'2020-02-08 00:00:00+05'::timestamp with time zone))"
"        Rows Removed by Filter: 6367"
"        Heap Fetches: 6368"
"        Buffers: shared hit=6232"
"Planning Time: 0.348 ms"
"Execution Time: 45.343 ms"

This is not a bug, but just feature.

The '2020-02-08' is not text type - it is 'unknown' type - and then is just directly transformed to date. I think so cast from text to date is not maybe immutable, and it can stops some optimizations.


postgres=# explain analyze select * from foo where '2020-02-01'::date is null or a > '2020-01-01';
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      QUERY PLAN                                                      │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Index Only Scan using foo_a_idx on foo  (cost=0.42..9.12 rows=40 width=8) (actual time=0.083..0.125 rows=37 loops=1) │
│   Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)                                               │
│   Heap Fetches: 37                                                                                                   │
│ Planning Time: 0.279 ms                                                                                              │
│ Execution Time: 0.189 ms                                                                                             │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

postgres=# explain analyze select * from foo where '2020-02-01'::text::date is null or a > '2020-01-01';
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                   QUERY PLAN                                                    │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Seq Scan on foo  (cost=0.00..2193.00 rows=540 width=8) (actual time=0.050..71.671 rows=37 loops=1)              │
│   Filter: ((('2020-02-01'::cstring)::date IS NULL) OR (a > '2020-01-01 00:00:00+01'::timestamp with time zone)) │
│   Rows Removed by Filter: 99963                                                                                 │
│ Planning Time: 0.299 ms                                                                                         │
│ Execution Time: 71.714 ms                                                                                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)


The problem is in casting from Date to Timestamptz - the related function "timestamptz" is marked as "stable" - and then probably the all expression is marked as "stable", what block to use this expression as index condition.

I don't know the context, but the expression "'2020-02-01'::date is null or a > '2020-01-01'" has not too much sense, so just don't do this. Postgres is not too smart and don't try to reduce some useless part of expressions.

Or if you need it, then use UNION and separate this expression to two independent expressions

postgres=# explain analyze select * from foo where '2020-02-01'::text::date is null union select * from foo where  a > '2020-01-01';
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                               QUERY PLAN                                                               │
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ HashAggregate  (cost=3202.82..4203.22 rows=100040 width=8) (actual time=0.175..1.635 rows=37 loops=1)                                  │
│   Group Key: foo.a                                                                                                                     │
│   ->  Append  (cost=0.01..2952.72 rows=100040 width=8) (actual time=0.061..0.120 rows=37 loops=1)                                      │
│         ->  Result  (cost=0.01..1443.01 rows=100000 width=8) (actual time=0.016..0.017 rows=0 loops=1)                                 │
│               One-Time Filter: (('2020-02-01'::cstring)::date IS NULL)                                                                 │
│               ->  Seq Scan on foo  (cost=0.01..1443.01 rows=100000 width=8) (never executed)                                           │
│         ->  Index Only Scan using foo_a_idx on foo foo_1  (cost=0.42..9.12 rows=40 width=8) (actual time=0.043..0.090 rows=37 loops=1) │
│               Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)                                                     │
│               Heap Fetches: 37                                                                                                         │
│ Planning Time: 0.437 ms                                                                                                                │
│ Execution Time: 6.690 ms                                                                                                               │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)



Regards

Pavel
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16251: ::text type casting of a constant breaks query performance

Pavel Stehule


so 8. 2. 2020 v 8:09 odesílatel Pavel Stehule <[hidden email]> napsal:


so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <[hidden email]> napsal:
The following bug has been logged on the website:

Bug reference:      16251
Logged by:          Dima Pavlov
Email address:      [hidden email]
PostgreSQL version: 12.1
Operating system:   Windows 10
Description:       

Test table and indexes (PostgreSQL 12.1):

CREATE TABLE t (dt timestamp with time zone);
CREATE INDEX ind ON t USING btree (dt);

INSERT
INTO t(dt)
    SELECT
    (
        timestamp '2020-01-01 00:00:00' +
        random() * (
            timestamp '2020-02-29 00:00:00' -
            timestamp '2020-01-01 00:00:00'
        )
    )
    FROM generate_series(1, 10000)

-------------------------------------

In the first query, everything is ok, appropriate index "ind" is used:

explain (analyze, buffers)
SELECT *
FROM t
WHERE
   ('2020-02-08')::date IS NULL
   OR
   dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit  (cost=0.29..0.37 rows=1 width=8) (actual time=0.186..0.188 rows=1
loops=1)"
"  Buffers: shared hit=3"
"  ->  Index Only Scan using ind on t  (cost=0.29..303.75 rows=3627 width=8)
(actual time=0.184..0.184 rows=1 loops=1)"
"        Index Cond: (dt > '2020-02-08 00:00:00+05'::timestamp with time
zone)"
"        Heap Fetches: 1"
"        Buffers: shared hit=3"
"Planning Time: 2.365 ms"
"Execution Time: 0.239 ms"

-----------------------------------------------

With '::text' type casting of '2020-02-08' (which is already text) query
permofance is very low

explain (analyze, buffers)
SELECT *
FROM t
WHERE
   ('2020-02-08'::text)::date IS NULL
   OR
   dt > '2020-02-08'
ORDER BY dt
LIMIT 1

"Limit  (cost=0.29..0.44 rows=1 width=8) (actual time=45.306..45.307 rows=1
loops=1)"
"  Buffers: shared hit=6232"
"  ->  Index Only Scan using ind on t  (cost=0.29..561.28 rows=3658 width=8)
(actual time=45.304..45.304 rows=1 loops=1)"
"        Filter: ((('2020-02-08'::cstring)::date IS NULL) OR (dt >
'2020-02-08 00:00:00+05'::timestamp with time zone))"
"        Rows Removed by Filter: 6367"
"        Heap Fetches: 6368"
"        Buffers: shared hit=6232"
"Planning Time: 0.348 ms"
"Execution Time: 45.343 ms"

This is not a bug, but just feature.

The '2020-02-08' is not text type - it is 'unknown' type - and then is just directly transformed to date. I think so cast from text to date is not maybe immutable, and it can stops some optimizations.


postgres=# explain analyze select * from foo where '2020-02-01'::date is null or a > '2020-01-01';
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      QUERY PLAN                                                      │
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Index Only Scan using foo_a_idx on foo  (cost=0.42..9.12 rows=40 width=8) (actual time=0.083..0.125 rows=37 loops=1) │
│   Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)                                               │
│   Heap Fetches: 37                                                                                                   │
│ Planning Time: 0.279 ms                                                                                              │
│ Execution Time: 0.189 ms                                                                                             │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

postgres=# explain analyze select * from foo where '2020-02-01'::text::date is null or a > '2020-01-01';
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                   QUERY PLAN                                                    │
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Seq Scan on foo  (cost=0.00..2193.00 rows=540 width=8) (actual time=0.050..71.671 rows=37 loops=1)              │
│   Filter: ((('2020-02-01'::cstring)::date IS NULL) OR (a > '2020-01-01 00:00:00+01'::timestamp with time zone)) │
│   Rows Removed by Filter: 99963                                                                                 │
│ Planning Time: 0.299 ms                                                                                         │
│ Execution Time: 71.714 ms                                                                                       │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)


The problem is in casting from Date to Timestamptz - the related function "timestamptz" is marked as "stable" - and then probably the all expression is marked as "stable", what block to use this expression as index condition.

I was wrong - problem is in probably cast from text to date.


I don't know the context, but the expression "'2020-02-01'::date is null or a > '2020-01-01'" has not too much sense, so just don't do this. Postgres is not too smart and don't try to reduce some useless part of expressions.

More times Postgres is more sensitive on data types due special rules of casting.


Or if you need it, then use UNION and separate this expression to two independent expressions

postgres=# explain analyze select * from foo where '2020-02-01'::text::date is null union select * from foo where  a > '2020-01-01';
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                               QUERY PLAN                                                               │
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ HashAggregate  (cost=3202.82..4203.22 rows=100040 width=8) (actual time=0.175..1.635 rows=37 loops=1)                                  │
│   Group Key: foo.a                                                                                                                     │
│   ->  Append  (cost=0.01..2952.72 rows=100040 width=8) (actual time=0.061..0.120 rows=37 loops=1)                                      │
│         ->  Result  (cost=0.01..1443.01 rows=100000 width=8) (actual time=0.016..0.017 rows=0 loops=1)                                 │
│               One-Time Filter: (('2020-02-01'::cstring)::date IS NULL)                                                                 │
│               ->  Seq Scan on foo  (cost=0.01..1443.01 rows=100000 width=8) (never executed)                                           │
│         ->  Index Only Scan using foo_a_idx on foo foo_1  (cost=0.42..9.12 rows=40 width=8) (actual time=0.043..0.090 rows=37 loops=1) │
│               Index Cond: (a > '2020-01-01 00:00:00+01'::timestamp with time zone)                                                     │
│               Heap Fetches: 37                                                                                                         │
│ Planning Time: 0.437 ms                                                                                                                │
│ Execution Time: 6.690 ms                                                                                                               │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)



Regards

Pavel
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16251: ::text type casting of a constant breaks query performance

Tom Lane-2
In reply to this post by Pavel Stehule
Pavel Stehule <[hidden email]> writes:
> so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
> [hidden email]> napsal:
>> With '::text' type casting of '2020-02-08' (which is already text) query
>> permofance is very low

> The '2020-02-08' is not text type - it is 'unknown' type - and then is just
> directly transformed to date. I think so cast from text to date is not
> maybe immutable, and it can stops some optimizations.

Yeah.  There actually isn't any cast from text to date, if you look into
pg_cast.  So "('2020-02-08'::text)::date" is implemented as a text Const
that's fed through a CoerceViaIO node that applies date_in(), and
date_in() is only stable not immutable.  (That must be so because its
behavior depends on the DateStyle setting, and maybe TimeZone too; not
sure about the latter but definitely the former.)  So the planner is
unable to reduce the IS NULL test to constant-false and thereby get
rid of the OR, and that means it can't usefully apply the index.

If you can't rearrange things so that the IS NULL argument is seen
as a constant, the UNION trick that Pavel mentioned might be a useful
workaround.  But I'm inclined to think that you need to take two steps
back and figure out whether this query logic is really sane or not.
You do realize that the query is asking to retrieve the entire table,
if whatever-it-is is NULL?  Why would that be what you want?

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16251: ::text type casting of a constant breaks query performance

Pavel Stehule


so 8. 2. 2020 v 17:49 odesílatel Tom Lane <[hidden email]> napsal:
Pavel Stehule <[hidden email]> writes:
> so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
> [hidden email]> napsal:
>> With '::text' type casting of '2020-02-08' (which is already text) query
>> permofance is very low

> The '2020-02-08' is not text type - it is 'unknown' type - and then is just
> directly transformed to date. I think so cast from text to date is not
> maybe immutable, and it can stops some optimizations.

Yeah.  There actually isn't any cast from text to date, if you look into
pg_cast.  So "('2020-02-08'::text)::date" is implemented as a text Const
that's fed through a CoerceViaIO node that applies date_in(), and
date_in() is only stable not immutable.  (That must be so because its
behavior depends on the DateStyle setting, and maybe TimeZone too; not
sure about the latter but definitely the former.)  So the planner is
unable to reduce the IS NULL test to constant-false and thereby get
rid of the OR, and that means it can't usefully apply the index.

If you can't rearrange things so that the IS NULL argument is seen
as a constant, the UNION trick that Pavel mentioned might be a useful
workaround.  But I'm inclined to think that you need to take two steps
back and figure out whether this query logic is really sane or not.
You do realize that the query is asking to retrieve the entire table,
if whatever-it-is is NULL?  Why would that be what you want?

If I remember well, this technique was a trick to use one query for variables that can be (or should not be) specified by user.

I can has a variable $ID. If user specifies this variable, it has some number, else it has NULL.

When you want to use one query for both possibilities (static query), then you can write

SELECT * FROM tab WHERE ($ID is NULL OR id = $ID)

We used this technique 20 years ago, and I think it was very popular, but databases was significantly smaller, and only few people had good knowledge of SQL databases.



                        regards, tom lane
Reply | Threaded
Open this post in threaded view
|

Re: BUG #16251: ::text type casting of a constant breaks query performance

Jeff Janes
In reply to this post by Tom Lane-2
On Sat, Feb 8, 2020 at 11:49 AM Tom Lane <[hidden email]> wrote:
Pavel Stehule <[hidden email]> writes:
> so 8. 2. 2020 v 7:44 odesílatel PG Bug reporting form <
> [hidden email]> napsal:
 
If you can't rearrange things so that the IS NULL argument is seen
as a constant, the UNION trick that Pavel mentioned might be a useful
workaround.  But I'm inclined to think that you need to take two steps
back and figure out whether this query logic is really sane or not.
You do realize that the query is asking to retrieve the entire table,
if whatever-it-is is NULL?  Why would that be what you want?

It is a pretty common tactic to do this.  It much easier on the client side to bind NULL to a parameter when you don't care, rather than dynamically rewrite the query text to remove that condition from it.  Of course that whole thing is likely to be ANDed together with other clauses in an unsimplified real-world example.

Cheers,

Jeff