BUG #16356: Can't perform json operations on casted text in where clause when a join is used. WHERE is too early

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16356: Can't perform json operations on casted text in where clause when a join is used. WHERE is too early

apt.postgresql.org Repository Update
The following bug has been logged on the website:

Bug reference:      16356
Logged by:          Vale Violet Mote
Email address:      [hidden email]
PostgreSQL version: 12.2
Operating system:   Win10
Description:        

Live question: https://stackoverflow.com/questions/61147921
Paste of text:

Postgres (12.2) Setup:

```sql
CREATE TABLE public.test_table (
        id int NOT NULL,
        value_type text NOT NULL,
        value text NOT NULL
);
```
```sql
INSERT INTO public.test_table
(id, value_type, value)
VALUES (1, 'string', 'a'),
(2, 'json', '{"hello":"world"}'),
(3, 'json', '{"color":"blue"}');
```
Initial Queries:

```sql
select value::jsonb as json_value from test_table where value_type =
'json'
```

```
json_value        |
------------------|
{"hello": "world"}|
{"color": "blue"} |
```

But I'm only interested in ones with 'color'.

Moving it to a subquery so that I can get only 'color', also just fine:

```sql
select only_json.json_value
from(
        select value::jsonb as json_value from test_table where value_type =
'json'
) only_json
where only_json.json_value ? 'color' = true
```
```
json_value        |
------------------|
{"color": "blue"} |
```

Now let's break that main table up into two, and suddenly effectively the
same query has trouble:

```sql

CREATE TABLE public.test_table (
    id INT PRIMARY KEY,
    value TEXT NOT NULL
);
CREATE TABLE public.test_types (
    id INT PRIMARY KEY REFERENCES public.test_table (id),
    value_type TEXT NOT NULL
);

INSERT INTO public.test_table
(id, value)
VALUES (1, 'a'),
(2, '{"hello":"world"}'),
(3, '{"color":"blue"}');

insert into public.test_types
(id, value_type)
values (1, 'string'),
(2, 'json'),
(3, 'json');
```

Now this query:
```sql
select id, value from (
select id, value::jsonb from public.test_table natural join
public.test_types
where value_type = 'json') only_json
```

returns, as expected:
```
id|value             |
--|------------------|
 2|{"hello": "world"}|
 3|{"color": "blue"} |
```

But as soon as I attach the where clause, it fails:

```sql
select id, value from (
select id, value::jsonb from public.test_table natural join
public.test_types
where value_type = 'json') only_json
where only_json.value ? 'color' = true
```

```
SQL Error [22P02]: ERROR: invalid input syntax for type json
  Detail: Token "a" is invalid.
  Where: JSON data, line 1: a
```

It's somehow resurrected the value of 'a' that was well-eliminated prior to
this where clause. So what gives? Why does the join cause it to apply the
last where clause (which should happen logically last) too early?  Failed
workarounds I've tried:

 - Using left join instead of natural join.
 - Applying `where value_type = 'json'` to the joined table first, prior to
the join.
 - Moving it to a "with".
 - Creating a view and then applying the where clause to a select from the
view.
 - Creating a column via select called `is_color_holder` with `SELECT
only_json.value ? 'color' as is_color_holder`. This column populates
correctly, but if I use a where clause, `WHERE is_color_holder = true`, I
receive the same error.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16356: Can't perform json operations on casted text in where clause when a join is used. WHERE is too early

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> It's somehow resurrected the value of 'a' that was well-eliminated prior to
> this where clause. So what gives? Why does the join cause it to apply the
> last where clause (which should happen logically last) too early?

The documentation explicitly disclaims any specific evaluation order for
WHERE clauses, cf

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-EXPRESS-EVAL

If you really need to, you can force the matter by putting some type of
optimization fence into the sub-select ("OFFSET 0" is the traditional
way, or you can use WITH ... AS MATERIALIZED in v12 and up).  This is
typically disastrous for performance, of course.  Not being able to
filter rows before joining would make the join much slower.

I'll just note that the sort of EAV schema that you have here is widely
agreed to be an anti-pattern in database design.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #16356: Can't perform json operations on casted text in where clause when a join is used. WHERE is too early

David G Johnston
In reply to this post by apt.postgresql.org Repository Update

On Friday, April 10, 2020, PG Bug reporting form <[hidden email]> wrote:
The following bug has been logged on the website:

Bug reference:      16356
Logged by:          Vale Violet Mote
Email address:      [hidden email]
PostgreSQL version: 12.2
Operating system:   Win10
Description:       

Live question: https://stackoverflow.com/questions/61147921
Paste of text:

But as soon as I attach the where clause, it fails:

```sql
select id, value from (
select id, value::jsonb from public.test_table natural join
public.test_types
where value_type = 'json') only_json
where only_json.value ? 'color' = true
```

```
SQL Error [22P02]: ERROR: invalid input syntax for type json
  Detail: Token "a" is invalid.
  Where: JSON data, line 1: a
```

It's somehow resurrected the value of 'a' that was well-eliminated prior to
this where clause.  So what gives? Why does the join cause it to apply the
last where clause (which should happen logically last) too early?


PostgreSQL is trying to be helpful by re-arranging things to execute in the most efficient way possible.  It cannot adapt those optimizations on the fly in response to data.  Since your model has issues that make this optimization fail you need to make it so the optimization cannot be applied.  Adding “offset 0” to the subquery should accomplish this.

 - Moving it to a "with".

It would have in prior versions.  I think 12 removed the optimization fence that used to be in place here.

The best solution is not to have column content formats vary, though I get that is not always possible.

David J.