Postgres Full Text Search Jsonb Array column does not search for first row

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Postgres Full Text Search Jsonb Array column does not search for first row

Dmytro Zhluktenko

Hey,

Pgsql is unable to perform indexed full text search onto jsonb column containing an array when looking for the first row in the table.

Any ideas why this is happening?

Table is nothing more than just Id and Foo which is jsonb column.

The case is that I have an empty database with predefined gin index repro_fts_idx on make_tsvector function. make_tsvector creates tsvector from given jsonb column.

When I add a new item into the table, I expect it to appear in make_tsvector function in a form of tsvector. It's there. Also, I expect that if I run full text search query onto it, it would appear in search results. However, this is not the case because it returns empty specifically for the first row. It simply does not take it into account. If I add one more row which is completely the same, the system is able to find it with the same query.

here is a small repro case:

 
-- drop table cp."Repro" cascade
 
CREATE TABLE cp."Repro" (
    "Id" serial NOT NULL,
    "Foo" jsonb NULL
);
 
CREATE OR REPLACE FUNCTION cp.make_tsvector(in_t cp."Repro")
 RETURNS tsvector
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$ begin    
                        return to_tsvector(jsonb_agg(x.prop))
                        from (SELECT CONCAT( jsonb_array_elements(in_t."Foo") ->> 'Name', ' ', jsonb_array_elements(in_t."Foo") ->> 'Address' ) as prop from cp."Repro" f) as x;
                        END;
                        $function$
;
 
 
CREATE INDEX repro_fts_idx ON cp."Repro" USING gin (cp.make_tsvector(cp."Repro".*)) WITH (fastupdate=off, gin_pending_list_limit='64');
 
 
INSERT INTO cp."Repro"
("Foo")
VALUES('[{"Name": "Sup", "Address": "Adress", "IsCurrent": true}]');
 
-- just in case it's the indexing issue
-- REINDEX INDEX cp.repro_fts_idx;
 
select * from cp."Repro"
 
select cp.make_tsvector(x) from cp."Repro" x
 
select * from ts_stat('select cp.make_tsvector(x) from cp."Repro" x')
 
-- explain analyze
SELECT *
 FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'sup:*'::tsquery
 
 
 INSERT INTO cp."Repro"
("Foo")
VALUES('[{"Name": "Sup", "Address": "Adress", "IsCurrent": true}]');
 
 
-- explain analyze
SELECT *
 FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'sup:*'::tsquery

 

 

BR, Dmytro.

 

Reply | Threaded
Open this post in threaded view
|

Re: Postgres Full Text Search Jsonb Array column does not search for first row

Laurenz Albe
On Tue, 2019-11-26 at 13:37 +0200, Dmytro Zhluktenko wrote:

> Pgsql is unable to perform indexed full text search onto jsonb column containing an array when looking for the first row in the table.
>
> Any ideas why this is happening?
>  
> CREATE OR REPLACE FUNCTION cp.make_tsvector(in_t cp."Repro")
>  RETURNS tsvector
>  LANGUAGE plpgsql
>  IMMUTABLE
>
> [...]
>  
> CREATE INDEX repro_fts_idx ON cp."Repro" USING gin (cp.make_tsvector(cp."Repro".*)) WITH (fastupdate=off, gin_pending_list_limit='64');
>  
> [...]
>
> -- explain analyze
> SELECT *
>  FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'sup:*'::tsquery

One possibility is that there ar just too few rows in the table.

SET enable_seqscan = off;

and then try again.

If that is not the problm, please provide EXPLAIN (ANALYZE, BUFFERS) output for
the query.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

RE: Re: Postgres Full Text Search Jsonb Array column does not search for first row

Dmytro Zhluktenko
In reply to this post by Dmytro Zhluktenko

Hello, thanks for helping!


explain (analyze, BUFFERS)

SELECT *

FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'fir:*'::tsquery



outputs this query plan:

Bitmap Heap Scan on "Repro" x  (cost=12.00..16.26 rows=1 width=72) (actual time=0.007..0.007 rows=0 loops=1)

  Recheck Cond: (cp.make_tsvector(x.*) @@ '''fir'':*'::tsquery)

  Buffers: shared hit=2

  ->  Bitmap Index Scan on repro_fts_idx  (cost=0.00..12.00 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)

        Index Cond: (cp.make_tsvector(x.*) @@ '''fir'':*'::tsquery)

        Buffers: shared hit=2

Planning Time: 0.070 ms

Execution Time: 0.040 ms

 

 

Query runs fine if uses seq scan. Seq Scan is not desired here since this is the query that should run on huge amounts of data and it should find the first element.

Obviously, if seq_scan is off, then query still does the same result.


Also, if you add 100000 more entries, it will still fail to find the first one using index.

 

BR, Dmytro.

 

From: [hidden email]
Sent: 26 листопада 2019 р. 21:13
To: [hidden email]; [hidden email]
Subject: Re: Postgres Full Text Search Jsonb Array column does not search for first row

 

On Tue, 2019-11-26 at 13:37 +0200, Dmytro Zhluktenko wrote:

> Pgsql is unable to perform indexed full text search onto jsonb column containing an array when looking for the first row in the table.

>

> Any ideas why this is happening?

> CREATE OR REPLACE FUNCTION cp.make_tsvector(in_t cp."Repro")

>  RETURNS tsvector

>  LANGUAGE plpgsql

>  IMMUTABLE

>

> [...]

> CREATE INDEX repro_fts_idx ON cp."Repro" USING gin (cp.make_tsvector(cp."Repro".*)) WITH (fastupdate=off, gin_pending_list_limit='64');

> [...]

> 

> -- explain analyze

> SELECT *

>  FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'sup:*'::tsquery

 

One possibility is that there ar just too few rows in the table.

 

SET enable_seqscan = off;

 

and then try again.

 

If that is not the problm, please provide EXPLAIN (ANALYZE, BUFFERS) output for

the query.

 

Yours,

Laurenz Albe

--

Cybertec | https://www.cybertec-postgresql.com

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Re: Postgres Full Text Search Jsonb Array column does not search for first row

Laurenz Albe
On Wed, 2019-11-27 at 11:54 +0200, Dmytro Zhluktenko wrote:

> explain (analyze, BUFFERS)
> SELECT *
> FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'fir:*'::tsquery
>
> outputs this query plan:
> Bitmap Heap Scan on "Repro" x  (cost=12.00..16.26 rows=1 width=72) (actual time=0.007..0.007 rows=0 loops=1)
>   Recheck Cond: (cp.make_tsvector(x.*) @@ '''fir'':*'::tsquery)
>   Buffers: shared hit=2
>   ->  Bitmap Index Scan on repro_fts_idx  (cost=0.00..12.00 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
>         Index Cond: (cp.make_tsvector(x.*) @@ '''fir'':*'::tsquery)
>         Buffers: shared hit=2
> Planning Time: 0.070 ms
> Execution Time: 0.040 ms
>
> Query runs fine if uses seq scan. Seq Scan is not desired here since this is the query that should run on huge amounts of data and it should find the first element.
> Obviously, if seq_scan is off, then query still does the same result.
>
> Also, if you add 100000 more entries, it will still fail to find the first one using index.

I cannot quite follow.

We have seen that the query can use the index by setting "enable_seqscan = off",
but that PostgreSQL prefers to use a sequential scan because the table is small.

If the table were bigger, PostgreSQL would prefer the index scan.

Are your concerns hypothetical or real?
If real, can you show EXPLAIN (ANALYZE, BUFFERS) output of a query
execution where PostgreSQL chooses a sequential scan, but you think
it shouldn't?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Reply | Threaded
Open this post in threaded view
|

RE: Re: Re: Postgres Full Text Search Jsonb Array column does not search for first row

Dmytro Zhluktenko
In reply to this post by Dmytro Zhluktenko

The issue was solved with the approach suggested here.

https://stackoverflow.com/questions/59049873/postgres-full-text-search-jsonb-array-column-does-not-show-first-row

 

BR, Dmytro.

 

From: [hidden email]
Sent: 28 листопада 2019 р. 16:39
To: [hidden email]; [hidden email]
Subject: Re: Re: Postgres Full Text Search Jsonb Array column does not search for first row

 

On Wed, 2019-11-27 at 11:54 +0200, Dmytro Zhluktenko wrote:

> explain (analyze, BUFFERS)

> SELECT *

> FROM "cp"."Repro" x where cp.make_tsvector(x) @@ 'fir:*'::tsquery

>

> outputs this query plan:

> Bitmap Heap Scan on "Repro" x  (cost=12.00..16.26 rows=1 width=72) (actual time=0.007.0.007 rows=0 loops=1)

>   Recheck Cond: (cp.make_tsvector(x.*) @@ '''fir'':*'::tsquery)

>   Buffers: shared hit=2

>   ->  Bitmap Index Scan on repro_fts_idx  (cost=0.00..12.00 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)

>         Index Cond: (cp.make_tsvector(x.*) @@ '''fir'':*'::tsquery)

>         Buffers: shared hit=2

> Planning Time: 0.070 ms

> Execution Time: 0.040 ms

>

> Query runs fine if uses seq scan. Seq Scan is not desired here since this is the query that should run on huge amounts of data and it should find the first element.

> Obviously, if seq_scan is off, then query still does the same result.

>

> Also, if you add 100000 more entries, it will still fail to find the first one using index.

 

I cannot quite follow.

 

We have seen that the query can use the index by setting "enable_seqscan = off",

but that PostgreSQL prefers to use a sequential scan because the table is small.

 

If the table were bigger, PostgreSQL would prefer the index scan.

 

Are your concerns hypothetical or real?

If real, can you show EXPLAIN (ANALYZE, BUFFERS) output of a query

execution where PostgreSQL chooses a sequential scan, but you think

it shouldn't?

 

Yours,

Laurenz Albe

--

Cybertec | https://www.cybertec-postgresql.com