BUG #15948: TRGM gin index is not be taken into account when using like all (array)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #15948: TRGM gin index is not be taken into account when using like all (array)

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      15948
Logged by:          James Inform
Email address:      [hidden email]
PostgreSQL version: 11.5
Operating system:   Mac OS X 10.13 / Ubuntu 18.04 LTS
Description:        

Use the following to reproduce the problem:

create extension pg_trgm;
create table mytable(myid int8 primary key, mytext text);
insert into mytable select g,md5(g::text) from generate_series(1,100000)
g;
create index gin1 on mytable using gin(mytext gin_trgm_ops);

-- Now there is the following record existing in the table
-- myid: 7 mytext: 8f14e45fceea167a5a36dedd4bea2543

-- Now try to query all record which contain "36dedd" AND "4e45f"
-- Using LIKE ALL, a seq scan is used which is very expensive
explain analyze
select * from mytable where mytext like all (array['%36dedd%','%4e45f%']);

-- Seq Scan on mytable  (cost=0.00..2185.00 rows=1 width=41) (actual
time=0.012..16.302 rows=1 loops=1)
--   Filter: (mytext ~~ ALL ('{%36dedd%,%4e45f%}'::text[]))
--  Rows Removed by Filter: 99999
-- Planning Time: 0.071 ms
-- Execution Time: 16.312 ms

-- Rewriting the same query with an where clause using the AND operator
works like one would expect from the LIKE ALL
explain analyze
select * from mytable where mytext like '%36dedd%' and mytext like
'%4e45f%';

-- Bitmap Heap Scan on mytable  (cost=22.00..23.02 rows=1 width=41) (actual
time=0.136..0.136 rows=1 loops=1)
--   Recheck Cond: ((mytext ~~ '%36dedd%'::text) AND (mytext ~~
'%4e45f%'::text))
--   Heap Blocks: exact=1
--   ->  Bitmap Index Scan on gin1  (cost=0.00..22.00 rows=1 width=0)
(actual time=0.130..0.130 rows=1 loops=1)
--         Index Cond: ((mytext ~~ '%36dedd%'::text) AND (mytext ~~
'%4e45f%'::text))
-- Planning Time: 0.084 ms
-- Execution Time: 0.155 ms

So, is this a bug or a feature? ;)

Reply | Threaded
Open this post in threaded view
|

Re: BUG #15948: TRGM gin index is not be taken into account when using like all (array)

Tom Lane-2
PG Bug reporting form <[hidden email]> writes:
> -- Now try to query all record which contain "36dedd" AND "4e45f"
> -- Using LIKE ALL, a seq scan is used which is very expensive
> explain analyze
> select * from mytable where mytext like all (array['%36dedd%','%4e45f%']);

This is not a bug.  At most it's an unimplemented feature ...
one I can't get very excited about, considering how little
use there is for such queries.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: BUG #15948: TRGM gin index is not be taken into account when using like all (array)

James Inform


On August 12, 2019 at 1:41 AM Tom Lane <[hidden email]> wrote:


PG Bug reporting form <[hidden email]> writes:
-- Now try to query all record which contain "36dedd" AND "4e45f"
-- Using LIKE ALL, a seq scan is used which is very expensive
explain analyze
select * from mytable where mytext like all (array['%36dedd%','%4e45f%']);

This is not a bug. At most it's an unimplemented feature ...
one I can't get very excited about, considering how little
use there is for such queries.

regards, tom lane

Thanks for the clarification. I just stumbled about this because

select * from mytable where mytext like any (array['%36dedd%','%4e45f%'])

uses the gin index the same way as

select * from mytable where mytext like '%36dedd%' or mytext like '%4e45f%'

does.


Its not a show stopper in any way, but for me a working

select * from mytable where mytext like all (array['%36dedd%','%4e45f%'])

would be more elegant and more readable, because I am dealing with big amounts of free text columns.

Although I use full text search on those columns, I have a need to identify records where different search strings are contained (front and end masked, like '%part%'). This is because the free texts have many compound words, where I want to find words containing specific search strings.


Maybe this could be a good starting point for me for becoming involved. Could someone point me to the relevant source code area where this "LIKE ALL" handling should be implemented?!