websearch_to_tsquery() and apostrophe inside double quotes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

websearch_to_tsquery() and apostrophe inside double quotes

Alastair McKinley
Hi all,

I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.

Here is an example of searching for a name containing an apostrophe.

The following works as expected:

select to_tsvector('peter o''toole') @@ websearch_to_tsquery('peter o''toole');
 ?column?
----------
 t
(1 row)


When the name is in double quotes, the search fails:

select to_tsvector('peter o''toole') @@ websearch_to_tsquery('"peter o''toole"');
 ?column?
----------
 f
(1 row)

In the first case, websearch_to_tsquery() returns:

select websearch_to_tsquery('peter o''toole');
  websearch_to_tsquery  
------------------------
 'peter' & 'o' & 'tool'
(1 row)

which makes sense to me.

In the second case websearch_to_tsquery() returns something that I can't quite understand:

select websearch_to_tsquery('"peter o''toole"');
     websearch_to_tsquery    
------------------------------
 'peter' <-> ( 'o' & 'tool' )
(1 row)

I am not quite sure what text this will actually match?

Best regards,

Alastair


Reply | Threaded
Open this post in threaded view
|

Re: websearch_to_tsquery() and apostrophe inside double quotes

Tom Lane-2
Alastair McKinley <[hidden email]> writes:
> I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.
> ...

> select websearch_to_tsquery('"peter o''toole"');
>      websearch_to_tsquery
> ------------------------------
>  'peter' <-> ( 'o' & 'tool' )
> (1 row)

> I am not quite sure what text this will actually match?

I believe it's impossible for that to match anything :-(.
It would require 'o' and 'tool' to match the same lexeme
(one immediately after a 'peter') which of course is impossible.

The underlying tsvector type seems to treat the apostrophe the
same as whitespace; it separates 'o' and 'toole' into
distinct words:

# select to_tsvector('peter o''toole');
       to_tsvector        
--------------------------
 'o':2 'peter':1 'tool':3
(1 row)

So it seems to me that this is a bug: websearch_to_tsquery
should also treat "'" like whitespace.  There's certainly
not anything in its documentation that suggests it should
treat "'" specially.  If it didn't, you'd get

# select websearch_to_tsquery('"peter o toole"');
    websearch_to_tsquery    
----------------------------
 'peter' <-> 'o' <-> 'tool'
(1 row)

which would match this tsvector.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: websearch_to_tsquery() and apostrophe inside double quotes

Alastair McKinley
Hi Tom,

Thank you for looking at this.  You are right I couldn't find anything in the docs that would explain this.

I can't think of any rationale for producing a query like this so it does look like a bug.

Best regards,

Alastair





From: Tom Lane <[hidden email]>
Sent: 10 October 2019 14:35
To: Alastair McKinley <[hidden email]>
Cc: [hidden email] <[hidden email]>; [hidden email] <[hidden email]>
Subject: Re: websearch_to_tsquery() and apostrophe inside double quotes
 
Alastair McKinley <[hidden email]> writes:
> I am a little confused about what us being generated by websearch_to_tsquery() in the case of an apostrophe inside double quotes.
> ...

> select websearch_to_tsquery('"peter o''toole"');
>      websearch_to_tsquery
> ------------------------------
>  'peter' <-> ( 'o' & 'tool' )
> (1 row)

> I am not quite sure what text this will actually match?

I believe it's impossible for that to match anything :-(.
It would require 'o' and 'tool' to match the same lexeme
(one immediately after a 'peter') which of course is impossible.

The underlying tsvector type seems to treat the apostrophe the
same as whitespace; it separates 'o' and 'toole' into
distinct words:

# select to_tsvector('peter o''toole');
       to_tsvector       
--------------------------
 'o':2 'peter':1 'tool':3
(1 row)

So it seems to me that this is a bug: websearch_to_tsquery
should also treat "'" like whitespace.  There's certainly
not anything in its documentation that suggests it should
treat "'" specially.  If it didn't, you'd get

# select websearch_to_tsquery('"peter o toole"');
    websearch_to_tsquery   
----------------------------
 'peter' <-> 'o' <-> 'tool'
(1 row)

which would match this tsvector.

                        regards, tom lane