using regular expression queries with regular expression indexes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

using regular expression queries with regular expression indexes

Mark Steben-3
Good afternoon,
We are currently running postgres version 12.2. I have a query that uses regexp code to confirm if a text field is a date or not.  It is resource intensive and long running.  I have made a regexp index (actually several flavors of regexp indexes) to aid in this queries performance. However I cannot seem to 'convince' the planner to use the indexes I create. 
 Following is table/index construction for your reference:

 \d customertab
                                                Table "public.customertab"
          Column           |            Type             | Collation | Nullable |                     Default                    
---------------------------+-----------------------------+-----------+----------+-------------------------------------------------
 id                        | bigint                      |           | not null | nextval('customer_attributes_id_seq'::regclass)
 customer_attribute_key_id | integer                     |           |          |
 attributable_id           | integer                     |           |          |
 attributable_type         | character varying(255)      |           |          |
 value                     | character varying(255)      |           |          |
 created_at                | timestamp without time zone |           |          |
 updated_at                | timestamp without time zone |           |          |
 group_num                 | character varying(255)      |           |          |
 customer_transaction_id   | integer                     |           |          |
 account_id                | integer                     |           |          |
Indexes:
    "primets_custattr_pkey" PRIMARY KEY, btree (id)
    "custattr_account_id" btree (account_id)
    "custattr_value" btree (value) WHERE value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text AND value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text AND account_id IS NOT NULL
   
    "custattr_value_acctid_keyid" btree (value, account_id, customer_attribute_key_id) WHERE value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text AND value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text AND account_id IS NOT NULL
    
"custattr_value_includes" btree (value) INCLUDE (id, account_id, group_num) WHERE value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text AND value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text AND account_id IS NOT NULL

The query I am attempting to run:

explain select * FROM customertab ca
                 where ca.value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
                 AND ca.value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
                   AND ca.account_id IS NOT NULL

and the explain:
 Bitmap Heap Scan on customertab ca  (cost=9907.40..903793.74 rows=479041 width=95)
   Recheck Cond: (((value)::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text) AND ((value)::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text) AND (accou
nt_id IS NOT NULL))
   ->  Bitmap Index Scan on custattr_value  (cost=0.00..9787.64 rows=479041 width=0)
(3 rows)


Finally if I remove the 'ca.account_id is not null' expression

  explain select * FROM customertab ca
                 where ca.value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
                 AND ca.value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text  

This is the explain:
 Gather  (cost=1000.00..1641792.30 rows=479041 width=95)
   Workers Planned: 2

   ->  Parallel Seq Scan on customertab ca  (cost=0.00..1592888.20 rows=199600 width=95)
         Filter: (((value)::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text) AND ((value)::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text))
(4 rows)


Is there something else I can try, or am I missing something?  Comments welcome.





   

--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





Reply | Threaded
Open this post in threaded view
|

Re: using regular expression queries with regular expression indexes

Tom Lane-2
Mark Steben <[hidden email]> writes:
> We are currently running postgres version 12.2. I have a query that uses
> regexp code to confirm if a text field is a date or not.  It is resource
> intensive and long running.  I have made a regexp index (actually several
> flavors of regexp indexes) to aid in this queries performance. However I
> cannot seem to 'convince' the planner to use the indexes I create.

I don't understand your complaint.  Your example shows that it *is*
using the index:

>    ->  Bitmap Index Scan on custattr_value  (cost=0.00..9787.64 rows=479041 width=0)

> Finally if I remove the 'ca.account_id is not null' expression
> [ it doesn't use the index ]

Well, of course not, because then the query doesn't match the index's
WHERE condition (i.e., there could be rows the query needs to return that
are not included in the index).  But I imagine that restriction isn't very
selective, so you could probably omit it from the index definition as
well; then the index would work for queries with or without that
restriction.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: using regular expression queries with regular expression indexes

Thomas Kellerer-4
In reply to this post by Mark Steben-3
Mark Steben schrieb am 25.06.2020 um 18:45:
> I have a query that uses regexp code to confirm if a text field is a date or not.

Why don't you define it as a date column? (e.g. so that leap years are corretly dealt with)

But if you can't do that, what about a generated column that uses that regex to populate a boolean flag, e.g. is_valid_date?

Then the expensive regex evaluation would only be done when you change the value, not everytime you query it.

Thomas