BUG #16826: Regex in substring(... from ..) wrong

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

BUG #16826: Regex in substring(... from ..) wrong

PG Bug reporting form
The following bug has been logged on the website:

Bug reference:      16826
Logged by:          James Inform
Email address:      [hidden email]
PostgreSQL version: 13.1
Operating system:   Mac and Ubuntu
Description:        

Hopefully I am not messing up regex syntax, but it seems that handling of
non-greedy is not correct in substring function's regex.

When I set the non-greedy operator ? inside a regex in the substring
function, everything after the ? seems to be also treated as non-greedy,
which is wrong.

Please look at the following examples, the last one shows the issue:

select substring('part1.part2.part3' from '^.*');
-- Result: part1.part2.part3
-- Correct, gets the whole string

select substring('part1.part2.part3' from '^.*\.');
-- Result: part1.part2.
-- Correct, because default mode is greedy, so everything until the second
dot is catched

select substring('part1.part2.part3' from '^.*?\.');
-- Result: part1.
-- Correct, because mode is non-greedy, so everything until the first dot is
catched

select substring('part1.part2.part3' from '^.*\..*');
-- Result: part1.part2.part3
-- Correct, everything is catched

select substring('part1.part2.part3' from '^.*?\..*');
-- Result: part1.
-- Wrong, should catch everything but seems to stay non-greedy after the ?

I have also tested against REL_13_STABLE including commit
49c928c0c067a8ec0882eeea5c03ccbd1b1b1a62, but the issue is the same.

Reply | Threaded
Open this post in threaded view
|

Re: BUG #16826: Regex in substring(... from ..) wrong

David G Johnston
On Friday, January 15, 2021, PG Bug reporting form <[hidden email]> wrote:
The following bug has been logged on the website:

Bug reference:      16826
Logged by:          James Inform
Email address:      [hidden email]
PostgreSQL version: 13.1
Operating system:   Mac and Ubuntu
Description:       

Hopefully I am not messing up regex syntax, but it seems that handling of
non-greedy is not correct in substring function's regex.

When I set the non-greedy operator ? inside a regex in the substring
function, everything after the ? seems to be also treated as non-greedy,
which is wrong.

This seems to behave per the documentation in 9.7.3.5


Implementations of regex do differ so referencing prior experience for correctness has limits.

David J.