Getting unexpected results from regexp_replace

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting unexpected results from regexp_replace

Dan Nessett
I freely admit this may be my problem. Writing regular expression patterns is more an art than a skill. However, I am getting an unexpected result from regex_replace().

I have a table that is partially defined as follows (names and email addresses hidden for privacy):

user_name user_email
“AAAAA"
“BBBBB” “bbbbb(xxxxx)"
“CCCCC” "ccccct(home)"
“DDDDD"
“EEEEE" "aeeeee(home)”

The second entry is an email address - bbbbb - followed by the name of an individual (xxxxx) in parentheses. The email address for CCCCC and EEEEE have the word “home” in parentheses appended to the email address.

I want to delete the parenthetical expression including the parentheses for all email addresses. I also have a column (not shown) called email_list that contains a comma separated list of all email addresses associated with each name or NULL if there is no list. I create a table:

CREATE TABLE "households_with_email" AS
SELECT user_name, regexp_replace(user_email, E'\(.*\)', '') AS user_email, family_list, street_address, city, state, zip, phone_list, email_list
FROM "household_data"
WHERE email_list != ‘';

I expected the regex_replace to the parenthetical text with the null character. Instead, it replaces the whole string in user_email with the null string:

user_name user_email
"Rodriguez” ""
"Armstrong" ""
"Bauer" ""
"Berst" ""
"Berst” ""

I realize there may be some characteristic such as greedy matching that is causing this result, but if so, I don’t see how. The pattern indicates first find the ‘(‘ character, then match all characters until a ‘)’ character arrives. Those characters, including the parentheses should then be replaced with the null string.

Or am I misinterpreting the pattern?

Dan

Reply | Threaded
Open this post in threaded view
|

Re: Getting unexpected results from regexp_replace

Tom Lane-2
Dan Nessett <[hidden email]> writes:
> SELECT user_name, regexp_replace(user_email, E'\(.*\)', '') AS user_email, family_list, street_address, city, state, zip, phone_list, email_list
> FROM "household_data"
> WHERE email_list != ‘';

Because you used E'...', the backslashes are eaten by the string literal
parser.  So the pattern seen by regexp_replace() is just  '(.*)', in
which the parens are capturing parens not literal characters.  Thus it
matches the whole string.

Personally I'd leave off the E, but if you must use it then double the
backslashes.

                        regards, tom lane


Reply | Threaded
Open this post in threaded view
|

Re: Getting unexpected results from regexp_replace

Dan Nessett
Thanks. Doubling the backslashes did the trick. I tried to use the original expression without the E, but postgres threw an error and said to use the “E” version of the pattern.

Dan

> On Feb 21, 2021, at 8:50 AM, Tom Lane <[hidden email]> wrote:
>
> Dan Nessett <[hidden email]> writes:
>> SELECT user_name, regexp_replace(user_email, E'\(.*\)', '') AS user_email, family_list, street_address, city, state, zip, phone_list, email_list
>> FROM "household_data"
>> WHERE email_list != ‘';
>
> Because you used E'...', the backslashes are eaten by the string literal
> parser.  So the pattern seen by regexp_replace() is just  '(.*)', in
> which the parens are capturing parens not literal characters.  Thus it
> matches the whole string.
>
> Personally I'd leave off the E, but if you must use it then double the
> backslashes.
>
> regards, tom lane
>
>