Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex_Replace vs. Replace - why Regex_replace is "malfunctioning"? (bug?)

kwieto
8 - Asteroid

I have two fields:
Field 1 contain a single value (like COMPANY NAME 1), 

Field 2 contain list of values, delimited with "|", among which there may be (or not) the value matching Field 1 (so it might be i.e. Company Name 1|Company Name 2|... or Company Name 2, or Company Name 3|Company Name 4, etc.)

 

Values in F1 are all uppercase, values in F2 are not. 


I want to remove all values matching with ones listed in Field 1 (So if Field 2 is Company Name 1|Company Name 2, and Field 1 is COMPANY NAME 2, then I want to have only Company Name 1 in F2

Since Replace Function is case sensitive, I wanted to use Regex_Replace function for that, written in a way: Regex_Replace([F2], [F1], "") so it will remove values (and I handle potentially doubled separators in the next step)

This works only on part of the data - there area values in F1 which are exactly matching (except the case) with F2, but are not removed.
I don't know what is the reason, I looked for any values which are same visually but are considered as different (like space and non-breaking space), any spaces on the end or beginning of values stored in F1, but couldn't find anything.

Surprisingly, if I use Uppercase function on Field 2 and then use standard Replace function - Replace([F1], [F2], "") everything works as expected.
Regex_Replace still doesn't work as expected,

Any idea why standard Replace works and Regex_Replace doesn't? There isn't any multiplication of values stored in F2, every value is appearing exactly once (sometimes F2 contains only the value matching F1 and still it isn't removed by Regex_Replace, while standard Replace works).

I want to keep original formating in F2, so using UPPERCASE to handle that with the standard REPLACE and then reverting remaining values to their original formating is possible, but requires additional workarounds.
And except that I would like to understand why Regex_Replace doesn't work?

5 REPLIES 5
kwieto
8 - Asteroid

OK, seems that I found the issue. Some values listed in F1 are containing parenthesis - i.e. COMPANY NAME (XYZ) 1, and this makes a trouble. Strange, as for example commas and especially dots (also considered as special characters in regex) are not causing the trouble.

Qiu
21 - Polaris
21 - Polaris

@kwieto 
Please share some sample data so we can take a closer look/

kwieto
8 - Asteroid

For example:

Field 1Field 2expected outputoutput given by Regex_Replace
NEW HORIZON LTD.Sunshine Stories GmbH|New Horizon Ltd.Sunshine Stories GmbHSunshine Stories GmbH
FLOWER POWER (UK) Co.Sunshine Stories GmbH|Flower Power (UK) Co.|New Horizon Ltd.Sunshine Stories GmbH|New Horizon Ltd.Sunshine Stories GmbH|Flower Power (UK) Co.|New Horizon Ltd.
TRADE CAPITAL (CANADA)Sunshine Stories GmbH|Flower Power (UK) Co.|New Horizon Ltd.|Trade Capital (Canada)Sunshine Stories GmbH|Flower Power (UK) Co.|New Horizon Ltd.Sunshine Stories GmbH|Flower Power (UK) Co.|New Horizon Ltd.|Trade Capital (Canada)

 

Formula used: Regex_Replace([Field 2], "\|?"+[Field 1], "") or Regex_Replace([Field 2], [Field 1], "") (in this case the orphan delimiter will stay and will need to be removed later)

CoG
14 - Magnetar

You are right on the money. The parenthesis and any other REGEX "globally" reserved characters are going to pose a problem for you. Commas are only special characters in certain contexts in REGEX, the period/dot is not going to cause you issues with under-matching because it matches any character to begin with, but the Parentheses define a named group that you are searching on, which is not what you want, obviously. If you know all of the special characters you expect to see that need to be escaped, you can use a prior formula to prepare [Field 1]:

 

REGEX_Replace([Field 1],"(\(|\)|\.)", "\\$1")

 

Screenshot.png

 

Hope this helps and Happy Solving!

 

kwieto
8 - Asteroid

Thanks, I've found similar way of handling that, but your formula is more elegant than mine. Fortunately there are only parentheses to be handled. 

Labels
Top Solution Authors