Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to remove a string of text with Regex tool?

TwanD
7 - Meteor

Hi everyone,

 

I have been using 'Text to Column' tool and that has been working fine but I really want to get into using the 'Regex' tool because of its capabilities. Granted I'm a novice at this so a lot of trials and errors to get things to work but for the life of me I can't figure out how to write the expression properly to replace the field with a string of text I want.

 

Original text string:

 

Sum_STATUTORY.O1 Entity Name

 

And I want just this:

 

O1 Entity Name

 

I understand I can use the replace function in Regex tool to accomplish this but I just haven't been able to write out the expression to accomplish it. Any help would be greatly appreciated, thanks!

 

10 REPLIES 10
Kenda
16 - Nebula
16 - Nebula

Hello again @TwanD! I would actually suggest using the Formula tool with an expression like the following:

REGEX_Replace([Field1], ".*\.(.*)","$1")

This will only keep what's after the period from your field. Hope this helps!

TwanD
7 - Meteor

@BarnesK, thank you so much, you're the best! so the Regex formula is similar to what is in the Regex tool?

TwanD
7 - Meteor

Also could you please explain the logic in that regex formula, like what is it doing? thank you.

Kenda
16 - Nebula
16 - Nebula

@TwanD similar, yes. I tend to like to use the Formula tool just because it is what I'm used to using and you can do multiple things at once.

 

If you read Alteryx's description for the REGEX_Replace(String, pattern, replace) formula, it tells you "Returns the string resulting from the RegEx find pattern and replace string." Basically, what it is doing is looking for the 'pattern' you give it within the 'string' and replacing that with your 'replace'. Within the pattern, a period represents a single character and an asterisk means zero or more of that character. If you want an actual period captured, you must precede it with the backslash. Therefore, here, we use ".*\." to indicate everything before and including the period. Additionally, you can section groups using the parentheses. In this case, because we wanted to keep everything after the period, we used (.*). Finally, in the replace portion, we put $1 indicating our first grouped pattern (in other words, whatever is in the first set of parentheses from the pattern). 

 

RegEx can be overwhelming at first, but once you get the hang of it, it is super powerful. Try using this site https://regexone.com/ if you want to learn more about it.

TwanD
7 - Meteor

Ah perfect! Thank you again for your response and help, greatly appreciate it. Saved me some hair pulling haha. Have a great day!

TwanD
7 - Meteor
Spoiler
 

As an add on question, what if I have

 

 

Sum_Statutory.O1 Entity Name.HD

 

and I want to have

O1 EntityName.HD

 

How would that change the expression? I attempted to create it but just not getting the logic

NicoleJohnson
ACE Emeritus
ACE Emeritus

Replacing your RegEx expression in the formula @Kenda provided with this should do the trick:

 

[^.]*\.(.*)

 

First part [^.] looks for any character except a period due to adding the ^ sign at the beginning... followed by the period, and then whatever is in the parentheses will be returned in your results. 

 

Cheers,

NJ

TwanD
7 - Meteor

Thank you so much, that worked! Sorry about the empty Spoiler box, not sure how it got there. Thanks again!

Kenda
16 - Nebula
16 - Nebula

@TwanD Another option that I tend to do in these situations is to use an expression like the one I showed you previously, but adding a question mark like this:

 

REGEX_Replace([Field1], ".*?\.(.*)","$1")

Basically, by adding the question mark here, it says look for the first period and keep everything after that. The previous expression was giving you the last period in the field.

Labels