Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

String Parsing within a formula field

usnascimc
5 - Atom

Hi Everyone,

 

I'm looking for a way to parse a particular string within a formula field. Please see the examples below: 

 

if(left(ZXY_IBTI_TREND_N[User Status],4)="CLAA", "Yes","No")

 

CONCATENATE(ZXY_IB4[Deadline to Implement Year],ZBE_IB4[Deadline to implement Month])

 

if(ZXV_AUTHWF[Case End Date Time]<ZCV_AUTHWF[SLA Target End Date Time],"On time","Delay")

 

It always starts with either  "ZXY_" or "ZXV_", it is always preceded by  "("  and followed by "[". 

 

Unfortunately, I have been trying to use the REGEX function without success. 

 

Thanks in advance for your help.

Claudio

4 REPLIES 4
Qiu
21 - Polaris
21 - Polaris

@usnascimc 
Can you give some sample data for input and output? 😁

jbichachi003
9 - Comet

Your post is a bit confusing--are the formulas stored as data within a field? If so, are you only trying to extract what you highlighted in blue?

 

If the answer to both of my questions is "yes", you can use the the RegEx tool configured to have an Output Method of Parse. The Regular expression would be as follows:

 

^.*?\(((ZXY_|ZXV_)(.*?))\[.*$

 

I've also attached a solution (created in Alteryx Designer, version 2020.4).

 

Let's break this regular expression down...

 

The above regular expression can be read left to right, one character (or group of characters) at a time. I'm breaking out each one for you below.

 

^ - this first symbol means we are starting at the beginning of the string and there is no text that precedes it (you can delete this character if this is not applicable to your data).

 

.*? - the dot represents any character. The asterisk is known as a quantifier and asterisks will match 0 or more of the preceding character. Quantifiers, by nature, are greedy; this means they will match every instance of the preceding character (since the dot represents any character, dot asterisk matches everything). Since we know our desired text is immediately preceded by an open parenthesis, we can convert our greedy quantifier into a lazy one by adding the question mark to it. A lazy quantifier will grab everything up to an instance of the following character, but won't grab the next character itself.

 

\( - in RegEx, there are "literal" characters and "metacharacters". Literal characters are literally the characters you know well: A, B, C, 1, 2, 3, !, @, #, etc. Metacharacters are characters (or a combination of characters) that serve a specific purpose in RegEx. In our case the open parenthesis is a metacharacter, but we want it to be a literal character. In order to convert the character from a metacharacter to a literal character, we escape it with the backslash (another metacharacter).

 

I'm skipping some parentheses, but I'll explain their function later. Just know they are not literal characters. Parentheses are a metacharacter and can only be considered literal if there is a backslash that precedes them (as in my purple example immediately above).

 

ZXY_|ZXV_ - The "ZXY_" and "ZXV_" are literal characters. The pipe symbol in between them is the English equivalent for "or". In our example, the string will contain either ZXY_ or ZXV_. Note, you could have written the expression as ^.*?\(((ZXY|ZXV)_(.*?))\[.*$ or even ^.*?\((ZX(Y|V)_(.*?))\[.*$. Since both options start with a ZX and end with an underscore, placing either or both outside the or clause would still work (I actually think the first option improves legibility, but the version I posted keeps the underscore within the second captured group--more on that later).

 

.*? - see the same character group above for detailed explanation. After the ZXY_ or ZXV_, we need to match the rest of the blue text. Since I don't know what characters it might contain or how long it will be, I'm using the .* to match 0 or more of any following characters, but adding the ? to convert the asterisk to a lazy quantifier, (meaning it will only match everything up to the following character). Note, I'm assuming the blue text will never contain an open square bracket.

 

\[ - the open square bracket is a metacharacter. I'm using the backslash to convert it to a literal character.

 

.* - as mentioned above, the dot represents any character and the asterisk is a greedy quantifier that will match 0 or more of the preceding character (the dot). So this combination will match everything through the last instance of the following character.

 

$ - this is a metacharacter. Just how we started the regular expression with the carrot metacharacter (^), we end it with the dollar symbol metacharacter. This represents the end of the string and no text will follow. It's actually unneeded here and can be deleted, but I like to add it as a bookend to the opening carrot metacharacter.

 

Let's talk about those parentheses...

 

Parentheses create what are known as captured groups. If you use the RegEx tool with the Replace Output Method, you can use these captured groups as variables, with each variable storing part of the original string. Anything in the above expression that is between an open parenthesis and a closed parenthesis will be captured within a captured group. When using the RegEx tool configured with the Parse Output Method, each captured group will be output as its own field.

 

In the above expression, I created three captured groups:

((ZXY_|ZXV_)(.*?)) this is the entire blue text from your example. Note, the parentheses won't print (if you are using the Replace Output Method, this can be referenced by typing $1).

(ZXY_|ZXV_) this is the beginning ZXY_ or ZXV_. Note, the parentheses won't print (if you are using the Replace Output Method, this can be referenced by typing $2).

(.*?) this is the rest of the blue text following the ZXY_ or ZXV_. Note, the parentheses won't print (if you are using the Replace Output Method, this can be referenced by typing $3).

 

I wasn't sure how you wanted the data, so you can always add or remove a pair of open and closed parentheses to add or remove a captured group.

 

RegEx tool configured to the Parse output methodRegEx tool configured to the Parse output method

 

Hope this helped!

 

PS - if anyone can teach me how to use inline code block or markup text when posting on the Alteryx Community, that would be very helpful!

 

JarekSkudrzyk
11 - Bolide

@jbichachi003 nice solution!
regarding formatting posts - if you do not see this panel highlighted in yellow below then I think it might be a feature that is unlocked once you progress in community ranks.

JarekSkudrzyk_0-1649836334218.png

 

usnascimc
5 - Atom

@jbichachi003 

 

You have understood my questions correctly. The formulas are data within a field from which I want to extract the blue strings.

 

Your proposed solution worked precisely the way I needed. Thank you very much for your generous and detailed explanation. That was very helpful for a beginner like me. 

 

I truly appreciate the time you have taken to help me.

Labels