Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Conditional Regex

mweiser_dup_512
6 - Meteoroid

Hi Alteryx Community,

 

I am trying to tokenize a series of strings with letters, numbers, and special characters by every instance of a capital letter of a word, but I only want to tokenize if there is no "/" in between two words. How would I go about that? The data is very random and inconsistent. Ideally I would like to tokenize by the "/" but every now and then there is a date using "/" so I am attempting to go with each capital letter.

 

For example:

Polar Bear / Train (CAR) 18-19 / Nickel/Vacation 1/2 - 1/6/ CAT 1/3 Alligator X&Y / FLOMINGO / Vulture/GORILLA / HOMECARE2 Provide

 

To turn into:

Polar Bear

Train (CAR)

Nickel

Vacation

CAT

Alligator X&Y

FLOMINGO

Vulture

GORILLA

HOMECARE2 Provide

 

 

7 REPLIES 7
charlie_archer
7 - Meteor

Hi there,

 

I got to your list by using the following within the Regex tokenize tool:

 

[^\d-]

 

to strip out the following numbers (and make sure the list was consistent with yours, i followed up with a parse with the following:

 

([^\d-]+)

 

Hope this helps but let me know if this doesn't give you what you're after.

 

Charlie

mweiser_dup_512
6 - Meteoroid

Hi Charlie,

 

Thank you for the reply. I tried using regex to tokenize by [^\d-] and then had a regex parsing by ([^\d-]+), but the first regex simply broke out the set one character at a time (and therefore the second regex didn't do much). Please let me know if I misunderstood your response.

The following is a screen shot of the tokenize tool:

Alteryx Regex Configuration.PNG

 

 

Thank you

 

Matt

charlie_archer
7 - Meteor

Yep sorry Matt thats my fault for giving you the wrong regex. Think i must have pasted the same in twice for some reason.

 

The tokenize is actually (\u.+?)/

The parse is the same.

 

I've attached my workflow just in case.

mweiser_dup_512
6 - Meteoroid

Thank you, this is great. My only concern is what to do when there is a number within the word/phrase like the "HOMECARE2 Provide" at the end.

Bob_Blackey
11 - Bolide

I love a good Regex question! I'll just add that in your example you wanted to also capture the "HOMECARE2 Provide". In that case I would change the regex in the tokenize to:

 

(\u.+?)(?:/|$)

 

The (?:/|$)

 

is an unmakred group - it says to stop when you hit either a / or the end of the line (  $ in regex)

 

 

mweiser_dup_512
6 - Meteoroid

Thank you Bob. What do you suggest I do for the second regex parsing tool since it is deleting the 2 and everything after that in "HOMECARE2 Provide"? Please keep in mind that the digit (if part of the word/phrase) would not always be at the end of the word itself, but may be in the beginning or middle of the word.

Please see the screen shot below:

 

Alteryx Regex Configuration 2.PNG

Bob_Blackey
11 - Bolide

Hi Matt,

 

Regex is fun because how you build it really depends on your data and your knowledge of how quirky it can be. 

 

Based on your original post it look like you want to keep "HOMECARE2 Provide" and the numbers you want to get rid of are the dates.

 

so what I did is first replace the possible dates with nothing.

 

regex_001.png

 

The first replace gets rid of dates with either 2/17 or 12/15 format, the second with either 2-17 or 2-15.

 

As I said regex is a great tool, but how complex your expressions are depends on how extreme your data is.

 

Cheers,

Bob

 

 

 

Labels
Top Solution Authors