Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to parse string and dates more than once in Regex

sguindi
7 - Meteor

Hey,

I want to return the two words before a date more than once with Regex

Ex: I want to parse this date of birth 03/03/1999 and also i want to parse out this date as well with this text 2/2/2000

 

Result to be:

RegExOut1       Regexout2         RegexOut3    RegexOut4

of birth               03/03/1999         this text          2/2/2000

 

I used this expression but it only returned out 1 & 2 only, it didn't parse out the rest of the text. I tried different things but can't get it to work

([a-z]+\s[a-z]+)\s(\d{1,2}\/\d{1,2}\/\d{4})

7 REPLIES 7
Thableaus
17 - Castor
17 - Castor

Hi @sguindi 

 

This might work:

 

.*?([a-z]+\s[a-z]+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?([a-z]+\s[a-z]+)\s(\d{1,2}\/\d{1,2}\/\d{4})

 

Parse method on RegEX tool.

 

Cheers,

sguindi
7 - Meteor

Thank you so much! it worked! YAY!

sguindi
7 - Meteor

@Thableaus

some of the results are null because the word before the date is actually number

Ex: This customer lives is 123 Maine street, NY 1234. DOB is 01/01/1111

I removed the "." so now it should return the number 1234 and DOB as the two words before the date, but can't get my expression to include either letters or numbers

Thableaus
17 - Castor
17 - Castor

@sguindi 

 

Instead of [a-z], you can use \w. \w stands for either digits or letters. It's considered the "word" character.

 

.*?(\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})

 

Cheers.

 

Thableaus
17 - Castor
17 - Castor

@sguindi 

 

But you need to check if there is punctuation between the two words before date besides \s (space).

 

This totally changes how you parse the expression.

 

Cheers,

sguindi
7 - Meteor

yes, i did replace some of the punctuation with space. So i went ahead and used the last expression to parse 3 words and 3 dates instead of two each, it worked with most of the data but with some it returned Null although there were 2 dates in there.. i noticed it returns null for every regout if the dates were less than 3 dates.

.*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})

I tried adding "?" at the end of the 3rd date (didn't work) :

.*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})?

 

I also have another another case where the first thing in a string is the date, so it returned null because the expression reads 3 words first. I tried doing this but also didn't work:

.*?(\w+\s\w+\s\w+)?\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})

 

Could you help me! TIA

Thableaus
17 - Castor
17 - Castor

@sguindi 

 

Here's a dynamic approach for your case. Using Tokenize method with RegEX tool can separate this pattern repeatedly according to the number of occurrences it happens.

 

tokenize.PNG

 

"I also have another another case where the first thing in a string is the date, so it returned null because the expression reads 3 words first. I tried doing this but also didn't work:"

 

This is a different case and it should be treated separately. One common mistake using RegEX is trying to solve all problems in a singular expression. Alteryx has a variety of tools to treat, separate and organize data, that's the whole purpose of using it.

 

So what I recommend is that you flag these odd records and apply a different RegEX for them. 

Otherwise your expression gets too complicated, difficult to mantain and sometimes you can't even understand what's going on (even though you make it work).

 

WF attached.

 

Cheers,

Labels