Hey,
I want to return the two words before a date more than once with Regex
Ex: I want to parse this date of birth 03/03/1999 and also i want to parse out this date as well with this text 2/2/2000
Result to be:
RegExOut1 Regexout2 RegexOut3 RegexOut4
of birth 03/03/1999 this text 2/2/2000
I used this expression but it only returned out 1 & 2 only, it didn't parse out the rest of the text. I tried different things but can't get it to work
([a-z]+\s[a-z]+)\s(\d{1,2}\/\d{1,2}\/\d{4})
Solved! Go to Solution.
Hi @sguindi
This might work:
.*?([a-z]+\s[a-z]+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?([a-z]+\s[a-z]+)\s(\d{1,2}\/\d{1,2}\/\d{4})
Parse method on RegEX tool.
Cheers,
Thank you so much! it worked! YAY!
@Thableaus
some of the results are null because the word before the date is actually number
Ex: This customer lives is 123 Maine street, NY 1234. DOB is 01/01/1111
I removed the "." so now it should return the number 1234 and DOB as the two words before the date, but can't get my expression to include either letters or numbers
Instead of [a-z], you can use \w. \w stands for either digits or letters. It's considered the "word" character.
.*?(\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})
Cheers.
But you need to check if there is punctuation between the two words before date besides \s (space).
This totally changes how you parse the expression.
Cheers,
yes, i did replace some of the punctuation with space. So i went ahead and used the last expression to parse 3 words and 3 dates instead of two each, it worked with most of the data but with some it returned Null although there were 2 dates in there.. i noticed it returns null for every regout if the dates were less than 3 dates.
.*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})
I tried adding "?" at the end of the 3rd date (didn't work) :
.*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})?
I also have another another case where the first thing in a string is the date, so it returned null because the expression reads 3 words first. I tried doing this but also didn't work:
.*?(\w+\s\w+\s\w+)?\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4}).*?(\w+\s\w+\s\w+)\s(\d{1,2}\/\d{1,2}\/\d{4})
Could you help me! TIA
Here's a dynamic approach for your case. Using Tokenize method with RegEX tool can separate this pattern repeatedly according to the number of occurrences it happens.
"I also have another another case where the first thing in a string is the date, so it returned null because the expression reads 3 words first. I tried doing this but also didn't work:"
This is a different case and it should be treated separately. One common mistake using RegEX is trying to solve all problems in a singular expression. Alteryx has a variety of tools to treat, separate and organize data, that's the whole purpose of using it.
So what I recommend is that you flag these odd records and apply a different RegEX for them.
Otherwise your expression gets too complicated, difficult to mantain and sometimes you can't even understand what's going on (even though you make it work).
WF attached.
Cheers,