Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Regex Match to find number pattern

RMotiwalla
8 - Asteroid

Hello

 

I am trying to use the Regex match in a filter to find numbers that have dates in them. I cannot seem to find anywhere how to look for specific numbers in a specific order, just to show that there are numbers in the field. The field is set as a string, so that is not the problem. 

 

My issue is that I have invoice numbers that have dates at the end in the format yyMMDD, but they are all not going to be there, and some of the digits will be cutoff depending on how long the invoice number is (for a max of 12 characters). Here are some of the sample invoice numbers:

 

Inv#"Date"
38045021907019070
37182181905019050
969472190701190701
30011906019060
1349920288711
1926720026111
3023587988Z 
30477475341818
19927341904219042
20116831905219052
122499190501190501
10170411902119021
11045491905019050
1872620007011
11061121905319053
140045190628190628
10064671907019070
16152151906219062
30252162006020060
30233362106021060

 

As you can see, the "date" is rather inconsistent other than it will start with 1 or 2, but a few don't even have what looks to be a date. A lot of them seem to be from 2019, so I could probably capture the majority stating that the pattern I am looking for starts with 19 and may have up to 4 more digits after it, with a maximum of 12 characters in the field.

 

Is there any way to use Regex in a filter to find these? Or any other way possible?

 

Thanks!

3 REPLIES 3
apathetichell
18 - Pollux

regex tool set to parse with:

(19[01]\d{2,3})

 

gives you the bulk of it - but I do not think your data as presently constructed has a standard enough formatting to make regex full work. You could also run the above with 20 and 21 in lieu of19 via: (19[01]\d{2,3})|(20[01]\d{2,3})|(21[01]\d{2,3}) to see what you get

 

note - for months 10,11 and 12 it's a bit trickier.

RMotiwalla
8 - Asteroid

Thank you for the quick response apathetichell! I mostly understand the formula, but I do not understand what [01] means, or why it gets trickier looking for months 10, 11, and 12. Also, is there a way to split the beginning of the number from the date portion? I was able to use replace and remove the date portion, but I would like to keep it just for spot checking purposes.

 

Thanks!

apathetichell
18 - Pollux

O.k. so the number pattern doesn't follow a standard order that I can see. Ideally it would be 6 digits exactly - with a 19/20/or 21 to start it. So here's what we have:

(19[01]\d{2,3})|(20[01]\d{2,3})|(21[01]\d{2,3})

 

Note - it will create 3 different date columns in parse (which you could keep are regexoutetc... or you could rename as 2019/2020/2021 as you desire.

 

This looks for a 19, 20 or 21 to start. It has to match any of those 3 exactly. Following that we need a 0 or a 1. This is for the month column. (in retrospect it should be able to match a 10/11/12 here). The following looks for any 2 or 3 digits. This will match anything. Ideally this should be exactly 3 - so a date in the format of 190501 would be matched. I do not know if May 1st would be formatted as as that in your dataset or as 19051- which is only five digits.

 

There are a few ways to get the other characters. You can either remove the regex from the original field via if [regexout1]!=null() then replace([field1],regexout1,"") elseif [regexout2]!=null() then replace([field1],regexout2,"") else replace([regexout3],"") endif

 

or you could (.*)before each of the entries above and the part preceding the "date" would be captured. There's also probably a regex replace way but I'd have to play around to get it.

 

 

Labels