Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Modify REGEX to account for specific variations

hellyars
13 - Pulsar

 

I want to parse FIELD1 into tow groups: GROUP_1 and GROUP_2.

 

My REGEX tool is set to parse.  My REGEX is:  

 

 

(^[A-Z]{3,4}).*?(\b\d{4}\b)

 

 

This works for the majority of the records, but it does not work for #4 and #7.

 

 

RecordIDFIELD1GROUP_1GROUP_2
1GHY/5010GHY5010
2TRY/4268TRY4268
3SDE CTC/5300SDE5300
4CMP/652000CMP6520
5GVW/0605/J0449GVW0605
6CMP/0204567M/6518CMP6518
7PNA/0449CPNA0449
8CMNAP/5678CMNAP5678

 

 

GROUP_1 is always the first letter group - as long as it is either 3 or 5 characters (not 4).

GROUP_2 is trickier.  The target is a 4 digit number group.   It will always only be a numbers group.  

 

In #4 two additional zeros have been added.  So, 4 digits followed by two zeroes and only two zeroes tells me it is okay to use the first 4-digits in Group 2.

In #7 the 4-digit code is there, but it is followed by a C.  The presence of the C tells me its okay to ignore it and use the 4-digits before it in Group_2

 

How can I modify my REGEX to account for #4 and #7?

 

4 REPLIES 4
echuong1
Alteryx Alumni (Retired)

Try this: (\u{3,5}).*/(\d{4}).*

 

I'm looking for 3 or 5 uppercase letters, anything, a forward slash, 4 digits, anything after. I have the 3 or 5 uppercase as a marked group to parse, as well as the first 4 digits after the forward slash.

 

echuong1_0-1612297608818.png

 

hellyars
13 - Pulsar

@echuong1 

 

Hi E,  

It works.  Using your logic, I was also able to modify my original expression to (^[A-Z]{3,4}).*(\d{4}) by removing the ? and \b.  But, I went with yours.  

QUESTION:  In the real data, I realize I need to keep the C and parse it into a 3rd group if and when it is present. How could I modify your expression to account fo this?  I tried (\u{3,5}).*/(\d{4}).*|\u{3,5}.*/\d{4}.([C]).* but Alteryx just laughed. 

 

Oh, I still can't get the Intel Suite to work with my PDFs.  I think the Parallels VM tax is the biggest culprit, but I have not had a chance to test on a pure PC.

echuong1
Alteryx Alumni (Retired)

I assumed you're referring to the C at the end of record 7? 

 

Try the following:  (\u{3,5}).*/(\d{4})(C)?.* 

 

It is looking for the same as before, but I included an optional marked group containing a C. The ? at the end makes it optional.

 

If this resolves your issue, please mark this thread as solved, so others can find answers more easily. Thanks!

 

echuong1_0-1612302286587.png

 

BretCarr
10 - Fireball

I like @echuong1 ‘s expression but if you want to stay true to your “the the first set of letters are either 3 or 5 but never 4,” then you should use a catch for that to keep it clear:

 

((?:[A-Z]{3}|[A-Z]{5})).*\/(\d{4})(C)?
 
I also removed the last .* since after that final optional C character, there was nothing left to bother with.
 
The OR capture block is worth memorizing. I find myself using that constantly to catch and handle of obscure data. I also make it a habit to “escape” (the backslashes immediately before all symbol characters) even if it makes the expression a little cluttered. Just good coding that translate well between languages!
 
Good luck in your endeavors! 🤓
 
 
Labels