Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex Parse Help

cp2019
7 - Meteor

Hi All - I am trying to parse the following text out of a field with data like the following...

 

I need to extract any code that either has 1 or 2 text characters followed by at least 4 numeric characters.

 

So below in this case it would be 'BU18292' or 'H4484' (anything bolded)

 

BU18292 - Integration Project
BU18292 - Restructuring Expenses
FINA - Integration - Finance - FINA - Integration - Finance
OAA - H4484 - Restructuring/Transaction Expenses
GK_Lab - H4484 - Dept for Dev of New Products

 

I am trying to use the Regex tool with parsing but am kind of stuck. Haven't used it in awhile.

 

Thanks in advance for your help!

8 REPLIES 8
OllieClarke
15 - Aurora
15 - Aurora

Hi @cp2019 

 

Try (\u{1,2}\d{4,})
This will look for 1 or 2 Uppercase Characters immediately followed by 4 or more digits (if you untick Case Insensitive)

 

Hope that helps

 

Ollie

echuong1
Alteryx Alumni (Retired)

The following will look for a letter followed by 4 or more numbers OR 2 letters followed by 4 or more letters:

(\w\d{4}+|\w{2}\d{4}+)

 

 

 

echuong1_0-1611332280266.png

 

Maskell_Rascal
13 - Pulsar

Hi @cp2019 

 

A little late to the party, but this is another code that will work. 

 

(\S*\d{4,})

 

Maskell_Rascal_0-1611332958278.png

If this solves your issue please mark the answer as correct, if not let me know!

 

Thanks!

Phil

 

OllieClarke
15 - Aurora
15 - Aurora

Hi @echuong1 

\w includes numbers and underscores so BU1654 or 876524 would both be parsed by that RegEx

Maskell_Rascal
13 - Pulsar

@cp2019 

 

Now that I think about this more, this code will cover you better without the need to caps or it being dependent on spaces. Adding in a quantifier to limit the non-white space to 2 will ensure that you always get two digits regardless of whether its a special character text or capital/lowercase letter. 

 

 

 

(\S{1,2}\d{4,})

 

Maskell_Rascal_0-1611333884194.png

 

Thanks!

Phil

cp2019
7 - Meteor

Thank you, I used this and works perfectly. Much appreciation to you and everyone else here. I'll mark this as the solution to account for the case sensitivity/whitespace aspect.

OllieClarke
15 - Aurora
15 - Aurora

@Maskell_Rascal respectfully, I think using \S in this instance is too broad of a choice. If it's a free text field, it parses the desired parts, but it would also parse a lot of undesired strings: '-*9872' or '981762192834' or '1.8763' would all be parsed.

If case sensitivity is an issue, then you could use \u and tick 'Case Insensitive' (as it is by default) or you could use [a-zA-Z]. 

However, this was all under the assumption that @cp2019 just wanted strings of the form 1/2 letters (unaccented?) followed by 4+ digits

Maskell_Rascal
13 - Pulsar

@OllieClarke I agree. I was only giving an alternative solution to cover a more broad range of scenarios if needed. Your original solution works perfectly! 

Labels