Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex Parse Help

cp2019
7 - Meteor

Hi All - I am trying to parse the following text out of a field with data like the following...

 

I need to extract any code that either has 1 or 2 text characters followed by at least 4 numeric characters.

 

So below in this case it would be 'BU18292' or 'H4484' (anything bolded)

 

BU18292 - Integration Project
BU18292 - Restructuring Expenses
FINA - Integration - Finance - FINA - Integration - Finance
OAA - H4484 - Restructuring/Transaction Expenses
GK_Lab - H4484 - Dept for Dev of New Products

 

I am trying to use the Regex tool with parsing but am kind of stuck. Haven't used it in awhile.

 

Thanks in advance for your help!

8 REPLIES 8
OllieClarke
15 - Aurora
15 - Aurora

Hi @cp2019 

 

Try (\u{1,2}\d{4,})
This will look for 1 or 2 Uppercase Characters immediately followed by 4 or more digits (if you untick Case Insensitive)

 

Hope that helps

 

Ollie

echuong1
Alteryx Alumni (Retired)

The following will look for a letter followed by 4 or more numbers OR 2 letters followed by 4 or more letters:

(\w\d{4}+|\w{2}\d{4}+)

 

 

 

echuong1_0-1611332280266.png

 

Maskell_Rascal
13 - Pulsar

Hi @cp2019 

 

A little late to the party, but this is another code that will work. 

 

(\S*\d{4,})

 

Maskell_Rascal_0-1611332958278.png

If this solves your issue please mark the answer as correct, if not let me know!

 

Thanks!

Phil

 

OllieClarke
15 - Aurora
15 - Aurora

Hi @echuong1 

\w includes numbers and underscores so BU1654 or 876524 would both be parsed by that RegEx

Maskell_Rascal
13 - Pulsar

@cp2019 

 

Now that I think about this more, this code will cover you better without the need to caps or it being dependent on spaces. Adding in a quantifier to limit the non-white space to 2 will ensure that you always get two digits regardless of whether its a special character text or capital/lowercase letter. 

 

 

 

(\S{1,2}\d{4,})

 

Maskell_Rascal_0-1611333884194.png

 

Thanks!

Phil

cp2019
7 - Meteor

Thank you, I used this and works perfectly. Much appreciation to you and everyone else here. I'll mark this as the solution to account for the case sensitivity/whitespace aspect.

OllieClarke
15 - Aurora
15 - Aurora

@Maskell_Rascal respectfully, I think using \S in this instance is too broad of a choice. If it's a free text field, it parses the desired parts, but it would also parse a lot of undesired strings: '-*9872' or '981762192834' or '1.8763' would all be parsed.

If case sensitivity is an issue, then you could use \u and tick 'Case Insensitive' (as it is by default) or you could use [a-zA-Z]. 

However, this was all under the assumption that @cp2019 just wanted strings of the form 1/2 letters (unaccented?) followed by 4+ digits

Maskell_Rascal
13 - Pulsar

@OllieClarke I agree. I was only giving an alternative solution to cover a more broad range of scenarios if needed. Your original solution works perfectly! 

Labels