Alteryx Designer Desktop Discussions

cp2019 · ‎01-22-2021

Hi All - I am trying to parse the following text out of a field with data like the following...

I need to extract any code that either has 1 or 2 text characters followed by at least 4 numeric characters.

So below in this case it would be 'BU18292' or 'H4484' (anything bolded)

BU18292 - Integration Project
BU18292 - Restructuring Expenses
FINA - Integration - Finance - FINA - Integration - Finance
OAA - H4484 - Restructuring/Transaction Expenses
GK_Lab - H4484 - Dept for Dev of New Products

I am trying to use the Regex tool with parsing but am kind of stuck. Haven't used it in awhile.

Thanks in advance for your help!

OllieClarke · ‎01-22-2021

Hi @cp2019

Try (\u{1,2}\d{4,})
This will look for 1 or 2 Uppercase Characters immediately followed by 4 or more digits (if you untick Case Insensitive)

Hope that helps

Ollie

echuong1 · ‎01-22-2021

The following will look for a letter followed by 4 or more numbers OR 2 letters followed by 4 or more letters:

(\w\d{4}+|\w{2}\d{4}+)

Maskell_Rascal · ‎01-22-2021

Hi @cp2019

A little late to the party, but this is another code that will work.

(\S*\d{4,})

If this solves your issue please mark the answer as correct, if not let me know!

Thanks!

Phil

OllieClarke · ‎01-22-2021

Hi @echuong1

\w includes numbers and underscores so BU1654 or 876524 would both be parsed by that RegEx

Maskell_Rascal · ‎01-22-2021

@cp2019

Now that I think about this more, this code will cover you better without the need to caps or it being dependent on spaces. Adding in a quantifier to limit the non-white space to 2 will ensure that you always get two digits regardless of whether its a special character text or capital/lowercase letter.

(\S{1,2}\d{4,})

Thanks!

Phil

cp2019 · ‎01-22-2021

Thank you, I used this and works perfectly. Much appreciation to you and everyone else here. I'll mark this as the solution to account for the case sensitivity/whitespace aspect.

OllieClarke · ‎01-22-2021

@Maskell_Rascal respectfully, I think using \S in this instance is too broad of a choice. If it's a free text field, it parses the desired parts, but it would also parse a lot of undesired strings: '-*9872' or '981762192834' or '1.8763' would all be parsed.

If case sensitivity is an issue, then you could use \u and tick 'Case Insensitive' (as it is by default) or you could use [a-zA-Z].

However, this was all under the assumption that @cp2019 just wanted strings of the form 1/2 letters (unaccented?) followed by 4+ digits

Maskell_Rascal · ‎01-22-2021

@OllieClarke I agree. I was only giving an alternative solution to cover a more broad range of scenarios if needed. Your original solution works perfectly!

Alteryx Designer Desktop Discussions

Regex Parse Help

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula