Hi All - I am trying to parse the following text out of a field with data like the following...
I need to extract any code that either has 1 or 2 text characters followed by at least 4 numeric characters.
So below in this case it would be 'BU18292' or 'H4484' (anything bolded)
BU18292 - Integration Project
BU18292 - Restructuring Expenses
FINA - Integration - Finance - FINA - Integration - Finance
OAA - H4484 - Restructuring/Transaction Expenses
GK_Lab - H4484 - Dept for Dev of New Products
I am trying to use the Regex tool with parsing but am kind of stuck. Haven't used it in awhile.
Thanks in advance for your help!
Solved! Go to Solution.
Hi @cp2019
Try (\u{1,2}\d{4,})
This will look for 1 or 2 Uppercase Characters immediately followed by 4 or more digits (if you untick Case Insensitive)
Hope that helps
Ollie
Hi @cp2019
A little late to the party, but this is another code that will work.
(\S*\d{4,})
If this solves your issue please mark the answer as correct, if not let me know!
Thanks!
Phil
Hi @echuong1
\w includes numbers and underscores so BU1654 or 876524 would both be parsed by that RegEx
Now that I think about this more, this code will cover you better without the need to caps or it being dependent on spaces. Adding in a quantifier to limit the non-white space to 2 will ensure that you always get two digits regardless of whether its a special character text or capital/lowercase letter.
(\S{1,2}\d{4,})
Thanks!
Phil
Thank you, I used this and works perfectly. Much appreciation to you and everyone else here. I'll mark this as the solution to account for the case sensitivity/whitespace aspect.
@Maskell_Rascal respectfully, I think using \S in this instance is too broad of a choice. If it's a free text field, it parses the desired parts, but it would also parse a lot of undesired strings: '-*9872' or '981762192834' or '1.8763' would all be parsed.
If case sensitivity is an issue, then you could use \u and tick 'Case Insensitive' (as it is by default) or you could use [a-zA-Z].
However, this was all under the assumption that @cp2019 just wanted strings of the form 1/2 letters (unaccented?) followed by 4+ digits
@OllieClarke I agree. I was only giving an alternative solution to cover a more broad range of scenarios if needed. Your original solution works perfectly!