Hi, all.
This is my first post asking a big favor!!!
I have a dataset and one of the text fields has short descriptions. The length of value varies.. from 10 to 50 characters.
In this field, I'm trying to parse out strings with a specific format.
For example,
Data value: SP001 Mar 2022 Operating expense --> Value I need to extract SP001
Data value: Fixed asset UI001 2022 --> Value I need to extract UI001
Basically the values I need to parse out have one of the formats below;
Since I don't know in which part of the description this string appears, I was hoping to use multiple RegEx formula to handle this complicated task.
I'm not good at using RegEx, it'll be greatly appreciated if you could help me out with this.
Thank you the great experts in the community. Please help!!!
Solved! Go to Solution.
@byung0917
Can you provide a sample input so we can test?
Here are some sample data. Sorry I wish I could attach Excel, but the security system in my network does not allow...
Description | Value I need |
Q1 NOR001 Mar 2022 Liability - LT - Change Liability | NOR001 |
Q3 NOR001 Mar 2022 Liability - LT | NOR001 |
NOR001 Mar 2022 Liability - LT - Interest | NOR001 |
NOR001 Mar 2022 Liability - LT - Liability Reversal | NOR001 |
NY012 Mar 2022 Liability - LT - Interest | NY012 |
MO007-SB-Reclass March Prepaid against Liability | MO007-SB |
DC005-Reclass March Prepaid against Liability | DC005 |
MO007-B-Reclass March Prepaid against Liability | MO007-B |
MO007-2-Reclass March Prepaid against Liability | MO007-2 |
CA021-2-Reclass March Prepaid against Liability | CA021-2 |
MO007-1-Reclass March Prepaid against Liability | MO007-1 |
CA029-Reclass March Prepaid against Liability | CA029 |
Fix MO007-4-Reclass March Prepaid against Liability | MO007-4 |
Fix MO007-5-Reclass March Prepaid against Liability | MO007-5 |
Fix MO007-6-Reclass March Prepaid against Liability | MO007-6 |
[A-Z]{2,3}-?\d{3}
This matches 2 or 3 UPPERCASE letters followed by one or zero dashes followed by three numbers.
This matches most of your use-cases
it doesn't match your last two
Which is harder.
I think [A-Z]{2,3}-?\d{3}-?\d?\d? would do it, but you don't haveany test cases in your sample for that.
Also you sample has MO007-B and MO007-SB. Neither of which matches your initial use cases.
So then I got to
[A-Z]{2,3}-?\d{3}-?[A-Z1-9]?[A-Z1-9]?
Which seems to work EXCEPT it adds a hyphen R on the next of rows like CA029-Reclass March Prepaid against Liability because it sees the -R as being like the rest of the patterns.
So, yeah. This is close but not quite there. Hopefully it gets you somewhere.