Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Finding strings with specific patterns in a text column using RegEx

byung0917
7 - Meteor

Hi, all. 

 

This is my first post asking a big favor!!!

 

I have a dataset and one of the text fields has short descriptions. The length of value varies.. from 10 to 50 characters. 

In this field, I'm trying to parse out strings with a specific format. 

 

For example, 

Data value: SP001 Mar 2022 Operating expense --> Value I need to extract SP001

Data value: Fixed asset UI001 2022 --> Value I need to extract UI001

 

Basically the values I need to parse out have one of the formats below;

  • Three characters + three digit numeric value (e.g. NOR001)
  • Two characters + three digit numeric value (e.g. CA002)
  • Three characters + "-" + three digit numeric value (e.g. NOR-002)
  • Two characters + "-" + three digit numeric value (e.g. NY-001)
  • Three characters + "-" + three digit numeric value + "-" + one digit numeric value (e.g. CAD-091-1)
  • Two characters + "-" + three digit numeric value + "-" + one digit numeric value (e.g. CAD-023-11)

Since I don't know in which part of the description this string appears, I was hoping to use multiple RegEx formula to handle this complicated task. 

 

I'm not good at using RegEx, it'll be greatly appreciated if you could help me out with this. 

Thank you the great experts in the community. Please help!!!

 

 

 

 

3 REPLIES 3
Qiu
20 - Arcturus
20 - Arcturus

@byung0917 
Can you provide a sample input so we can test?

byung0917
7 - Meteor

 

Here are some sample data. Sorry I wish I could attach Excel, but the security system in my network does not allow... 

 

DescriptionValue I need
Q1 NOR001 Mar 2022 Liability - LT - Change LiabilityNOR001
Q3 NOR001 Mar 2022  Liability - LTNOR001
NOR001 Mar 2022  Liability - LT - InterestNOR001
NOR001 Mar 2022 Liability - LT - Liability ReversalNOR001
NY012 Mar 2022 Liability - LT - InterestNY012
MO007-SB-Reclass March Prepaid against LiabilityMO007-SB
DC005-Reclass March Prepaid against LiabilityDC005
MO007-B-Reclass March Prepaid against LiabilityMO007-B
MO007-2-Reclass March Prepaid against LiabilityMO007-2
CA021-2-Reclass March Prepaid against LiabilityCA021-2
MO007-1-Reclass March Prepaid against LiabilityMO007-1
CA029-Reclass March Prepaid against LiabilityCA029
Fix MO007-4-Reclass March Prepaid against LiabilityMO007-4
Fix MO007-5-Reclass March Prepaid against LiabilityMO007-5
Fix MO007-6-Reclass March Prepaid against LiabilityMO007-6
rfoster7
9 - Comet

[A-Z]{2,3}-?\d{3}

 

This matches 2 or 3 UPPERCASE letters followed by one or zero dashes followed by three numbers. 

 

This matches most of your use-cases

 

  • Three characters + three digit numeric value (e.g. NOR001)
  • Two characters + three digit numeric value (e.g. CA002)
  • Three characters + "-" + three digit numeric value (e.g. NOR-002)
  • Two characters + "-" + three digit numeric value (e.g. NY-001)

 

it doesn't match your last two 

 

  • Three characters + "-" + three digit numeric value + "-" + one digit numeric value (e.g. CAD-091-1)
  • Two characters + "-" + three digit numeric value + "-" + one digit numeric value (e.g. CAD-023-11)

 

Which is harder. 

 

I think [A-Z]{2,3}-?\d{3}-?\d?\d? would do it, but you don't haveany test cases in your sample for that. 

 

Also you sample has MO007-B and MO007-SB. Neither of which matches your initial use cases.  

 

So then I got to 

 

[A-Z]{2,3}-?\d{3}-?[A-Z1-9]?[A-Z1-9]?

 

Which seems to work EXCEPT it adds a hyphen R on the next of rows like CA029-Reclass March Prepaid against Liability because it sees the -R as being like the rest of the patterns.

 

So, yeah. This is close but not quite there. Hopefully it gets you somewhere.  

image.pngimage.png

 

Labels