I get the below data from a system which includes 3 fields I need to parse out, but they don't always appear in the same place. Two of the fields are the exact same size but one of the two fields is not always present. I've uploaded the sample data and the desired parsed results, color coded for where they appear in the string. I can pretty easily get cost center from first 9 digits or No Cost Center just from regular formula, but I'm struggling with getting the other 2 fields because of the same size.
Payment |
0650/1596_SN00-5301311 |
0650/1899_SN00-5309104--No_ARIO_Number |
No_Cost_Center-Need_ARIO_Number--1034664-10475100_0_1987001_SN00 |
0650/1574_SN00-7301181--No_ARIO_Number |
0650/1645_SN00-1987001 |
0650/1645_SN00-5309104 |
0650/1575_SN00-5301311 |
No_Cost_Center-Need_ARIO_Number--1027660-10369066_0_1987001_SN00 |
0650/1748_SN00-5301311 |
0650/1748_SN00-5309104--No_ARIO_Number |
0650/1631_SN00-5301809--No_ARIO_Number |
No_Cost_Center-Need_ARIO_Number--1028736-10123404_0_1987001_SN00 |
0650/1688_SN00-5301809--No_ARIO_Number |
0650/1591_SN00-6204301--No_ARIO_Number |
No_Cost_Center-Need_ARIO_Number-HDQ-1036659-10475065_0_1987001_SN00 |
0650/1871_SN00-5302300 |
Solved! Go to Solution.
Hi @182129 ,
I tried to follow the format as much as possible, but still not sure if it works for larger volume.
Please adjust the formula tool if needed.
Workflow
RegEx tool configuration
Formula tool expression
[HasAR#] = REGEX_Match([GL], ".*_\d{7}_.*")
[GL] = IF ![HasAR#] THEN [AR#] ELSE REGEX_Replace([GL], ".*_(\d{7})_.*", "$1") ENDIF
[AR#] = IF [HasAR#] THEN [AR#] ELSE Null() ENDIF
[Cost Center] = IF StartsWith([Cost Center], "No") THEN "No_Cost_Center" ELSE Replace([Cost Center], "_SN00", "") ENDIF
Output
I took a slightly different approach.
I thought it might be easier to split the data into the three (3) data configurations present and tackle each case individually.
This might offer a little more flexibility should other patterns appear.
As you know, the Record ID field will let you split, process in groups, and re-join each record.
I hope this helps.
@hellyars I like this approach but because we're on version 2020.3 of Alteryx I can't import your solution. Would you mind posting the configurations?
@182129will do. back in a few.
Cant downgrade...but this should help.
Reference my image above. The steps below will reference the name of the discussion boxes that contain the RegEx tools.
0. Add a Record ID tool.
1. Set the first Filter Tool to filter Payments, Contains, No_ARIO.
2. Connect the True output to a RegEx tool in the No_ARIO box and configure it to parse Payment using the following configuration.
^(\d{4}\/\d{4}).*?SN00-(\d{7})
3. Connect the False output from the first Filter tool to a second Filter tool.
Configure this second Filter tool to filter Payment, Contains, No_Cost.
4. From the second Filter tool, connect the True output to a RegEx tool in the No_Cost_Need Present box and configure it to parse Payment as follows.
-(\d{7})-(\d{8})_0_(\d{7})
5. From the second Filter tool, connect the False output to a third (and last) RegEx tool in the Cost Center Present box and configure it to parse Payment using the following configuration:
(\d{4}\/\d{4})_.*-(\d{7})
6. Add a Join tool to bring all the streams back together and then Sort using Record ID.
@hellyars Do I need Record ID if the final sort order doesn't matter?
Technically you do not need it here, but it is a good practice to include it.
You never know when it might come in handy.