Hi All,
I am trying to figure out how to get distinct records based on a column which has similar values. As you can see below, i have multiple rows or records but they are with similar values. i want to eliminate the duplicates from this.
I was thinking of sorting them and using muti row with regex to get an output and using unique after that but not able to find out. Pls provide your comments.
| Input | Output |
| 0.ES Data Extraction Main | Data Extraction Main |
| 0.FR Data Extraction Main | Data Extraction Main |
| 0.GB Data Extraction Main | Data Extraction Main |
| 0.IT Data Extraction Main | Data Extraction Main |
| 00 Duplicate PO Check Tool_after AGB_v2 | Duplicate PO Check Tool_after AGB |
| 00 Duplicate PO Check Tool_after AGB_v3 | Duplicate PO Check Tool_after AGB |
| 01 Top 11_InDI_PowerBI_major overhaul_v0.98 | 01 Top 11_InDI_PowerBI_major overhaul |
| 01 Top 11_InDI_PowerBI_major overhaul_v0.98 - QC | 01 Top 11_InDI_PowerBI_major overhaul |
| 01 Top 11_InDI_PowerBI_major overhaul_v0.99 | 01 Top 11_InDI_PowerBI_major overhaul |
| 01. CRM_MCCP | 01. CRM_MCCP |
| 01. CRM_MCCP_TEAM_ACTUAL | 01. CRM_MCCP_TEAM |
| 01. CRM_MCCP_TEAM_ACTUAL_DIVIDED | 01. CRM_MCCP_TEAM |
| 05 MB to CML filter 1 Part 2 | 05 MB to CML filter 1 Part 2 |
| 05 MB to CML filter 1 Part 2 - Copy | 05 MB to CML filter 1 Part 2 |
| 05 MB to CML filter 1 Part 2 - test | 05 MB to CML filter 1 Part 2 |
Solved! Go to Solution.
The challenge you have here is that you have varied prefixes for the numbers. But the pattern you appear to be wanting removed is: "#.aa" so that is easily done via Regex. I used Regex_Replace to find the pattern and remove it.
Here's the syntax: Regex_Replace([Input], "^\d+\.[A-Za-z]{2}\s", "")
Then we move on to the Suffix you want to remove. From your example it appears that you want to remove anything starting from a " - " or "_v" or “_ACTUAL” so we can find those string pieces and simply replace with a symbol like "|". You may need to add more derivatives to your fuller dataset.
From there a Text-To-Columns Tool can split the string on that symbol and you can deselect the remaining characters that come after it using the Select Tool.
Here's what all that looks like:
I've also attached the sample workflow. You can adapt the concepts and/or merge them together as you see fit. Hope this helps you move forward with your work! -Jay
