Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to get distinct records based on the string with similar looking values

sundarp
5 - Atom

Hi All,

 

I am trying to figure out how to get distinct records based on a column which has similar values. As you can see below, i have multiple rows or records but they are with similar values. i want to eliminate the duplicates from this.

 

I was thinking of sorting them and using muti row with regex to get an output and using unique after that but not able to find out. Pls provide your comments.

 

InputOutput
0.ES Data Extraction MainData Extraction Main
0.FR Data Extraction MainData Extraction Main
0.GB Data Extraction MainData Extraction Main
0.IT Data Extraction MainData Extraction Main
00 Duplicate PO Check Tool_after AGB_v2Duplicate PO Check Tool_after AGB
00 Duplicate PO Check Tool_after AGB_v3Duplicate PO Check Tool_after AGB
01  Top 11_InDI_PowerBI_major overhaul_v0.9801  Top 11_InDI_PowerBI_major overhaul
01  Top 11_InDI_PowerBI_major overhaul_v0.98 - QC01  Top 11_InDI_PowerBI_major overhaul
01  Top 11_InDI_PowerBI_major overhaul_v0.9901  Top 11_InDI_PowerBI_major overhaul
01. CRM_MCCP01. CRM_MCCP
01. CRM_MCCP_TEAM_ACTUAL01. CRM_MCCP_TEAM
01. CRM_MCCP_TEAM_ACTUAL_DIVIDED01. CRM_MCCP_TEAM
05 MB to CML filter 1 Part 205 MB to CML filter 1 Part 2
05 MB to CML filter 1 Part 2 - Copy05 MB to CML filter 1 Part 2
05 MB to CML filter 1 Part 2 - test05 MB to CML filter 1 Part 2

 

2 REPLIES 2
jrlindem
12 - Quasar

The challenge you have here is that you have varied prefixes for the numbers.  But the pattern you appear to be wanting removed is: "#.aa" so that is easily done via Regex.  I used Regex_Replace to find the pattern and remove it.

 

Here's the syntax:  Regex_Replace([Input], "^\d+\.[A-Za-z]{2}\s", "")

 

Then we move on to the Suffix you want to remove.  From your example it appears that you want to remove anything starting from a " - " or "_v" or “_ACTUAL” so we can find those string pieces and simply replace with a symbol like "|".  You may need to add more derivatives to your fuller dataset.

 

From there a Text-To-Columns Tool can split the string on that symbol and you can deselect the remaining characters that come after it using the Select Tool.

 

Here's what all that looks like:

 

jrlindem_1-1763139689402.png

 

 

I've also attached the sample workflow.  You can adapt the concepts and/or merge them together as you see fit.  Hope this helps you move forward with your work!  -Jay

flying008
15 - Aurora

Hi, @sundarp 

 

FYI.

Spoiler
REGEX_Replace([Input], '^\d\.\w+?\s|_(?:v|ACTUAL).*?$|\s-.+?$', '')

录制_2025_11_15_09_50_36_657.gif

 

 

 

Labels
Top Solution Authors