Alteryx Designer Desktop Discussions

siddharthpotlia · ‎05-25-2021

Hi, I have data in the below format (In a single cell) Deutsche Boerse AG (DB); Deutsche Boerse AG (DB); Bolsa Mexicana de Valores (BMV); London Stock Exchange (LSE); Pink Sheets LLC (OTCPK); Pink Sheets LLC (OTCPK) I want to remove all duplicates in here Ideally the result shall look like: Deutsche Boerse AG (DB); Bolsa Mexicana de Valores (BMV); London Stock Exchange (LSE); Pink Sheets LLC (OTCPK) How do I do this in Alteryx, I know Regex is one way but I am not able to figure out the correct code for it. I found the a code shared in the community for this regex_replace([field],"\b(\w+),(?=.*\b\1,?)",""), but this is not working for me (Most likely the code has to be customized in line with the data but I do not know much about stack overflow at the moment, can anyone please help out in this. Thanks in advance.

mceleavey · ‎05-25-2021

Hi @siddharthpotlia ,

the unique tool is what you need after you've split the cell contents out to analyse them:

Workflow attached.

M.

atcodedog05 · ‎05-25-2021

Hi @siddharthpotlia

I know this not regex but this how i would do it.

1. Use record id as key.

2. Split fields to rows using ; as delimiter

3. Using unique tool to keep unique instance

4. Conacting it back with ; as delimiter

Hope this helps 🙂

atcodedog05 · ‎05-25-2021

Hi @siddharthpotlia

And RegEx method i had to try. Taking reference from weekly challenge https://community.alteryx.com/t5/Weekly-Challenge/Challenge-130-Is-There-an-Echo-in-Here/td-p/206817

Workflow:

TrimRight(
REGEX_Replace([Field1]+";", "(.+\);)\s.*\1", "${1}"),
";")

Hope this works 😅 & Hope this helps 🙂

vlad_kutateladze · ‎05-25-2021

@atcodedog05 Why the Record ID tool is necessary for this workflow? The Sample tool gives the same result while grouping only by Field1.

mceleavey · ‎05-25-2021

@vlad_kutateladze , I used a method that doesn't need it. You can parse out the contents to rows, unique, then use the sum tool to concatenate back to the same format.

@atcodedog05 just used a different method.

In Alteryx, there are always multiple ways to do the same thing 🙂

atcodedog05 · ‎05-25-2021

Hi @vlad_kutateladze

If you have multiple rows a key field is necessary to split and join back. RecordID is used as groupby in summarize tool (key filed for rows) for joining back.

Qiu · ‎05-25-2021

@atcodedog05
While I am trying...

siddharthpotlia · ‎05-25-2021

Thanks for your prompt response it is really helpful. One challenge over here is that I have multiple companies (~100) for which I need to run this. So the data set looks like this

Company 1	Deutsche Boerse AG (DB); Deutsche Boerse AG (DB); Bolsa Mexicana de Valores (BMV); London Stock Exchange (LSE); Pink Sheets LLC (OTCPK); Pink Sheets LLC (OTCPK)
Company 2	Botswana Share Market (BSM); Deutsche Boerse AG (DB); Deutsche Boerse AG (DB); Bolsa Mexicana de Valores (BMV); Namibian Stock Exchange (NMSE); The Johannesburg Securities Exchange (JSE); SIX Swiss Exchange (SWX); London Stock Exchange (LSE); XETRA Trading Platform (XTRA); XETRA Trading Platform (XTRA); Pink Sheets LLC (OTCPK); Pink Sheets LLC (OTCPK)
Company 3	Deutsche Boerse AG (DB); Bolsa Mexicana de Valores (BMV); London Stock Exchange (LSE); Pink Sheets LLC (OTCPK)

mceleavey · ‎05-25-2021

Hi @siddharthpotlia ,

this is a simple fix to my original, just added a Record ID field.

Alternatively (it wasn't in the data) you can simply group by the company ID in the sum tool. (EDIT: I've added this)

Workflow attached.

M.

Alteryx Designer Desktop Discussions

Remove Duplicates in a single cell