Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove duplicate words from a field

Brad1
11 - Bolide

Need some help.

 

So I have a field that look like this:

 

Lang_Spoken

ENG, SPA, OTH, CHI, JPN, OTH, OTH

 

We are keeping track of more languages than the Client is asking for - so every time a language is return that isn't on their list it goes down as "OTH". 

 

How can I best clean this field of duplicate "OTH" or any other dup for that matter.

 

Thanks for any help.

 

5 REPLIES 5
BenMoss
ACE Emeritus
ACE Emeritus

First, record ID each row. You can use the 'text to columns' tool, set your delimiter as , and choose the mode 'split to rows'.

 

You can then unique on the 'Record ID' field and the 'Lang_Spoken' field.

 

Finally, to bring them back onto a single line you can use the summerize tool, grouping by your ID field and concatting your 'Lang_Spoken' field. Also group by any other fields you may have in your data.

 

Give this a go, and let us know if it works!

 

Ben

Claje
14 - Magnetar

Hi,


The easiest way (I know of) to do this is to Use Text to Columns with the setting (output as rows).  Then, you can use the Unique tool to limit to one of each unique value, and the Summarize tool to concatenate the string back together.

 

If you run into issues with this i can put together a quick example!

Brad1
11 - Bolide

Never mind.

 

2018-07-12_11-13-16.jpg

NJT
11 - Bolide

Worked great for me with a similar issue!

OmaratRJL
7 - Meteor

Is there a more "compact way to do this? What if there are 50 potential fields. Is there a way that we can use regex to walk through the string and say example:

 

SYNWH01,SYNWH02,SYNWH4,SYNWH5,SYNWH6,SYNWH7,SYNWH10,SYNWH14,SYNWH15,SYNWH01,SYNWH02,SYNWH4,SYNWH5,SYNWH6,SYNWH7,SYNWH10,SYNWH14,SYNWH15

 

then produces something like this:

 

SYNWH01,SYNWH02,SYNWH4,SYNWH5,SYNWH6,SYNWH7,SYNWH10,SYNWH14,SYNWH15,SYNWH01,SYNWH02,SYNWH4,SYNWH5,SYNWH6,SYNWH7,SYNWH10,SYNWH14,SYNWH15

 

My fix to this was to put in a unique to get rid of the duplicates but my regex limited mind wants to know if there is a regex fix to this?

Labels