Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove Duplicates within a cell?

HM
8 - Asteroid

Hi,

 

I've summarised my data using concat because I want to keep all unique values but it also repeats duplicates - is there any way to remove duplicates within a cell? 

 

For example: change "A,A,A,B,B,C,D,E" to "A,B,C,D,E"?

 

Thanks

11 REPLIES 11
MarqueeCrew
20 - Arcturus
20 - Arcturus

That was an interesting puzzle.  Thanks for giving me some work to do while waiting on a plane.

 

regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

 

This will get you your result of A,B,C,D,E

 

How did I figure it out?  Google:

 

http://stackoverflow.com/questions/3309805/what-regular-expression-can-remove-duplicate-items-from-a...

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
joe_strellis
7 - Meteor

Was also wondering if it would be easier for you to add a unique before your summarize? 

Should be a little more efficient too if you have a lot of data.

paul_houghton
12 - Quasar

If you didnt want to get into the Regex, another option would be to break the string out into individual rows using the @Text to Columns' tool from that you just add a Unique tool after it then Cross Tab back to return the rows to a string

HM
8 - Asteroid

Hi Joe,

 

They was only one character for each record before it was summarised but after it was summarised (concatenated), the characters were combined into one string regardless of whether they were unique or duplicate values.

 

Thanks,

Heidi

joe_strellis
7 - Meteor

Hi Heidi,

 

I am not completely sure that I follow....

 

 

In the stream before you have the summarize tool to concatenate them together couldnt you add a unique. Check the field you are concatenating (and quite probably another filed you are grouping by). 

This should then contain the list you require without the need for RegEx. Not that the result will be any different as Mark's function work, just would hopefully be a little more efficient.

 

Thanks

Joe

dan1
5 - Atom

Hello,

 

I tried using this expression to remove duplicate values within the same cell after concatenating rows. It does not seem to be working though. How would I modify this to remove strings that are multiple words and not just single character strings?

 

regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

iceman
6 - Meteoroid

Hello,

 

I have the same issue after using the same formula. regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

 

But the good thing though was the comma was remove except for  the duplicate values inside the cell didn't work.

 

From

8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD

 

Output I got is:

8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD

 

Thank you

Bbrelo1
5 - Atom

Give this a try: regex_replace([Field1],'([^,]*)(\,\1)*(\,|$)','\1\3')

 

It worked for what I needed and when I ran it against your example I got: 8985 VENICE BLVD

 

Which I think is what you are looking for.

phoebe_kelley
9 - Comet

I am having trouble with this formula. It seems to remove the commas (my delimiter) in all but the first set. I need to use the regex replace formula because I have multiple fields to edit and want to use the multi-field formula tool.

 

I changed it from "/w+" to ".+" since I have some non-alpha-numeric characters in my data. I don't care about the order of the output, just that they are unique. I can change the delimiter from comma to something else if that works better.

 

this is what I'm currently using: regex_replace([Regions Submitted],"\b(.+),(?=.*\b\1,?)",""). Sample data is attached.

 

Here is how I would like the formula to work:

input: CEN,NE,CEN,NE,CEN,NE,RM,RM,RM,RM,RM,RM

output: CEN, NE, RM (order doesn't matter here, as long as the three unique values are present, separated by a comma or other delimiter)

 

input: ICE CREAM BLACK SESAME,ICE CREAM BLACK SESAME

output: ICE CREAM BLACK SESAME

 

input: BIOSIL® HAIR, SKIN, NAILS

output: BIOSIL® HAIR, SKIN, NAILS

 

input: HUMPHRY SLOCOMBE | HMSLCM | 129921,HUMPHRY SLOCOMBE | HMSLCM | 129921

output: HUMPHRY SLOCOMBE | HMSLCM | 129921

 

input: email.last@test.com,email.last@test.com,email2.last@test.com

output: email.last@test.com,email2.last@test.com

Labels