community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Remove Duplicates within a cell?

Asteroid

Hi,

 

I've summarised my data using concat because I want to keep all unique values but it also repeats duplicates - is there any way to remove duplicates within a cell? 

 

For example: change "A,A,A,B,B,C,D,E" to "A,B,C,D,E"?

 

Thanks

Alteryx Certified Partner
Alteryx Certified Partner

That was an interesting puzzle.  Thanks for giving me some work to do while waiting on a plane.

 

regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

 

This will get you your result of A,B,C,D,E

 

How did I figure it out?  Google:

 

http://stackoverflow.com/questions/3309805/what-regular-expression-can-remove-duplicate-items-from-a...

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and reboot. Order shall return.
Alteryx Partner

Was also wondering if it would be easier for you to add a unique before your summarize? 

Should be a little more efficient too if you have a lot of data.

Alteryx Certified Partner

If you didnt want to get into the Regex, another option would be to break the string out into individual rows using the @Text to Columns' tool from that you just add a Unique tool after it then Cross Tab back to return the rows to a string

Asteroid

Hi Joe,

 

They was only one character for each record before it was summarised but after it was summarised (concatenated), the characters were combined into one string regardless of whether they were unique or duplicate values.

 

Thanks,

Heidi

Alteryx Partner

Hi Heidi,

 

I am not completely sure that I follow....

 

 

In the stream before you have the summarize tool to concatenate them together couldnt you add a unique. Check the field you are concatenating (and quite probably another filed you are grouping by). 

This should then contain the list you require without the need for RegEx. Not that the result will be any different as Mark's function work, just would hopefully be a little more efficient.

 

Thanks

Joe

Atom

Hello,

 

I tried using this expression to remove duplicate values within the same cell after concatenating rows. It does not seem to be working though. How would I modify this to remove strings that are multiple words and not just single character strings?

 

regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

Meteoroid

Hello,

 

I have the same issue after using the same formula. regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

 

But the good thing though was the comma was remove except for  the duplicate values inside the cell didn't work.

 

From

8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD

 

Output I got is:

8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD

 

Thank you

Highlighted
Atom

Give this a try: regex_replace([Field1],'([^,]*)(\,\1)*(\,|$)','\1\3')

 

It worked for what I needed and when I ran it against your example I got: 8985 VENICE BLVD

 

Which I think is what you are looking for.

Labels