Alteryx Designer Desktop Discussions

HM · ‎10-28-2015

Hi,

I've summarised my data using concat because I want to keep all unique values but it also repeats duplicates - is there any way to remove duplicates within a cell?

For example: change "A,A,A,B,B,C,D,E" to "A,B,C,D,E"?

Thanks

MarqueeCrew · ‎10-28-2015

That was an interesting puzzle. Thanks for giving me some work to do while waiting on a plane.

regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

This will get you your result of A,B,C,D,E

How did I figure it out? Google:

http://stackoverflow.com/questions/3309805/what-regular-expression-can-remove-duplicate-items-from-a...

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

joe_strellis · ‎10-29-2015

Was also wondering if it would be easier for you to add a unique before your summarize?

Should be a little more efficient too if you have a lot of data.

paul_houghton · ‎10-29-2015

If you didnt want to get into the Regex, another option would be to break the string out into individual rows using the @Text to Columns' tool from that you just add a Unique tool after it then Cross Tab back to return the rows to a string

HM · ‎10-29-2015

Hi Joe,

They was only one character for each record before it was summarised but after it was summarised (concatenated), the characters were combined into one string regardless of whether they were unique or duplicate values.

Thanks,

Heidi

joe_strellis · ‎10-30-2015

Hi Heidi,

I am not completely sure that I follow....

In the stream before you have the summarize tool to concatenate them together couldnt you add a unique. Check the field you are concatenating (and quite probably another filed you are grouping by).

This should then contain the list you require without the need for RegEx. Not that the result will be any different as Mark's function work, just would hopefully be a little more efficient.

Thanks

Joe

dan1 · ‎09-28-2017

Hello,

I tried using this expression to remove duplicate values within the same cell after concatenating rows. It does not seem to be working though. How would I modify this to remove strings that are multiple words and not just single character strings?

regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

iceman · ‎02-26-2018

Hello,

I have the same issue after using the same formula. regex_replace([field],"\b(\w+),(?=.*\b\1,?)","")

But the good thing though was the comma was remove except for the duplicate values inside the cell didn't work.

From

8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD,8985 VENICE BLVD

Output I got is:

8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD 8985 VENICE BLVD

Thank you

Bbrelo1 · ‎08-29-2018

Give this a try: regex_replace([Field1],'([^,]*)(\,\1)*(\,|$)','\1\3')

It worked for what I needed and when I ran it against your example I got: 8985 VENICE BLVD

Which I think is what you are looking for.

phoebe_kelley · ‎12-24-2019

I am having trouble with this formula. It seems to remove the commas (my delimiter) in all but the first set. I need to use the regex replace formula because I have multiple fields to edit and want to use the multi-field formula tool.

I changed it from "/w+" to ".+" since I have some non-alpha-numeric characters in my data. I don't care about the order of the output, just that they are unique. I can change the delimiter from comma to something else if that works better.

this is what I'm currently using: regex_replace([Regions Submitted],"\b(.+),(?=.*\b\1,?)",""). Sample data is attached.

Here is how I would like the formula to work:

input: CEN,NE,CEN,NE,CEN,NE,RM,RM,RM,RM,RM,RM

output: CEN, NE, RM (order doesn't matter here, as long as the three unique values are present, separated by a comma or other delimiter)

input: ICE CREAM BLACK SESAME,ICE CREAM BLACK SESAME

output: ICE CREAM BLACK SESAME

input: BIOSIL® HAIR, SKIN, NAILS

output: BIOSIL® HAIR, SKIN, NAILS

input: HUMPHRY SLOCOMBE | HMSLCM | 129921,HUMPHRY SLOCOMBE | HMSLCM | 129921

output: HUMPHRY SLOCOMBE | HMSLCM | 129921

input: email.last@test.com,email.last@test.com,email2.last@test.com

output: email.last@test.com,email2.last@test.com

Alteryx Designer Desktop Discussions

Remove Duplicates within a cell?