Remove Duplicates from Cell after Concatenation

Hi all,

I'm having trouble with removing duplicates from a cell after concatenation. Basically, in my Summarize tool, I regroup my data and concatenate multiple fields. But since in the concatenation, I can have the same value multiple times, I want to remove the duplicate values from the cell and only have a concatenation of unique values.

Examples:

EUR, EUR, USD, CHF, EUR ---> EUR, USD, CHF

01, 03, 01, 04, 01 ---> 01, 03, 04

Since I'm doing this on multiple fields at the same time, it's not very efficient to do it by splitting to rows and then using the unique tool. I've found a REGEX formula for this, but this only works on String values, not on numbers formatted as String, where it also removes the comma between the values (01, 01, 02 ---> 0102)

Trim(regex_replace([_CurrentField_],"\b(.+),(?=.*\b\1,?)",""),",")

Anyone have a formula or way of doing this easily?

Thanks in advance!

Accepted answers

AbhilashR

Hi @nathalie93, I took your regex_replace approach and included leading and lagging whitespace to come up with a feasible solution. Give it a try on your larger dataset and let us know if this works.

REPLACE(
    TRIM(
	REGEX_Replace([_CurrentField_], "\b(.+),(?=.*\b\1,?)", "")
    )
  ," ",", "
)

sample.yxmd

All comments

ggruccio

Hi @nathalie93,

One easy way may be to Transpose your data, then do Text to Columns to split to rows, then Group By, then Crosstab.

Assuming this is your start data:

The Transpose tool will Stack them

Text to Columns will allow you to split to rows:

Summarize - Group By (on both columns) will let you remove the duplicates

And Crosstab - concatenate will let you put them back in the original table format:

This should work for as many columns as you have!

I've attached a workflow for reference:

transpose-group by-crosstab.yxmd

AbhilashR

Hi @nathalie93, I took your regex_replace approach and included leading and lagging whitespace to come up with a feasible solution. Give it a try on your larger dataset and let us know if this works.

REPLACE(
    TRIM(
	REGEX_Replace([_CurrentField_], "\b(.+),(?=.*\b\1,?)", "")
    )
  ," ",", "
)

sample.yxmd

nathalie93

Thanks @AbhilashR for the formula, I just had to add another one to remove any duplicate commas that remained, but works great now!

nathalie93

Hi @ggruccio, I'm sure this works as well, but as I have quite a large dataset and workflow, I prefer to have a shorter solution, such as the RegEx formula. But thanks for the answer!

ggruccio

Hi @nathalie93, no problem! Glad @AbhilashR was able to provide a solution that fit your need!

Priyam_Goel

Hi, I tried this for one of my worksheets. But it creates unnecessary commas or spaces.

sjpostlethwaite

Great solution, thanks so much for sharing

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15867

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703