Hi all,
I'm having trouble with removing duplicates from a cell after concatenation. Basically, in my Summarize tool, I regroup my data and concatenate multiple fields. But since in the concatenation, I can have the same value multiple times, I want to remove the duplicate values from the cell and only have a concatenation of unique values.
Examples:
EUR, EUR, USD, CHF, EUR ---> EUR, USD, CHF
01, 03, 01, 04, 01 ---> 01, 03, 04
Since I'm doing this on multiple fields at the same time, it's not very efficient to do it by splitting to rows and then using the unique tool. I've found a REGEX formula for this, but this only works on String values, not on numbers formatted as String, where it also removes the comma between the values (01, 01, 02 ---> 0102)
Trim(regex_replace([_CurrentField_],"\b(.+),(?=.*\b\1,?)",""),",")
Anyone have a formula or way of doing this easily?
Thanks in advance!
Solved! Go to Solution.
Hi @nathalie93,
One easy way may be to Transpose your data, then do Text to Columns to split to rows, then Group By, then Crosstab.
Assuming this is your start data:
The Transpose tool will Stack them
Text to Columns will allow you to split to rows:
Summarize - Group By (on both columns) will let you remove the duplicates
And Crosstab - concatenate will let you put them back in the original table format:
This should work for as many columns as you have!
I've attached a workflow for reference:
Hi @nathalie93, I took your regex_replace approach and included leading and lagging whitespace to come up with a feasible solution. Give it a try on your larger dataset and let us know if this works.
REPLACE(
TRIM(
REGEX_Replace([_CurrentField_], "\b(.+),(?=.*\b\1,?)", "")
)
," ",", "
)
Thanks @AbhilashR for the formula, I just had to add another one to remove any duplicate commas that remained, but works great now!
Hi @ggruccio, I'm sure this works as well, but as I have quite a large dataset and workflow, I prefer to have a shorter solution, such as the RegEx formula. But thanks for the answer!
Hi @nathalie93, no problem! Glad @AbhilashR was able to provide a solution that fit your need!
Hi, I tried this for one of my worksheets. But it creates unnecessary commas or spaces.
Great solution, thanks so much for sharing :)