Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Remove Duplicates from Cell after Concatenation

nathalie93
6 - Meteoroid

Hi all, 

 

I'm having trouble with removing duplicates from a cell after concatenation. Basically, in my Summarize tool, I regroup my data and concatenate multiple fields. But since in the concatenation, I can have the same value multiple times, I want to remove the duplicate values from the cell and only have a concatenation of unique values.

 

Examples:

EUR, EUR, USD, CHF, EUR ---> EUR, USD, CHF

01, 03, 01, 04, 01 ---> 01, 03, 04

 

Since I'm doing this on multiple fields at the same time, it's not very efficient to do it by splitting to rows and then using the unique tool. I've found a REGEX formula for this, but this only works on String values, not on numbers formatted as String, where it also removes the comma between the values (01, 01, 02 ---> 0102)

 

Trim(regex_replace([_CurrentField_],"\b(.+),(?=.*\b\1,?)",""),",")

 

Anyone have a formula or way of doing this easily?

 

Thanks in advance!

 

 

7 REPLIES 7
ggruccio
ACE Emeritus
ACE Emeritus

Hi @nathalie93,

 

One easy way may be to Transpose your data, then do Text to Columns to split to rows, then Group By, then Crosstab.

 

ggruccio_0-1587735339859.png

Assuming this is your start data:

ggruccio_1-1587735391154.png

The Transpose tool will Stack them

ggruccio_2-1587735423172.png

 

Text to Columns will allow you to split to rows:

ggruccio_3-1587735492098.png

Summarize - Group By (on both columns) will let you remove the duplicates

 

ggruccio_4-1587735564369.png

 

 

And Crosstab - concatenate will let you put them back in the original table format:

 

ggruccio_5-1587735579752.png

 

 

This should work for as many columns as you have!

 

 

I've attached a workflow for reference:

AbhilashR
15 - Aurora
15 - Aurora

Hi @nathalie93, I took your regex_replace approach and included leading and lagging whitespace to come up with a feasible solution. Give it a try on your larger dataset and let us know if this works.

AbhilashR_0-1587750289640.png

REPLACE(
    TRIM(
	REGEX_Replace([_CurrentField_], "\b(.+),(?=.*\b\1,?)", "")
    )
  ," ",", "
)

 

  

nathalie93
6 - Meteoroid

Thanks @AbhilashR for the formula, I just had to add another one to remove any duplicate commas that remained, but works great now! 

nathalie93
6 - Meteoroid

Hi @ggruccio, I'm sure this works as well, but as I have quite a large dataset and workflow, I prefer to have a shorter solution, such as the RegEx formula. But thanks for the answer!

ggruccio
ACE Emeritus
ACE Emeritus

Hi @nathalie93, no problem!  Glad @AbhilashR was able to provide a solution that fit your need!

Priyam_Goel
5 - Atom

Hi, I tried this for one of my worksheets. But it creates unnecessary commas or spaces. 

 

sjpostlethwaite
5 - Atom

Great solution, thanks so much for sharing :)

Labels