Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
解決済み

Removing duplicates after concatenating column in summarize tool

Adriankp
アステロイド

Hey all,

 

I'm in a scenario where I want to group a table by approximately 10-15 columns, and concatenate the values of 4 other columns. This often results in duplicates present in the concatenated columns, which I would like to get rid of. Considering the size of data, I would like to avoid using transpose/crosstab to handle it if possible. Looking through the community, I found the below REGEX_Replace formula in this thread which achieves this. However, since it relies on word boundaries, it breaks if any of the concatenated values contain special characters.

 

Does anyone have any insight on how to work around this?

 

 

REPLACE(
    TRIM(
	REGEX_Replace([_CurrentField_], "\b(.+),(?=.*\b\1,?)", "")
    )
  ," ",", "
)

 

 

Thanks!

6件の返信6
FinnCharlton
パルサー

Hi @Adriankp , which special characters are breaking this? I'm struggling to recreate it

Adriankp
アステロイド

It works for the duplicated values, but breaks in case you have values with special characters that are not duplicates.

 

 

In the example below, row one and two contains duplicates, but row three contains three unique values with special characters. The desired outcome would be to de-duplicate rows one and two, but keep row three as is. Using the given formula however, the output of row three becomes CC-DC-C, when it should be C-C-C , C-D-C , C-C.

 

Considering \b only covers letters, numbers and underscores as far as I'm aware, any other special character would result in the current formula not working properly.

 

Skärmbild part 1.jpg

Skärmbild test 2.jpg

ed_hayter
クエーサー

What if you grouped by your values in a summarize tool before then concatenating so that the concatenation is only comprised of unique values?

 

image.png

 

(Not sure I fully grasp the context) Group by lots of columns then concatenate 4 columns. (Group by this concatenated field along with the other fields before then concatenating the now unique 4 column concats?)

 

May not be what you want at all but thought i'd give an out of the box idea

 

 

Adriankp
アステロイド

I've explored this too, but it unfortunately doesn't work with my data set, considering the large amount of fields I have to group by to retain the correct structure and that I have to concatenate multiple fields. I guess you could technically work around it by separating the data into four data streams, and handling each field that needs to be concatenated separately, and then joining it back together, but I would prefer a more dynamic approach.

FinnCharlton
パルサー

Hi @Adriankp , give this formula a go:

 

TRIM(

REGEX_Replace(','+[_CurrentField_], "(?<=,)(.+),(?=.*\1)", "")

,',')

 

Turns out it wasn't the special character doing it. Part of the C-C-C was matching the C-C later in the string. I've replaced the \b with a negative lookbehind to prevent this

 

image.png

Adriankp
アステロイド

This seems to have worked wonders!

 

Thanks a lot for the help!

ラベル