Hey all,
I'm in a scenario where I want to group a table by approximately 10-15 columns, and concatenate the values of 4 other columns. This often results in duplicates present in the concatenated columns, which I would like to get rid of. Considering the size of data, I would like to avoid using transpose/crosstab to handle it if possible. Looking through the community, I found the below REGEX_Replace formula in this thread which achieves this. However, since it relies on word boundaries, it breaks if any of the concatenated values contain special characters.
Does anyone have any insight on how to work around this?
REPLACE(
TRIM(
REGEX_Replace([_CurrentField_], "\b(.+),(?=.*\b\1,?)", "")
)
," ",", "
)
Thanks!
解決済! 解決策の投稿を見る。
Hi @Adriankp , which special characters are breaking this? I'm struggling to recreate it
It works for the duplicated values, but breaks in case you have values with special characters that are not duplicates.
In the example below, row one and two contains duplicates, but row three contains three unique values with special characters. The desired outcome would be to de-duplicate rows one and two, but keep row three as is. Using the given formula however, the output of row three becomes CC-DC-C, when it should be C-C-C , C-D-C , C-C.
Considering \b only covers letters, numbers and underscores as far as I'm aware, any other special character would result in the current formula not working properly.
What if you grouped by your values in a summarize tool before then concatenating so that the concatenation is only comprised of unique values?
(Not sure I fully grasp the context) Group by lots of columns then concatenate 4 columns. (Group by this concatenated field along with the other fields before then concatenating the now unique 4 column concats?)
May not be what you want at all but thought i'd give an out of the box idea
I've explored this too, but it unfortunately doesn't work with my data set, considering the large amount of fields I have to group by to retain the correct structure and that I have to concatenate multiple fields. I guess you could technically work around it by separating the data into four data streams, and handling each field that needs to be concatenated separately, and then joining it back together, but I would prefer a more dynamic approach.
Hi @Adriankp , give this formula a go:
TRIM(
REGEX_Replace(','+[_CurrentField_], "(?<=,)(.+),(?=.*\1)", "")
,',')
Turns out it wasn't the special character doing it. Part of the C-C-C was matching the C-C later in the string. I've replaced the \b with a negative lookbehind to prevent this
This seems to have worked wonders!
Thanks a lot for the help!