Alteryx Designer Desktop Discussions

TapasDas · ‎03-06-2021

Hi,

I am new to Alteryx and trying to find unique values in a particular column separated by comma. While I used available Regex formula for my case, I found two unique values are merged and reporting only one value. Attached example.. Appreciate any help on this.

clmc9601 · ‎03-06-2021

Hi @TapasDas,

Here is a solution that might be easier to understand than regex. If you have to use regex, it might not help. I hope it's helpful though!

TapasDas · ‎03-06-2021

Hi ,

Thanks for your suggestion.

As the actual workflow and data files are pretty big with many rows, RegEx formula solution will be more suitable.

Regards,

Tapas

clmc9601 · ‎03-06-2021

Hi @TapasDas,

How do you want to treat rows with multiple unique values? Do you want each unique value in a new row (like in my screenshot above), or do you want the unique values comma delimited in a single cell, or something else? This is helpful to know before building a regex solution. Thanks!

TapasDas · ‎03-06-2021

Hello clmc9601,

I want the unique values comma delimited in a single cell.

Thanks for your help.

clmc9601 · ‎03-06-2021

Hi @TapasDas,

This regex expression (which I adapted from an online forum) worked on the sample data on my end. I hope it helps with your full data!

REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')

TapasDas · ‎03-07-2021

Hello,

It worked indeed! Thanks a lot for your help.

I wonder whether you can explain the working of the formula - particularly the adjustment logic you made. By that way I may be able to learn and can adjust in future situations.

Thanks a lot!

clmc9601 · ‎03-07-2021

Hi @TapasDas,

Sure, here's a breakdown of the regex expression:

REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')

Starting with the second argument:

Part A. ([^,]+) <- this finds any consecutive group of characters that does not contain a comma and classifies it as "group 1".

Part B. (,[ ]*\1)+ <- this finds a comma and an optional space and a group of characters that matches the characters found in group 1. Essentially, this identifies a duplicate or many duplicates in a row.

Then the third argument instructs Alteryx to replace the set of duplicates with a single instance of those characters.

However, I realized this expression will only work when your duplicates are right next to each other. If your email addresses are in alphabetical order, this shouldn't be a problem.

I hope this helped!

TapasDas · ‎03-08-2021

Hi clmc9601,

Thanks for the explanation. Started learning the logics!

The solution is already helping - however, wondering whether any way to modify this to make it working even if the duplicates are not sorted alphabetically?

Thanks and best regards.

clmc9601 · ‎03-09-2021

Hi @TapasDas,

Any regex-only solutions I can think of right away involve complicated syntax lookup groups that are currently beyond the scope of my regex proficiency. It's a complicated solution in regex but a simple one in Alteryx if you have the performance bandwidth. Alteryx can handle millions and billions of rows, but I do realize some workflows are very slow already.

Hopefully one of the other Alteryx solutions masters will see this and jump in, or I will leave it on my to-do list for when I have some free time to explore this area of regex more. I wish I could help you more at the moment! Just posting this so I don't leave you unanswered.

Alteryx Designer Desktop Discussions

Unique value using RegEx formula not working