Unique value using RegEx formula not working
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I am new to Alteryx and trying to find unique values in a particular column separated by comma. While I used available Regex formula for my case, I found two unique values are merged and reporting only one value. Attached example.. Appreciate any help on this.
Solved! Go to Solution.
- Labels:
- Common Use Cases
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @TapasDas,
Here is a solution that might be easier to understand than regex. If you have to use regex, it might not help. I hope it's helpful though!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi ,
Thanks for your suggestion.
As the actual workflow and data files are pretty big with many rows, RegEx formula solution will be more suitable.
Regards,
Tapas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @TapasDas,
How do you want to treat rows with multiple unique values? Do you want each unique value in a new row (like in my screenshot above), or do you want the unique values comma delimited in a single cell, or something else? This is helpful to know before building a regex solution. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello clmc9601,
I want the unique values comma delimited in a single cell.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @TapasDas,
This regex expression (which I adapted from an online forum) worked on the sample data on my end. I hope it helps with your full data!
REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
It worked indeed! Thanks a lot for your help.
I wonder whether you can explain the working of the formula - particularly the adjustment logic you made. By that way I may be able to learn and can adjust in future situations.
Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @TapasDas,
Sure, here's a breakdown of the regex expression:
REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')
Starting with the second argument:
Part A. ([^,]+) <- this finds any consecutive group of characters that does not contain a comma and classifies it as "group 1".
Part B. (,[ ]*\1)+ <- this finds a comma and an optional space and a group of characters that matches the characters found in group 1. Essentially, this identifies a duplicate or many duplicates in a row.
Then the third argument instructs Alteryx to replace the set of duplicates with a single instance of those characters.
However, I realized this expression will only work when your duplicates are right next to each other. If your email addresses are in alphabetical order, this shouldn't be a problem.
I hope this helped!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi clmc9601,
Thanks for the explanation. Started learning the logics!
The solution is already helping - however, wondering whether any way to modify this to make it working even if the duplicates are not sorted alphabetically?
Thanks and best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @TapasDas,
Any regex-only solutions I can think of right away involve complicated syntax lookup groups that are currently beyond the scope of my regex proficiency. It's a complicated solution in regex but a simple one in Alteryx if you have the performance bandwidth. Alteryx can handle millions and billions of rows, but I do realize some workflows are very slow already.
Hopefully one of the other Alteryx solutions masters will see this and jump in, or I will leave it on my to-do list for when I have some free time to explore this area of regex more. I wish I could help you more at the moment! Just posting this so I don't leave you unanswered.
