Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Unique value using RegEx formula not working

TapasDas
6 - Meteoroid

Hi,  

 

I am new to Alteryx and trying to find unique values in a particular column separated by comma.   While I used available Regex  formula for my case, I found two unique values are merged and reporting only one value.   Attached example..   Appreciate any help on this.

10 REPLIES 10
clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

Here is a solution that might be easier to understand than regex. If you have to use regex, it might not help. I hope it's helpful though!

 

Screen Shot 2021-03-06 at 10.31.09 AM.png

TapasDas
6 - Meteoroid

Hi ,

 

Thanks for your suggestion.

 

As the actual workflow and data files are pretty big with many rows, RegEx formula solution will be more suitable.

 

Regards,

 

Tapas

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

How do you want to treat rows with multiple unique values? Do you want each unique value in a new row (like in my screenshot above), or do you want the unique values comma delimited in a single cell, or something else? This is helpful to know before building a regex solution. Thanks!

TapasDas
6 - Meteoroid

Hello clmc9601,

 

want the unique values comma delimited in a single cell.

 

Thanks for your help.

 

 

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

This regex expression (which I adapted from an online forum) worked on the sample data on my end. I hope it helps with your full data!

 

REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')

 

TapasDas
6 - Meteoroid

Hello,

 

It worked indeed! Thanks a lot for your help.

 

I wonder whether you can explain the working of the formula - particularly the adjustment logic you made. By that way I may be able to learn and can adjust in future situations.

 

Thanks a lot!

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

Sure, here's a breakdown of the regex expression:

 

REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')

 

Starting with the second argument:

Part A. ([^,]+)     <- this finds any consecutive group of characters that does not contain a comma and classifies it as "group 1".

Part B. (,[ ]*\1)+    <- this finds a comma and an optional space and a group of characters that matches the characters found in group 1. Essentially, this identifies a duplicate or many duplicates in a row.

 

Then the third argument instructs Alteryx to replace the set of duplicates with a single instance of those characters.

 

However, I realized this expression will only work when your duplicates are right next to each other. If your email addresses are in alphabetical order, this shouldn't be a problem.

 

I hope this helped!

TapasDas
6 - Meteoroid

Hi clmc9601,

 

Thanks for the explanation. Started learning the logics!

 

The solution is already helping - however, wondering whether any way to modify this to make it working even if the duplicates are not sorted alphabetically?

 

Thanks and best regards.

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

Any regex-only solutions I can think of right away involve complicated syntax lookup groups that are currently beyond the scope of my regex proficiency. It's a complicated solution in regex but a simple one in Alteryx if you have the performance bandwidth. Alteryx can handle millions and billions of rows, but I do realize some workflows are very slow already.

 

Hopefully one of the other Alteryx solutions masters will see this and jump in, or I will leave it on my to-do list for when I have some free time to explore this area of regex more. I wish I could help you more at the moment! Just posting this so I don't leave you unanswered.

Labels