Alert: There is a planned Community maintenance outage October 16th from approximately 10 - 11 PM PST. During this time the Alteryx Community will be inaccessible. Thank you for your understanding!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Unique value using RegEx formula not working

TapasDas
6 - Meteoroid

Hi,  

 

I am new to Alteryx and trying to find unique values in a particular column separated by comma.   While I used available Regex  formula for my case, I found two unique values are merged and reporting only one value.   Attached example..   Appreciate any help on this.

10 REPLIES 10
clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

Here is a solution that might be easier to understand than regex. If you have to use regex, it might not help. I hope it's helpful though!

 

Screen Shot 2021-03-06 at 10.31.09 AM.png

TapasDas
6 - Meteoroid

Hi ,

 

Thanks for your suggestion.

 

As the actual workflow and data files are pretty big with many rows, RegEx formula solution will be more suitable.

 

Regards,

 

Tapas

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

How do you want to treat rows with multiple unique values? Do you want each unique value in a new row (like in my screenshot above), or do you want the unique values comma delimited in a single cell, or something else? This is helpful to know before building a regex solution. Thanks!

TapasDas
6 - Meteoroid

Hello clmc9601,

 

want the unique values comma delimited in a single cell.

 

Thanks for your help.

 

 

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

This regex expression (which I adapted from an online forum) worked on the sample data on my end. I hope it helps with your full data!

 

REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')

 

TapasDas
6 - Meteoroid

Hello,

 

It worked indeed! Thanks a lot for your help.

 

I wonder whether you can explain the working of the formula - particularly the adjustment logic you made. By that way I may be able to learn and can adjust in future situations.

 

Thanks a lot!

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

Sure, here's a breakdown of the regex expression:

 

REGEX_Replace([Mail],"([^,]+)(,[ ]*\1)+",'$1')

 

Starting with the second argument:

Part A. ([^,]+)     <- this finds any consecutive group of characters that does not contain a comma and classifies it as "group 1".

Part B. (,[ ]*\1)+    <- this finds a comma and an optional space and a group of characters that matches the characters found in group 1. Essentially, this identifies a duplicate or many duplicates in a row.

 

Then the third argument instructs Alteryx to replace the set of duplicates with a single instance of those characters.

 

However, I realized this expression will only work when your duplicates are right next to each other. If your email addresses are in alphabetical order, this shouldn't be a problem.

 

I hope this helped!

TapasDas
6 - Meteoroid

Hi clmc9601,

 

Thanks for the explanation. Started learning the logics!

 

The solution is already helping - however, wondering whether any way to modify this to make it working even if the duplicates are not sorted alphabetically?

 

Thanks and best regards.

clmc9601
13 - Pulsar
13 - Pulsar

Hi @TapasDas,

 

Any regex-only solutions I can think of right away involve complicated syntax lookup groups that are currently beyond the scope of my regex proficiency. It's a complicated solution in regex but a simple one in Alteryx if you have the performance bandwidth. Alteryx can handle millions and billions of rows, but I do realize some workflows are very slow already.

 

Hopefully one of the other Alteryx solutions masters will see this and jump in, or I will leave it on my to-do list for when I have some free time to explore this area of regex more. I wish I could help you more at the moment! Just posting this so I don't leave you unanswered.

Labels