Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

RegEx Help: Remove duplicate string from single string, no seperators

Winston
7 - Meteor

I'm try to clean up a data field so that I can compare it to another field so the data can be joined.  The issue is the original data isn't very consistent in its length or times data is repeated within it.  I have attached an excel file showing the initial data and the ultimate result needed.

 

I can use a string formula to go from the initial data to Step 1.  But I'm trying to go from Step 1 to Step 2 using RegEx but I'm failing terribly.  Never been good coming up with RegEx formulas.  I found an example online that I'm trying to modify to my needs but not having any luck as it was originally designed for comma separated values.  What I have so far is: (\b[a-zA-Z0-9]+)(?=.* *\1(?:|$))  This isn't even close as it makes matches everywhere.

 

Once I get to Step 2, I can use some more string formula manipulation to get to the ultimate result in Step 3, but if everything could be done in one step, that would be ideal.

 

Thanks for the assistance.

 

-Winston

4 REPLIES 4
jdunkerley79
ACE Emeritus
ACE Emeritus

A formula like:

REGEX_Replace([Step 1], "^(.*)\1$", "$1")

 

should do what you need.

 

It will remove duplicated strings 

Qiu
20 - Arcturus
20 - Arcturus

@jdunkerley79 

Its brilliant!

MarqueeCrew
20 - Arcturus
20 - Arcturus

Great solve @jdunkerley79 

 

cheers,

 

 mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Winston
7 - Meteor

@jdunkerley79 that works perfectly and proves when it comes to RegEx I'm out of my element.

 

Thanks

 

-Winston

Labels