Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Identify and remove duplicate strings in a single cell

Rob48
8 - Asteroid

I’m trying to create a workflow that will identify and delete duplicate data strings within a single cell that exceed a certain length. There are no consistent delimiters within the data, it is formatted as in the below example:

 

Customer IDComments
John Smith12319Oct2023 customer ordered 45 widgets @ $20.50 per;  on September 17 John spoke with employee 123ABC to get an order changed from red to blue  *** September 1st the customer stopped in and spoke with Dave about adding more orders to an existing invoice, Dave said he would talk to the manager about it.....   July 20 customer paid in full ***   on July 7th '23 Customer called to request 20 red widgets adn 20 blue widgets, and paid $1,000 deposit up front**** 02Jun'23 updated previous order via website which was processed by employee ABCZYZ and fulfilled in 1 week ***

 

 

 

The “Comments” column is my issue.  Many of the cells will have duplicated comments, so for example the string “July 20 customer paid in full ***   ” could be repeated several times consecutively within the cell. 

 

The only things I can say for sure about the data is the repeated strings will always be consecutive, never broken up by other strings. Also they will generally be 20-30 characters or more in length.

 

I have two goals here:

 

1. Identify and isolate strings beyond a certain length that are repeated within a cell, maybe by creating a new column for each duplicated string

 

2. Remove the duplicate strings.

 

Thanks for any help you can provide.

 

2 REPLIES 2
ed_hayter
12 - Quasar

This regex might be too heavy to work.

 

I edited your input to repeat the pattern you mentioned in your comment - July 20 Paid in Full 3 times.

 

The regex expression seemed to only return that pattern once.

 

image.png 

Rob48
8 - Asteroid

Thanks ed_hayter, the RegEx formula gave me enough to work with that I was able to bring it home with some added formulas.

 

Much Appreciated! 

Labels