I’m trying to create a workflow that will identify and delete duplicate data strings within a single cell that exceed a certain length. There are no consistent delimiters within the data, it is formatted as in the below example:
| Customer ID | Comments | 
| John Smith123 | 19Oct2023 customer ordered 45 widgets @ $20.50 per; on September 17 John spoke with employee 123ABC to get an order changed from red to blue *** September 1st the customer stopped in and spoke with Dave about adding more orders to an existing invoice, Dave said he would talk to the manager about it..... July 20 customer paid in full *** on July 7th '23 Customer called to request 20 red widgets adn 20 blue widgets, and paid $1,000 deposit up front**** 02Jun'23 updated previous order via website which was processed by employee ABCZYZ and fulfilled in 1 week *** | 
The “Comments” column is my issue. Many of the cells will have duplicated comments, so for example the string “July 20 customer paid in full *** ” could be repeated several times consecutively within the cell.
The only things I can say for sure about the data is the repeated strings will always be consecutive, never broken up by other strings. Also they will generally be 20-30 characters or more in length.
I have two goals here:
1. Identify and isolate strings beyond a certain length that are repeated within a cell, maybe by creating a new column for each duplicated string
2. Remove the duplicate strings.
Thanks for any help you can provide.
Solved! Go to Solution.
Thanks ed_hayter, the RegEx formula gave me enough to work with that I was able to bring it home with some added formulas.
Much Appreciated!
 
					
				
				
			
		

