community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Hot to keep only the longest unique values and remove substrings

Highlighted
Atom
Is there a way to keep only the longest values of a unique sequence? For example, if we have the following list:

Test
Test123
Test12345
Test67
Test689
Example

We would want to be left with only Test12345, Test689, and Example. The other ones which are substrings would be filtered out.

With a large dataset, is there an automated way to check if it is a substring against all other values in the column to decide if it should be kept or removed? I have been leaning towards using formulas, filters, and fuzzy match, but haven't figured out exactly how to do what I want.

Thanks!
Alteryx Partner

Hi @annhood like you said, fuzzy matching will probably solve your problem. I'm not very good at it, so I came up with this solution. It does involve a cartesian join, so with a large dataset, it might not be the most performant solution.

clipboard_image_0.png

clipboard_image_1.png

Labels