Alteryx Designer Desktop Discussions

annhood · ‎09-12-2019

Is there a way to keep only the longest values of a unique sequence? For example, if we have the following list:

Test
Test123
Test12345
Test67
Test689
Example

We would want to be left with only Test12345, Test689, and Example. The other ones which are substrings would be filtered out.

With a large dataset, is there an automated way to check if it is a substring against all other values in the column to decide if it should be kept or removed? I have been leaning towards using formulas, filters, and fuzzy match, but haven't figured out exactly how to do what I want.

Thanks!

OllieClarke · ‎09-13-2019

Hi @annhood like you said, fuzzy matching will probably solve your problem. I'm not very good at it, so I came up with this solution. It does involve a cartesian join, so with a large dataset, it might not be the most performant solution.

Alteryx Designer Desktop Discussions

Hot to keep only the longest unique values and remove substrings