I think it would be incredibly helpful for Alteryx to include a "Fuzzy Join" operator, similar to what is described in this article: http://www.decisivedata.net/blog/alteryx-fuzzy-join-workflow/
Virtually every client/project I work on, there is a nead to clean up data. Most of the time, that involved standardizing to some existing list of data. However, as we all know, data from differnet systems or being manually collected will not match perfectly in all cases. This is most often when I tend to use the Fuzzy Match tool.
However, I have to use a lot of weird steps to effectively create a "Fuzzy Join", which is something I've done using database functions in the past. I think it would be great if a new tool were created that would do the following:
- Accept two inputs, one for the "raw" data and another for the "list" of data to match to.
- Perform a fuzzy join based on similar functionality to the fuzzy match, convert data to metaphone keys and then run Jaro/Levenstein matches. By default, return only the highest matching result.
- Expand the pre-process functionality to include words to exclude from the analysis (beyond just "and", "the" and "in").
- Match on the whole string. No need to try and do joins based on partial words within a string.
This seems like a very common thing (I've created a macro for this anyway) that could be made to be simpler for everyday use.
Thanks!