Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Create a fuzzy join operator

I think it would be incredibly helpful for Alteryx to include a "Fuzzy Join" operator, similar to what is described in this article: http://www.decisivedata.net/blog/alteryx-fuzzy-join-workflow/

 

Virtually every client/project I work on, there is a nead to clean up data.  Most of the time, that involved standardizing to some existing list of data.  However, as we all know, data from differnet systems or being manually collected will not match perfectly in all cases.  This is most often when I tend to use the Fuzzy Match tool.

 

However, I have to use a lot of weird steps to effectively create a "Fuzzy Join", which is something I've done using database functions in the past.  I think it would be great if a new tool were created that would do the following:

  • Accept two inputs, one for the "raw" data and another for the "list" of data to match to.
  • Perform a fuzzy join based on similar functionality to the fuzzy match, convert data to metaphone keys and then run Jaro/Levenstein matches.  By default, return only the highest matching result.
  • Expand the pre-process functionality to include words to exclude from the analysis (beyond just "and", "the" and "in").  
  • Match on the whole string.  No need to try and do joins based on partial words within a string.

 

This seems like a very common thing (I've created a macro for this anyway) that could be made to be simpler for everyday use.

 

Thanks!

11 Comments
cbridges
11 - Bolide

Great idea, and thanks for the link to the blog article - hadn't seen that before, very helpful in the meantime.

Atabarezz
13 - Pulsar

definately need it...

pcatterson
11 - Bolide
I agree that this would be of great use for me. I regularly get lists, especially of names that are not consistent. Joining data in this way would be of great value in my work.
MarshallG
8 - Asteroid

Agreed. I've wanted this one multiple times. Using the fuzzy match tool as a workaround is clunky and not intuitive.

analyticsninja
5 - Atom

I think functionality very similar to what described above is found in the Microsoft Addin for Fuzzy match. It is quite easy to use as well but doesn't work well for large data set. If Fuzzy Join can be made similar to or better than the MS Addin it would be great!!

DultonM
11 - Bolide

@CSchrader, could you please share the macro you created to accomplish this task? It might help Alteryx staff understand your idea better and help other users who need similar functionality.

CSchrader
6 - Meteoroid

 @DultonM: Sure, the original link I posted has the macro that I made a few small modifications to (those modifications are unique to the data situation).  

 

I also just remembered its on the Gallery: https://gallery.alteryx.com/#!app/Fuzzy-Join/559efa7e398a7111689361d3

AlexKo
Alteryx Alumni (Retired)
Status changed to: Under Review
 
marco_zara
8 - Asteroid
This would be a great feature to have!
KylieF
Alteryx Community Team
Alteryx Community Team
Status changed to: Revisit

Thank you for your post! We are really interested in this idea, however we cannot fit this idea on the near future road map. We’ll keep this idea in mind however and update the status once we’re better able to speak on when and if we can implement it.