Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Data cleansing matching similar or misspelled data

Joel_Mills
6 - Meteoroid

Dearest Alteryx Community... can you help me please?

I have a table of data that in certain columns has a variety of data misspellings that I need consolidating into a single agreed spelling for all matches.
For example

Business School

School of business

Busness

Business Schoool

Business Faculty

Business department

Business Enterprise

Business clients

 

There are many matches that need simplifying to just 'Business School'. To make matters harder, there are some departments like Business Enterprise and Business clients that are not part of the business school and so can be left alone.

 

Any idea how to go about this? I am thinking something along the lines of fuzzy logic, but as a very n00b to alteryx I am unsure of the best way to proceed. 

 

I also need to  repeat this with several other departments that have been 'interestingly input' into the data and need cleansing.

TIA

3 REPLIES 3
Yoshiro_Fujimori
15 - Aurora

Hi @Joel_Mills ,

 

I suppose your case is related to Named Entity Recognition.

Alteryx has Named Entity Recognition tool as a part of Alteryx Intelligence Suite .

 

Unfortunately I do not have the license and so cannot support you further.

You may check this page 

https://help.alteryx.com/20223/designer/named-entity-recognition

and consider purchasing the add-on license if you think it fits with your case.

 

Good luck.

 

martinding
13 - Pulsar

Hi @Joel_Mills,

 

I have done a similar project before and what we used was a combination of fuzzy matching and look up dictionary.

 

The hard part, like you suggested is that some semantically similar names such as "Business Enterprise" are very similar to Business School, and no fuzzy match alone can tell them apart.

 

So my suggestion is to filter out the cases where you know that don't belong to your group (in this case business school), and then fuzzy match on the remaining.

 

martinding_0-1680136804998.png

 

Joel_Mills
6 - Meteoroid

I fixed this in my own way by using find and replace based on a list of cleansed department data that I created and manually edited. Where department was the original in one column, I then manually matched all similar with a standard, agreed new value in a 'clean' column. Then it was easy to use find and replace on that file once I brought it in to Alteryx.

Labels