ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Levenshtein distance in the Addresses

svergtanya
7 - Meteor

Hi, 

I have a data set joined on the IDs. I need to compate 2 cities fields(from different sources) and create flag based on the Levenshtein distance for those fields. If the distance < 3 then flag = 1, if the distance > 3 then flag = 0

Example, "Athena" and "Athens",  Levenshtein distance = 1, so flag = 1 
'ZURIH' and 'MUNICH', Levenshtein distance = 3, so flag = 0

Could you, please, advice me, what is the best way to implement it? 

Thank you in advance! 

BenMoss
17 - Castor
17 - Castor

Hi @svergtanya, there is a tool in Alteryx that allows you to create the Levenshtein distance between values (the fuzzy match tool) but unfortunately there is no way of extracting the values out of the tool, instead you are given a % match score, so this maybe worth looking into.

 

However, regardless of the above, you could make use of Alteryx's ability to interact with Python to solve this problem, specifically there is a package titled "Levenshtein" that allows us to get the information you wish fairly easy.

 

The solution I have shared is based on you passing in a two column table, one column titled 'Left' and one column titled 'Right'. The script will then compare the values on a line by line basis, passing the output to anchor ''#1" coming out of the python tool.

 

You'll also need to install the "Levenshtein" package in Alteryx's python environment 

 

from ayx import Alteryx
from Levenshtein import distance as lev
import pandas as pd

data = Alteryx.read("#1")

result = []

Right = data['Right'].tolist()

Left = data['Left'].tolist()

for i,k in zip(Right, Left):
  result.append(lev(i, k))

result = pd.DataFrame({'Distance': result})

Alteryx.write(result,1)

 

BenMoss_0-1617277886170.png

 

As mentioned the example workflow is attached. It's then a case of using a formula tool to create your new flag field...

 

IF [Distance] >= 3 THEN 1 ELSE 0 ENDIF

 

Ben

PhilipMannering
14 - Magnetar
14 - Magnetar

You can also do it like this,

 

 

from ayx import Alteryx
from Levenshtein import distance

# Read in dataframe
df = Alteryx.read("#1")

# Apply Levenshtein distance
df['Levenshtein'] = df.apply(lambda x: distance(x['Left'], x['Right']), axis=1)
df['Flag'] = df['Levenshtein'] >= 3

# Write out dataframe
Alteryx.write(df, 1)

 

svergtanya
7 - Meteor

Hi Ben,

Thank you for such detailed response. 
I have not used Python before in Alteryx, so it took me quite some time to install the package, as had issues with SSL and python version. 

Best regards, Tetiana

Labels