Missed the Q4 Fall Release Product Update? Watch the on-demand webinar for more info on the latest in Designer 24.2, Auto Insights Magic Reports, and more!
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Levenshtein distance in the Addresses

svergtanya
7 - Meteor

Hi, 

I have a data set joined on the IDs. I need to compate 2 cities fields(from different sources) and create flag based on the Levenshtein distance for those fields. If the distance < 3 then flag = 1, if the distance > 3 then flag = 0

Example, "Athena" and "Athens",  Levenshtein distance = 1, so flag = 1 
'ZURIH' and 'MUNICH', Levenshtein distance = 3, so flag = 0

Could you, please, advice me, what is the best way to implement it? 

Thank you in advance! 

4 REPLIES 4
BenMoss
ACE Emeritus
ACE Emeritus

Hi @svergtanya, there is a tool in Alteryx that allows you to create the Levenshtein distance between values (the fuzzy match tool) but unfortunately there is no way of extracting the values out of the tool, instead you are given a % match score, so this maybe worth looking into.

 

However, regardless of the above, you could make use of Alteryx's ability to interact with Python to solve this problem, specifically there is a package titled "Levenshtein" that allows us to get the information you wish fairly easy.

 

The solution I have shared is based on you passing in a two column table, one column titled 'Left' and one column titled 'Right'. The script will then compare the values on a line by line basis, passing the output to anchor ''#1" coming out of the python tool.

 

You'll also need to install the "Levenshtein" package in Alteryx's python environment 

 

from ayx import Alteryx
from Levenshtein import distance as lev
import pandas as pd

data = Alteryx.read("#1")

result = []

Right = data['Right'].tolist()

Left = data['Left'].tolist()

for i,k in zip(Right, Left):
  result.append(lev(i, k))

result = pd.DataFrame({'Distance': result})

Alteryx.write(result,1)

 

BenMoss_0-1617277886170.png

 

As mentioned the example workflow is attached. It's then a case of using a formula tool to create your new flag field...

 

IF [Distance] >= 3 THEN 1 ELSE 0 ENDIF

 

Ben

PhilipMannering
16 - Nebula
16 - Nebula

You can also do it like this,

 

 

from ayx import Alteryx
from Levenshtein import distance

# Read in dataframe
df = Alteryx.read("#1")

# Apply Levenshtein distance
df['Levenshtein'] = df.apply(lambda x: distance(x['Left'], x['Right']), axis=1)
df['Flag'] = df['Levenshtein'] >= 3

# Write out dataframe
Alteryx.write(df, 1)

 

svergtanya
7 - Meteor

Hi Ben,

Thank you for such detailed response. 
I have not used Python before in Alteryx, so it took me quite some time to install the package, as had issues with SSL and python version. 

Best regards, Tetiana

simonaubert_bd
13 - Pulsar

Hello @svergtanya 
Please note there is also an idea to make it native in formula

 

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Levenshtein-and-Jaro-Winkler-Distances-for-f...

 

Best regards,

Simon

Labels
Top Solution Authors