Hello! I recently used Fuzzy Match for the first time to conduct a rate variance analysis for MRO data (basically looking to see how my client pays different prices across suppliers for the same parts/tools - and what they could potentially save). Fuzzy Match worked great b/c it would catch things like '18in Magnetic Level' versus Magnetic Level - 18" - as the item description is a free text field.
Here's where I encounter my problem - I now need to transform my output into a unique ID on the source file that denotes the 'cluster' that an item corresponds to. This is because there were a few occurrences with multiple shared descriptions (i.e. building on my previous example - you might have 18in Magnetic Level, Magnetic Level - 18", (18") Mag Level). I need to be able to compare all three of these to see where we paid the highest versus lowest rate. I also believe there are a few times where things match to a second or third degree in a way (i.e. serial number 1028 matches to 2543 and 6785, but then 2543 also matches to 8931).
Wondering if anyone has been able to perform this activity in the past? My initial thought is to apply a cluster ID to the fuzzy match output (sorted based on the match score), then apply that back to my source file. From there, I'm not sure how to account for the scenarios I listed in the previous paragraph.
Welcome any input!
Rob
look at the text mining palette, specifically topic modeling if you have access to it