This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 08-30-201801:32 PM - edited on 06-19-201911:28 AM by ichand
2018 Excellence Awards Entry: Fuzzy Match Check - Row by Row
Name: Kiran Kumar
Title: Quality Assurance Engineer
Overview of Use Case:
Data from a number of Healthcare data vendors are used in our analytical products. This includes data from companies like Optum, BHI(Blue Health Intelligence), and IQVIA. The primary Alteryx use case was to check the correctness of the data. We wanted to validate which data sources are more valuable in terms of accuracy and completeness of data. For this scenario, we built a workflow using the awesome Fuzzy Match tool from Alteryx. The Fuzzy Match tool workflow has helped the team and the business make important and valid decisions based on the insightful results from the Fuzzy Match workflow. Ultimately, the results from this workflow give the Product Management Team an excellent overview on data decisions to be made.
Describe the business challenge or problem you needed to solve:
To make important business decisions on the data quality received from Healthcare data vendors, we needed to implement a fuzzy match check between different data sources on Address, City, State, and Zip code. Checking these sources through fuzzy match tells a lot about the data we are expecting. The Product Management Team was keen on a standard fuzzy logic algorithm and the Alteryx Fuzzy Matching tool is a perfect solution that works perfectly to making the decisions.
Describe your working solution:
We primarily used Alteryx Designer x64(2018.2). For this particular scenario, data format for the input files was in Excel/Text Input (Sample Addresses) and/or a database table from Microsoft SQL Server Management Studio. The SQL table has addresses from both data sources. Currently, this workflow is deployed in the Alteryx Gallery and can be run by the product team.
1. Input the excel from the Alteryx workflow or the Gallery. 2. RecordID Tool providers a unique ID for each Addresses. 3. Select Tool is used next to select the desired column (the workflow tackles only address). 4. Adding a Source field using the formula tool. 5. Adding a Union tool to combine the two data sources by Auto Config by Name. 6. Super useful - Data Cleansing tool in case any leading and Trailing Whitespace is present and for converting to upper case. 7. Adding a common column with Formula Tool. 8. Setting the Fuzzy Match tool to match Threshold 0% so as to obtain all the data and set the Address and Constant threshold as per workflow to obtain all the data needed. 9. Apply the Unique Tool for RecordID and RecordID2 (tool Tip - Always apply the Unique Tool after the Fuzzy Tool to eliminate the duplicates). 10. Filter Tool next, to validate the row by row comparison. 11. Join tool next to join the columns and data sources from both sources. 12. Sort tool to sort based on the Match score of the awesome Fuzzy logic. 13. Output the Results in output excel.
This is deployed in Alteryx Gallery as well and the product team is using to all Fuzzy match Checks.
Figure 1: Text input sample between two data sources from different vendors
Figure 2: Fuzzy tool details for address
Figure 3: Sample results from workflow
Figure 4: Deployed in Alteryx Gallery for the use of the team
Describe the benefits you have achieved:
Impact of the workflow have been huge as it impacts million-dollar data driven decisions. These decisions are now based on the Fuzzy Matching data quality checks from Healthcare vendors. Fuzzy logic algorithm has been perfectly designed by Alteryx Team and results are accurate when validating if we wanted to find the data above the threshold of 75%,for instance. This workflow has not only helped the team and business make effective data decisions, it has increased customer satisfaction and revenue for the company.