This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Currently, I've been doing some cleaning at the SQL query level with a data set we get from a data aggregator. Beyond the typical additions of spaces, and unexpected characters, names field will have many creative spellings for the same entity. I was wondering if there are recommendations beyond the Alteryx Data Cleansing tool? I can implement many of those rules in SQL. Hard rules are having their limitations given the ongoing issues with creative spellings and users are screaming for the repetitions to be re-cast as the same entities when applicable.The cleaned data has to be re-loaded to the server for Power BI reporting.
You can use the Data Cleansing tool to clean up a majority of your data quality issues.
With regard to cleaning up the spellings of names, you might want to look into the Fuzzy Matching functionality. Essentially, you can use different algorithms to identify names and terms that are closely related to each other. From there, you can use a Find and Replace tool to normalize the values. Check out the example at the bottom of the Fuzzy Match tool by clicking on the tool in the palette and clicking example.