Good morning,
I have figured out a Data Cleansing and Fuzzy Match flow that works semi-decently at cleaning up some hand-keyed city names. However, I need help figuring out how to get it to output the results to a new column and leave the original column completely intact.
I'm also open to suggestions to improve the current workflow, as the database we're dealing with is Intelligent Audit data from hand-keyed input. To put it lightly, it's an absolute mess. Cleansing one location in San Cristobal, DO has 120 unique results if you include the destination state and country, for example.
Origin City |
HAINA SAN CRISTOBAL |
HAINA, SAN CRISTOBAL |
HAINA,SAN CRISTOBAL |
PSA SAN CRISTOBAL |
SAN CRISTIOBAL |
SAN CRISTOBA |
SAN CRISTOBAL |
SAN CRISTOBAL DOM REP |
SAN CRISTOBAL (PROVINCIA) |
SAN CRISTOBAL, DOM REP |
Do you have a postal code that could be used to cross reference against a list of cities to ensure the same city name and construct is used?
@hellyars Unfortunately, the zip code often needs to be added or corrected as well.