How to clean the city and geographical attribute | Replace + Join
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I have 2 questions
1. I'm wondering if there is any tool in Alteryx that detect the common name for the city. Maybe using AI, fuzzy matching, geocoding APIs or something? I don't know.
For example,
Correct: Los Angeles
Incorrect: Las Angelas
*Las Angelas is a typo. But, the tool will group them together to the correct one.
Second example:
Correct: Los Angeles
Incorrect: Los Angeles,CA
*I don't need to see ,CA there. Just the name of the city.
Third Example:
Correct: Edmonton
Incorrect: Edmonton Alberta
*I don't need to see Alberta there. Just the name of the city.
Currently, the Route , Route Departure, Route Arrival, and Via are to be corrected.
There are a lot of typos in my data and just wondering the quickest way to handle this and almost impossible to me to correct one by one.
2.
Context: I have 2 files: The Airlines Route Semi Clean will be the main file, which include: Route, Route Departure, Route Arrival, and Via.
There are a lot of fields that using 3 digits of airport code. For example: LAX, SFO, DEN, etc...
I have another file called "Airport Information" that contains 3 digits airport code and city located.
and I was just wondering if there is any way that we can replace those 3 digit codes with the city.
For example: LAX is replaced to Los Angeles
I'm assuming we need to do the JOIN, BUT the issue is that join will remove the city name that is already there in those 4 fields.
My current idea is: First, we need to make sure that Alteryx recognizes the 3 digits code, which will be the primary key for the joining. In my another file also has the 3 digits airport code. Then, those 3 digits code in the primary file will be replaced to the city from the secondary file. Is it possible?
I attached the excel files here. Could you please send the workflow here please? Thank you!
Solved! Go to Solution.
- Labels:
- Data Investigation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
When I saw your post on the Tableau forums this morning I almost replied and said "use Alteryx"
The challenge is always the "free format text" that you're left to deal with.
1) The Fuzzy Match Tool https://help.alteryx.com/current/en/designer/tools/join/fuzzy-match-tool.html could solve some/all of your typo's but don't expect it to remove Alberta
2) Yes, you could join (or maybe create a macro) to make this work.
Either way, I think you should be prepared for records that just don't match at all - and you will have to fix them by modifying the workflow every time an unmatched value is found.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi, I run fuzzy match tool and it's been running for 2 hours and still keep going. What did I do wrong? what's the proper setting?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Qiu Hi Qiu, do you know what's the best way to handle these cases? thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@rively90
For 1st question, it is always tricky and I dont really like "Fuzzy Match" 😂
Refer this post, and maybe we can have a "correct list" of cities to perform Fuzzy Match?
2nd question, I believe you are looking Find and Replace tool.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@Qiu Hi, thank you! What tool should you recommend using instead of fuzzy matching? It runs forever to my machine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Qiu
I noticed there are still codes in the Route and Via columns. I've updated the secondary file for the airport attached below. How to fix this? thank you!
please use 'iata_code' for the code and municipality for the city
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
data:image/s3,"s3://crabby-images/cfacb/cfacb216f7c3f145aa388f6fe78f3fdad0fb3400" alt=""