Through the use of this community I was able to get the address extracted but can't get the city, state, zip pulled from this file. Any help would be greatly appreciated.
Solved! Go to Solution.
You can get the zip and state fairly easily via (.*)(\u{2})\s(\d{5})\s(\w+) in a regex parse tool (with case insensitive unchecked)... Honestly splitting the address/city joint column is beyond my skill. There are a few strategies like searching for court/drive/road/street etc. and their abbreviations but nothing that I know of that's easy. There's also a numeric strategy where you split everything to rows on space and make judgement calls about how many rows above state/zip should be allotted to each.
Depending on the region and where this is - I might load in a secondary source of zip codes/cities and match them against the zip codes that I parsed out.
Why? because then I could remove the city section from the (.*) above and hypothetically I'd be left with the address...
@apathetichell, I almost have it. I used a join on zip with a zip code dataset to get city (instead of trying to pull it out of address+city). So, all I need now is to remove the name of the city from ADD+CITY. So, how can I use the CITY field to match against the ADD+CITY field and have it removed from ADD+CITY?
There are 3 ways that I know to parse address. Hope they are helpful to you.
1. Regex Tool : Regex is a powerful tool. You can easily parse data( address) my using meta characters and quantifiers in this tool.
2. Formula Tool: Use Regex functions in formula tool.
3. Parse Address Tool: I personally never used it but I learnt about it. Try it.
Thank you
Can you try?
left([Add+City],length([Add+City]-length([City])
trimright([add+city],[city]) should do the same I'd expect.
@bpayne73 I think I have it. My approach can probably stand for a few improvements. I ran with the idea of a ZIP lookup table that @apathetichell suggested. I used a free open-source free ZIP code dataset from https://simplemaps.com/data/us-zips.
I ran into 2 challenges: 1) I had to account for records whose address starts with a PO instead of a street #, and 2) ZIPs that did not appear in the ZIP database. Its just coincidence (I hope) that both issues were linked to the same record. Anyhow, I added a simple input table to add troublesome zip+city joins.
Oh I had not seen @apathetichell suggestion for removing city from ADD+CITY. I used a crude method where I looked for a match and replaced city with country then added a formula tool to remove it.
I hope this helps.
@apathetichell I just posted my brute force attempt. It works, but I want to try your suggestion.
@hellyarsGreat job! I do think the zip lookup table makes things MUCH easier. Ideally we could use the Parse Address Tool as @Alteryx_AR suggested but $$$$.
@bpayne73 Version 2 attached. This incorporates @apathetichell 1 step method for removing city from ADD+CITY. Also, this approach assumes you do not have access to the Parse Address Tool, which requires the purchase of the CASS dataset from Alteryx.
Super thanks for this great workflow help. On my larger dataset there were more uncontained zips but I was able to add as you did with the 76147. I really appreciate the help and quick turn. Alteryx Community is fantastic for solutions.
Thanks!