Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parse Problem

bpayne73
6 - Meteoroid

Through the use of this community I was able to get the address extracted but can't get the city, state, zip pulled from this file. Any help would be greatly appreciated. 

9 REPLIES 9
apathetichell
18 - Pollux

You can get the zip and state fairly easily via (.*)(\u{2})\s(\d{5})\s(\w+) in a regex parse tool (with case insensitive unchecked)... Honestly splitting the address/city joint column is beyond my skill. There are a few strategies like searching for court/drive/road/street etc. and their abbreviations but nothing that I know of that's easy. There's also a numeric strategy where you split everything to rows on space and make judgement calls about how  many rows above state/zip should be allotted to each.

 

Depending on the region and where this is - I might load in a secondary source of zip codes/cities and match them against the zip codes that I parsed out.

 

Why? because then I could remove the city section from the (.*) above and hypothetically I'd be left with the address...

 

hellyars
13 - Pulsar

@apathetichell, I almost have it. I used a join on zip with a zip code dataset to get city (instead of trying to pull it out of address+city).  So, all I need now is to remove the name of the city from ADD+CITY.  So, how can I use the CITY field to match against the ADD+CITY field and have it removed from ADD+CITY?

 

 

Screen Shot 2021-05-05 at 11.20.58 AM.png

Alteryx_AR
12 - Quasar

@bpayne73

 

There are 3 ways that I know to parse address. Hope they are helpful to you. 

1. Regex Tool : Regex is a powerful tool. You can easily parse data( address) my using meta characters and quantifiers in this tool.

2. Formula Tool: Use Regex functions in formula tool.

3. Parse Address Tool: I personally never used it but I learnt about it. Try it.

 

Thank you

 

 

apathetichell
18 - Pollux

@hellyars 

 

Can you try?

 

left([Add+City],length([Add+City]-length([City])

trimright([add+city],[city]) should do the same I'd expect.

 

 

 

 

hellyars
13 - Pulsar

@bpayne73  I think I have it.  My approach can probably stand for a few improvements.  I ran with the idea of a ZIP lookup table that @apathetichell suggested.  I used a free open-source free ZIP code dataset from https://simplemaps.com/data/us-zips.

 

I ran into 2 challenges:  1) I had to account for records whose address starts with a PO instead of a street #, and 2) ZIPs that did not appear in the ZIP database.  Its just coincidence (I hope) that both issues were linked to the same record.  Anyhow, I added a simple input table to add troublesome zip+city joins.  

 

Oh I had not seen @apathetichell  suggestion for removing city from ADD+CITY.  I used a crude method where I looked for a match and replaced city with country then added a formula tool to remove it. 

 

I hope this helps. 

 

 

Screen Shot 2021-05-05 at 11.57.27 AM.png

hellyars
13 - Pulsar

@apathetichell  I just posted my brute force attempt.  It works, but I want to try your suggestion.  

apathetichell
18 - Pollux

@hellyarsGreat job! I do think the zip lookup table makes things MUCH easier. Ideally we could use the Parse Address Tool as @Alteryx_AR suggested but $$$$.

hellyars
13 - Pulsar

@bpayne73   Version 2 attached.  This incorporates @apathetichell 1 step method for removing city from ADD+CITY.   Also, this approach assumes you do not have access to the Parse Address Tool, which requires the purchase of the CASS dataset from Alteryx. 

 

 

Screen Shot 2021-05-05 at 12.37.35 PM.png

bpayne73
6 - Meteoroid

Super thanks for this great workflow help. On my larger dataset there were more uncontained zips but I was able to add as you did with the 76147. I really appreciate the help and quick turn. Alteryx Community is fantastic for solutions.

Thanks!

Labels