ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Parse Problem

bpayne73
6 - Meteoroid

Through the use of this community I was able to get the address extracted but can't get the city, state, zip pulled from this file. Any help would be greatly appreciated. 

apathetichell
13 - Pulsar

You can get the zip and state fairly easily via (.*)(\u{2})\s(\d{5})\s(\w+) in a regex parse tool (with case insensitive unchecked)... Honestly splitting the address/city joint column is beyond my skill. There are a few strategies like searching for court/drive/road/street etc. and their abbreviations but nothing that I know of that's easy. There's also a numeric strategy where you split everything to rows on space and make judgement calls about how  many rows above state/zip should be allotted to each.

 

Depending on the region and where this is - I might load in a secondary source of zip codes/cities and match them against the zip codes that I parsed out.

 

Why? because then I could remove the city section from the (.*) above and hypothetically I'd be left with the address...

 

hellyars
12 - Quasar

@apathetichell, I almost have it. I used a join on zip with a zip code dataset to get city (instead of trying to pull it out of address+city).  So, all I need now is to remove the name of the city from ADD+CITY.  So, how can I use the CITY field to match against the ADD+CITY field and have it removed from ADD+CITY?

 

 

Screen Shot 2021-05-05 at 11.20.58 AM.png

ArtiRajput
8 - Asteroid

@bpayne73

 

There are 3 ways that I know to parse address. Hope they are helpful to you. 

1. Regex Tool : Regex is a powerful tool. You can easily parse data( address) my using meta characters and quantifiers in this tool.

2. Formula Tool: Use Regex functions in formula tool.

3. Parse Address Tool: I personally never used it but I learnt about it. Try it.

 

Thank you

 

 

apathetichell
13 - Pulsar

@hellyars 

 

Can you try?

 

left([Add+City],length([Add+City]-length([City])

trimright([add+city],[city]) should do the same I'd expect.

 

 

 

 

hellyars
12 - Quasar

@bpayne73  I think I have it.  My approach can probably stand for a few improvements.  I ran with the idea of a ZIP lookup table that @apathetichell suggested.  I used a free open-source free ZIP code dataset from https://simplemaps.com/data/us-zips.

 

I ran into 2 challenges:  1) I had to account for records whose address starts with a PO instead of a street #, and 2) ZIPs that did not appear in the ZIP database.  Its just coincidence (I hope) that both issues were linked to the same record.  Anyhow, I added a simple input table to add troublesome zip+city joins.  

 

Oh I had not seen @apathetichell  suggestion for removing city from ADD+CITY.  I used a crude method where I looked for a match and replaced city with country then added a formula tool to remove it. 

 

I hope this helps. 

 

 

Screen Shot 2021-05-05 at 11.57.27 AM.png

hellyars
12 - Quasar

@apathetichell  I just posted my brute force attempt.  It works, but I want to try your suggestion.  

apathetichell
13 - Pulsar

@hellyarsGreat job! I do think the zip lookup table makes things MUCH easier. Ideally we could use the Parse Address Tool as @ArtiRajput suggested but $$$$.

hellyars
12 - Quasar

@bpayne73   Version 2 attached.  This incorporates @apathetichell 1 step method for removing city from ADD+CITY.   Also, this approach assumes you do not have access to the Parse Address Tool, which requires the purchase of the CASS dataset from Alteryx. 

 

 

Screen Shot 2021-05-05 at 12.37.35 PM.png

bpayne73
6 - Meteoroid

Super thanks for this great workflow help. On my larger dataset there were more uncontained zips but I was able to add as you did with the 76147. I really appreciate the help and quick turn. Alteryx Community is fantastic for solutions.

Thanks!

Labels