Free Trial

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #54: Data Prep Address Parsing

GeneR
Alteryx Alumni (Retired)

The link to last week’s challenge (exercise #53) is HERE.

 

This week’s challenge is to parse out City, State and ZIP code from some unformatted input data. 

 

The data is in a nonstandard format - it is missing commas and some city names are two words and some city name are only one word, making parsing a challenge.   You need to be able to parse out the city name, state, and zip code if available.

 

Your goal is to create a process that will transform the data into a data table with separated columns for City, State, and ZIP.

 

Enjoy and as always I look forward to seeing some creative solutions.

mceleavey
17 - Castor
17 - Castor

Seeing as we are Alteryxing here, I decided to go a little further and "repair" the gaps in the data, basically because it's easy in Alteryx, so why not?

 

Spoiler
Spoiler
I began by deciding I would take the existing address components and feed them into the open Google API and returning the full results.
First, I replaced the spaces in the Address Text with + signs  and appended this to the API url. I then used a formula to create the search string and gave each row an ID:

Prepare initial search string for download.PNG

This string was then fed into the download tool which brought back all of the full address information. I then used the Regex tool to parse to rows each search string, then text to columns to seperate the address components:

Download and parse.PNG

It was then a simple task of cleaning up the data, writing a quick formula to take the short version of the State and removing any unwanted columns:

Cleanup and sorting.PNG

Then using Crosstab to put the data back into rows, and renaming the columns:

Re-arrange accordingly and rename.PNG

This is the output:

Output.PNG


I've attached the workflow but you will need to input your own API key.


Bulien

GeneR
Alteryx Alumni (Retired)

Nice!

philip_ball
5 - Atom

Hi,

 

So my license doesn't have access to the address parsing tools, so I took a more manual approach.

 

Spoiler

ex54 flow.JPG

 

Basically 3 Regex parsers 'power' it. The Zip and State are simple enough and follow an easy to parse form (ie: 2 capital letters, 5 numbers at the end of the line).

In order to split the City name from Street, I viewed the street type (ie: Drive, Circle, St, etc.) as a type of delimiter between the two. To turn it into a single delimiter I created a 'collection' of street types and Find/Replaced them with '|' symbols in the original lines. I then separated out the street names fully, and then parsed out the City name with a regex.

 

 

 

TomWelgemoed
12 - Quasar

Great stuff Phil for doing this without address inputs.

andrewdatakim
12 - Quasar
12 - Quasar

Hi Everyone,

 

I built my model based on receiving the information in the two formats provided (w/ and w/o zip codes), which run through a parse tool to separate out the state and zip code.

Address Parse WorkflowAddress Parse Workflow

 

 

 

RegEx Address ParseRegEx Address Parse

 

The next step was to divide the two into those with and without zip codes. The ones with zip codes can be referenced to a zip code repository and joined giving you all of the information you are looking for with the correct formatting. Those without zip codes need to be cross referenced using an Advanced Join (this tool allows you to specify multiple criteria for a cross join https://gallery.alteryx.com/?_ga=1.45648100.198713632.1460495285#!app/Advanced-Join/547f8df96ac90f0f...)


Advance Join LogicAdvance Join Logic

The next set of formula tools removes the cities from the addresses and leaves the Selects and the Union for cleanup. The addition that I would make for further improvement is to add the Google API to run the address records where zip code was not given to add the zip code, at which point any zip codes that you were given that are either not in your repository or are invalid could be corrected/ stored.

 

The major weakness of this workflow is the initial parse which can currently only process the two options. I would want to create a repository for the most common submissions and insure that it stays case insensitive.

NicoleJohnson
ACE Emeritus
ACE Emeritus

My solution. I got a bit stuck (clearly, i need a crash course in RegEx, and don't have access to the address parse tools either), so I ended up deciding to create a text input of common street names (Street, St, Drive, Dr, etc.), which allowed me to split out the number & street after using the Find & Replace tool. From there, string formulas let me do the rest. For this to work for any address, I'd likely need to expand the common street names text input to include additional ones (Place, Close, Court, etc.) It's not elegant, and I'm not convinced yet that it would work 100% of the time... but it worked.

 

Spoiler
WeeklyChallenge54.JPG
SeanAdams
17 - Castor
17 - Castor

solution attached.    Bit bashful about this one 'cause it's just not pretty (in fact - went back and re-did it once I checked the solution afterwards :-))

Love Mark's idea ( @mceleavey to use the Google API to do some cleanup)

 

Spoiler
Did a series of formulas to replace the street 
Then found state and zip in regex pairs
- parse out the zip - then replace it in the original text
- parse out the state - then replace it in the original text

as you can see in the screenshot below - felt a bit silly afterwards so I went back and redid it and included that too :-)


challenge54.png

 

 

estherb47
15 - Aurora
15 - Aurora

Had some challenges with the RegEx when I was trying to do it with one RegEx parse tool (could not get it to select from the numbers through the Street/Road/Ave/etc. Still managed to clean this with 3 tools!

Spoiler
Spoiler

First RegEx tool replaces the Street/St/Avenue/Ave/Road/Rd/Circle/Drive/Dr etc, with a +

Then used a formula with RegEx_Replace to remove all of the characters preceding and including that +

Next RegEx parse with a Parse method to grab all of the text before the two uppercase letters as the city, the 2 uppercase letters as the state, and everything afterwards as zip. Used \s outside of the marked groups to remove extra spaces from the results
image.png

 

LordNeilLord
15 - Aurora

This was most definitely not a beginner exercise!

Spoiler
Weekly Challenge 54.png