Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語

Want to get involved? We're always looking for ideas and content for Weekly Challenges.


Challenge #54: Data Prep Address Parsing

Alteryx Alumni (Retired)

The link to last week’s challenge (exercise #53) is HERE.


This week’s challenge is to parse out City, State and ZIP code from some unformatted input data. 


The data is in a nonstandard format - it is missing commas and some city names are two words and some city name are only one word, making parsing a challenge.   You need to be able to parse out the city name, state, and zip code if available.


Your goal is to create a process that will transform the data into a data table with separated columns for City, State, and ZIP.


Enjoy and as always I look forward to seeing some creative solutions.

17 - Castor
17 - Castor

Seeing as we are Alteryxing here, I decided to go a little further and "repair" the gaps in the data, basically because it's easy in Alteryx, so why not?


I began by deciding I would take the existing address components and feed them into the open Google API and returning the full results.
First, I replaced the spaces in the Address Text with + signs  and appended this to the API url. I then used a formula to create the search string and gave each row an ID:

Prepare initial search string for download.PNG

This string was then fed into the download tool which brought back all of the full address information. I then used the Regex tool to parse to rows each search string, then text to columns to seperate the address components:

Download and parse.PNG

It was then a simple task of cleaning up the data, writing a quick formula to take the short version of the State and removing any unwanted columns:

Cleanup and sorting.PNG

Then using Crosstab to put the data back into rows, and renaming the columns:

Re-arrange accordingly and rename.PNG

This is the output:


I've attached the workflow but you will need to input your own API key.


Alteryx Alumni (Retired)


5 - Atom



So my license doesn't have access to the address parsing tools, so I took a more manual approach.



ex54 flow.JPG


Basically 3 Regex parsers 'power' it. The Zip and State are simple enough and follow an easy to parse form (ie: 2 capital letters, 5 numbers at the end of the line).

In order to split the City name from Street, I viewed the street type (ie: Drive, Circle, St, etc.) as a type of delimiter between the two. To turn it into a single delimiter I created a 'collection' of street types and Find/Replaced them with '|' symbols in the original lines. I then separated out the street names fully, and then parsed out the City name with a regex.




12 - Quasar

Great stuff Phil for doing this without address inputs.

12 - Quasar
12 - Quasar

Hi Everyone,


I built my model based on receiving the information in the two formats provided (w/ and w/o zip codes), which run through a parse tool to separate out the state and zip code.

Address Parse WorkflowAddress Parse Workflow




RegEx Address ParseRegEx Address Parse


The next step was to divide the two into those with and without zip codes. The ones with zip codes can be referenced to a zip code repository and joined giving you all of the information you are looking for with the correct formatting. Those without zip codes need to be cross referenced using an Advanced Join (this tool allows you to specify multiple criteria for a cross join https://gallery.alteryx.com/?_ga=1.45648100.198713632.1460495285#!app/Advanced-Join/547f8df96ac90f0f...)

Advance Join LogicAdvance Join Logic

The next set of formula tools removes the cities from the addresses and leaves the Selects and the Union for cleanup. The addition that I would make for further improvement is to add the Google API to run the address records where zip code was not given to add the zip code, at which point any zip codes that you were given that are either not in your repository or are invalid could be corrected/ stored.


The major weakness of this workflow is the initial parse which can currently only process the two options. I would want to create a repository for the most common submissions and insure that it stays case insensitive.

ACE Emeritus
ACE Emeritus

My solution. I got a bit stuck (clearly, i need a crash course in RegEx, and don't have access to the address parse tools either), so I ended up deciding to create a text input of common street names (Street, St, Drive, Dr, etc.), which allowed me to split out the number & street after using the Find & Replace tool. From there, string formulas let me do the rest. For this to work for any address, I'd likely need to expand the common street names text input to include additional ones (Place, Close, Court, etc.) It's not elegant, and I'm not convinced yet that it would work 100% of the time... but it worked.


17 - Castor
17 - Castor

solution attached.    Bit bashful about this one 'cause it's just not pretty (in fact - went back and re-did it once I checked the solution afterwards :-))

Love Mark's idea ( @mceleavey to use the Google API to do some cleanup)


Did a series of formulas to replace the street 
Then found state and zip in regex pairs
- parse out the zip - then replace it in the original text
- parse out the state - then replace it in the original text

as you can see in the screenshot below - felt a bit silly afterwards so I went back and redid it and included that too :-)




15 - Aurora
15 - Aurora

Had some challenges with the RegEx when I was trying to do it with one RegEx parse tool (could not get it to select from the numbers through the Street/Road/Ave/etc. Still managed to clean this with 3 tools!


First RegEx tool replaces the Street/St/Avenue/Ave/Road/Rd/Circle/Drive/Dr etc, with a +

Then used a formula with RegEx_Replace to remove all of the characters preceding and including that +

Next RegEx parse with a Parse method to grab all of the text before the two uppercase letters as the city, the 2 uppercase letters as the state, and everything afterwards as zip. Used \s outside of the marked groups to remove extra spaces from the results


15 - Aurora

This was most definitely not a beginner exercise!

Weekly Challenge 54.png