Weekly Challenges

GeneR · ‎01-23-2017

The link to last week’s challenge (exercise #53) is HERE.

This week’s challenge is to parse out City, State and ZIP code from some unformatted input data.

The data is in a nonstandard format - it is missing commas and some city names are two words and some city name are only one word, making parsing a challenge. You need to be able to parse out the city name, state, and zip code if available.

Your goal is to create a process that will transform the data into a data table with separated columns for City, State, and ZIP.

Enjoy and as always I look forward to seeing some creative solutions.

mceleavey · ‎01-24-2017

Seeing as we are Alteryxing here, I decided to go a little further and "repair" the gaps in the data, basically because it's easy in Alteryx, so why not?

Spoiler

I began by deciding I would take the existing address components and feed them into the open Google API and returning the full results.
First, I replaced the spaces in the Address Text with + signs and appended this to the API url. I then used a formula to create the search string and gave each row an ID:

Prepare initial search string for download.PNG

Prepare initial search string for download.PNG

This string was then fed into the download tool which brought back all of the full address information. I then used the Regex tool to parse to rows each search string, then text to columns to seperate the address components:

Download and parse.PNG

It was then a simple task of cleaning up the data, writing a quick formula to take the short version of the State and removing any unwanted columns:

Cleanup and sorting.PNG

Then using Crosstab to put the data back into rows, and renaming the columns:

Re-arrange accordingly and rename.PNG

This is the output:

I've attached the workflow but you will need to input your own API key.

I began by deciding I would take the existing address components and feed them into the open Google API and returning the full results.First, I replaced the spaces in the Address Text with + signs and appended this to the API url. I then used a formula to create the search string and gave each row an ID:This string was then fed into the download tool which brought back all of the full address information. I then used the Regex tool to parse to rows each search string, then text to columns to seperate the address components:It was then a simple task of cleaning up the data, writing a quick formula to take the short version of the State and removing any unwanted columns:Then using Crosstab to put the data back into rows, and renaming the columns:This is the output:I've attached the workflow but you will need to input your own API key.

GeneR · ‎01-24-2017

Nice!

philip_ball · ‎01-27-2017

Hi,

So my license doesn't have access to the address parsing tools, so I took a more manual approach.

Spoiler

ex54 flow.JPG

Basically 3 Regex parsers 'power' it. The Zip and State are simple enough and follow an easy to parse form (ie: 2 capital letters, 5 numbers at the end of the line).

In order to split the City name from Street, I viewed the street type (ie: Drive, Circle, St, etc.) as a type of delimiter between the two. To turn it into a single delimiter I created a 'collection' of street types and Find/Replaced them with '|' symbols in the original lines. I then separated out the street names fully, and then parsed out the City name with a regex.

Basically 3 Regex parsers 'power' it. The Zip and State are simple enough and follow an easy to parse form (ie: 2 capital letters, 5 numbers at the end of the line).In order to split the City name from Street, I viewed the street type (ie: Drive, Circle, St, etc.) as a type of delimiter between the two. To turn it into a single delimiter I created a 'collection' of street types and Find/Replaced them with '|' symbols in the original lines. I then separated out the street names fully, and then parsed out the City name with a regex.

TomWelgemoed · ‎01-28-2017

Great stuff Phil for doing this without address inputs.

andrewdatakim · ‎01-30-2017

Hi Everyone,

I built my model based on receiving the information in the two formats provided (w/ and w/o zip codes), which run through a parse tool to separate out the state and zip code.

Address Parse Workflow

RegEx Address Parse

The next step was to divide the two into those with and without zip codes. The ones with zip codes can be referenced to a zip code repository and joined giving you all of the information you are looking for with the correct formatting. Those without zip codes need to be cross referenced using an Advanced Join (this tool allows you to specify multiple criteria for a cross join https://gallery.alteryx.com/?_ga=1.45648100.198713632.1460495285#!app/Advanced-Join/547f8df96ac90f0f...)

Advance Join Logic

The next set of formula tools removes the cities from the addresses and leaves the Selects and the Union for cleanup. The addition that I would make for further improvement is to add the Google API to run the address records where zip code was not given to add the zip code, at which point any zip codes that you were given that are either not in your repository or are invalid could be corrected/ stored.

The major weakness of this workflow is the initial parse which can currently only process the two options. I would want to create a repository for the most common submissions and insure that it stays case insensitive.

NicoleJohnson · ‎04-03-2017

My solution. I got a bit stuck (clearly, i need a crash course in RegEx, and don't have access to the address parse tools either), so I ended up deciding to create a text input of common street names (Street, St, Drive, Dr, etc.), which allowed me to split out the number & street after using the Find & Replace tool. From there, string formulas let me do the rest. For this to work for any address, I'd likely need to expand the common street names text input to include additional ones (Place, Close, Court, etc.) It's not elegant, and I'm not convinced yet that it would work 100% of the time... but it worked.

Spoiler

SeanAdams · ‎04-04-2017

solution attached. Bit bashful about this one 'cause it's just not pretty (in fact - went back and re-did it once I checked the solution afterwards :-))

Love Mark's idea ( @mceleavey to use the Google API to do some cleanup)

Spoiler

Did a series of formulas to replace the street
Then found state and zip in regex pairs
- parse out the zip - then replace it in the original text
- parse out the state - then replace it in the original text

as you can see in the screenshot below - felt a bit silly afterwards so I went back and redid it and included that too :-)

Did a series of formulas to replace the street Then found state and zip in regex pairs- parse out the zip - then replace it in the original text- parse out the state - then replace it in the original textas you can see in the screenshot below - felt a bit silly afterwards so I went back and redid it and included that too :-)

estherb47 · ‎05-03-2017

Had some challenges with the RegEx when I was trying to do it with one RegEx parse tool (could not get it to select from the numbers through the Street/Road/Ave/etc. Still managed to clean this with 3 tools!

Spoiler

Spoiler

First RegEx tool replaces the Street/St/Avenue/Ave/Road/Rd/Circle/Drive/Dr etc, with a +

Then used a formula with RegEx_Replace to remove all of the characters preceding and including that +

Next RegEx parse with a Parse method to grab all of the text before the two uppercase letters as the city, the 2 uppercase letters as the state, and everything afterwards as zip. Used \s outside of the marked groups to remove extra spaces from the results

SpoilerFirst RegEx tool replaces the Street/St/Avenue/Ave/Road/Rd/Circle/Drive/Dr etc, with a +Then used a formula with RegEx_Replace to remove all of the characters preceding and including that +Next RegEx parse with a Parse method to grab all of the text before the two uppercase letters as the city, the 2 uppercase letters as the state, and everything afterwards as zip. Used \s outside of the marked groups to remove extra spaces from the results

LordNeilLord · ‎06-02-2017

This was most definitely not a beginner exercise!

Spoiler

Weekly Challenges

IDEAS WANTED

Challenge #54: Data Prep Address Parsing