Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
cwilliams
6 - Meteoroid

I recall this day in school where the professor was drilling into our heads "select … from … where … Anyone? Anyone? Anyone seen this before?" Oh wait, that was somebody else's schooling. As we were running our database drills I leaned over to my friend and said, "WHO in their right mind would do this for a living?"

 

Turns out I am, or at least was, that person.

 

I love normalization and putting messy data into neat rows and columns. When all your data is inside a single database (or multiple on a shared server, or multiple servers that are linked) it's easier to find errant data. Even with all the limitations of SQL http://en.wikipedia.org/wiki/SQL. And by SQL, I mean the old school T-SQL standard. The new reserved words in MS are awesome and powerful but I leave that functionality to reporting tools. Please don't pivot my raw data.

 

What do you do when the data is not in a homogenous source? And you have data that can't be loaded into a RDBMS? And you hypothetically lost some block groups?

 

 

You use Alteryx.

 

In my source data I have a combination of ZIPs and block groups with their associated polygons and centroids. This data resides in .csv and .tab files. Using Alteryx's join tool I can quickly locate the block groups that are in one file and not another. Using formulas I can make the necessary computations and load the missing block groups into the other file. My block group sources are in-sync; a quick verification with another join tool checking the right and left outputs are empty.

 

Now I use SpatialMatch to look for block groups outside the ZIPs. This is what a co-worker calls a "retinally weighted" approach – I was spot checking that my block group centroids were falling inside the ZIP polygons.

 

No way would a database have supported this. Even a spatially aware one. And not at these speeds. My geography files include the entire US and Puerto Rico.

 

So, again, Alteryx has made ad-hoc data analysis easy to do.