Free Trial

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
robertlenius
7 - Meteor

Working in finance, I’m somewhat bummed that I don’t often get to explore the possibilities within location-based data. That’s why I was thrilled when a project fell into my lap to visually analyze taxes paid across German villages. However, this excitement was short-lived after seeing the dataset I would be working with. See, the problem is spatial tools are easy to work with if you have data points such as zip codes, states, or countries, but trying to locate German villages within these tools simply doesn’t work. For that, we would need the latitude and longitude of the villages themselves, and this, unfortunately, didn’t exist in my data set.

 

Luckily for me, I stumbled across the organization GeoNames which provides an online database that “…covers all countries and contains over eleven million placements that are available for download free of charge.” Better yet, they offer a free API from which you can query their database. So, using Alteryx, I was able to build a macro that would work with this API to do location-based searches and return exact latitudes and longitudes. If you’re interested in using this macro, you can find it available here in Alteryx Gallery. Through this tool, I was able to enrich my dataset with the latitude and longitude for these German Cities, making it possible to provide visual analysis.


The setup is straight forward; I first use a unique tool to grab only the unique villages from my dataset and then run this list through the macro. I then bring everything back together and output it to a file for Tableau to pick up.

 

robertlenius_0-1578354378553.png

 

Now that the data has been successfully enriched, a visualization can be generated. My example set had 102 villages in it; I’d hate to imagine looking up the latitude and longitude of each one and manually keying them in!

 

robertlenius_1-1578354378557.png


Here’s the visualized portion of that data set, but we have a problem. Immediately I can see that there are seven null values—meaning that these locations were not found within the GeoNames database. What’s more interesting though, is I have datapoints showing in the United States and Africa even though my data set was intended for only Germany. What gives? Well, the problem is multiple places around the world have the same name. For example, when booking a trip to Dublin, you’d probably want to go to Dublin, Ireland, not Dublin, Ohio. We can correct this in our data set by making some modifications to the Alteryx Workflow.

robertlenius_2-1578354378561.png

Above is the modified workflow. Essentially the results are now sent to an Excel file where the locations can be manually adjusted if needed. I’ve also made a join on this file as well so that only new villages are run through the GeoNames macro. This also speeds up the workflow as web calls like this can take a while compared to normal data processing.

 

robertlenius_3-1578354378568.png

 

And here’s the result! With my fully enriched and cleansed data, I can now visually analyze the tax data. And better yet, I have a scalable process that can easily be maintained and run again as more data arises.

 

The Nerd Stuff

 

If you’re interested in how the macro works under the hood, this section’s for you!

 

robertlenius_4-1578354378571.png


Here’s what the GeoNames macro looks like on the back end. At the top, you can see the basic interface tools being used to give the user options when working with it. Below you can see that while all the data is brought in, it’s limited to 15 records per minute. This is both out of respect for GeoNames as we don’t want to bombard their service with requests and because 15 records per minute is what each user is limited to. Next, a Formula tool is used to build a unique URL for each location, which is how the API is accessed. This URL is then fed to a Download tool that sends the API call to GeoNames and returns the results. Lastly, an XML Parse tool is used to parse the returned data into its relevant columns.

Comments