Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
alexandramannerings
8 - Asteroid

Manny and I want to go on a Foodcation, but how are we going to pick the best city to visit? We have to consider Manny’s city ratings and vegetarian demands, my inability to stand in a line, and the anti-car sentiments of a last-minute tag-along to our trip. Only one analytics program can help us crunch the numbers and select our destination… Check out the Alteryx solution developed with the help of the Denver User Group Grand Prix team and some adroitly deployed macros, the Spatial Info tool, predictive grouping, heat maps, Throttle, a Weather Underground API, and more!

 

“Busy” didn’t begin to describe this week at work. As I was frantically hammering away at my keyboard, my boss leaned into my cubicle.

 

Yeeeaaaaah, if you could get those workflows finished, that’d be great.”

 

I couldn’t wait for the weekend. Buzz. A text from Manny popped up on my phone.

 

d1.png

 

That sounded like fun, and a welcome break.

 

d2.jpg

d3.png

 

Visions of a steaming Louisiana crawfish boil started making me drool. New Orleans!

 

d4.png

 

Shoot, how was I going to find a city with vegetarian restaurants?

 

d5.png

 

Of course he did. I knew less about those than vegetarian restaurants.

 

d6.png

 

How was I going to find a city that would meet all Manny’s requirements and be similar to cities he liked, and fast?

 

There was only one tool--and one user group--that could help me solve a problem like this. Quick, send up the Data Signal!

 

Superhero Denver Banner.png

 

Luckily, there was a good wind blowing into the city, and the caped analysts of the Denver Alteryx User Group arrived quickly. I explained my problem, and we got to work at once.

 

d8.png

 

We did a quick data scrape for vegetarian restaurants across the US, parsed the data, geocoded the addresses, removed the duplicates using a PeanoKey from the Spatial Info tool and a Unique tool, and then summed the resulting restaurant counts for each CBSA. (I learned that a PeanoKey merges the two-dimensional coordinates of a point to create a single identifying number. Pretty nifty!)

 

Farmers’ markets were a little more complicated, as different markets are open at different times of the year. Time for the favorite tool of Alteryx Grand Prix qualifying rounds and power users everywhere… the Generate Rows tool! This tool allowed us to interpolate rows from a starting point until a given condition is met, so it could generate the missing rows starting the “open” date of any given farmers’ market and continue adding rows for each day until it reached the “closed” date of that farmers’ market, and the start again for the next market. We summed up open days for each month across up to four distinct seasons, so we could tally up the farmers’ markets available based on the timing of our visit.

 

d9.png

 

Since the small sample size of Manny’s visited cities led to a lack of statistical significance in any sort of regression model, we instead used the Find Nearest Neighbors tool based on the handful of demographic variables Manny provided to match  Manny’s ratings for the cities he’s already visited and identify similar cities he hasn’t visited. We also included weather data in the matching, retrieved from the Weather Underground API using a Download tool in a scheduled workflow (which I posted up to gallery.alteryx.com as a downloadable workflow if anyone else has pressing weather data needs!). Find Nearest Neighbors creates a distance score and ranking for a specified number of matches from a universe of options to a target population, based on a combined score of the similarity of all the selected variables to compare.

 

d10.png

 

Then, we converted the number of farmers’ markets, vegetarian restaurants and similarity measure into a standardized score, with the best value in each category worth 1 point and all other scores as a proportional fraction of 1. Done!

 

By now, it was lunchtime and, with all this thinking about food, I was starving. But the line at the cafeteria was terrible. In a hangry rage, I decided we had to go someplace where I wouldn’t have to wait for my food. We’d need lots of restaurants per capita to alleviate the demand and the growling in my stomach. I got back to the workflow and added in two new calculations, restaurants per capita and farmers’ markets per capita, converted them into the standardized 0 to 1 score by dividing each value by the best value in the category, and then added up the scores again.

 

BuzzBuzzBuzz. Manny was calling.  I answered, “Ahoy hoy.”

 

“So, I was telling Vihn all about our foodcation weekend, and he wants to come! And you know Vihn, if I say no, there’s the whining and the crying and sulking and the passive aggressive sticky notes on the bulletin board. It’s okay if he comes, right?”

 

Sigh. “Sure, it’s fine.” But Manny wasn’t done.

 

“Vihn also mentioned that he hates riding in cars and driving, so let’s go to a city that is easy to get around in other ways, okay?”

Back to the workflow. Fortunately, the Denver Alteryx User Group had stuck around to assist a colleague of mine with a nasty bit of data wrangling, so they were available to help me collect bike share facilities from United States Department of Transportation, Bureau of Transportation Statistics and scrape www.walkscore.com for walkability, bikability and public transport scores for many major cities. The Dynamic Replace tool was key for joining the ridiculously long and complicated official CBSA names to the normal city names in the walk and bike data, by finding any instance of the city name in the CBSA name

 

We decided that while “walk score” was a nice start, we could get even more precise with the analysis by using the Heat Map, Spatial Match, and Distance tools inside a batch macro to calculate the average walking time between restaurants in a city.

 

Now we were on a real roll and decided to see what other data we could use to refine the analysis. No one likes getting food poisoning, right? So we joined in data from the CDC Foodborne Outbreak Online Database (or “FOOD Tool”, because we all know data programs are only as good as their acronyms). And did anyone else see that terrifying 60 Minutes piece last night about foodies getting brutally mugged while on exotic eating adventures?!? We decided it would be prudent to add data from Public Engines on the likelihood of becoming a crime statistic in each CBSA. We then converted these data elements to the standardized score between 0 and 1.

 

We’d made quick work of Vihn’s preferences, plus pulled in even more city attributes, and it was at this point that it dawned on me: what happens if other people want to join us and they want different things? Or what if Manny and Vihn and I get in an argument about which of our requests was the most important? We needed a way to be able to look at all the possible options in a flexible, dynamic way. Time for an analytic app!

 

To appify the workflow, we added Numeric Up Down tools that would allow Many, Vinh, me, or any other self-service user to add custom weights to each model variable. A zero would exclude the variable from the model completely, while a ten would assign that variable maximum importance.

 

d11.png

 

Why stop there?  We decided to allow the app user to use a Drop Down tool to select the month of their foodcation (plans change after all...) and use List Box tools to exclude restaurants with questionable specialities (no fast food for me please!) or exclude cities they’d prefer not to visit.

 

With the data blending and analytics squared away, we decided to add some polish to the output to help support and justify our results to Manny and Vinh. Since we might want to do something other eat all day, I added a Google Attractions Search (available for you if you want to use it!) and incorporated the top attraction photo in the report, using a macro to connect to the Google Places API to retrieve tourist attractions and another macro containing some reporting tools to create a neat presentation-ready output. Speaking of, I’m very excited for the upcoming Reporting Tool Updates!

 

d12.png

 

Pleased with our app, I quickly published it up to gallery.alteryx.com and placed it in the public gallery. I shot the link to the app over to Manny and Vinh to let them tweak my analysis and find the best city. When I ran the app with Manny’s, my and Vinh’s variables weighted highly, I got…

d13.png

Chicago! But now that I’ve got them set up with a self-service app, they can play with the app on their own and I can get back to work without them blowing up my cell phone. Hopefully by the time I’m ready to call it a night, Manny and Vinh will have settled on the perfect parameters and we can get on to booking the perfect foodcation!

Comments
tnguyen
5 - Atom

Yeah, Seattle rocks!

 

Great work Denver User group! +1

DataG8
8 - Asteroid

Awesome work team Denver! I really enjoy the use of the API's, adding something slightly dynamic to the results. Inspired! :)  

jacob_barlow
7 - Meteor
Good job Denver team!
Sluskyd
5 - Atom

Good job denver 

ethunter
5 - Atom

Well done Denver Team, you have my vote. Integrating with Google API is a fun idea.

MartinK
5 - Atom

Nice job! Well done!

Great job Denver!

akchandarana
6 - Meteoroid

I vote for Team Denver!

arnold_slabbekoorn
7 - Meteor

I vote for team Denver

GQ_SHEN
5 - Atom
Great work, Team Denver, you got my vote!
rachellryan
5 - Atom

Great job Denver!

nathan87
5 - Atom

The Denver team solution is clearly superior in every dimension. 

Hollingsworth
12 - Quasar
12 - Quasar

Your preparation, data sources, analytics, and presentation were all excellent. You get my vote.

 

As someone who has 'trickle-scraped' the Weather Underground site, I also appreciated that method a lot.

 

 Great job!