Find the BEST place to visit for a great Foodcation!
The 2017 OC Alteryx Grand Prix Team—the members of which have never met in person—naturally decided we should go on a Foodcation Trip—for no ulterior reason whatsoever. We love food. We love getting out of the house. We love excuses to stay up late doing Alteryx. The idea quickly gained momentum. We couldn’t have been more excited…until…our BFF Manny invited our “friend” Vinh.
We learned early on why Vinh hates riding in cars – he gets carsick! Aside from the fact that Vinh vomits at the drop of a hat, the team doesn’t know that much about him. But I don’t trust him. He’s moody, irritable, irritating, and possibly psychotic. “So let’s invite him along on our trip, right? Great idea Manny!!!” (Manny swears it just slipped out and that Vinh really invited HIMSELF along. We’re all too scared to confront him.)
So we’re on a mission: find the best place to take a trip to maximize our foodcation fun and minimize Vinh’s short temper and tendency to mess up the car.
If you attended the actual OC 2017 Alteryx Grand Prix presentation, then you noticed that while Susan Haynie and Dan Jones were present, April Prak was suspiciously absent. She said she was going to Europe. I think she’s home in her pajamas eating veggie rolls just to avoid Vinh. We think April’s the smart one. But she’s remained integral in helping us on what has now become a mission of survival.
With our lives on the line, it’s been a challenge to appear normal, so we decided that we’d just treat this whole Foodcation saga thing like a challenge. Almost like a game. A contest. A race if you will. A race that sucks up all your free time.
That calmed us down just enough to get things done. So herein we describe our process for planning the epic survival Foodcation.
Why we should win (the 5 C’s):
Creativity: Creation of additional metrics such as variety of Farmers Markets and variety of Restaurant Cuisines. Need a little excitement? We added Night Time Lights data collected from space to show nighttime activity for the chosen city.
Computations: Lots of Predictive (Data correlation, Histograms, 4 predictive models!!!...not including some majorly failed ones we were too embarrassed to include in the blog)
Communication: Modern day interaction with our friends! Get your customized Trip Planner by Email or interact with Twitter.
Customization: You set your Foodcation priorities. We all know requirements can change and at the end of the day a user wants to choose their inputs and get something pretty and personalized just for them.
Cutting Edge: You can tweet your priorities to request a report from your Phone! And we use a Chained Application to make the city selection process dynamic and perfectly in sync with everyone’s preferences.
THE SHORT VERSION
General outline of how the OC Foodcation process works:Brief Explanations of Workflows:
Workflow 1: Data Prep Blending and Creation of Metrics brings together lots of types of locations and city level data (CBSA’s). Missing, incorrect and massive (real world!) data gave us plenty of challenges. Fortunately, Alteryx’ spatial, data parsing, cleanup and dynamic input tools got us past the data prep, and on to our Foodcation planning!
Workflow 2: Predictive Modeling is used to determine which cities are similar to the ones Manny has already rated. Pearson correlation finds the most correlated variables to Manny’s rating and those variables were then used to create a prediction of Manny’s rating for all cities. This predicted rating and variables from Workflow 1 along with custom weights from the user were then used to create a final score for each city.
Workflow 3: Based on final city chosen we created a custom report that includes a map indicating restaurant, transportation and farmers market options, night time excitement, and a table with hyperlinks to restaurants when available.
What the Foodcation Planner sees (so easy, anybody can do it – even Vinh!):
Custom Priority/”Moody Mode”:
Manny and our Team were all largely on the same page when it comes to our vacation priorities. We had a standard process predicting exactly where we should go. And then Vinh came along. Vinh said he’d accommodate whatever we want, but we’re not taking that chance. This state of panic led us to adding the Custom Priority mode. Which internally we call Vinh Moody Mode. Don’t tell Vinh that.
Stick with the Default Weights in Standard Mode, or change them to suit your interests.
After clicking Continue, the process runs and generates a report of the Top Three Cities according to the custom weights that were entered.
Make your final decision on the location of your next Foodcation. And select that destination in the 2nd app screen:
ABRACADABRA! Your Customized Tour Guide!
In case you’re wondering, the OC Team used the defaults, and on some separate Vinh-less occasion we’re going to Honolulu, Hawaii 😊
THE LONG VERSION….
Four data sets plus important add-ons: Who doesn’t need some night time excitement? And, Vinh “really” needs comfortable transportation! Poor Vinh, he gets carsick, which isn’t good for any of us ☹
Real world data is MESSY!
We used lots of Alteryx tools to parse the text, put points into their CBSA polygons, calculate the center city areas, summarize our city totals, merge and match and join and union, till we were completely “inspired”!
Once the data sets had their bath and were nice and clean, it was time to figure out what they really meant to us. In other words, we calculated new measures that told us the important things we were looking for - Easy access to restaurants and Farmers markets, lots of Vegetarian Variety, plus more!
The Night Time lights data are pretty interesting, and cover the entire Earth. We use satellite data from the National Oceaniic and Atmospheric Administration (NOAA). A new type of satellite sensor, VIIRS (Visible Infrared Imaging Radiometer Suite) has been sending beautiful higher resolution images of the DNB (Day Night Band) since 2012.
After converting the raster images to point data, we used the Alteryx spatial match tool to link the night light points to each city. With over 141 million points that fall within CBSA boundaries, Alteryx was critical in being able to produce average night time light intensity for each CBSA:
Now that the fun data prep and blending is done, let’s go on to another fun topic, Predictive Modeling. In an ideal world, Predictive Modeling would involve lots of data and known distributions, none of which we have in this case. In fact, we only have 10 lines of data, Manny’s rating for the 10 cities he visited.
Let’s not resort to hair pulling, we’ll have to apply a test and validate model approach and come up with a success criterion for how we pick this model. Now, we can contemplate success criteria and discuss AIC, BIC, and several other statistics for days but we’re going to go old fashioned and use MSE. MSE is the mean squared error and basically calculates the average of the errors (Actual – Predicted value) squared. We might want to note, due to only having 10 lines of data, overfitting could be an issue and won’t be solved by using MSE as a success criterion.
Our first step was to “investigate the data”….hopefully you see where we are going with this. We turn to our trusty data investigation tools to do a Pearson correlation with Manny’s variable as the target variable . The goal here is to figure out which variables impact Manny’s rating. We decide on the top 5 correlated variables (but not too correlated among themselves) to use in our series of predictive models, including vegetarian restaurants per capita, transportation options and Simmons dogs per household (Manny is dog lover). Whew… one major step done…quite a few to go.
Our first model attempt was k- means clustering which was a big fat failure. Without trying to go into too much detail it turns out that both the cities that Manny poorly rated and cities that Manny highly rated were being clustered into the same group by the model. After scratching our heads for a bit and changing quite a few parameters we realized the reason for this is because the poorly rated cities and the highly rated cities seemed similar to the cluster model in comparison to all the cities in general. Intuitively, you can think of it this way, while Dallas (Manny poorly rated) and NYC (Manny highly rated) might differ in number of vegetarian restaurants greatly when comparing to the whole dataset of cities like Akron, OH or Decatur, AL the differences might not seem so bad. If we had to cluster NYC with a city, I would say it’s more similar to Dallas than Akron, OH. It was a great first start and there was a little bit of hair pulling, but we moved on and stuck to our plan of test and validate.
Our Forest Model was the clear winner, with the model on average accurately predicting Manny’s rating within a range of sqrt(2.9) which is approximately 1.7. A quick sanity check of the distribution of the predicted ratings shows that most cities fall within the midrange of 5 (see histogram below). We’re okay with this, given the limited data it’s better for predictions to be on the modest side so we don’t skew the results too much. We also custom created a word cloud in R, the larger and darker the word the larger the predicted rating. You can see our model predicts that San Francisco, Los Angeles, Chicago and Honolulu are most like Manny’s favorite cities (New York, Philadelphia, Seattle).
Computing the Final Score
Now that we have the hard computations out of the way we need to bring everything back together. The goal of the analysis is to use all the variables we have (farmers markets, vegetarian restaurants, predicted Manny’s rating, transportation data) and come up with an overall Score. We need to compare apples to apples across all variables so we utilized Alteryx’s standardize z score and standardize unit interval macros to get everything to a final 0-1 range (tools can usually be found here C:\ProgramFiles\Alteryx\bin\RuntimeData\Macros\Predictive Tools\Supporting_Macros ) . The three cities with the highest final scores will be among the recommended picks for the foodcation. But wait that’s not all…
Choosing the Best City
Now is the time to decide! We look at the top three based on our standard weighting scheme, giving every factor an equal say in our final choice.
And here is our customized Trip Planner – We’re going to Honolulu, Hawaii!
Closest Vegetarian Restaurants to the center of the action
We have three workflows: one standard workflow for the data preparation, and two analytic apps to let the user interact and customize their choice of city for their next Foodcation.
An extra special app lets us receive tweets that include three weights (for variables) and hashtag #OCInspire. This “OCVacation Twitterized.yxwz” app is near-real-time magic.
Workflow 1: Grand Prix OC Foodcation Step1_Data Prep_Blending_and_Creation_of_Metrics.yxmd
You can find this app, with input and output data, on the Alteryx Gallery, once it’s made public.
Tweet your preferences! Include #OCInspire and how you rate Cuisine Variety, Restaurants per square mile, and Transportation Options. Enter 3 single digits separated by spaces. Example tweet:
#OCInspire 3 7 8 Alteryx is Awesome!
Our process uses a ‘chained app’ implementation. They’re frequently ‘the trick’ to getting something done in Alteryx that you couldn’t quite put into a single app. Chained apps are cool, you can learn more about them here. Basically, it lets the user interact as many times as needed with the app, using results from the prior apps. So in our case, the first app lets the user prioritize what is most important to them, and then sees the top three resulting cities. The second app in the chain lets the user select which of the top three they want, and then produces a customized guide about the market, a list of local restaurants, and optionally sends it via email.
We also made use of a lot of macros in our processes, including those in the toolbar, a couple found in the supporting_macros folder in the Alteryx installation, and a couple of new ones you may not have seen before. You can get the CREW macro package here. The OC Team LOVES MACROS and encourages you to build them and use them! We used all of these in our workflows/apps!
The R Predictive tools in Alteryx “R” awesome! We used a lot of them to help us find the perfect destination for our Foodcation, as we described earlier. , we used RColorBrewer, slam, tm and wordcloud. In case you don’t have them already, you can install additional R tools using the in the Alteryx Gallery here
Want to test the apps? Check them out in the public gallery!