Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
shaynie
8 - Asteroid

Our mission:

Find the BEST place to visit for a great Foodcation!

(And survive)

 

The 2017 OC Alteryx Grand Prix Team—the members of which have never met in person—naturally decided we should go on a Foodcation Trip—for no ulterior reason whatsoever.  We love food. We love getting out of the house. We love excuses to stay up late doing Alteryx. The idea quickly gained momentum. We couldn’t have been more excited…until…our BFF Manny invited our “friend” Vinh.

 

OC1.png

We learned early on why Vinh hates riding in cars – he gets carsick!  Aside from the fact that Vinh vomits at the drop of a hat, the team doesn’t know that much about him. But I don’t trust him. He’s moody, irritable, irritating, and possibly psychotic. “So let’s invite him along on our trip, right? Great idea Manny!!!” (Manny swears it just slipped out and that Vinh really invited HIMSELF along. We’re all too scared to confront him.)

 

So we’re on a mission: find the best place to take a trip to maximize our foodcation fun and minimize Vinh’s short temper and tendency to mess up the car.

 

If you attended the actual OC 2017 Alteryx Grand Prix presentation, then you noticed that while Susan Haynie and Dan Jones were present, April Prak was suspiciously absent. She said she was going to Europe. I think she’s home in her pajamas eating veggie rolls just to avoid Vinh. We think April’s the smart one. But she’s remained integral in helping us on what has now become a mission of survival.

 

With our lives on the line, it’s been a challenge to appear normal, so we decided that we’d just treat this whole Foodcation saga thing like a challenge. Almost like a game. A contest. A race if you will. A race that sucks up all your free time.

 

That calmed us down just enough to get things done. So herein we describe our process for planning the epic survival Foodcation.

 

Why we should win (the 5 C’s):

  • Creativity: Creation of additional metrics such as variety of Farmers Markets and variety of Restaurant Cuisines. Need a little excitement? We added Night Time Lights data collected from space to show nighttime activity for the chosen city.
  • Computations: Lots of Predictive (Data correlation, Histograms, 4 predictive models!!!...not including some majorly failed ones we were too embarrassed to include in the blog)
  • Communication: Modern day interaction with our friends!  Get your customized Trip Planner by Email or interact with Twitter.
  • Customization: You set your Foodcation priorities. We all know requirements can change and at the end of the day a user wants to choose their inputs and get something pretty and personalized just for them.
  • Cutting Edge: You can tweet your priorities to request a report from your Phone! And we use a Chained Application to make the city selection process dynamic and perfectly in sync with everyone’s preferences.

 

THE SHORT VERSION

General outline of how the OC Foodcation process works:OC2.pngBrief Explanations of Workflows:

Workflow 1: Data Prep Blending and Creation of Metrics brings together lots of types of locations and city level data (CBSA’s).  Missing, incorrect and massive (real world!) data gave us plenty of challenges.  Fortunately, Alteryx’ spatial, data parsing, cleanup and dynamic input tools got us past the data prep, and on to our Foodcation planning!

 

Workflow 2: Predictive Modeling is used to determine which cities are similar to the ones Manny has already rated. Pearson correlation finds the most correlated variables to Manny’s rating and those variables were then used to create a prediction of  Manny’s rating for all cities. This predicted rating and variables from Workflow 1 along with custom weights from the user were then used to create a final score for each city.

 

Workflow 3: Based on final city chosen we created a custom report that includes a map indicating restaurant, transportation and farmers market options, night time excitement, and a table with hyperlinks to restaurants when available.

 

What the Foodcation Planner sees (so easy, anybody can do it – even Vinh!): 

 

OC3.png 

 

Custom Priority/”Moody Mode”:

Manny and our Team were all largely on the same page when it comes to our vacation priorities. We had a standard process predicting exactly where we should go. And then Vinh came along. Vinh said he’d accommodate whatever we want, but we’re not taking that chance. This state of panic led us to adding the Custom Priority mode. Which internally we call Vinh Moody Mode. Don’t tell Vinh that.

 

Stick with the Default Weights in Standard Mode, or change them to suit your interests.

 

OC4.png

 

 

 After clicking Continue, the process runs and generates a report of the Top Three Cities according to the custom weights that were entered. 

 

OC5.png

 

Make your final decision on the location of your next Foodcation. And select that destination in the 2nd app screen:

 

OC6alt.jpg

 

  

ABRACADABRA!  Your Customized Tour Guide!

In case you’re wondering, the OC Team used the defaults, and on some separate Vinh-less occasion we’re going to Honolulu, Hawaii 😊

 

 OC7.png

THE LONG VERSION….

Data Prep/Blending

Four data sets plus important add-ons:  Who doesn’t need some night time excitement?  And, Vinh “really” needs comfortable transportation!  Poor Vinh, he gets carsick, which isn’t good for any of us ☹

OC8a.png

Real world data is MESSY!

We used lots of Alteryx tools to parse the text, put points into their CBSA polygons, calculate the center city areas, summarize our city totals, merge and match and join and union, till we were completely “inspired”!

 

Once the data sets had their bath and were nice and clean, it was time to figure out what they really meant to us.  In other words, we calculated new measures that told us the important things we were looking for -  Easy access to restaurants and Farmers markets, lots of Vegetarian Variety, plus more!

 

OC9.png

You can find our extra data sets here and here.

 

OC10.jpg

 

The Night Time lights data are pretty interesting, and cover the entire Earth.   We use satellite data from the National Oceaniic and Atmospheric Administration (NOAA).  A new type of satellite sensor, VIIRS (Visible Infrared Imaging Radiometer Suite) has been sending beautiful higher resolution images of the DNB (Day Night Band) since 2012.

 

After converting the raster images to point data, we used the Alteryx spatial match tool to link the night light points to each city.  With over 141 million points that fall within CBSA boundaries, Alteryx was critical in being able to produce average night time light intensity for each CBSA:

 

OC11.jpg

 

Predictive Modeling

Now that the fun data prep and blending is done, let’s go on to another fun topic, Predictive Modeling. In an ideal world, Predictive Modeling would involve lots of data and known distributions, none of which we have in this case. In fact, we only have 10 lines of data, Manny’s rating for the 10 cities he visited.

 oc12.jpg

Let’s not resort to hair pulling, we’ll have to apply a test and validate model approach and come up with a success criterion for how we pick this model. Now, we can contemplate success criteria and discuss AIC, BIC, and several other statistics for days but we’re going to go old fashioned and use MSE. MSE is the mean squared error and basically calculates the average of the errors (Actual – Predicted value) squared. We might want to note, due to only having 10 lines of data, overfitting could be an issue and won’t be solved by using MSE as a success criterion.

 

OC13.png

 

Our first step was to “investigate the data”….hopefully you see where we are going with this. We turn to our trusty data investigation tools to do a Pearson correlation with Manny’s variable as the target variable . The goal here is to figure out which variables impact Manny’s rating. We decide on the top 5 correlated variables (but not too correlated among themselves) to use in our series of predictive models,  including vegetarian restaurants per capita, transportation options and Simmons dogs per household (Manny is dog lover). Whew…  one major step done…quite a few to go.

 

Our first model attempt was k- means clustering which was a big fat failure. Without trying to go into too much detail it turns out that both the cities that Manny poorly rated and cities that Manny highly rated were being clustered into the same group by the model. After scratching our heads for a bit and changing quite a few parameters we realized the reason for this is because the poorly rated cities and the highly rated cities seemed similar to the cluster model in comparison to all the cities in general. Intuitively, you can think of it this way, while Dallas (Manny poorly rated) and NYC (Manny highly rated) might differ in number of vegetarian restaurants greatly when comparing to the whole dataset of cities like Akron, OH or Decatur, AL the differences might not seem so bad. If we had to cluster NYC with a city, I would say it’s more similar to Dallas than Akron, OH. It was a great first start and there was a little bit of hair pulling, but we moved on and stuck to our plan of test and validate.

 

At this point we decided to try 3 other models: Forest Model, Linear Regression, and Gamma Regression. We used MSE as the success criterion and below are the results.

 

Model

MSE

Forest Model

2.9

Linear Regression

5.0

Gamma Regression

5.7

 

Our Forest Model was the clear winner, with the model on average accurately predicting Manny’s rating within a range of sqrt(2.9) which is approximately 1.7.  A quick sanity check of the distribution of the predicted ratings shows that most cities fall within the midrange of 5 (see histogram below). We’re okay with this, given the limited data it’s better for predictions to be on the modest side so we don’t skew the results too much. We also custom created a word cloud in R, the larger and darker the word the larger the predicted rating. You can see our model predicts that San Francisco, Los Angeles, Chicago and Honolulu are most like Manny’s favorite cities (New York, Philadelphia, Seattle).

 

OC14a.png OC14b.png

 

Computing the Final Score

Now that we have the hard computations out of the way we need to bring everything back together. The goal of the analysis is to use all the variables we have (farmers markets, vegetarian restaurants, predicted Manny’s rating, transportation data) and come up with an overall Score. We need to compare apples to apples across all variables so we utilized Alteryx’s standardize z score and standardize unit interval macros to get everything to a final 0-1 range (tools can usually be found here C:\ProgramFiles\Alteryx\bin\RuntimeData\Macros\Predictive Tools\Supporting_Macros ) . The three cities with the highest final scores will be among the recommended picks for the foodcation.   But wait that’s not all… 

 

Choosing the Best City

Now  is the time to decide!   We look at the top three based on our standard weighting scheme, giving every factor an equal say in our final choice.

 

urban honolulu.png

 

OC16.png 

 

And here is our customized Trip Planner – We’re going to Honolulu, Hawaii!

 

OC17.png

 

Closest Vegetarian Restaurants to the center of the action

 

Engineering Notes:

WORKFLOWS

OC19.png

We have three workflows:  one standard workflow for the data preparation, and two analytic apps to let the user interact and customize their choice of city for their next Foodcation.

 An extra special app lets us receive tweets that include three weights (for variables) and hashtag #OCInspire.  This “OCVacation Twitterized.yxwz” app is near-real-time magic.

 

Workflow 1: Grand Prix OC Foodcation Step1_Data Prep_Blending_and_Creation_of_Metrics.yxmd

 

OC20.png

You can find this app, with input and output data, on the Alteryx Gallery, once it’s made public.

 

Workflow 2. OCFoodcationAppStep2_PredictiveScore.yxwz

 

OC21.png

 

Workflow 3. OCFoodcationAppStep3_FinalReport.yxwz

 

OC22.png

 

Extra App:  OCFoodcation Twitterized.yxwz

Tweet your preferences!  Include #OCInspire and how you rate Cuisine Variety, Restaurants per square mile, and Transportation Options.  Enter 3 single digits separated by spaces.   Example tweet:

#OCInspire 3 7 8 Alteryx is Awesome!

 

 OC23.png

 

OC24.jpgCHAINED APPS  

Our process uses a ‘chained app’ implementation. They’re frequently ‘the trick’ to getting something done in Alteryx that you couldn’t quite put into a single app. Chained apps are cool, you can learn more about them here. Basically, it lets the user interact as many times as needed with the app, using results from the prior apps.  So in our case, the first app lets the user prioritize what is most important to them, and then sees the top three resulting cities.  The second app in the chain lets the user select which of the top three they want, and then produces a customized guide about the market, a list of local restaurants, and optionally sends it via email.

 

 

 

OC25.png

 

MACROS

We also made use of a lot of macros in our processes, including those in the toolbar, a couple found in the supporting_macros  folder in the Alteryx installation, and a couple of new ones you may not have seen before. You can get the CREW macro package here.   The OC Team LOVES MACROS and encourages you to build them and use them!  We used all of these in our workflows/apps!

 

ToolName

Count

Append_Cluster.yxmc

1

Cleanse.yxmc

4

Forest_Model.yxmc

1

Gamma_Regression.yxmc

1

Histogram.yxmc

3

Imputation_v2.yxmc

1

K-Centroids_Cluster_Analysis.yxmc

1

K-Centroids_Diagnostics.yxmc

1

Linear_Regression/Supporting_Macros/Linear_Regression.yxmc

1

Score.yxmc

3

SequenceOutputs.yxmc

1

Standardize_unit-interval.yxmc

4

Standardize_z-score.yxmc

2

Twitter Search DJ2017.yxmc

1

 

R TOOLS

The R Predictive tools in Alteryx “R” awesome!  We used a lot of them to help us find the perfect destination for our Foodcation, as we described earlier.   , we used RColorBrewer, slam, tm and wordcloud.   In case you don’t have them already, you can install additional R tools using the in the Alteryx Gallery here

 

Want to test the apps?  Check them out in the public gallery!

OC Grand Prix Data Prep

https://gallery.alteryx.com/#!app/OC-Grand-Prix-Foodcation-Data-Prep/5931c48beffc2a0b7cc53cce

 

OC Grand Prix Foodcation 2017

https://gallery.alteryx.com/#!app/OC-Grand-Prix-Foodcation-2017/5931bb7deffc2a0b7cc53926

 

Enjoy the OC Foodcation Planner and be careful who you share your plans with!

 

April Prak

Alteryx alum, statistics and predictive expert

www.linkedin.com/in/april-prak

 

Dan Jones

Alteryx alum, loves pushing the envelope of Alteryx engineering—and common decency

www.linkedin.com/in/edanjones

 

Susan Haynie

Geospatial and demographic expert, China data, and happy grandma

www.linkedin.com/in/susan-haynie-365b503 

 

OC26.jpg

Comments
MelissaD
Alteryx Alumni (Retired)

Awesome content and demonstration of Alteryx's unique capabilities and strengths!

levell_x_dunn
10 - Fireball

Great Job OC!