alteryx Community

Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.

Dallas User Group | Grand Prix 2017 Finalist

6 - Meteoroid

Mateo, Manny and Vinh are planning a Foodcation! Follow the conversation as they use Alteryx to combine, calculate, analyze, and predict where the ultimate Foodcation destination can be found. Then use the same data but with your own rankings and priorities to find the perfect spot for your next trip!

Mateo Agee

FOODCATION!

To: Emmanuella Gowon

Manny,

What’s up?  Gotta make this quick, but I need to use some of my vacation days ASAP, so I’ve been thinking it’s time to go on a trip! I know you are all about the eats, and I’ve been craving some new Thai food so let’s do a Foodcation! I don’t know where you would want to go, but I just want to make sure we aren’t stuck waiting in lines since there are so many places I want to try. Hope you are up for it! Let’s kick around some ideas of places to visit.

Mateo Agee

Director of Restaurant Density

10 Minute Lane

Impatience, TX

No_Waiting.org

Emmanuella Gowon

RE: FOODCATION!

To: Mateo Agee, Vinh Gould

Mateo,

YES! Let’s do it! I’m adding Vinh to the thread.  He has some interesting ideas that are sure to add variation to our preferences.

I have very simple taste that I’m sure will be easy to accommodate.  Since I’m vegetarian, I’d want to go someplace with lots of vegetarian options. And I love to go to high quality Farmer’s Markets…But I don’t like village markets like you would find back in my home village (Umahia, Nigeria).  The proof that these are different is trivial and is left as an exercise to the reader!

To make this simple, I’ve already rated some cities that I’ve been to.  Let’s go somewhere I haven’t been!  And somewhere that would rate high on my list.  It’s simple, we’ll just turn my existing list into a regression equation, score the remaining cities, and then we will have some good options of places that we could go.

Manny Gowon

Manager of Vegetarian Preferences

K Means Cluster Analysis Group

Linear Regression, TX 75243

Emmanuella_Gowon@KMeans.com

Vinh Gould

RE: FOODCATION!

To: Mateo Agee, Emmanuella Gowon

Hey everyone!

Thanks for asking me to come!  I just got back into the office from looking over some preliminary station sites.  Of course I rode transit there and back, and took the camera for some railfanningThat’s just how I roll….  J

As you both know, I hate to sit in traffic, especially in an unfamiliar city.  Since Engineering assures me that we’re years off on our miniaturized reactor technology (something about coolant?), that means I think we really need to stick with the basics like walking, biking, buses and trains.

With Alteryx in mind, I took the liberty of grabbing some datasets to assemble my top cities for where I’d most like to go. The EPA puts out an awesome dataset that generates a Walkability Index (1-20) for every block group (found here); I ought to be able to aggregate that up to the CBSA level.  And since y’all know transit is important to me, I can grab the latest transit trip data from the APTA to work with! Found it here.

Keep me posted on what timeframe y’all have in mind. I’ll need to let my guild master on Ancient Parchments Online know so I don’t get kicked out for inactivity.

Vinh Gould

Director, Right-of-Way Analysis

NUKATRANSIT

111 Tech Drive

Fort Worth, TX  76129

Vg_trainman@nukatransit.com

Mateo Agee

RE: FOODCATION!

To: Emmanuella Gowon; Vinh Gould

Hola peeps!

About to head out the door to beat the rush hour for my favorite al pastor spot tonight, but wanted to drop a note letting you guys know that I used Alteryx to pull together some Census data from the census.gov county business patterns dataset . I used this to get restaurant counts by Metro CBSA as well as the latest Metro CBSA population data and then combined them to calculate restaurants per capita. I’m thinking that this is a good proxy for expected wait times throughout the metro areas that we would be visiting. I used this per capita variable to rank my preferences and attached a dataset with which cities that I would want to go to most.

Manny, I also pulled some details on Vegetarian restaurants as well, and those are included with my dataset. So, if you want to use those for your ranking algorithms feel free. I had to do some iterative clean up on address data and latitude/ longitude values to get good geospatial matches to the CBSA polygons, but it should be ready now. The Tom-Tom address geocoder  worked great, but there was still a bunch of data with bad street addresses in the Vegetarian Restaurant file I was working with. So, I checked for CBSA assignments using Zip Code and City Name as well for the Restaurants with missing street addresses. Thanks for sharing those CBSA polygons Vinh, they were super helpful!

Maybe y’all can send over your CBSA  rankings and we can average our preferences to find our top destination.

If you want to see what I ran to get this, see the attached workflow.

Get back to me ASAP!

XOXO to all my taco lovers, excited to see you soon!

Emmanuella Gowon

RE: FOODCATION!

To: Mateo Agee, Vinh Gould

Mateo, Vinh:

Ah, Sannu my dears… I know you both think I’m making this too complicated. Going anywhere with you guys means that I have to give up 2 degrees of freedom. Check out the workflow that I developed in Alteryx. Very simple and very straightforward! I’ve attached it for you both to admire. 😉

Ca c’est tres simple!

If you want the details, I’ll go through it.  I took the Farmer’s Market data that we got from Alteryx and the Vegetarian Restaurant data. Na gode (thank you) Vinh, for your help in attaching the CBSAs to the list of Farmer’s Markets. And Mateo, that list of Vegetarian restaurants was a mess; you are such a dear for cleaning it up so that we could aggregate the number of Vegetarian restaurants at the CBSA. So, from there I was able to get the number of Farmer’s Markets and the Vegetarian Restaurants for a CBSA. More significantly, I found the density of Farmer’s Markets and Vegetarian Restaurants.

Next, I found the doppelgangers to the cities that I’ve been to because I desperately needed a larger N! So, using the nearest neighbor tool, I located my doppelganger cities based on population, ethnic mix, Farmer’s Markets, and Vegetarian restaurants.

 My Rated City Doppelganger City Asheville, NC Metro Portland-South Portland, ME Metro Atlanta-Sandy Springs-Roswell, GA Metro Detroit-Warren-Dearborn, MI Metro Boston-Cambridge-Newton, MA-NH Metro Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Carson City, NV Metro Prescott, AZ Metro Columbus, OH Metro St. Louis, MO-IL Metro Dallas-Fort Worth-Arlington, TX Metro Houston-The Woodlands-Sugar Land, TX Metro Miami-Fort Lauderdale-West Palm Beach, FL Metro Houston-The Woodlands-Sugar Land, TX Metro New York-Newark-Jersey City, NY-NJ-PA Metro Chicago-Naperville-Elgin, IL-IN-WI Metro Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Detroit-Warren-Dearborn, MI Metro Seattle-Tacoma-Bellevue, WA Metro Sacramento-Roseville-Arden-Arcade, CA Metro

Since these other cities were My Rated Cities’ doppelgangers, I decided to apply my ratings to them. Math doesn’t lie, so these have to be remarkably similar! After all that, I was able to run a regression   (Ah, a regression equation, doesn’t that warm your heart?!) .

Manny’s Regression for Highest Rated Cities

2

Basic Summary

3

Call:

lm(formula = rating ~ Farmer.s.Markets.by.population + Vegetarian.Restaurants.per.10000.Capita + Native.American.Population + Asian.Population + Black.Population + Hispanic.Population + Hawaiian.and.Pacific.Island.Poulation + Other.Race + Mixed.Race, data = inputs\$the.data)

4

Residuals:

5

 Min 1Q Median 3Q Max -0.86380 -0.28990 0.00805 0.30410 0.53530

6

Coefficients:

7

 Estimate Std. Error t value Pr(>|t|) (Intercept) -5.108e+00 1.161e+00 -4.401 0.00229 ** Farmer.s.Markets.by.population 1.367e+05 1.924e+04 7.104 1e-04 *** Vegetarian.Restaurants.per.10000.Capita 4.185e+00 5.118e-01 8.177 4e-05 *** Native.American.Population -6.615e-04 2.746e-04 -2.409 0.04256 * Asian.Population -2.025e-05 4.775e-06 -4.242 0.00283 ** Black.Population 9.758e-07 6.091e-07 1.602 0.14784 Hispanic.Population -2.158e-06 4.809e-07 -4.488 0.00203 ** Hawaiian.and.Pacific.Island.Poulation 6.288e-04 2.668e-04 2.357 0.04618 * Other.Race 2.731e-05 1.295e-05 2.108 0.06805 . Mixed.Race 9.046e-05 1.442e-05 6.275 0.00024 ***

Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

8

Residual standard error: 0.57108 on 8 degrees of freedom

Multiple R-squared: 0.9845, Adjusted R-Squared: 0.9671

F-statistic: 56.44 on 9 and 8 DF, p-value: 2.781e-06

Check it out! I have good significance -- wonderful p values for the equation!

Once I had the algorithm from the regression, I went back and scored all the CBSAs and produced my list.  It’s beautiful, isn’t it?

So, let’s be finding the good chop-chop!

Manny Gowon

Manager of Vegetarian Preferences

K Means Cluster Analysis Group

2416 Farm to Market 3.14 Road

Linear Regression, TX 75243

Emmanuella_Gowon@KMeans.com

Vinh Gould

RE: FOODCATION!

To: Mateo Agee, Emmanuella Gowon

M&M,

You know what I love about Alteryx?  You can mix and match even wildly disparate data sources without much care for what format they started in and get a totally awesome combination at the end of your process. It’s like we’re going multimodal up in here!

I gotta admit, I faced a roadblock when I first started picking apart my datasets (and thanks again Mateo for your ideas to look at fuzzy logic).  Once I dug deeper and took elements from each process along with some good-ol’ string fu, that really got the wheels turning.  We weren’t sitting in data traffic anymore!

It looks pretty complicated, but it’s really not. There was just a lot of work in fine-tuning the combination between the APTA data, which used an urbanized area (UZA), and the CBSAs that we were using as a base. Check out this little bite of elegance:

trim(substring([UZA], findstring([UZA],",")+1, 3))

That’s giving us a US State by looking for a comma in each UZA name, then taking the next three characters and trimming the results. Three characters gives us the space after the comma (later trimmed out), but also makes sure that we’re only taking one state at a time. So, those wild state-straddlers like “El Paso, TX-NM” will only give us the first state, in this case “TX” which of course we love anyway. Similar string searches got us to just one city as well.

And why do we want just one city and one state? Well, with just one city and one state, we can match it to Census place geography! But, even then there are a few that are difficult to match, like those city-counties such as Indianapolis (i.e. “Indianapolis city”). So, we’ll instead fix things by playing with the geocoder…you’d be surprised how many cities have a “100 Main” to which you can geocode!

At the end of the day, having all these UZAs distilled into single places makes it possible to aggregate the APTA Transit Trips data up to the CBSA level. We have to account for all of those smaller transit agencies that are a part of a larger area -- That’s often where they run the really cool stuff!!

Finally, I had some unexpected cleanup on my EPA walkability data. The data was apparently using a different CBSA definition than our base. Soon enough, however, everything was running on standard-gauge track and clicking right along. My top picks, drumroll please…

 CBSA Vinh Rank Walkability Rank Transit Rank Los Angeles-Long Beach-Anaheim, CA 1 3 2 San Francisco-Oakland-Hayward, CA 2 4 5 Boston-Cambridge-Newton, MA-NH 3 8 6 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 4 7 7 New York-Newark-Jersey City, NY-NJ-PA 5 17 1

I’ve got to run to a meeting. Our team at work is looking at locomotive prototypes…

VG

Mateo Agee

RE: FOODCATION!

To: Emmanuella Gowon; Vinh Gould

Hey y’all,

This is perfect! I took your combined rankings and workflows and merged them with my rankings. It looks like we are headed west to San Francisco Bay! I used each of our final rankings and averaged them and then re-ranked them using the average to get the following top five:

 Overall Ranking CBSA Title Manny Ranking Vinh Rank Mateo Rank 1 San Francisco-Oakland-Hayward, CA 4 2 13 2 Urban Honolulu, HI 1 7 40 3 Los Angeles-Long Beach-Anaheim, CA 2 1 58 4 Eugene, OR 22 31 27 5 Portland-Vancouver-Hillsboro, OR-WA 39 10 38

Based on the data, San Francisco appears to be the perfect choice! According to the datasets feeding into our analysis, San Francisco has:

• 424 Vegetarian Restaurants
• 124 Farmers Markets
• 4th most Walkable City in the US
• 5th most used Transit System in the US
• 23 restaurants of any type per capita which is above the 90th percentile

So, San Francisco should have matches with 1) Vegetarian options, similar to Manny’s previous favorite cities, 2) public transit and easily walkable options for Vinh, and 3) relatively short wait times, per my preferences! I am so excited for the crazy Cali Burritos and hopefully we can find a good Pad Thai place. But I am even more excited that we have data to back up our FOODCATION EXTRAVAGANZA!

Just FYI, I took all of our workflows and merged them into one large workflow so that we can run the whole process simultaneously if we ever need to update the source data or adjust any calculations. Take a look if you want to check my work or see the output for yourself!

Let me know ASAP if you both are in for San Francisco! I think this will be THE BEST FOODCATION EVER!!!!

Best,

Mateo

Mateo Agee

RE: FOODCATION!

To: Emmanuella Gowon; Vinh Gould

Hey, another follow-up:

After I saw what we had collectively built to solve this problem, I decided that we needed to share our data collection and Alteryx skills with the WORLD! So, I took all of our workflows and parameterized them into an Alteryx App and published it to the public Alteryx Gallery here

In the app, you can set up all of your preferences for yourself as you try to decide on the ideal Foodcation destination. Some of the features that I built in are:

• Choose your ten cities and 1-10 rankings for each city (just like Manny).
• Search for specific types of vegetarian cuisines or menu items that you may be interested in trying.
• Choose specific Farmers Market products that you want to be able to purchase on your visit.
• Set the weightings of all these different ranking metrics (Including Vinh’s Transit and Walkability Scores) separately on a 0-10 scale (for any item that you set a 0 weighting to, the Final Ranking will ignore that factor).

The default settings are equivalent to what we did for our current search. Now with the app, anyone can try different combinations of factors in order to identify their perfect destination.

Hasta luego,

Mateo

Vinh Gould

RE: FOODCATION!

To: Mateo Agee, Emmanuella Gowon

San Francisco! AWESOME!! I saw an old photo just a few months ago of the inaugural run of a famous streamliner passenger train; it left from right there at the Embarcadero!

Not to mention those wicked-cool cable cars!  YES!!

Looks like I’ll have to purchase a couple extra memory cards for the cameras…

VG

Emmanuella Gowon

RE: FOODCATION!

To: Mateo Agee, Vinh Gould

Mateo, Vinh:

San Francisco, it is! Great team-work, everyone! Buying my tickets now…

Check out this awesome Farmers Market at Ferry Plaza! Can’t wait!!!

Cheers,

Manny