Data Science

Machine learning & data science for beginners and experts alike.
Register for the upcoming Live Community Q&A Session - and don't forget to submit your questions for @DeanS regarding the future role of analytics here.

Happy Halloween!  


To kick off our Halloween edition of Engine Works, here's a few gifs of kids getting scared.  Because we all know that traumatizing your neighbor's kids is the true spirit of Halloween.    






Now, on to the learning.  We're professionals, after all.  




What's In This Blog Post


  1. Background: A Programmer's View of Alteryx (or how I learned to stop worrying and love the GUI) 

  2. Alteryx As An API Platform 

  3. What Our App Will Do (It involves candy, because Halloween)

  4. Our App's Architecture (because bosses need Powerpoint)

  5. Our App's Key Components

    1. Demographic Clustering API Built With Alteryx

    2. Python Library for Accessing API

    3. Jupyter Notebook For Calling API and Visualizing Data


 Background: A Programmer's View of Alteryx


The first time I saw Alteryx, I was knee deep in a weeks long ETL project for a customer whose products take up large portions of aisles in every grocery store in America.  Sometime midway through the project, I received a phone call from my twin brother Josh.  He was calling me from a conference in Las Vegas to complain about the ultimate first world problem.   Here's how the conversation went:




Me: "Are you just trying to rub it in?  I'm in the middle of this insane ETL project. " (Insert long strings of whining here.)

Josh: "You need to check out this company called Alteryx.  They have this tool that will let you do all that waaaay faster.  I was using their tool today at the conference."  

Me:  "Googling them now.  Looks like it's a drag and drop GUI tool.  GUI tools are bad.  No way."  


My brother continued to pester me about Alteryx, and eventually I gave the tool a try.  I was wrong about Alteryx. (As a married man, I've gotten much better about the whole 'I was wrong' thing.)  It hit me that if many of my customers had Alteryx, they wouldn't need me.  My view in life is if there's a company making a product that can replace you, you should probably try to go work for them.  So here I am.  What I didn't know until I started working at Alteryx was that there's a whole level of functionality above what most users are aware of.  Typically, we build tools to get things done for our own needs, and maybe we share them with our colleagues to use on their own machines to get their own things done. This is great, and goes a long way towards furthering day-to-day efficiency.  


But the next level is within any Alteryx user's reach.  That next level is....  


Alteryx As An API Platform


Did you know that you can use Alteryx to build APIs that can be used by traditional programmers to connect them to other software?  Perhaps you're wondering what an API is? This article is a good run-down from a few years ago.  API stands for Application Programming Interface, which is a deep topic that I won't get into here.  Just think of it as an alternative to a visual user interface, that is designed for other software to use, rather than humans.  For the purposes of this article, we're going to talk specifically about what's known as a RESTful API. This article is one of the best explanations of REST I've seen. If you don't want to read the article, just know that a RESTful API allows a piece of software to be used by other pieces of software with the same type of internet protocol (HTTP) as what is used by your web browser.   


So why should you care?  Well here's why:  Alteryx lets you turn any workflow you build into a RESTful API, with no coding required.  


Ok, so maybe you're still wondering why you should care.




Here's why:  Anyone in your organization can build Alteryx applications which can be integrated with other software platforms with minimal effort.  You've probably heard this before from pretty much every enterprise software company you've dealt with.  You know the ones I'm talking about.  They have armies of lawyers, so for now we'll just call them FortuneTeller and SUC.  Unlike these two companies, you don't have to hire consultants who drive Ferraris to come in and charge you $1000/hour to integrate our software.  The whole point of RESTful APIs is they are both powerful and simple.




If you think like the skeptical young man above, I don't blame you.  But seeing is believing.


Today we're going to walk through an Alteryx-backed application which, in a few steps, allows us to analyze candy sales data from 2015.  This is dummy data, but it was created by modelling real candy sales data and producing a dataset that had similar trends.  This application has a back-end 100% powered by Alteryx, but on the front-end will leverage open-source, non-Alteryx software to perform some visualizations.  While Alteryx has some very cool existing and soon-to-be-released visualization capabilities, using these tools wouldn't prove my point that Alteryx can integrate with any other software.   Now on we go to the geekery.  


What The App Will Do


Candy sales (yep, here's my Halloween tie-in) are higher around Halloween.  You don't need a data scientist to tell you this.  The fear of eggs being thrown at your house tends to motivate a person to buy candy.  Not unlike Valentine's Day, Halloween is a holiday that is driven by the consequences of NOT buying things.  


But our app is going to help us with some harder questions.  Do areas with higher numbers of kids under the age of 14 have higher candy sales?  What about areas with larger numbers of senior citizens?  My completely anecdotal experience as a trick-or-treater in the mid to late 80's and early 90's was that folks who were grandparents always seemed to have tons of candy to give out.  We can also assume that the popularity of various products may change depending on the demographics of an area.  Retailers such as Wal-Mart have long known that demographics are the key factor for predicting demand at a given store location. We aren't going to answer all of these questions in this post, but instead will create an app that shows an example of how Alteryx CAN be used to do this, and simultaneously expose this capability as an integration point for other software.


High Level Architecture


Below is a diagram of the application's high-level structure:




Let's start with the most familiar parts from the diagram above, and move on to the more obscure components in that order.  


Key Components


Demographic Clustering API Built With Alteryx


Below is the application which we're exposing as a RESTful API using Alteryx Server.  We do this by creating the application and then saving it to the Gallery on an Alteryx Server instance.  See the links in the preceding sentence for a how to on building and deploying an application to Alteryx Server.  It's a simple and straightforward process, but walking through the steps is beyond the scope of this article.  



The above application (notice how it's essentially no different than a workflow), once deployed to the Alteryx Server Gallery, looks like this:




When we click the "Run" button, this screen will appear:




Upon making our selections and hitting the "Run" button again, the process will kick off and return our sales data, with the calculated clusters and demographic variables for each store also joined to each sales row.  Keep in mind that although the cluster was calculated using only the selected demographic variables, all available demographic fields are being returned with the data.  This was a design decision to ensure that users can observe variables they didn't choose to cluster on.  Below is a screenshot of the results:




 Ok, so this is all great, but keep in mind I vowed earlier in the article to show off an API that allows this application to be integrated with other software.  So let's move on to the next part of our stack.


Python Library for Accessing API


Python is a high-level, dynamic programming language that, for various reasons, is particularly popular in the scientific, start-up, and data science communities.  A big contributor to its popularity is that it is heavily used at Google.  (Just a heads up: A product manager here gave me permission to let you know that Alteryx has Python integration on our roadmap for next year. Woot!)


It's not nearly as fast as C++ or other lower level languages, but it makes up for that by being decently fast and allowing developers to build new tools orders of magnitude faster than lower level languages.  A big part of that is the plethora of libraries which are built on top of other libraries, contributing to a great network effect.  


I've now added to this paradigm by building a little library that we'll be using to conveniently connect to, execute, and retrieve data from our new Alteryx powered API.  


The library (I call it PyRyx) has been put up on Github with a friendly MIT license.  Feel free to download it, make it better, and/or use it in your own tools.  


It's located here:




Next, let's move on to the tool we'll be using to import our library and use it to interact wtih our API, retrieve the data, and visualize it.   



Jupyter Notebook For Calling API and Visualizing Data


Jupyter is, quite simply, a groundbreaking open source project that evolved from a previous project called Ipython.  Jupyter is a programming environment that allows users to create "Notebooks", which are collections of interactive pieces of code that can be mixed in with other segments of writing to create things like interactive textbooks, scientific studies with data and executable code embedded, nifty tutorials, and even business reports.  It's not a tool that is approachable for non-coders like Alteryx is, but it certainly meets the needs of advanced users for certain use-cases.   


A tutorial on using Jupyter is beyond the scope of this blog post, but there are thousands of tutorials available on the web.  A good start is here.


Now, let's move on to the Jupyter walkthrough.  


First, we import some Python libraries we'll need for connecting to our API, as well as visualizing and manipulating the data our API produces.





Next, we'll store our Alteryx Gallery API credentials in some convenient variables.  (Note: Enabling your own server's API endpoints is straightforward, but will require administrative permissions.  See the Alteryx server documentation for details on enabling the API and obtaining the credentials.)  

I've got Alteryx Server installed on my personal computer, so my gallery url is pointing at 'localhost' address.  Your url will be the same address as your current Gallery installation at your company/organization.  You can also leverage the official Alteryx public gallery website if you'd like, but you'll need to obtain permission through your sales rep.  




Now that we've stored our credentials and url in convenient variables, we'll simply create an Alteryx API connection instance and store it in the "ayx" variable.  We'll use this "ayx" instance for the rest of our walkthrough.  




 Here we're calling the PyRyx library's convenience function "getSubscriptionWorkflows", which returns all of the analytic applications which are available on the connected Alteryx Server.  Note the "id" value for the third application in the list, the "Halloween_Cluster.yxwz" application.  We'll be using that id value to connect directly to our Halloween Cluster application.  jup_4.JPG


 Next, we're connecting to the application to obtain the question information that would normally be displayed to the user if the application were being accessed via the Gallery website.  




Next, we're going to do two things:  First, we'll create an answers object which we'll pass as an argument to our application.  Next, we'll pass this answers object to the application using the "executeAndFetchResults" function.  This function connects to our application, submits our answers, waits for the application to finish creating the store clusters, and then returns the results in a convenient object called a Pandas DataFrame. Pandas is a library for doing advanced data manipulation, and is particularly popular with Wall Street quants and scientists who are proficient with traditional programming. The DataFrame object is similar to an R DataFrame, and is used to store and manipulate tabular data.  






Here, you see that our Alteryx API has completed its task and returned our data in a convenient Pandas DataFrame object.  Here we examine the first five rows of data.  




Next, we use Pandas' groupby function to perform some summarization on our data and store a grouped sum of quantity sold by item and day in a variable called 'grouped_dem'.  

We also create a second summarized data object called 'avg_sales_by_cluster', and then we have to convert the Cluster integer value to a string.  This is required to ensure our visualization tools in the next step treat the cluster value as a categorical, rather than continuous variable.  (If you're thinking that this looks a lot harder than just using a 'Summarize' tool in Alteryx, that's because it is.)  




 Now we'll use the super sweet (pun totally intended) Altair library to examine our first candy sales object, which shows the sum sold by day and item.  Notice that there's a huge spike in sales on Halloween, as expected.  You can also see smaller spikes in sales around the holiday time period.  




 Now here's a visualization of our second aggregated object, the average quantity sold for each demographic cluster.  Again, we see a big spike in sales around Halloween.  However, an interesting phenomenon we see is that certain clusters of stores (in this case cluster 7) have other spikes in sales at other dates.  






Remember, this is simulated data, so I'm not going to play candy sales analyst here, but I hope I've shed some light in this post on what can be done with Alteryx applications when you leverage them as RESTful APIs combined with other software.  This example only scratches the surface.  Using the same prinicpals above, building dynamic web applications which leverage Alteryx as a data source is trivial in any language.  For example, there are numerous web frameworks written in Python which could leverage the code featured in this article to achieve this goal.  Integration doesn't have to be scary if you choose the right tools.  


Thanks, and have a Happy Halloween.   I'm looking forward to any and all questions, comments, and suggestions.  




Halloween16.pngMake sure you check in here to get the Python Trick or Treat badge!