community
cancel
Showing results for 
Search instead for 
Did you mean: 

Data Science Blog

Machine learning & data science for beginners and experts alike.
Alteryx
Alteryx

How about using Facebook's Prophet package for time series forecasting in Alteryx Designer?

 

Hmm, interesting that you ask! I have been trying to do that thing for ages now.

 

Facebook research lab's premise is that Prophet package is to time-series as Forest Model is to classification and regression (almost, anyway). This bad boy is the Chuck Noris of time series if you will.  Throw just about anything at it and it will do the trick.

 

Alright, I am listening.

 

image.png

 

But man oh man does it come with painful deployment. I mean, it did. All solved now. As Prophet relies on Pystan and multiple non-python dependencies (it's actually C++ I think), it is quite painful (read virtually impossible without smashing your notebook against the wall) to deploy with PIP.

 

So finally, when Alteryx released 2019.3 version and introduced package management with CONDA I could use it to install the Prophet package with all its dependencies

 

image.png

 

So What Is Prophet?

Prophet is a forecasting tool available in Python and R, developed by Facebook research labs as open sourced projectAt its core, the Prophet procedure is an additive regression model with four main components (find more here).

 

 

Why bother? 

Forecasting is a data science task that is central to many activities within an organisation. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline.

 

 

So what's the problem?

Producing high quality forecasts is not an easy problem for either machines or for most analysts. Completely automated forecasted techniques can be brittle and are often inflexible. Plus, analysts who can deliver high-quality forecast are quite rare because this data science skill requires substantial experience.

 

 

Any solution to that?

Facebook created Prophet to fix these problems with the premise of making it easier for experts and non-experts to make high-quality forecasts that keep up with demand.  Facebook has found that by combining automatic forecasting with analyst-in-the-loop forecasts for special cases, it is possible to cover a wide variety of business use-casesWith Prophet, you are not stuck with the results of a completely automatic procedure if the forecast is not satisfactory — an analyst with no training in time series methods can improve or tweak forecasts using a variety of easily-interpretable parameters.


 

image.png

 



Where Prophet shines?

Prophet is optimised for the business forecast tasks which typically have any of the following characteristics:

  • hourly, daily, or weekly observations with at least a few months (preferably a year) of history
  • strong multiple “human-scale” seasonalities: day of week and time of year
  • important holidays that occur at irregular intervals that are known in advance (eg. Super Bowl)
  • a reasonable number of missing observations or large outliers
  • historical trend changes, for instance due to product launches or logging changes
  • trends that are non-linear growth curves, where a trend hits a natural limit or saturates


What's the bottom line?

Prophet makes it much more straightforward to create a reasonable, accurate forecast. The forecast package includes many different forecasting techniques (ARIMA, exponential smoothing, etc), each with their own strengths, weaknesses, and tuning parameters.  Prophet forecasts are customisable in ways that are intuitive to non-experts. There are smoothing parameters for seasonality that allow you to adjust how closely to fit historical cycles, as well as smoothing parameters for trends that allow you to adjust how aggressively to follow historical trend changes. 



So Let's Alteryx that Bad Boy!

I have chosen a dataset of Medium posts over the past 5+ year to test Prophet in Alteryx.  The goal is to create predictions (forecast) of how many posts will be generated over the next 60 days.

 

image.png



Install that thing.

First of all, you actually need to install Prophet using CONDA. Follow the instructions from this post to get this done. As CONDA reinstalls quite a big portion of base packages I would suggest you back up the Alteryx Python env directory.



Alteryx handles the data prep (naturally).

The workflow and original datasets are attached. My workflow utilizes Alteryx to prepare the dataset of Medium posts, and streams the data directly to the Python Code Tool where all the Prophet magic happens.

 

A few points to note:

The input data streamed to the Python tool is simply DATE (labeled [ds]) and VALUE (labeled [y]). Notice that tons of data points are actually missing as Medium picked up its massive user base slowly over time. 

 

The data is sorted by DATE ascending. Both [ds] and [y] columns are strings. Had some issues with int data type conversions for Python tool and doing some conversions in Python directly. 

 

You could take just about any problem you are facing, do the same formatting of your data and stream that into the Python tool I am using. 

 

image.png




Visualisations in Python Code Tool

I am using several plots directly within the Python Code tool. Just open it to check out.  First, I am plotting the number of Daily Medium Posts over Time between 2010 and 2017.

 

It is hard to infer anything meaningful from this chart, apart from the prominent upward and accelerating trend.  Bucketing this a bit differently, maybe in weekly bins would be easier on the eye. Too bad I am lazy 🙂


 

image.png


And once modelling is done, Prophet allows me to easily plot the forecast data with things like outliers and confidence intervals. Straight out of the box with a simple one-line function call.  The plot below is simply showcasing the model learning based on historical data and then forecasting the future 60 days of posts.


 

image.png

 

There are also various plots for components of the model, like trend, weekly and yearly seasonality and others. Prophet did a good job by fitting the accelerated growth of new posts at the end of 2016.

 

The graph of weekly seasonality indicates that there are a few fewer new posts on Saturdays and Sundays than on the other days of the week. That lawn won't cut itself over the weekend right? Yeah, I've been there.


 

image.png

 

I have actually tried to stream most of the important datasets out of the Python tool.

 

One is the dataset of forecasted data from Prophet's model:


image.png



Another output compares Forecast (yhat) with the actual historic data [y]:


image.png



And the last one with error measures:


image.png



Now a little step sideways:

 

How Does That All Compare to Alteryx Time Series Tools?

I have actually wondered If I use the ARIMA and ETS tools that are pretty much out of the box with Alteryx - how will that compare to the Prophet package?

 

Building the workflow with the same dataset and same holdout sample, predicting 60 days of posts into the future took about 3 minutes with Alteryx.


 

image.png



This is the error measures for both ARIMA and ETS models. Interestingly I am getting lower error measures here than with Prophet tool. Truth be told I have not really done any hyperparameter tuning for any of the models.

 

Note: this testing workflow is also attached.

 

image.png



Last thing, the code itself:

 

 

 

 

 

# List all non-standard packages to be imported by your 
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])
from ayx import Alteryx

# Pip won't work with Prophet -> install with CONDA
# https://github.com/facebook/prophet/issues/715
# There are issues with pystan C++ compiler not working correctly
Package.installPackages(['statsmodels','plotly','patsy','scipy'])       
import warnings
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
daily_df = Alteryx.read("#1")   #Load the pre-processed medium data, aggregated by date, ASC sorted by date
Alteryx.readMetadata("#1")   #Read metadata from connection #1
#print(daily_df.dtypes)
daily_df[["y"]] = daily_df[["y"]].apply(pd.to_numeric) #Explictly change datatype for value column "y" to numeric - had some issues when relying in Select tool
daily_df.head(10)
#print(daily_df.dtypes)
#Simple pandas plot of historic daily figures
daily_df.plot(x ='ds', y='y', kind = 'line',figsize=(15,10), title='Number of Daily Medium Posts over Time')
from fbprophet import Prophet
import logging
logging.getLogger().setLevel(logging.ERROR)     #mute unimportant diagnostic messages

prediction_size = 60     #We want to predict 60 data points, i.e. 60 days
train_df = daily_df[:-prediction_size]     #Construct the train dataset, remove holdout for validation
print(train_df)
Alteryx.write(train_df,1)  #Train Dataframe output to connection #1
m = Prophet()      #Insantiate prophet model
m.fit(train_df);   #Train our model by invoking its fit method on our training dataset

future = m.make_future_dataframe(periods=prediction_size)  #create a dataframe with all dates from the history and also extend into the future   
future.head(n=10) 

forecast = m.predict(future)   #Predict values; pass in the dates for which we want to create a forecast
forecast.head(n=10)
#print(forecast.dtypes)
Alteryx.write(forecast,2)  #Forecasted Dataframe output to connection #2
m.plot(forecast); #The Prophet library has its own built-in functions for quickly visualizing the results

m.plot_components(forecast); #Observe different components of the model separately: trend, yearly and weekly seasonality
print(', '.join(forecast.columns))  #Check out all available columns in forecast
print(forecast.dtypes)
#Set indexes on forecast and daily_df before join
forecast.set_index('ds',inplace=True)
daily_df.set_index('ds',inplace=True)

#Join the forecast object with the actual values y from the original dataset
cmp_df=forecast.join(daily_df)[['yhat', 'yhat_lower', 'yhat_upper','y']]

#Reset index so the date -- ds value -- is written to Ayx output dataset
cmp_df.reset_index(level=0, inplace=True)
cmp_df.tail(n=10)

Alteryx.write(cmp_df,3)  #Comparison data to output #3
#Helper function that we will use to understand the quality of our forecasting with MAPE and MAE error measures
def calculate_forecast_errors(df, prediction_size):
    """Calculate MAPE and MAE of the forecast.
    
       Args:
           df: joined dataset with 'y' and 'yhat' columns.
           prediction_size: number of days at the end to predict.
    """
    
    # Make a copy
    df = df.copy()
    
    # Now we calculate the values of e_i and p_i according to the formulas given in the article above.
    df['e'] = df['y'] - df['yhat']
    df['p'] = 100 * df['e'] / df['y']
    
    # Recall that we held out the values of the last `prediction_size` days
    # in order to predict them and measure the quality of the model. 
    
    # Now cut out the part of the data which we made our prediction for.
    predicted_part = df[-prediction_size:]
    
    # Define the function that averages absolute error values over the predicted part.
    error_mean = lambda error_name: np.mean(np.abs(predicted_part[error_name]))
    
    # Now we can calculate MAPE and MAE and return the resulting dictionary of errors.
    return {'MAPE': error_mean('p'), 'MAE': error_mean('e')}
#Placeholder for measures output DF  
df_measures=pd.DataFrame()

#Use our function to get MAPE and MAE measures
for err_name, err_value in calculate_forecast_errors(cmp_df, prediction_size).items():
    print(err_name, err_value)
    df = pd.DataFrame({"Name":[err_name],"Value":[err_value]}) 
    df_measures = df_measures.append(df,  ignore_index = True, sort = False) #Append to the data frame placeholder
    
Alteryx.write(df_measures,4)  #Perf measures to connection 4

 

 

 

 

 

Wrap Up

Well, we have CONDA finally allowing us to install Prophet and all its non-Python dependencies.  We have done a brief intro to Prophet and built a nice sample workflow that can be easily reused to whatever date you want to throw at it. Just grab the code, or the sample workflow attached. And lastly we have compared Prophet to ARIMA and ETS out of the box tools.

 

I think that Prophet is super nice and actually does not require much for you to be able to use it.  Even without any hyperparameter tuning (maybe next time) it produced pretty solid results.

 

I think there are certain use cases where it will shine (mentioned top of this article) and its use can definitely be considered while building forecast models.


 

image.png



More Resources

Facebook's Research team pages: Intro To Prophet

TDS: Implementing Prophet Effectively

TDS: Time Series with Prophet Effectively

Utilizing Prophet on Kaggle

 

Cheers,

DM

Comments
Alteryx Partner

Sounds great!

 

Any chance to see this implemented in a release any time soon?

Alteryx is loosing steam on machine learning as it's missing a lot of stuff:

 

- multicore ML libraries (Ranger package for random forest for example)

- anything remotely modern for boosting (XGBoost, LightGBM, Catboost)

- Basic clustering options for non-numerical data (k-modes)

- Anything to help manage ML on non-structured data 

 

Please make Alteryx relevant for data science again! 

Alteryx
Alteryx
Alteryx Partner

Thank you, is there any chance we will see them officially implemented any time soon?