How about using Facebook's Prophet package for time series forecasting in Alteryx Designer?
Hmm, interesting that you ask! I have been trying to do that thing for ages now.
Facebook research lab's premise is that Prophet package is to time-series as Forest Model is to classification and regression (almost, anyway). This bad boy is the Chuck Noris of time series if you will. Throw just about anything at it and it will do the trick.
Alright, I am listening.
But man oh man does it come with painful deployment. I mean, it did. All solved now. As Prophet relies on Pystan and multiple non-python dependencies (it's actually C++ I think), it is quite painful (read virtually impossible without smashing your notebook against the wall) to deploy with PIP.
So finally, when Alteryx released 2019.3 version and introduced package management with CONDA I could use it to install the Prophet package with all its dependencies.
So What Is Prophet?
Prophet is a forecasting tool available in Python and R, developed by Facebook research labs as open sourced project. At its core, the Prophet procedure is an additive regression model with four main components (find more here).
Why bother?
Forecasting is a data science task that is central to many activities within an organisation. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline.
So what's the problem?
Producing high quality forecasts is not an easy problem for either machines or for most analysts. Completely automated forecasted techniques can be brittle and are often inflexible. Plus, analysts who can deliver high-quality forecast are quite rare because this data science skill requires substantial experience.
Any solution to that?
Facebook created Prophet to fix these problems with the premise of making it easier for experts and non-experts to make high-quality forecasts that keep up with demand. Facebook has found that by combining automatic forecasting with analyst-in-the-loop forecasts for special cases, it is possible to cover a wide variety of business use-cases. With Prophet, you are not stuck with the results of a completely automatic procedure if the forecast is not satisfactory — an analyst with no training in time series methods can improve or tweak forecasts using a variety of easily-interpretable parameters.
Where Prophet shines?
Prophet is optimised for the business forecast tasks which typically have any of the following characteristics:
What's the bottom line?
Prophet makes it much more straightforward to create a reasonable, accurate forecast. The forecast package includes many different forecasting techniques (ARIMA, exponential smoothing, etc), each with their own strengths, weaknesses, and tuning parameters. Prophet forecasts are customisable in ways that are intuitive to non-experts. There are smoothing parameters for seasonality that allow you to adjust how closely to fit historical cycles, as well as smoothing parameters for trends that allow you to adjust how aggressively to follow historical trend changes.
So Let's Alteryx that Bad Boy!
I have chosen a dataset of Medium posts over the past 5+ year to test Prophet in Alteryx. The goal is to create predictions (forecast) of how many posts will be generated over the next 60 days.
Install that thing.
First of all, you actually need to install Prophet using CONDA. Follow the instructions from this post to get this done. As CONDA reinstalls quite a big portion of base packages I would suggest you back up the Alteryx Python env directory.
Alteryx handles the data prep (naturally).
The workflow and original datasets are attached. My workflow utilizes Alteryx to prepare the dataset of Medium posts, and streams the data directly to the Python Code Tool where all the Prophet magic happens.
A few points to note:
The input data streamed to the Python tool is simply DATE (labeled [ds]) and VALUE (labeled [y]). Notice that tons of data points are actually missing as Medium picked up its massive user base slowly over time.
The data is sorted by DATE ascending. Both [ds] and [y] columns are strings. Had some issues with int data type conversions for Python tool and doing some conversions in Python directly.
You could take just about any problem you are facing, do the same formatting of your data and stream that into the Python tool I am using.
Visualisations in Python Code Tool
I am using several plots directly within the Python Code tool. Just open it to check out. First, I am plotting the number of Daily Medium Posts over Time between 2010 and 2017.
It is hard to infer anything meaningful from this chart, apart from the prominent upward and accelerating trend. Bucketing this a bit differently, maybe in weekly bins would be easier on the eye. Too bad I am lazy 🙂
And once modelling is done, Prophet allows me to easily plot the forecast data with things like outliers and confidence intervals. Straight out of the box with a simple one-line function call. The plot below is simply showcasing the model learning based on historical data and then forecasting the future 60 days of posts.
There are also various plots for components of the model, like trend, weekly and yearly seasonality and others. Prophet did a good job by fitting the accelerated growth of new posts at the end of 2016.
The graph of weekly seasonality indicates that there are a few fewer new posts on Saturdays and Sundays than on the other days of the week. That lawn won't cut itself over the weekend right? Yeah, I've been there.
I have actually tried to stream most of the important datasets out of the Python tool.
One is the dataset of forecasted data from Prophet's model:
Another output compares Forecast (yhat) with the actual historic data [y]:
And the last one with error measures:
Now a little step sideways:
How Does That All Compare to Alteryx Time Series Tools?
I have actually wondered If I use the ARIMA and ETS tools that are pretty much out of the box with Alteryx - how will that compare to the Prophet package?
Building the workflow with the same dataset and same holdout sample, predicting 60 days of posts into the future took about 3 minutes with Alteryx.
This is the error measures for both ARIMA and ETS models. Interestingly I am getting lower error measures here than with Prophet tool. Truth be told I have not really done any hyperparameter tuning for any of the models.
Note: this testing workflow is also attached.
Last thing, the code itself:
# List all non-standard packages to be imported by your
# script here (only missing packages will be installed)
from ayx import Package
#Package.installPackages(['pandas','numpy'])
from ayx import Alteryx
# Pip won't work with Prophet -> install with CONDA
# https://github.com/facebook/prophet/issues/715
# There are issues with pystan C++ compiler not working correctly
Package.installPackages(['statsmodels','plotly','patsy','scipy'])
import warnings
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
daily_df = Alteryx.read("#1") #Load the pre-processed medium data, aggregated by date, ASC sorted by date
Alteryx.readMetadata("#1") #Read metadata from connection #1
#print(daily_df.dtypes)
daily_df[["y"]] = daily_df[["y"]].apply(pd.to_numeric) #Explictly change datatype for value column "y" to numeric - had some issues when relying in Select tool
daily_df.head(10)
#print(daily_df.dtypes)
#Simple pandas plot of historic daily figures
daily_df.plot(x ='ds', y='y', kind = 'line',figsize=(15,10), title='Number of Daily Medium Posts over Time')
from fbprophet import Prophet
import logging
logging.getLogger().setLevel(logging.ERROR) #mute unimportant diagnostic messages
prediction_size = 60 #We want to predict 60 data points, i.e. 60 days
train_df = daily_df[:-prediction_size] #Construct the train dataset, remove holdout for validation
print(train_df)
Alteryx.write(train_df,1) #Train Dataframe output to connection #1
m = Prophet() #Insantiate prophet model
m.fit(train_df); #Train our model by invoking its fit method on our training dataset
future = m.make_future_dataframe(periods=prediction_size) #create a dataframe with all dates from the history and also extend into the future
future.head(n=10)
forecast = m.predict(future) #Predict values; pass in the dates for which we want to create a forecast
forecast.head(n=10)
#print(forecast.dtypes)
Alteryx.write(forecast,2) #Forecasted Dataframe output to connection #2
m.plot(forecast); #The Prophet library has its own built-in functions for quickly visualizing the results
m.plot_components(forecast); #Observe different components of the model separately: trend, yearly and weekly seasonality
print(', '.join(forecast.columns)) #Check out all available columns in forecast
print(forecast.dtypes)
#Set indexes on forecast and daily_df before join
forecast.set_index('ds',inplace=True)
daily_df.set_index('ds',inplace=True)
#Join the forecast object with the actual values y from the original dataset
cmp_df=forecast.join(daily_df)[['yhat', 'yhat_lower', 'yhat_upper','y']]
#Reset index so the date -- ds value -- is written to Ayx output dataset
cmp_df.reset_index(level=0, inplace=True)
cmp_df.tail(n=10)
Alteryx.write(cmp_df,3) #Comparison data to output #3
#Helper function that we will use to understand the quality of our forecasting with MAPE and MAE error measures
def calculate_forecast_errors(df, prediction_size):
"""Calculate MAPE and MAE of the forecast.
Args:
df: joined dataset with 'y' and 'yhat' columns.
prediction_size: number of days at the end to predict.
"""
# Make a copy
df = df.copy()
# Now we calculate the values of e_i and p_i according to the formulas given in the article above.
df['e'] = df['y'] - df['yhat']
df['p'] = 100 * df['e'] / df['y']
# Recall that we held out the values of the last `prediction_size` days
# in order to predict them and measure the quality of the model.
# Now cut out the part of the data which we made our prediction for.
predicted_part = df[-prediction_size:]
# Define the function that averages absolute error values over the predicted part.
error_mean = lambda error_name: np.mean(np.abs(predicted_part[error_name]))
# Now we can calculate MAPE and MAE and return the resulting dictionary of errors.
return {'MAPE': error_mean('p'), 'MAE': error_mean('e')}
#Placeholder for measures output DF
df_measures=pd.DataFrame()
#Use our function to get MAPE and MAE measures
for err_name, err_value in calculate_forecast_errors(cmp_df, prediction_size).items():
print(err_name, err_value)
df = pd.DataFrame({"Name":[err_name],"Value":[err_value]})
df_measures = df_measures.append(df, ignore_index = True, sort = False) #Append to the data frame placeholder
Alteryx.write(df_measures,4) #Perf measures to connection 4
Wrap Up
Well, we have CONDA finally allowing us to install Prophet and all its non-Python dependencies. We have done a brief intro to Prophet and built a nice sample workflow that can be easily reused to whatever date you want to throw at it. Just grab the code, or the sample workflow attached. And lastly we have compared Prophet to ARIMA and ETS out of the box tools.
I think that Prophet is super nice and actually does not require much for you to be able to use it. Even without any hyperparameter tuning (maybe next time) it produced pretty solid results.
I think there are certain use cases where it will shine (mentioned top of this article) and its use can definitely be considered while building forecast models.
More Resources
Facebook's Research team pages: Intro To Prophet
TDS: Implementing Prophet Effectively
TDS: Time Series with Prophet Effectively
Cheers,
DM
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.