Hi Community,
I'm trying to produce some aggregated forecast and was in need of some help, mainly around how to summarize the data before passing it through the TS models. The data is at the employee level but we'd like to have it summarized to the dept ID level. Below is my problem statement, and then a desired outcome. The dataset is also attached.
Problem Statement: We'd like to forecast out attrition in our org for the next three months by dept ID.
Desired Outcome: two columns, first column would have dept ID and the second column would have the forecasted attrition.
I know I need to summarize, but I imagine I need to summarize by date, and then add by Dept ID? Any help would be great.
Thank you.
I would probably start by creating more generic date buckets like weeks or months (using the Formula tool). From there, you can create your model with the ARIMA and ETS tools. For multiple groups, I'd take a look at this thread:
https://community.alteryx.com/t5/Alteryx-Designer-Discussions/TS-Forecasting/td-p/28857
Hi @dlopez ,
my approach would be to convert termination dates to a weekly or monthly level (if you want to predict "next three month", I'd use monthly data) and summarize - GROUP BY dept ID and period (week or month), UID count. You could use ARIMA or ETS to predict attrition based on the aggregated values, TS Model Factory may be used to build the model for multiple depts.
Have you considered using attrition rate (calculate attrition rate per month and dept first, predict future attrition rates and calculate absolute values for attrition as a last step)? It could make sense if the number of employees in a department is strongly varying).
What do you think?
Best,
Roland
Hi, @dlopez
Great question, and thank you for the problem statement and desired outcome.
Please like or mark an unacceptable solution if this works for you.
Here are some thoughts to explore:
1. Periodicity - break the Dates into the Weeks your HR recognizes (i.e., 52 weeks or 53 weeks), and you probably need only the last three years worth of data (for attrition recency + aged data effects might be more pronounced)
2. Descriptive Stats - slice and dice to understand Dept ID participation in the data, and generate an allocative factor (that should be tweaked further by internal conversations that are on-going + target strategies that have been deployed). At the end of this exercise, you want to be able to complete the following statement, "Dept X represents Y% of Total Attrition for period Z."
3. To start, run your TS as an aggregate = do not forecast per Dept ID. Just forecast based on Period and Attrition Count.
4. Once you have your TS forecast output range (there will be a number + low + high = the cone of uncertainty), ALLOCATE the forecast to Dept ID based on step 2.
5. Tweak (a little plus or minus) based on current ongoing initiatives or absence of initiatives
6. Optional: In step 2, you might also assess the time to attrition events/ clusters and also generate this statement, "Major/ Minor/ Average Attrition tends to happen X time (weeks) after Hire Date for Dept X."
7. Optional. Use step 6 to ALLOCATE which Dept ID are at risk for attrition events
Of course, by the time you're considering 6 and 7, you will be using more than TS. You will be augmenting TS with Predictive and Prescriptive treatments. Cheers!
Hi @echuong1
Got it, and that was my initial steps, bucket day level dates into months since that's what I want to forecast. And then I ran them through ETS and ARIMA models, working on that second part you mentioned.
Hi @RolandSchubert,
That's what my initial steps were (workflow has been shared), to group day level dates into months and then run them through the ETS and ARIMA models. Now I'm installing the TS Model tools to do the grouping of depts.
I mentioned this to our business partner and we've decided absolute numbers would work best for what they're using them for.
Thanks!
david
Hi @RobertOdera
Thanks for your feedback, was very helpful. My comments to your questions are below.
1. Exactly, this is was I did. Took the dates and aggregated them to the monthly level.
2. We've already done this piece and are aware, and the business exercise we really want to answer is very similar. "in the next three months these depts are forecasted to lose xxx amount of employees" that way we could allocated resources where we see higher levels of forecasted attrition
3. Did this, workflow has been attached.
4 thru 7 Not sure how I'd allocate if I didn't include dept in the forecast, maybe I'm missing something here.
Again, thanks for the feedback was very constructive.
Cheers,
david
Sure thing, @dlopez
Thanks for the feedback, and thank you for your attached workflow sample.
I will offer the following:
1. Great use of ETS and ARIMA.
Would you like me to share a consolidated approach/ flow which allows you to score the models in-stream?
2. Your question on how to allocate.
a. Run descriptive analytics (Data Investigation) on your cleaned data set (if you hadn't already prepped the set)
- what is the share/ mix over the past N periods by Dept ID, where N can range from 3 months to the entire data set period
- what is the mode, mean, median on the time to attrition event from the Start Date
- what is the time to attrition event share/ mix over the past N periods by Dept ID
In other words, you're trying to generate a Spread matrix by exploring the cleaned data set. Below is a quick Tableau snippet of an Alteryx'd scenario. Total Sales may be X for a given period N, but how does it mix (how is it allocated)? In the example below, you can see Sales X allocated across Month, Weekday, and Day of Month - this is the insight you want to tease of your data set.
In your use-case, your allocating Attrition (first actual attrition numbers, and second, time to attrition event) across Dept ID. I hope this makes sense.
b. IMHO, forecasting is a two-step. The forecast (which is the result of the flow) and meaningful accuracy (which should be the outcome of the process).
- Yes, you will ultimately provide an absolute number (but that number should be a guided pick from a ranged result based on elicitation - you, the team, risk averseness profile, business champions, etc.,)
- Yes, keep the ranged output because it is a function of your chosen "confidence", pun intended, and it allows you to frame and appreciate the cone of uncertainty - verus just generating a point forecast output (an absolute number, but without the benefit of the step above, which might be sub-optimal)
-Yes, keep the ranged output because the reality will lie within distribution in the range (for good forecast results), and keeping the ranged output allows you to have meaningful retrospectives, reflections, measurement, and look-backs.
It's a subtle benefit, but it's huge in terms of providing additional basis from which to build Trust.
Again, I appreciate your interest and engagement.
I hope it is useful and reach out if there's any way I can add value.
Hi @RobertOdera
I've updated my flow to get very close to what I want, it's attached to this post.
1. That'd be great, or even if you want to use the one I've attached and edit it that works.
2A. I get it now, yah we have a dashboard that shows those descriptive statistics to our business partners. These are all EDAish questions that we've explored.
2B. Right, I'm keeping my eye on the model score metrics to ensure we have a viable model and for sure would like to keep the eye on the "cone" as well.
Great discussion, greatly appreciated. Look forward to hearing back.
Cheers,
david
Sure thing, @dlopez
Sorry, my bandwidth evaporated last week.
You know...we plan, God laughs!
Okay, let's get back into it!
1. I will do (model using your shell) and send that out to you tomorrow AM
2A. Yes! EDA-ish (glad I'm tracking with your process/ thinking)
2B. Nice. This will be helpful in the future should you choose to mature/ evolve/ explore ML approaches (caveat = I'm new to ML, so take advice with a LUMP of salt versus grain!)
You don't work to over-fit, but improving prediction precision on the range should significantly reduce waste, which IMHO is huge in any scenario.
Cheers!