Alteryx Designer Desktop Discussions

dhavasa3 · ‎12-01-2019

I am trying to predict sales of hospitals based on certain hospital/physician traits and historical sales. I want classify the records first based on the traits and then use time series prediction for each group. Can I use SVM of RF or similar models for classification? And how to apply TS on each group post classification?

RishiK · ‎12-01-2019

@dhavasa3 you can perform SVM via the Alteryx SVM tool

https://help.alteryx.com/2018.3/SVM.htm

For RF, use the Forest Model tool in Alteryx

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Forest-Model/ta-p/3057...

@SandeepSK wrote an article on Time Series forecasting which may help you:

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Time-Series-Factory-Tools-vs-Batch-...

dhavasa3 · ‎12-01-2019

Sure, but my question is more around how do I link these two?- the classification and the time series?

The model should use SVM/RF first to classify and then automatically perform TS forecast for each group in the next step. Can we do that? If so, how?

RolandSchubert · ‎12-02-2019

Hi @dhavasa3 ,

I'll try to answer on a very general level in the first step. You want to use SVM or RF to classify traits, I think the links provided by @RishiK will be helpful to create the model. After model is trained to the needed level (i.e. classification is as good as expected), you can add a Score tool to apply the classification (output from SVM or Forest tool) to the data. Each record will be "classified" by receiving a specific result you can use to group the records. You can then either summarize the records by group and apply the time series prediction (ETS, ARIMA) to the "group sum" or apply it to individual records (perhaps using different models by group). Hope this is helpful to approch the problem.

Regards

Roland

dhavasa3 · ‎12-03-2019

Thank you for the explanation.

Now, my predictor variables for the classification are the factors that could influence sales. And the target variable is average sales.

The score tool applied post classification gives an estimated average sales number for each hospital. At this step I am unsure how to make groups from this result. Am I proceeding in the right direction?

RolandSchubert · ‎12-03-2019

I'm not really sure. I had the understanding, you want to create groups in a first step and forecast sales in the second step, now it seems, forecast is done and grouping it needed ...

What kind of groups do you want to create? Is it based of sales (e.g. large - medium - small) or based on traits (e.g. group 1 tends to have a large number of beds, is specialized in heart diseases, is located close to the city centre - group 2 has a small number of beds, has no area of specialization, is located in a rural area)? If you go for the first option, it's simply dividing into groups by predicted revenue, for the second option I would consider using clustering as you have no predified groups available, it's not the question "To which one of a list of groups does a hospital belong to" but "Which groups can the hospitals divided to". What do you think?

dhavasa3 · ‎12-04-2019

Hi,

Yes. I am talking about the second option. I have few parameters like specialization of the hospital, access restrictions for promotion etc on the basis of which I plan to cluster. After clustering I plan to look at the sales trends in each group and decide the best way to predict the revenue for next period using any TS technique.

The point where I get stuck is- I use RF/SVM for clustering. My x-variables are the parameters like hospital specialty. The target variable is average sales for a period. The result I get post using the score tool is the estimated sales number for each hospital. How do I interpret this result to make the groups? And Is this approach correct?

RolandSchubert · ‎12-04-2019

Hi @dhavasa3 ,

both Random Forest anf Support Vector Machine are classification approaches. That means (very simplified), you need to have a list of groups and elements with specific parameters already assigned to these groups. RF/SVM would then calculate the "group number" for other elements (i.e. hospitals). You could use this approach to find the right group number for a new hospital.

Based on your description of the problem, I would use a clustering approach. You have a list of elements (= hospitals) and their parameters (e.g. specialization, restrictions, sales). If you pass this to K-Means Clustering tool and add a Score tool, the clustering algorithm will create groups (= clusters) based on the specific parameters. The hospitals within each group are similar to each other and different to the hospitals in other groups/clusters.

The difference between the approaches is, that classification is used to add new elements to a group, when rules/conditions for "group membership" are know or can be derived from existing assignments, clustering is used to create groups based on parameters without any predifined rule, so your problem (as I understood it correctly) should be solved using clustering (=> Predictive Grouping Tools).

Does this help?

Best

Roland

dhavasa3 · ‎12-04-2019

Thank you. That solves my problem.

However, I would like to understand a little more about the classification you are talking about. So, in order to use SVM/RF, should we manually make the groups with specific parameters at first and feed into the system. In that case what is the target variable in the RF tool? Is it the group number? Please explain if possible

RolandSchubert · ‎12-04-2019

Hi @dhavasa3 ,

basically, you need a data set with group numbers to train the classification model. You could assign the cluster numbers resulting from cluster analysis to the hospitals (or manually make groups based on specific parameters). If the group number for a new hospital has to be determined, it would be possible to use RF/SVM - there is a list of "know" (already classified) elements that can be used to identify the logic behind the assignment (i.e. how are the specific parameters assessed to assign group number "1" to hospital "A" and group number "2" to hospital "B") and apply this logic to the new hospital. The target variable would be the group number. So the requirement for classification is always, that you already know the result (= target variable) for a number of elements in addition to their parameters, while clustering requires only the parameters. I hope this clarifies things a bit.

Best

Roland

Alteryx Designer Desktop Discussions

Classification and Forecasting in Alteryx