Alteryx Designer Desktop Discussions

Smcleod2 · ‎02-04-2016

Hello,

I need assistance with grouping and statistics.

End State (what good looks like): 3 or 4 groupings by employee segments based on a statisitcal methodology that is based on vertical (SIC), potential, and propensity to buy (higher value customers are more likely to purchase). For example, grouping employees from 1-15, etc.

I have sample data attached. I am stuck on best way forward. I have tried Tile and K-means Clustering with Alteryx, but still need assistance. Any methodology or process to help me progress is greatly appreciated. I am a new user with Alteryx.

Data&colon;

Company Sales, Sales revenue, Total employees, Employees_Here, Number_of_Family_Members, Industry Code, Potential, Propensity to Buy

Many thanks

JohnJPS · ‎02-05-2016

I'm not sure I completely understand the question, but could you do multiple Tile tools? For instance, send data input into both a tlile on Potential, and a tile on PropensityToBuy; then join these back together by record position; then add the "TileNum" outputs together and sort descending on the sum? Just a thought - see attached workflow for detail.

RodL · ‎02-05-2016

As @JohnJPS said, I'm not sure I understand your question.

Typically with clustering/segmentation, you are segmenting on the observations (e.g., stores, customers, or in the case of your data, companies).

The way your data is set up, you have individual companies with attributes for each one. For this data it would normallly be used to segment out companies into like groups.

I have attached how you would do this in Alteryx with the clustering tools. The Cluster Diagnostics workflow tests the data to determine the optimum number of clusters based on the K-Means cluster method. A PDF of the results is attached. Based on those results, it looks like the "best" cluster solution would be 6 clusters. (BTW, it takes about 25 minutes to run with the settings in the workflow.) Then the Cluster Analysis workflow shows how you would use those settings to run a model and assign the appropriate cluster to each company. Another PDF shows the results of that.

Of course, looking at the results, I might filter out the 'zero employee' companies as "exceptions", but it really depends on what the purpose of your segmentation is.

Smcleod2 · ‎02-06-2016

Thanks John, this is great. Thank you for the reply and help.

Smcleod2 · ‎02-06-2016

Thank you Rod for the help. I did not know about the Render output. The one concern is the time is takes for clustering. This was just a very small data sample as I tried a large sample and after an hour I stopped it. Thank you for the reply

RodL · ‎02-08-2016

Glad to help.

A couple other points to consider...

Time for the Diagnostics step will vary depending on the methodology. My experience is that the Neural Gas takes MUCH longer than the other two methods. I think the K-Means for the data set you provided took about 30 minutes to model on my machine.

Also my experience for much larger data sets (e.g., 8 million customers) has been that you diagnose and create the cluster model off of a random sample subset of your data (your wouldn't need to build a model off of a full 8MM observations to be statistically valid) and then once you determine the "best" model, you append the clusters for the full data set off of the model built on the subset.

Alteryx Designer Desktop Discussions

Grouping or Clustering by Employee Count