Data Science

Machine learning & data science for beginners and experts alike.
SusanCS
Alteryx Alumni (Retired)

Last week I posted a full introduction to factor analysis, plus a workflow demonstrating one use of this analytic approach in Designer. But now the process is even easier with a macro -- attached at the bottom of this post -- that you can easily grab and put into your own workflow!

 

Factor Analysis: What and Why

A quick summary: Exploratory factor analysis helps you find potential “latent” or hidden variables -- aka “factors” -- that represent combinations of your known, measured variables. Maybe there’s a particular subset of variables that are all correlated with each other and that together represent an unobserved influence on your dataset. 

 

 

SusanCS_0-1589910560405.gif

 

 

Factor analysis is often used in survey analysis, market research and finance to look for previously unrecognized but meaningful patterns among variables. Here are some potential uses:

 

  • Analyzing customer or employee satisfaction survey results to find underlying patterns of responses along certain variables
  • Checking consumers’ responses to questions about a new product to look for latent beliefs and perceptions about the product
  • Examining opinion polls to look for unexpressed attitudes that shaped responses to different issues or candidates
  • Identifying patterns in consumer behavior and purchases that define previously unidentified market segments
  • Grouping taste testers’ responses to flavor elements in order to link related flavor perceptions
  • Reviewing test scores to look for relationships among different skill sets and test performance

 

Try the Factor Analysis Macro

To simplify your factor analysis, try the macro attached to this post! Be sure you're running Designer as an administrator so that the necessary Python package can be installed successfully. You’ll need to have your survey questions or other variable names in the field names of your dataset. Your data also needs to be numeric. Rows containing null values will be dropped, as factor analysis does not perform well with missing values; you may want to impute values before bringing your data into the macro. 

 

 

SusanCS_1-1589910560389.png

 

 

All you’ll need to do to configure the macro is: 

 

  • Choose which numeric variables you want to use in the analysis; and 
  • Select the number of factors you want to look for in your data. That number should be bigger than 1 but smaller than the number of variables you are using, since the goal is to simplify and reduce your data. 
    • You may want to start with a small number -- say, 3 or 4 -- and then examine the scree plot provided in the macro’s output to see what your best choice might be. (Last week’s post talks about choosing the number of factors in more detail. The scree plot will not change based on the number of factors you select as a starting point; it will be the same every time you run the analysis.)

Add a Browse tool to each output anchor on the macro. 

 

 

SusanCS_2-1589910560391.png

 

 

After running your analysis, check out your five Browse tools to find:

 

  1. The results of your Bartlett’s test of sphericity.
  2. The variables in each of the factors identified in your data, and the loadings on each of the variables within those factors.
  3. The variances explained by your factors; you may be most interested in the cumulative variance at the far right, which shows how much variance in your data your factors can together explain. 
  4. The communalities for the variables in your analysis.
  5. A scree plot for your data, like the one shown above, that can be used to determine a good number of factors to seek in the data.

Don’t worry -- there’s a complete explanation for all of these terms in last week’s post, so check it out as you explore your results.

 

 

SusanCS_3-1589910560413.png

 

 

That screenshot above shows all you have to do -- grab your data, tidy it up, and send it into the macro! Happy hunting for hidden factors.

Susan Currie Sivek
Senior Data Science Journalist

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek

Comments
chineeloh
8 - Asteroid

Hi @SusanCS ,

 

The macro seems to be missing the file in the file input, see image as attached. 

 

 

factor analysis error.PNG

SusanCS
Alteryx Alumni (Retired)

Hi @chineeloh, thanks for trying out the macro! You shouldn't need the dataset for the macro to function. However, if you would like to experiment with that dataset, you can download it from the original source. Hope that helps!

chineeloh
8 - Asteroid

Thank you @SusanCS for the help.

 

It seems that I need to put a Dynamic Select Tool before the Factor Analysis Macro to only select numeric fields. The Python tool behind the macro seems to still read the String variables if we do not remove any String Variables before this macro (even if we only select numerical variables in the F.A. macro), resulting in the the following error:

 

Error: factor_analysis_macro (106): Tool #1: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
pandas\_libs\lib.pyx in pandas._libs.lib.maybe_convert_numeric()
ValueError: Unable to parse string "T"

 

So, the macro did run successfully. However I can't seem to load the chart, see below:

factor chart.PNG

SusanCS
Alteryx Alumni (Retired)

Hi @chineeloh, yes, you'll have to get your dataset into fully numeric form before it goes into the macro. If your scree plot isn't showing in the Browse Tool's Report tab, try looking in your results window for the field labeled "chart_path," which should show you where the image is stored; you may be able to access it there.

wiggles
5 - Atom

Hello!  This is very useful.  Thank you!  Once I have the factors, though, do you have a trick for giving each record a factor score?