Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Interaction terms in regressions

davidb
5 - Atom

Hi all, new Alteryx user here. I am setting up a workflow around some predictive models I have - both a linear and logistic model. Normally (in R) I have interaction terms in these models, but I cannot see a way of implementing these conveniently, short of creating the variables separately. 

 

i.e. the logistic regression tool can happily make me:

 

lm(y ~ x1 + x2)

 

but not 

 

lm(y ~ x1 * x2)

 

Am I missing something or is there a good work around? I suppose I could make the interactions as separate fields but this will be very cumbersome for some of my larger models with hundreds of regressors.

 

Relatedly, is there a more convenient way logging regressors than simply making new fields?

 

Thanks

15 REPLIES 15
Charity_K_Wilson
10 - Fireball

@DrDan

OH MY FREAKIN' WORD!  This is awesome!  WOW!  I'm going to have to process this one.  Thank you so much!  

 

How many interactions can I push into this thing before it will break on me?  What's the syntax for Log, Exp, and Polynomials?  I'm used to writing poly(x,2,raw=TRUE) for a squared variable.  

DrDan
Alteryx Alumni (Retired)

Like logb and bsplines, the poly function is not simple ("simple" is defined to be functions with only a single argument). As a result, squared terms will still need to be calculated using a Formula tool (as I indicated, it is a limited tool). In a formula expression, the power terms (^2 and ^3) actually aren't exponents, but determine the level of the interactions. As an example, the expression y ~ (a + b + c)^2 will produce all main effects and two-way interactions, while y ~ (a + b + c)^3 will produce all main effects, two-way, and three-way interactions. In terms of the number of interaction terms, it is basically unlimited.

 

Dealing with more complex transformation functions should be possible by modifying the macro, but the parsing for the sanity check becomes a bit more challenging.

 

Dan

DrDan
Alteryx Alumni (Retired)

After some thought, I realized that more complicated transformation functions could be taken into account fairly easily. The one issue is that R will give basically the transformation function call as the field name, so poly(x, 2, raw = TRUE), results in the new fields of "poly(x, 2, raw = TRUE)1" and "poly(x, 2, raw = TRUE)2". The first value is the original value of x, while the second is the square of x. Unfortunately, these complicated names can cause havoc downstream for Alteryx, so I highly recommend placing a Select tool after the Formula Transformation tool, run the workflow to force through the needed metadata, and then rename columns that have names that look like function calls to more standard names (e.g., rename "poly(x, 2, raw = TRUE)1" to "x"), after which you can place an appropriate regression modeling tool. the more complicated transformation supported are poly (for polynomials), logb (for logarithms with a user specified base), and bs (for b-splines, which is provided by the splines package). For simple transformation, say a natural logarithm, formulas look like y ~ log(x1) + x2*x3. Due to the need move metadata through the macro, run the workflow once after configuring the macro, and then connect it to downstream tools.

deargle
7 - Meteor

Thanks @DrDan for this tool, I appreciate it.

DrDan
Alteryx Alumni (Retired)

I wasn't quite done, here is the latest rev. The problem with complex column names bothered me. I've now fixed that as well. The passing of metadata needs to be addressed, but it is just about ready for the Predictive District in the Alteryx Analytics Gallery. This time I'm just sending the macro.

Gualigee
8 - Asteroid

Hi @deargle, what do you mean by "interaction effect"? Thank you. 

Labels