I'm creating a macro in which I have an R script node that is running a simple linear regression model. Yes, I know linear regression is available as a tool - I'm trying to stick with something simple here to scale up for other processes. Most of my R processes I need to select a single target and an unknown amount of predictor variables. Essentially I need to partially mimic the configuration window in the Linear Regression tool. I have dissected the macro of the linear regression tool but the predictor selection is throwing me off.
My R Script is simple:
data <- read.Alteryx("#1", mode="data.frame")
RGR <- lm(Target ~ IV1+IV2+IV3, data = data)
I have been successful in replacing "Target" with an Action where I have the action replace the string "Target" but I can't figure out how to replace IV1:3 dynamically. If in the future I have 8 variables instead of 3 that I'm using I need to replace "IV1+IV2+IV3" with "IV1+IV2+IV3+IV4+IV5+IV6+IV7+IV8" and so forth.
It seems like from the linear regression macro that I need to do something with XML (which I'm unfamiliar with) If anyone has any input on how to accomplish this with an action, I'd be eternally grateful!
Thanks!
Solved! Go to Solution.
Hi @Anthony_A - admittedly I have very little personal experience with R, but if I understood your question correctly, you want to be able to replace IV1, IV2, IV3, etc. variables within your R script with field references from an incoming data set (but please correct me if I am misunderstanding).
If so, one idea that may be worth trying is to place your R tool within a macro and leverage the Action tool's "Update Raw XML with Formula" setting along with a List Box that will allow you to select the fields from an incoming data set you want to assign as predictor variables.
If you have not worked with directly manipulating a tool's XML before, make sure to first go to Options > User Settings > Edit User Settings > Advanced > check the box for 'Display XML in Properties Window' and click Save. After doing this, you will notice a new option available within the tool configuration pane, which shows the following when displayed for the R tool loaded with your script:
This shows us the XML behind the tool configuration, which looks more intimidating than it actually is. If you want to replace the "IV1+IV2+IV3" string in your script with the same syntax for fields from your incoming dataset, after adding the R tool with your script to a macro workflow and connecting an Action (configured for 'Update Raw XML with Formula') and List Box (configured for 'Generate Custom List'), you should be able to pass the full string of your script into the R tool (updated with references to your dynamically selected fields as predictor variables) by adding all of your script text up to the "IV1+IV2+IV3" part in the 'Start Text' field, adding + to the 'Separator' field, and all of your script text following "IV1+IV2+IV3" to your 'End Text' field.
By doing this, your macro should essentially re-write the script each time you run it, substituting "IV1+IV2+IV3" for the fields you select in the macro tool's configuration interface from your main workflow, delimited by +. For example, if you select Field1, Field2, Field3, and Field4 in your macro config, your macro should pass the following string into your R tool:
data <- read.Alteryx("#1", mode="data.frame")
RGR <- lm(Target ~ Field1+Field2+Field3+Field4, data = data)
To force the new line / carriage return in your script between lines 1 and 2, you could place a placeholder character (e.g., "|") at the end of the string you added to the List Box's 'Start Text' field and then update your Action formula expression to something like:
regex_replace([#1], "\|", "\n")
Hope that helps! Let me know if that doesn't work for you or if you have any clarifying questions.
Hi @Anthony_A
If you only need to replace IV1+IV2+IV3, you can use a simple "Update Value".
Please refer attached file.
Hi @AkimasaKajitani
Thanks for your solution! That is a good work around but I think would be better for a user to select the variables with a List Box or another easy selection function. Do you know of a method of using a list box to dynamically replace "IV1+IV2+IV3" with selected fields from the List Box? So then "IV1+IV2+IV3" would be replaced with "Selectedfield1+Selectedfield2+Selectedfield3+Selected4... and so forth" from the selected columns in the List Box?
It looks like the linear regression macro updates the XML and I have been trying to take a similar approach with a list box connected to an action replacing the outer xml code with:
' <SelectFields>
<SelectField field="' + replace(EscapeXMLMetacharacters([IV1+IV2+IV3]), ",", '" selected="True" />
<SelectField field="') + '" selected="True" />
<SelectField field="*Unknown" selected="False" />
</SelectFields>'
but I haven't had any luck. Let me know if you have any thoughts!
Really appreciate the help!
Hi @Anthony_A
You can use "Generate Custom List" option at List Box tool.
Setting is below.
Start Text and End Text is Null and Separator is "+".
Hi @Anthony_A
There might be a simpler more elegant solution here, but below is how I'm thinking about this one (but let me know if I'm misunderstanding, given I have limited experience with R):
You have an R script which contains both static syntax and dynamic syntax, where the dynamic syntax consists of "Target" (i.e., the field in your incoming dataset representing the target variable) and "IV1+IV2+IV3" (i.e., 1+ fields in your incoming dataset representing the predictor variables), and where the static syntax consists of everything else.
To select your target and predictor variables from the macro's configuration within your main workflow, I think you are on the right path with trying to use a List Box tool for your predictor variables (configured for 'Generate Custom List' with a Separator of +). I would recommend using a Drop Down tool to select your target variable, given you'd want to limit this to a single field.
One way to do this would be to have a macro that used the above interface tools to update the dynamic portions of your script that you place within a Text Input tool. You can then concatenate the columns (using a Formula) and rows (using a Summarize) into a single record which would contain an updated version of your script with substituted values for the target and predictor variables.
Once you have that, you can create another separate batch macro that contains the R tool connected to a Control Parameter via an Action, along with a Macro Input / Macro Output. The Control Parameter passed into this batch macro will be your updated script from the first macro. Save this macro and embed it in your first macro (see below screenshots and attached workflow package for details).
The result of this should be that you would first parameterize your script from your main workflow by selecting the desired target and predictor variables > then your first macro will re-create your script including the updated variable references in place > which then get passed it into the nested batch macro where it is injected into the R tool and executed.
While I believe this method should work given I have used it successfully in different but similar scenarios, I will note that I got the below errors when I attempted this on a dummy dataset. Granted, these errors appear to be driven by syntax issues in the R script versus an issue with the macro setup (to test this, I added a second output to test what text string was being passed into the embedded batch macro and it does appear to align with your expected script). Given my R experience is limited, I didn't spend time trying to resolve the error, but hopefully you are more familiar with that piece and can more effectively troubleshoot.
Let me know if this suggested approach works for you.
Macro 1:
Macro 2:
Thanks @joshbennet and @AkimasaKajitani for all your help!
AkimasaKajitani's solution worked for me. At first it gave some really weird behavior where the variables in the list box would not reflect a different data source. A save and restart seemed to fix this behavior.
Thanks!!!