cancel
Showing results for 
Search instead for 
Did you mean: 

Guide to Creating Your Own R-Based Macro - Create and Test a Basic Macro

Alteryx
Alteryx

This post is part of the "Guide to Creating Your Own R-Based Macro" series.

 

The workflow is now ready to be converted into a macro. To do this, click on the canvas and then on the Workflow tab of the Properties window, click on the Macro radio button to convert the workflow into a Standard Macro. At this point you will want to use the drop down menu option File > Save as... to save the file to yxmc format. My original workflow was saved to the file Entropy_Importance.yxmd, and I saved the macro to the file Entropy_Importance.yxmc.

 

We are now ready to add the user interface elements to the macro, and make several other changes. Figure 1 shows the final version of the basic macro.

 

Figure 1: The basic macroFigure 1: The basic macro

As the figure suggests, the major changes are the addition of a number of interface tools, the Text Input tool has been converted to a Macro Input tool, and a Macro Output tool has replaced the Browse tool of the original workflow. All of these tools fall under the Interface tool group. Chad Martin wrote an excellent blog post that provides an overview on what these tools provide and how to work with them at the time of the 9.0 release. As a result, reading that post will likely be helpful if you have not yet used these tools.

 

I won't go into great detail, but I do want to give an overview of what is going on with the interface tools in the macro. Starting from the top left of the canvas, the first interface tool is a Drop Down tool that allows the user to select the target variable for the analysis. Inside, it is configured to only allow string type fields (which are converted to categorical variables in R) to be selected. The Action tool that it connects to modifies the upper Select tool to filter out all fields except the target field.

 

Moving to the right, the List Box tool allows the user to select a set of predictors. Within the tools configuration, only numeric variables (various integer, float, fixed decimal, and double types) are allowed to appear in the user interface. The Action tool associated with it modifies the lower Select tool based on the user's selection.

 

The final three tools as you move to the right in the canvas are Check Box tools, which if checked indicates whether a particular measure will be calculated. As you may have guessed, the macro itself will not only provide the information gain measure, but also the option of including the gain ratio, and symmetrical uncertainty entropy based measures as well.

 

Given the above, the code within the R tool (provided below) has gone through some alterations to allow for this additional functionality. In addition, the code example also illustrates how the user's input to the Check Box tools can be used as "question constants" in an R tool's code:

# Load the FSelector package
suppressWarnings(library(FSelector))
# Read in the data from Alteryx into R
the_data <- read.Alteryx("#1")
# Create a string of the potential predictors seperated by plus signs
the_preds <- paste(names(the_data)[-1], collapse = " + ")
# Get the name of the target field
the_target <- names(the_data)[1]
# Create a formula expression from the names of the target and predictors
the_form <- as.formula(paste(the_target, the_preds, sep = " ~ "))
# Initialize the output data frame
the_output <- data.frame(Field = names(the_data[-1]))
col_names <- "Field"
# Calculate the entropy based measure(s) selected by the user
# via the "questions constants"
if ('%Question.info.gain%' == "True") {
    out <- information.gain(the_form, the_data)
    the_output <- cbind(the_output, out[[1]])
    col_names <- c(col_names, "Information Gain")
}
if ('%Question.gain.ratio%' == "True") {
    out <- gain.ratio(the_form, the_data)
    the_output <- cbind(the_output, out[[1]])
    col_names <- c(col_names, "Gain Ratio")
}
if ('%Question.symm.uncertainty%' == "True") {
    out <- symmetrical.uncertainty(the_form, the_data)
    the_output <- cbind(the_output, out[[1]])
    col_names <- c(col_names, "Symmetrical Uncertainty")
}
# Prepare the final output
names(the_output) <- col_names
# Output the results
write.Alteryx(the_output)

 

It is now time to test to see if the basic macro works as expected in a workflow using different data. For the test workflow I decided to work with the Bank Marketing dataset from the UC Irvine Machine Learning Archive. The full dataset was used, which comes in CSV file format. As a result, the Auto Field tool was used to set appropriate field types. In addition, one of the predictor fields (pdays) is the number of days since a prospective customer was previously contacted with an offer to invest in a term savings account. Those who were never contacted for this product were given a code -1. Given this, the data is separated into those who have, and who have not, received a past telemarketing offer for a term savings account using a Filter tool. Finally, the basic macro was inserted into the workflow twice (based on right-clicking on the canvas and inserting the macro twice), and used against both of the data streams coming from the filter tool, with a Browse tool attached to both of them. The completed version of the test workflow is shown in Figure 2.

 

Figure 2: Test workflowFigure 2: Test workflow

Frequently, things will work as expected in the workflow contained in the macro, but not when the macro is used in a new workflow, and the test workflow should allow you to find any major errors in your macro.