community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.
My code runs in R, but not in the R Tool?
View full article
This post is part of the "Guide to Creating Your Own R-Based Macro" series.   The workflow is now ready to be converted into a macro. To do this, click on the canvas and then on the Workflow tab of the Workflow - Configuration menu, click on the Macro radio button to convert the workflow into a Standard Macro. At this point you will want to use the drop down menu option File > Save as... to save the file to yxmc format. My original workflow was saved to the file Entropy_Importance.yxmd, and I saved the macro to the file Entropy_Importance.yxmc.   We are now ready to add the user interface elements to the macro, and make several other changes. Figure 1 shows the final version of the basic macro.   Figure 1: The basic macro As the figure suggests, the major changes are the addition of a number of interface tools, the Text Input tool has been converted to a Macro Input tool, and a Macro Output tool has replaced the Browse tool of the original workflow. All of these tools fall under the Interface tool group.   I won't go into great detail, but I do want to give an overview of what is going on with the interface tools in the macro. Starting from the top left of the canvas, the first interface tool is a Drop Down tool that allows the user to select the target variable for the analysis. Inside, it is configured to only allow string type fields (which are converted to categorical variables in R) to be selected. The Action tool that it connects to modifies the upper Select tool to filter out all fields except the target field.   Moving to the right, the List Box tool allows the user to select a set of predictors. Within the tools configuration, only numeric variables (various integer, float, fixed decimal, and double types) are allowed to appear in the user interface. The Action tool associated with it modifies the lower Select tool based on the user's selection.   The final three tools as you move to the right in the canvas are Check Box tools, which if checked indicates whether a particular measure will be calculated. As you may have guessed, the macro itself will not only provide the information gain measure, but also the option of including the gain ratio, and symmetrical uncertainty entropy based measures as well.   Given the above, the code within the R tool (provided below) has gone through some alterations to allow for this additional functionality. In addition, the code example also illustrates how the user's input to the Check Box tools can be used as "question constants" in an R tool's code: # Load the FSelector package suppressWarnings(library(FSelector)) # Read in the data from Alteryx into R the_data <- read.Alteryx("#1") # Create a string of the potential predictors seperated by plus signs the_preds <- paste(names(the_data)[-1], collapse = " + ") # Get the name of the target field the_target <- names(the_data)[1] # Create a formula expression from the names of the target and predictors the_form <- as.formula(paste(the_target, the_preds, sep = " ~ ")) # Initialize the output data frame the_output <- data.frame(Field = names(the_data[-1])) col_names <- "Field" # Calculate the entropy based measure(s) selected by the user # via the "questions constants" if ('%Question.info.gain%' == "True") {     out <- information.gain(the_form, the_data)     the_output <- cbind(the_output, out[[1]])     col_names <- c(col_names, "Information Gain") } if ('%Question.gain.ratio%' == "True") {     out <- gain.ratio(the_form, the_data)     the_output <- cbind(the_output, out[[1]])     col_names <- c(col_names, "Gain Ratio") } if ('%Question.symm.uncertainty%' == "True") {     out <- symmetrical.uncertainty(the_form, the_data)     the_output <- cbind(the_output, out[[1]])     col_names <- c(col_names, "Symmetrical Uncertainty") } # Prepare the final output names(the_output) <- col_names # Output the results write.Alteryx(the_output)   It is now time to test to see if the basic macro works as expected in a workflow using different data. For the test workflow I decided to work with the Bank Marketing dataset from the UC Irvine Machine Learning Archive. The full dataset was used, which comes in CSV file format. As a result, the Auto Field tool was used to set appropriate field types. In addition, one of the predictor fields (pdays) is the number of days since a prospective customer was previously contacted with an offer to invest in a term savings account. Those who were never contacted for this product were given a code -1. Given this, the data is separated into those who have, and who have not, received a past telemarketing offer for a term savings account using a Filter tool. Finally, the basic macro was inserted into the workflow twice (based on right-clicking on the canvas and inserting the macro twice), and used against both of the data streams coming from the filter tool, with a Browse tool attached to both of them. The completed version of the test workflow is shown in Figure 2.   Figure 2: Test workflow Frequently, things will work as expected in the workflow contained in the macro, but not when the macro is used in a new workflow, and the test workflow should allow you to find any major errors in your macro.
View full article
This post is part of the "Guide to Creating Your Own R-Based Macro" series.   Now that we have the needed R packages installed, we can use them in an Alteryx workflow. The real purpose of this workflow is to begin to put together the macro itself. As a result, there will be some minor differences between this workflow and the one you would likely create if you didn't plan on using as the basis of developing a macro. The starting workflow of the macro is show in Figure 1.   Figure 1: The Initial Workflow The data used in this macro (contained in a Text Input tool) is Fisher's well known Iris data set. This data consists of the length and width of both the petals and stamens of individuals from three species of the Iris flower family. In this instance we want to know how important these four measures are in determining what species to which a particular flower belongs. While this dataset is pretty far afield from a business application, it is a nice dataset to work with for creating this macro since it is small (150 rows and five fields), and represents the correct case (a categorical target, species, and numeric predictors, height and width measurements).   The basic workflow consists of only six tools. A Text Input tool contains the Iris data, which feeds into two Select tools. The upper of the Select tools selects out the target field (the field Species), while the second selects the potential predictor fields to be examined. The downstream Join tool is used to bring the data back together in a way where the first column contains the target, and the subsequent columns contain the potential predictors to be examined.   This combination of three tools would be somewhat out of place in a standard (non-macro) workflow. In general, column position does not matter, moreover, even if it did, a single Select tool could be used to alter column position. However, in this case we will alter the position of columns based on a user's choices in the final macro's user interface, and the use of two select tools allows us to accomplish this task.   The data flowing into the R tool now consists of only the target field (the first column) and the selected numeric predictors in the remaining columns. The R tool contains the following lines of code # Load the FSelector package suppressWarnings(library(FSelector)) # Read in the data from Alteryx into R the_data <- read.Alteryx("#1") # Create a string of the potential predictors seperated by plus signs the_preds <- paste(names(the_data)[-1], collapse = " + ") # Get the name of the target field the_target <- names(the_data)[1] # Create a formula expression from the names of the target and predictors the_form <- as.formula(paste(the_target, the_preds, sep = " ~ ")) # Get the information gain measures out1 <- information.gain(the_form, the_data) # Prepare the results for output out <- data.frame(a = names(the_data)[-1], b = out1[[1]]) names(out) <- c("Field", "Information Gain") # Output the results write.Alteryx(out) The R code is fairly straightforward, with the possible exception of how the locations of values are indexed. For example, the code snippet names(the_data)[-1] takes all the provided field names except the first one (the [-1] index), which is the target field. The code snippet out[[1]] obtains the first (and only) column of the data frame returned by the information.gain R function.   The contents of the Browse tool (the sixth and last tool in the workflow) are the results of the analysis.
View full article
This post is part of the "Guide to Creating Your Own R-Based Macro" series.   There are two major repositories of R packages, CRAN (the Comprehensive R Archive Network) and Bioconductor . The Bioconductor repository has over 1000 packages, which are focused specifically on bioinformatics related applications, while CRAN does not focus on a specific application area, and has over 6000 contributed packages. In general, the functionality you will want to bring to Alteryx via R will be from a package that is on the CRAN repository.   With over 6000 packages, searching for a CRAN package with specific functionality by browsing through the contents of the CRAN repository is not very practical. The two ways I recommend finding a relevant package is by either looking at the appropriate "Task View" (a description of available packages that address a particular application), or doing a web search on the feature you are hoping to obtain, coupled with the addition of "R" to the search string.   For this macro, I used the web search approach, and entered the search string "entropy information gain R" into my preferred search engine. The first hit on this search was a link to the CRAN package FSelector . Examining the documentation to this package revealed that the package delivered the desired functionality through a function called information.gain, and this was one of three entropy based measures the package provides (the other two measures are the gain ratio and symmetrical uncertainty). All three of these functions took as arguments a formula of the form target ~ predictor1 + predictor2 +...+ predictorN   and an R data frame (R's equivalent of a data table) containing the data. The output of each of these functions is a data frame that contains a single column with the value of the selected measure with one row for each of the predictor fields. The predictor field names are contained in the row.names metadata element of the data frame. We will make use of this information in creating an Alteryx macro to wrap this functionality.   The FSelector package provides exactly what we need, so it is time to install the package. There are a number of ways to install an R package in a way that allows it to be used with Alteryx. The one complication that can arise in doing this is on user machines where multiple copies of R are installed. For users not using Microsoft R, the Alteryx predictive installer places the R executables within the Alteryx installation (usually C:\Program Files\Alteryx ). To make sure you are installing packages into the version of R Alteryx is using, open a command prompt and enter the command   "C:\Program Files\Alteryx\R-3.3.2\bin\x64\Rgui.exe" making sure to use the quotes. This will bring up the R console program. In the console window, type the command install.packages ( "FSelector" )   This will bring up a GUI asking you to select a CRAN mirror to download the package from, along with its dependencies (there are several). Select a mirror that is geographically close to you for best performance. In addition, the FSelector package makes use of several other packages that call Java, so you also need to have a JVM installed on your computer to create and use this macro (I'd recommend the Windows x64 Offline version available here).   Once R is done downloading and installing the packages, make sure that FSelector and all its dependencies were correctly installed. To do this, in the R console enter the command library (FSelector)   This will cause R to load the FSelector package. If you did get an error message that some packages were not available (one possibility is the RWekajars package), install them using the install.packages command in the R Console. Once the needed packages have been installed, you can exit the R Console program.
View full article
Most of the Alteryx advanced analytics capabilities - including most of the tools in the Predictive, AB Testing, Time Series, Predictive Grouping, and Prescriptive categories - are built as R-based macros under the hood. If there's a piece of functionality that you're looking for that's lacking in Alteryx but is available in R and you have modest R coding abilities, you can extend Alteryx by creating your own R-based Alteryx tool.   The macro creation process involves four steps (q uick links to the guides in the series 😞 Find and install an appropriate R package to provide the needed functionality. Develop an Alteryx workflow that makes use of the relevant R functions via the use of an R tool. This workflow becomes the basis of the macro. Create a macro that provides the basic functionality you want, and test it in a new workflow. Polish the macro by documenting it, giving it the ability to generate a report, and doing other things to make it more polished.  The various Alteryx files created in this tutorial are attached to this post.    Once you've created the new tool, don't forget to share it with the wider community by publishing it to the Alteryx Analytics Gallery.   Background   Recently I have been working with an existing customer that is considering expanding the use of Alteryx within their organization to include other groups. Some of those groups are focused on developing predictive analytics models, and currently its members are using a number of different software products. Based on this, there are certain features that they often use in some of those products that are not available "out of the box" in Alteryx. While these features are heavily used by some members of this group, they aren't as widely used in general. A trade-off we face in developing Alteryx is to provide generally needed functionality without blowing up the number of available tools to where their sheer number becomes overwhelming to new Alteryx users.   In a number of instances we have developed new tools at the request of customers to address their needs, providing them with the tools immediately, and then folding them into a subsequent release of the product or publishing them to the Predictive District on the Alteryx Analytics Gallery. A particular case in point is the MB Affinity tool, which was part of the 10.0 release of Alteryx. The MB Affinity tool provides cosine similarity/distance measures for items. This is a common method used in creating recommendation systems of the "people who bought this item also bought" variety.   Getting back to the issue faced by the predictive analytics team of our current customer, one feature of another product that they currently use, which isn't currently pre-packaged in Alteryx, is a tool that examines the importance of potential numeric predictors for a categorical target field using an entropy based measure known as information gain or Kullback–Leibler divergence . In this series, I illustrate how to create an Alteryx macro that provides this measure.
View full article
  Alteryx has a full set of integrated predictive tools but even with developers working at full  speed , it is hard to keep up with the R community. Sometimes users want to install and utilize their  favorite  R packages. This post demonstrates how to use and install additional R packages.
View full article
With the release of 11.0, we see numerous changes to many tools in the Designer. The Linear Regression Tool gets a UI makeover and some cool new features are added that we will explore in this article. If you are new to performing regression analysis in Alteryx, I highly recommend checking out the Tool Mastery article which goes into everything there is to the old tool. Everything presented in that article remains valid as no features were removed. In this article, we will delve into the changes and new features.
View full article
This tool provides a number of different univariate time series plots that are useful in both better understanding the time series data and determining how to proceed in developing a forecasting model.
View full article
The Association Analysis Tool allows you to choose any numerical fields and assesses the level of correlation between those fields. You can either use the Pearson product-moment correlation, Spearmen rank-order correlation, or Hoeffding's D statistics to perform your analysis. You can also have the option of doing an in-depth analysis of your target variable in relation to the other numerical fields. After you’ve run through the tool, you will have two outputs:
View full article
Question I am building a forecast for my company using the Time Series forecasting model. The sample workflow that Alteryx currently provides uses one product to forecast. I have multiple products I need to forecast - is there a way I can add a product column so I could forecast for all the products at one time? Answer The tools you're looking for are the TS Factory Tools, available in the Predictive District in the Gallery:   These tools estimate time series forecasting models for multiple groups at once using the autoregressive moving average (ARIMA) method or the exponential smoothing (ETS) method; they also provide forecasts from groups of either ARIMA or ETS models for a user-specified number of future periods.   Just like the original tools, the ETS method in the TS Model Factory does not allow fields related to the target variable (covariate fields) to be used in the model creation. However, the Autoregressive Moving Average (ARIMA) method does allow the use of covariates.      There's a sample workflow that demonstrates these tools with a use case involving bookings and website traffic for a hotel chain with four locations in the Denver metro area.   Happy Alteryx-ing!
View full article
Linear regression  is a statistical approach that seeks to model the relationship between a dependent (target) variable and one or more predictor variables. It is one of the oldest forms of regression and its applications throughout history have been endless for modeling all kinds of phenomena. In linear regression, a line of best fit is calculated using the least squares method . This linear equation is then used to calculate projected values for the target variable given a set of new values for the predictor variables.
View full article
Sampling weights, also known as survey weights, are positive values associated with the observations (rows) in your dataset (sample), used to ensure that metrics derived from a data set are representative of the population (the set of observations).
View full article
Neural Networks are frequently referred to as "black box" predictive models. This is because the actual inner workings of why a Neural Network sorts data the way it does are not explicitly available for interpretation. A wide variety of work has been conducted to make Neural Networks more transparent, ranging from visualization methods to developing a Neural Network model that can “show it’s work”. This article demonstrates how to leverage the NeuralNetTools R package to create a plot of the Neural Network trained by the Alteryx Neural Net tool. 
View full article
R-based tools available to download from Gallery.
View full article
The Alteryx Forest Tool implements a random forest model using functions in the randomForest R package. Random forest models are an ensemble learning method that leverages the individual predictive power of decision trees into a more robust model by creating a large number of decision trees (i.e., a "forest") and combining all of the individual estimates of the trees into a single model estimate.  In this Tool Mastery, we will be reviewing the configuration of the Forest Model Tool, as well as its outputs. 
View full article
Typically the first step of Cluster Analysis in Alteryx Designer, the K-Centroids Diagnostics Tool assists you to in determining an appropriate number of clusters to specify for a clustering solution in the K-Centroids Cluster Analysis Tool, given your data and specified clustering algorithm. Cluster analysis is an unsupervised learning algorithm, which means that there are no provided labels or targets for the algorithm to base its solution on. In some cases, you may know how many groups your data ought to be split into, but when this is not the case, you can use this tool to guide the number of target clusters your data most naturally divides into.
View full article
In statistics,  standardization  (sometimes called data normalization or feature scaling) refers to the process of rescaling the values of the variables in your data set so they share a common scale. Often performed as a pre-processing step, particularly for cluster analysis, standardization may be important to getting the best result in your analysis depending on your data. 
View full article
Clustering analysis has a wide variety of use cases, including harnessing spatial data for grouping stores by location, performing customer segmentation or even insurance fraud detection. Clustering analysis groups individual observations in a way that each group (cluster) contains data that are more similar to one another than the data in other groups. Included with the Predictive Tools installation, the  K-Centroids Cluster Analysis Tool  allows you to perform cluster analysis on a data set with the option of using three different algorithms;  K-Means ,  K-Medians , and   Neural Gas . In this Tool Mastery, we will go through the configuration and outputs of the tool.
View full article
The Neural Network Tool in Alteryx implements functions from the nnet package in R to generate a type of neural networks called multilayer perceptrons. By definition, neural network models generated by this tool are feed-forward (meaning data only flows in one direction through the network) and include a single Hidden Layer. In this Tool Mastery, we will review the configuration of the tool, as well as what is included in the Object and Report outputs.
View full article
With the introduction of the Predictive Analytics Starter Kit , you can enhance your analytic skills through an interactive, guided starter kit that teaches core predictive modeling techniques (A/B testing, linear regression, and logistic regression)
View full article