This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
There are two major repositories of R packages, CRAN (the Comprehensive R Archive Network) and Bioconductor. The Bioconductor repository has over 1000 packages, which are focused specifically on bioinformatics related applications, while CRAN does not focus on a specific application area, and has over 6000 contributed packages. In general, the functionality you will want to bring to Alteryx via R will be from a package that is on the CRAN repository.
With over 6000 packages, searching for a CRAN package with specific functionality by browsing through the contents of the CRAN repository is not very practical. The two ways I recommend finding a relevant package is by either looking at the appropriate "Task View" (a description of available packages that address a particular application), or doing a web search on the feature you are hoping to obtain, coupled with the addition of "R" to the search string.
For this macro, I used the web search approach, and entered the search string "entropy information gain R" into my preferred search engine. The first hit on this search was a link to the CRAN package FSelector. Examining the documentation to this package revealed that the package delivered the desired functionality through a function called information.gain, and this was one of three entropy based measures the package provides (the other two measures are the gain ratio and symmetrical uncertainty). All three of these functions took as arguments a formula of the form
target ~predictor1 +predictor2 +...+predictorN
and an R data frame (R's equivalent of a data table) containing the data. The output of each of these functions is a data frame that contains a single column with the value of the selected measure with one row for each of the predictor fields. The predictor field names are contained in the row.names metadata element of the data frame. We will make use of this information in creating an Alteryx macro to wrap this functionality.
The FSelector package provides exactly what we need, so it is time to install the package. There are a number of ways to install an R package in a way that allows it to be used with Alteryx. The one complication that can arise in doing this is on user machines where multiple copies of R are installed. For users not using Microsoft R, the Alteryx predictive installer places the R executables within the Alteryx installation (usually C:\Program Files\Alteryx). To make sure you are installing packages into the version of R Alteryx is using, open a command prompt and enter the command
making sure to use the quotes. This will bring up the R console program. In the console window, type the command
This will bring up a GUI asking you to select a CRAN mirror to download the package from, along with its dependencies (there are several). Select a mirror that is geographically close to you for best performance. In addition, the FSelector package makes use of several other packages that call Java, so you also need to have a JVM installed on your computer to create and use this macro (I'd recommend the Windows x64 Offline version available here).
Once R is done downloading and installing the packages, make sure that FSelector and all its dependencies were correctly installed. To do this, in the R console enter the command
This will cause R to load the FSelector package. If you did get an error message that some packages were not available (one possibility is the RWekajars package), install them using the install.packages command in the R Console. Once the needed packages have been installed, you can exit the R Console program.