This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Neural Networks are frequently referred to as "black box" predictive models. This is because the actual inner workings of why a Neural Network sorts data the way it does are not explicitly available for interpretation. A wide variety of work has been conducted to make Neural Networks more transparent, ranging from visualization methods to developing a Neural Network model that can “show it’s work”. This article demonstrates how to leverage the NeuralNetTools R package to create a plot of the Neural Network trained by the Alteryx Neural Net tool.
The Alteryx Forest Tool implements a random forest model using functions in the randomForest R package. Random forest models are an ensemble learning method that leverages the individual predictive power of decision trees into a more robust model by creating a large number of decision trees (i.e., a "forest") and combining all of the individual estimates of the trees into a single model estimate. In this Tool Mastery, we will be reviewing the configuration of the Forest Model Tool, as well as its outputs.
Typically the first step of Cluster Analysis in Alteryx Designer, the K-Centroids Diagnostics Tool assists you to in determining an appropriate number of clusters to specify for a clustering solution in the K-Centroids Cluster Analysis Tool, given your data and specified clustering algorithm. Cluster analysis is an unsupervised learning algorithm, which means that there are no provided labels or targets for the algorithm to base its solution on. In some cases, you may know how many groups your data ought to be split into, but when this is not the case, you can use this tool to guide the number of target clusters your data most naturally divides into.
In statistics, standardization (sometimes called data normalization or feature scaling) refers to the process of rescaling the values of the variables in your data set so they share a common scale. Often performed as a pre-processing step, particularly for cluster analysis, standardization may be important to getting the best result in your analysis depending on your data.
Clustering analysis has a wide variety of use cases, including harnessing spatial data for grouping stores by location, performing customer segmentation or even insurance fraud detection. Clustering analysis groups individual observations in a way that each group (cluster) contains data that are more similar to one another than the data in other groups. Included with the Predictive Tools installation, the K-Centroids Cluster Analysis Tool allows you to perform cluster analysis on a data set with the option of using three different algorithms; K-Means , K-Medians , and Neural Gas . In this Tool Mastery, we will go through the configuration and outputs of the tool.
The Neural Network Tool in Alteryx implements functions from the nnet package in R to generate a type of neural networks called multilayer perceptrons. By definition, neural network models generated by this tool are feed-forward (meaning data only flows in one direction through the network) and include a single Hidden Layer. In this Tool Mastery, we will review the configuration of the tool, as well as what is included in the Object and Report outputs.
With the introduction of the Predictive Analytics Starter Kit , you can enhance your analytic skills through an interactive, guided starter kit that teaches core predictive modeling techniques (A/B testing, linear regression, and logistic regression)
R is an open-source programming language and software environment, specifically intended for statistical computing and graphics. The Alteryx Predictive Tools install includes an installation of R, along with a set of R Packages used by the Predictive Tools. This article describes how to determine which R packages (and versions) are installed for used with your Alteryx R Tool, as well as a few Alteryx-specific packages on Github.
You want to impress your managers, so you decide to try some predictions on your data – forecasting, scoring potential marketing campaigns, finding new customers… That's great! Welcome to the addictive world of predictive analytics. We have the perfect platform for you to start exploring your data.
I know you want to dive right in and start testing models. It's tempting to just pull some data and start trying out tools, but the first and fundamentally most important part of all statistical analysis is the data investigation.
Your models won't mean much unless you understand your data. Here's where the Data Investigation Tools come in! You can get a statistical breakdown of each of your variables, both string and numeric, check for outliers (categorical and continuous), test correlations to slim down your predictors, and visualize the frequency and dispersion within each of your variables.
Part 1 of this article will give you an overview of the Field Summary Tool (never leave home without it!) Part 2 will touch on the Contingency and Frequency Tables, and Distribution Analysis; Part 3 will be the Association Analysis Tool, and the Pearson and Spearman Correlations; and Part 4 will be all the cool plotting tools.
Always, every day, literally every time you acquire a new data set, you will start with the Field Summary Tool. I cannot emphasize this enough, and I promise it will save you headaches.
There are three outputs to this tool: a data table containing your fields and their descriptive statistics, a static report, and the interactive visualization dashboard that provides a visual profile of your variables. From this output, you can select subsets to view, sort each of the panels, view and zoom in on specific values, and it even includes a visual indicator of data quality.
You'll get a nifty report with plots and descriptive statistics for each of your variables. Likely the most important part of this report is '% Missing' – ideally, you want 0.0% missing. If you are missing values, don't fret. You can remove these records or impute those values (another reason knowing your data is so important).
Also check 'Unique Values' – if you have a single unique value in one of your variables, that won't add anything useful to your model, so consider deselecting that variable.
The Remarks field is also very useful – it will suggest field-type changes for fields with a small number of unique values, perhaps that should be a string field. Or, if some values of your field have a small number of value counts, you may consider combining some value levels together.
The better YOU know your data, the more efficient and accurate your models will be. Only you know your data, your use case, and how your results are going to be applied. But we're here to help you get as familiar as you can with whatever data you have.
Stay tuned for subsequent articles – these tools will be your new best friends. Happy Alteryx-ing!
The Append Cluster Tool is effectively a Score Tool for the K-Centroids Cluster Analysis Tool. It takes the O anchor output (the model object) of the K-Centroids Cluster Analysis Tool, and a data stream (either the same data used to create the clusters, or a different data set with the same fields), and appends a cluster label to each incoming record. This Tool Mastery reviews its use.
As most of us can agree, predictive models can be extremely useful. Predictive models can help companies allocate their limited marketing budget on the most profitable group of customers, help non-profit organizations to find the most willing donors to donate to their cause, or even determine the probability a student will be admitted into a given school. A well-designed predictive model can help us make smart and cost-effective business decisions.
You may have run across this error, using the html plugin predictive tools (Linear Regression, Logistic Regression, Decision Tree):
Logistic Regression: Error in searchDir(dbDir, lang) : Logistic Regression: Expecting a single string value: [type=NULL; extent=0]
In 2018.2, this can happen when you have previously had an Admin version of Designer installed, but have since uninstalled. Once you've installed the 2018.2 non-Admin version with Predictive tools, these errors will now occur.
Help is on the way! (In the form of suggestions and an upcoming stable release.) You have several options. First, you can install an Admin version of Designer concurrently - 11.8, 2018.1, 2018.2, etc.
Last ditch effort: delete registry keys. This is not recommended - only delete keys if you cannot install a current version, or cannot wait until the next stable update.
Step 0) Save your license key somewhere easy to find: Options> Manage Licenses
Step 1) Open the Registry Editor (type regedit into your windows search bar) and delete the following directory:
Now, go predict stuff! Happy Alteryx-ing.
Logistic Regression is different from other types of regression because it creates predictions within a range of 0-1 and it does not assume that the predictor variables have a constant marginal effect on the target variable - making it applicable to many dichotomous problems including: estimating the probability that a student will graduate, the probability that a voter will vote for a specific candidate, or the probability that someone will respond to a marketing campaign.
A common concern in predictive modeling is whether a model has been overfit. In statistics, overfitting refers to the phenomena when an analytical model corresponds too closely (or exactly) to a specific data set, and therefore may fail when applied to additional data or future observations. One common method that can be used to mitigate overfitting is regularization . Regularization places controls on how large the coefficients of the predictor variables grow. In Alteryx, the option of implementing regularized regression is available for the Linear Regression and Logistic Regression Tools.
The subtitle to this article should be a short novel on configuring the Decision Tree Tool in Alteryx . The initial configuration of the tool is very simple, but it you chose to customize the configuration of the tool at all, it can get complicated quickly. In this article, I am focusing on the configuration of the Tool. However, because it is a Tool Mastery, I am covering everything within the configuration of the tool
Overview: I wrote this as a short example into how one might use Alteryx to write a further Alteryx module to do complicated or repetitive tasks dynamically that would be difficult to do through the front end.
This module will automatically produce another Alteryx module that will do frequency statistics for a file. This should save the manual time (for files with lots of columns) adding a summarize for each column. It also saves transposing the file (which for large files is very slow to run). Instructions:
Change the input to that module to whichever file you like (or use Testing.yxmd which is provided)
Run it – this will create the Result.yxmd module
Open Result.yxmd – and change the input in the module to be the same file you used in step 2
Change the output if necessary (it defaults to an Alteryx database)
At the moment it does deal with &’s and single quotes in files, but won’t do anything clever like do stats on substrings for long fields.
I hope this inspires people to use this technique and build on the module I’ve built.
The humble histogram is something many people are first exposed to in grade school. Histograms are a type of bar graph that display the distribution of continuous numerical data. Histograms are sometimes confused with bar charts, which are plots of categorical variables.