This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Welcome to Part 2 of the Pre-Predictive series! After a strong start but long hiatus, we will be resuming our tour of the Data Investigation Tools. This section will cover the Frequency Table, Contingency Table and Distribution Analysis Tools.
Welcome to Part 3 (out of 4) of the Pre-Predictive series. In this article series, we are introducing you to the very exciting world of data investigation. This section covers the Association Analysis Tool, The Pearson Correlation Tool, and the Spearman Correlation Tool!
Welcome to the closing chapter of our voyage through the Pre-Predictive series! This has been a four-part journey introducing you to the thrilling world of data investigation. This section covers the plotting tools included in the Data Investigation Toolbox.
This article was put together to resolve a common issue with cleansing your data as well as to show the use of tools and techniques that are not normally used for newer users. The goal of the article is to get newer users into these tools to open their creativity with the tool and hopefully take you to the next level!
In this use case, the data in the attached workflow is messy with capitalized strings all over the place. We want to format the data by removing some of the capitalization, but not all of it.
Note: If we wanted to make every first letter of the word capitalized we can use the Formula Tool and the TitleCase(String) function. This would make BEAR the WEIGHT - Bear The Weight. See the difference?
The tools that we will be using in this exercise is the Record ID, Text to Columns, RegEx, Formula, Tile, and Cross Tab Tools.
The exercise will show you the importance of using the Record ID Tool. The flexibility of the Text to Columns and RegEx Tools, the under-used Tile Tool, the creativity of the Formula Tool, and the not so scary Cross Tab tool when then data is configured properly.
We hope that these exercise and use cases open up your mind and the greatness of Alteryx!
See attached workflow and enjoy!
Inside the Laboratory tool set you'll find the Basic Data Profile Tool . This tool is similar to the Field Summary Tool in that it provides information about each field within your data such as length, type, source, shortest and longest values, and more. It differs from the Field Summary however when you get to the missing data details. The Field Summary tool gives you a single value for Percent Missing, but makes no distinction between whether that percentage is Null or Empty values. The Basic Data Profile tool gives you a count of records that have Null values, and a count of records that are blank.
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Pearson Correlation Tool on our way to mastering the Alteryx Designer.
We love helping users be successful with Alteryx, and this means providing a ton of great resources for getting started, learning more, and keeping you up to date with all the amazing stuff we're doing here at Alteryx… and the most compelling is Predictive!
Check out the Predictive District on the Gallery. There are great macros, apps, and sample workflows to demonstrate some nifty new tools. This post by DrDan on the Analytics Blog gives an overview of what's currently available – stay tuned for additions!
One of my favorites is the Predictive Analytics Starter Kit Volume 1. It enables you to learn the fundamentals of key predictive models with an interactive guided experience. Examples include Linear Regression, Logistic Regression, and AB Testing, and demonstrates the steps necessary to develop the dataset needed for analysis, and then how to actually build these predictive models yourself.
With v10.6, we introduced the Prescriptive Tool Category, comprising the Optimization and Simulation tools, to assist with determining the best course of action or outcome for a particular situation or set of scenarios. The Engine Works Blog has an introduction to this toolset, plus an extensive use case demonstration.
If you need more Optimization and Simulation action, there are several sample workflows, including Fantasy Sports Lineups (hey, sports fans – blog post here!), a mixing problem, workforce scheduling, and more!
Speaking of use cases, the software itself contains a plethora of predictive sample workflows - and the installed Starter Kits show up here, too! Help > Sample Workflows > Predictive Analytics.
Of course, don't forget the Predictive Analytics help pages, for overviews and configuration tips.
Visit our Product Training page for On-Demand and Virtual webinars on everything Predictive – regression modelling, cluster analysis, time series… As always, please begin with Data Prep and Investigation! Can I mention the Field Summary Tool enough times?
Want to show off the interactive visualizations from the models you've built? This Knowledge Base post shows you how. Another Engine Works post outlines how to build your own Custom Interactive Visualizations (Part 1 and counting…)
For the most in-depth, resource-rich training on leveraging predictive analytics to answer your business questions, consider the Udacity Predictive Analytics for Business NanoDegree. It consists of seven courses focused on selecting the right methodology, data preparation, and data visualization as well as four courses that will equip you to use predictive analytics to answer your business problems.
But really, it all starts with the Community. Cruise the Knowledge Base posts, search for Predictive or other favorite keywords, follow the blogs… and for the love of Ned, just play with the software! It's how we learn 🙂
A common task that analysts can run into (and a good practice when analyzing data) is to determine if the means of 2 sampled groups are significantly different. When this inquest arises, the Test of Means tool is right for you! To demonstrate how to configure this tool and how to interpret the results, a workflow has been attached. The attached workflow (v. 11.7 ) compares the amount of money that customers spent across different regions in the US. The Dollars_Spent field identifies the amount of money an individual spent and the Region field identifies the region that the individual resides in (NORTH, SOUTH, EAST, WEST).
The Association Analysis Tool allows you to choose any numerical fields and assesses the level of correlation between those fields. You can either use the Pearson product-moment correlation, Spearmen rank-order correlation, or Hoeffding's D statistics to perform your analysis. You can also have the option of doing an in-depth analysis of your target variable in relation to the other numerical fields. After you’ve run through the tool, you will have two outputs:
The Contingency Table tool is a part of the Data Investigation category in Alteryx Designer, which comes as a part of the predictive tools installation. Intuitively, you can use the Contingency Table tool to create a contingency table.
Typically the first step of Cluster Analysis in Alteryx Designer, the K-Centroids Diagnostics Tool assists you to in determining an appropriate number of clusters to specify for a clustering solution in the K-Centroids Cluster Analysis Tool, given your data and specified clustering algorithm. Cluster analysis is an unsupervised learning algorithm, which means that there are no provided labels or targets for the algorithm to base its solution on. In some cases, you may know how many groups your data ought to be split into, but when this is not the case, you can use this tool to guide the number of target clusters your data most naturally divides into.
Clustering analysis has a wide variety of use cases, including harnessing spatial data for grouping stores by location, performing customer segmentation or even insurance fraud detection. Clustering analysis groups individual observations in a way that each group (cluster) contains data that are more similar to one another than the data in other groups. Included with the Predictive Tools installation, the K-Centroids Cluster Analysis Tool allows you to perform cluster analysis on a data set with the option of using three different algorithms; K-Means , K-Medians , and Neural Gas . In this Tool Mastery, we will go through the configuration and outputs of the tool.
You want to impress your managers, so you decide to try some predictions on your data – forecasting, scoring potential marketing campaigns, finding new customers… That's great! Welcome to the addictive world of predictive analytics. We have the perfect platform for you to start exploring your data.
I know you want to dive right in and start testing models. It's tempting to just pull some data and start trying out tools, but the first and fundamentally most important part of all statistical analysis is the data investigation.
Your models won't mean much unless you understand your data. Here's where the Data Investigation Tools come in! You can get a statistical breakdown of each of your variables, both string and numeric, check for outliers (categorical and continuous), test correlations to slim down your predictors, and visualize the frequency and dispersion within each of your variables.
Part 1 of this article will give you an overview of the Field Summary Tool (never leave home without it!) Part 2 will touch on the Contingency and Frequency Tables, and Distribution Analysis; Part 3 will be the Association Analysis Tool, and the Pearson and Spearman Correlations; and Part 4 will be all the cool plotting tools.
Always, every day, literally every time you acquire a new data set, you will start with the Field Summary Tool. I cannot emphasize this enough, and I promise it will save you headaches.
There are three outputs to this tool: a data table containing your fields and their descriptive statistics, a static report, and the interactive visualization dashboard that provides a visual profile of your variables. From this output, you can select subsets to view, sort each of the panels, view and zoom in on specific values, and it even includes a visual indicator of data quality.
You'll get a nifty report with plots and descriptive statistics for each of your variables. Likely the most important part of this report is '% Missing' – ideally, you want 0.0% missing. If you are missing values, don't fret. You can remove these records or impute those values (another reason knowing your data is so important).
Also check 'Unique Values' – if you have a single unique value in one of your variables, that won't add anything useful to your model, so consider deselecting that variable.
The Remarks field is also very useful – it will suggest field-type changes for fields with a small number of unique values, perhaps that should be a string field. Or, if some values of your field have a small number of value counts, you may consider combining some value levels together.
The better YOU know your data, the more efficient and accurate your models will be. Only you know your data, your use case, and how your results are going to be applied. But we're here to help you get as familiar as you can with whatever data you have.
Stay tuned for subsequent articles – these tools will be your new best friends. Happy Alteryx-ing!
Far more than just a window to your data, the Browse Tool has a catalog of features to best view, investigate, and copy/save data at any checkpoint you place it. That introspection to your data anywhere in your blending gives valuable feedback that often speeds workflow development and makes it easier to learn tools by readily visualizing their transforms. Be equipped, and browse through the catalog of useful applications below!
The humble histogram is something many people are first exposed to in grade school. Histograms are a type of bar graph that display the distribution of continuous numerical data. Histograms are sometimes confused with bar charts, which are plots of categorical variables.
Time series forecasting is using a model to predict future values based on previously observed values. In a time series forecast, the prediction is based on history and we are assuming the future will resemble the past. We project current trends using existing data.
The Field Summary Tool analyzes data and creates a summary report containing descriptive statistics of data in selected columns. It’s a great tool to use when you want to make sure your data is structured correctly before using any further analysis, most notably with the suite of models that can be generated with the Predictive Tools.