This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Welcome to Part 2 of the Pre-Predictive series! After a strong start but long hiatus, we will be resuming our tour of the Data Investigation Tools. This section will cover the Frequency Table, Contingency Table and Distribution Analysis Tools.
Welcome to Part 3 (out of 4) of the Pre-Predictive series. In this article series, we are introducing you to the very exciting world of data investigation. This section covers the Association Analysis Tool, The Pearson Correlation Tool, and the Spearman Correlation Tool!
Welcome to the closing chapter of our voyage through the Pre-Predictive series! This has been a four-part journey introducing you to the thrilling world of data investigation. This section covers the plotting tools included in the Data Investigation Toolbox.
The humble histogram is something many people are first exposed to in grade school. Histograms are a type of bar graph that display the distribution of continuous numerical data. Histograms are sometimes confused with bar charts, which are plots of categorical variables.
Typically the first step of Cluster Analysis in Alteryx Designer, the K-Centroids Diagnostics Tool assists you to in determining an appropriate number of clusters to specify for a clustering solution in the K-Centroids Cluster Analysis Tool, given your data and specified clustering algorithm. Cluster analysis is an unsupervised learning algorithm, which means that there are no provided labels or targets for the algorithm to base its solution on. In some cases, you may know how many groups your data ought to be split into, but when this is not the case, you can use this tool to guide the number of target clusters your data most naturally divides into.
Clustering analysis has a wide variety of use cases, including harnessing spatial data for grouping stores by location, performing customer segmentation or even insurance fraud detection. Clustering analysis groups individual observations in a way that each group (cluster) contains data that are more similar to one another than the data in other groups. Included with the Predictive Tools installation, the K-Centroids Cluster Analysis Tool allows you to perform cluster analysis on a data set with the option of using three different algorithms; K-Means, K-Medians, and Neural Gas. In this Tool Mastery, we will go through the configuration and outputs of the tool.
Time series forecasting is using a model to predict future values based on previously observed values. In a time series forecast, the prediction is based on history and we are assuming the future will resemble the past. We project current trends using existing data.
The Association Analysis Tool allows you to choose any numerical fields and assesses the level of correlation between those fields. You can either use the Pearson product-moment correlation, Spearmen rank-order correlation, or Hoeffding's D statistics to perform your analysis. You can also have the option of doing an in-depth analysis of your target variable in relation to the other numerical fields. After you’ve run through the tool, you will have two outputs:
The Field Summary Tool analyzes data and creates a summary report containing descriptive statistics of data in selected columns. It’s a great tool to use when you want to make sure your data is structured correctly before using any further analysis, most notably with the suite of models that can be generated with the Predictive Tools.
This article is part of the Tool Mastery Series, a compilation of Knowledge Base contributions to introduce diverse working examples for Designer Tools. Here we’ll delve into uses of the Pearson Correlation Tool on our way to mastering the Alteryx Designer.
Far more than just a window to your data, the Browse Tool has a catalog of features to best view, investigate, and copy/save data at any checkpoint you place it. That introspection to your data anywhere in your blending gives valuable feedback that often speeds workflow development and makes it easier to learn tools by readily visualizing their transforms. Be equipped, and browse through the catalog of useful applications below!
Inside the Laboratory tool set you'll find the Basic Data Profile Tool. This tool is similar to the Field Summary Tool in that it provides information about each field within your data such as length, type, source, shortest and longest values, and more. It differs from the Field Summary however when you get to the missing data details. The Field Summary tool gives you a single value for Percent Missing, but makes no distinction between whether that percentage is Null or Empty values. The Basic Data Profile tool gives you a count of records that have Null values, and a count of records that are blank.
A common task that analysts can run into (and a good practice when analyzing data) is to determine if the means of 2 sampled groups are significantly different. When this inquest arises, the Test of Means tool is right for you! To demonstrate how to configure this tool and how to interpret the results, a workflow has been attached. The attached workflow (v. 11.7 ) compares the amount of money that customers spent across different regions in the US. The Dollars_Spent field identifies the amount of money an individual spent and the Region field identifies the region that the individual resides in (NORTH, SOUTH, EAST, WEST).
Characters that are not on a standard English keyboard may need translation into Unicode or a language-specific code page for Designer and database drivers to read them correctly.
Characters with incorrect encoding will often appear as boxes or question marks in the Designer Results screen and error messages.
Unicode characters take more bytes than English ASCII characters. Changing the column type and increasing the column size may be needed. In Designer, the column size is the number of characters, not the number of bytes.
This article was put together to resolve a common issue with cleansing your data as well as to show the use of tools and techniques that are not normally used for newer users. The goal of the article is to get newer users into these tools to open their creativity with the tool and hopefully take you to the next level!
In this use case, the data in the attached workflow is messy with capitalized strings all over the place. We want to format the data by removing some of the capitalization, but not all of it.
Note: If we wanted to make every first letter of the word capitalized we can use the Formula Tool and the TitleCase(String) function. This would make BEAR the WEIGHT - Bear The Weight. See the difference?
The tools that we will be using in this exercise is the Record ID, Text to Columns, RegEx, Formula, Tile, and Cross Tab Tools.
The exercise will show you the importance of using the Record ID Tool. The flexibility of the Text to Columns and RegEx Tools, the under-used Tile Tool, the creativity of the Formula Tool, and the not so scary Cross Tab tool when then data is configured properly.
We hope that these exercise and use cases open up your mind and the greatness of Alteryx!
See attached workflow and enjoy!
The Contingency Table tool is a part of the Data Investigation category in Alteryx Designer, which comes as a part of the predictive tools installation. Intuitively, you can use the Contingency Table tool to create a contingency table.