community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Knowledge Base

Definitive answers from Designer experts.
Community v19.6

Looks aren't everything... But the latest Community refresh looks darn good!

Learn More

Pre-Predictive: Using the Data Investigation Tools - Part 1 of 4

Community Content Engineer
Community Content Engineer
Created on

You want to impress your managers, so you decide to try some predictions on your data – forecasting, scoring potential marketing campaigns, finding new customers…  That's great! Welcome to the addictive world of predictive analytics.  We have the perfect platform for you to start exploring your data.

 

I know you want to dive right in and start testing models.  It's tempting to just pull some data and start trying out tools, but the first and fundamentally most important part of all statistical analysis is the data investigation.

 

Your models won't mean much unless you understand your data.  Here's where the Data Investigation Tools come in!  You can get a statistical breakdown of each of your variables, both string and numeric, check for outliers (categorical and continuous), test correlations to slim down your predictors, and visualize the frequency and dispersion within each of your variables.

 

Part 1 of this article will give you an overview of the Field Summary Tool (never leave home without it!)  Part 2 will touch on the Contingency and Frequency Tables, and Distribution Analysis; Part 3 will be the Association Analysis Tool, and the Pearson and Spearman Correlations; and Part 4 will be all the cool plotting tools.

 

Field Summary.jpg

 

Always, every day, literally every time you acquire a new data set, you will start with the Field Summary Tool.  I cannot emphasize this enough, and I promise it will save you headaches.

 

There are three outputs to this tool: a data table containing your fields and their descriptive statistics, a static report, and the interactive visualization dashboard that provides a visual profile of your variables.  From this output, you can select subsets to view, sort each of the panels, view and zoom in on specific values, and it even includes a visual indicator of data quality.

 

You'll get a nifty report with plots and descriptive statistics for each of your variables.  Likely the most important part of this report is '% Missing' – ideally, you want 0.0% missing.  If you are missing values, don't fret.  You can remove these records or impute those values (another reason knowing your data is so important).

 

Also check 'Unique Values' – if you have a single unique value in one of your variables, that won't add anything useful to your model, so consider deselecting that variable. 

 

The Remarks field is also very useful – it will suggest field-type changes for fields with a small number of unique values, perhaps that should be a string field.  Or, if some values of your field have a small number of value counts, you may consider combining some value levels together.

 

The better YOU know your data, the more efficient and accurate your models will be.  Only you know your data, your use case, and how your results are going to be applied.  But we're here to help you get as familiar as you can with whatever data you have.

 

Stay tuned for subsequent articles – these tools will be your new best friends.  Happy Alteryx-ing!

Comments
Atom

Excuse me in advance if this is well-known!

 

Can the output of the data investigation tools be written to a PDF file?

 

Could a long PDF file of all of the investigations for every field be generated?

 

I found a recommendation to use the Render function to render R tables.  What about flattening the graphical output (HTML)?

Alteryx
Alteryx

@akrinsky - you can use the Render tool to create PDF and HTML!

Meteoroid

Very helpful. Thank you! It prompted me to create an infographic 🙂02.Data Investigation Infographics.001.jpeg