Data Science

Machine learning & data science for beginners and experts alike.
bingqian_gao
7 - Meteor

Are you new to data science? Are you transitioning from a business analyst to a citizen data scientist? Or are you a seasoned data scientist with a relevant degree from university?

Regardless of when you started out on the journey of machine learning, have you ever felt lost when facing the long list of models, not knowing which one you should choose for your problem? Or maybe you are familiar with Logistic Regression and Linear Regression, but have always wondered what those other algorithms can be used for?

 

I would like to share with you the Alteryx Predictive Flowcharts that our Data Science practice created here at TrueCue.

 

The Predictive Flowcharts visualise some common considerations that analysts face when choosing a Predictive algorithm. They give some ideas and guidance when selecting which model to use, for instance, consider what kind of data you are trying to predict, what volume of data you have, or how important it is for the model to be interpretable.

 

 

Click image for an interactive version!Click image for an interactive version!
The flowcharts are designed to accompany TrueCue’s Predictive Analytics Alteryx training for novice data analysts and provide a starting point for learning the Predictive Analytics toolbox in Alteryx. A trained Data Scientist will spot some simplifications and generalisations.

 

The Data Investigation tool category includes tools for understanding the data to be used in a predictive analytics project, and tools for conducting specialised data sampling tasks for predictive analytics. Understanding what your data looks like is the first step of designing a machine learning solution.


Click the image for an interactive version!Click the image for an interactive version!

 

 

Model selection plays a crucial role in a predictive project. When we get our data, we typically start with some basic descriptive analysis to investigate and understand the data we are dealing with. Then based on the predictive goal, we determine if we have a Classification problem (where we want to classify data into groups or categories, e.g. predicting if a loan applicant will default), or a Regression problem (where we want to predict numbers, e.g. predicting how many software licenses we are going to sell next quarter).

 

After deciding whether we have a Classification or Regression problem, we can move to the model selection. You will see in the flowcharts that some models can be used for both Classification and Regression, while some can only be used for one of the two. There might be multiple models that are suitable in a given situation, and you don’t know which one will perform better.


Click image for an interactive version!Click image for an interactive version!

Click image for an interactive version!Click image for an interactive version!

This is why we have a Validation process, where we split the data, train the selected models and validate their performance with a “hold-out” dataset (which was hidden from all the models during the training stage) so that we can compare the model performance. Sometimes we split the data in multiple folds so that we can test the performance multiple times to increase the robustness – this is called cross-validation.

Once we have a winning model, we can then use this model to create prediction – this is called Inference (or scoring).

Click image for an interactive version!Click image for an interactive version!

 

 

If you find the charts useful or would like to share your thoughts or comments, please drop us a line or reply to this blog. We would love to hear from you!

 

The flowcharts were created by Katelyn Weber (analytics) and Jakub Szepietowski (design).

 

 

headshot.PNG
Bingqian Gao

Data Science Lead

Bingqian believes in the power of Analytics and Data Science in uncovering insights and helping to better inform decision making. As a Senior Consultant and Data Science Lead at TrueCue, she enjoys finding solutions for challenges in data consolidation, modelling, visualisation and Advanced Analytics. She leverages modern technology such as Alteryx, Tableau, DataRobot, and Microsoft Azure Machine Learning, and is one of the 17 Certified Alteryx Experts in the world. Outside of work, she enjoys a wide range of activities, from oil painting, poetry reading, scuba diving, to boxing and krav maga. Find @bingqian_gao on LinkedIn, or reach out via email.

Comments
TedW
Alteryx Alumni (Retired)

This is a great article thank you for sharing.  

Kartar_Singh
6 - Meteoroid

One of the best article and great way to explain which tools can be used for predictive analytics.

Tatiana2
5 - Atom

Hi 

I have a question, how can I import all these tools to my tool bar?

 

mceleavey
17 - Castor
17 - Castor

@Tatiana2 ,

 

You need to  check you have downloaded the R Tools from the Alteryx download site.

Also, ensure you have selected them to be shown on your toolbar by clicking the + sign at the top right of Alteryx which says "Add/Remove tools" when you hover over it, and making sure you have checked the predictive tools sections:

 

mceleavey_0-1616779264097.png

 

M.

Tatiana2
5 - Atom

Hi 

Sorry if I ask st... questions, I am new and I want to use this soft to analyze data for one project.

If you look here i have only a few tools on the right, how can I add more. I need the logistic regression...

Thank you very much for support

Tatiana2_0-1616781791382.png

 

mceleavey
17 - Castor
17 - Castor

Yes, it appears you have not downloaded the R pedictive tools from the Alteryx download site.

You need to download and install them first.

 

M.

 

Tatiana2
5 - Atom

sorry mceleavey what's the right link to download because I think I downloaded something else🤔?

Thank you very much

Joe_Lipski
13 - Pulsar
13 - Pulsar

@Tatiana2 you will need to log into: http://licenses.alteryx.com/ with your community credentials > Click on Alteryx Designer > Click on the correct version of Alteryx that you have installed > And they should show up there: Make sure you choose admin or non-admin (to match the designer install):

 

joe_lipski_0-1617014740527.png

 

DawnDuong
13 - Pulsar
13 - Pulsar

This is a great article, very useful flowcharts. Thanks for sharing!

sireeshagandam
8 - Asteroid

good