Start your journey with Alteryx Machine Learning - Take our Interactive Lesson today!

Alteryx Machine Learning Discussions

Find answers, ask questions, and share expertise about Alteryx Machine Learning.
Getting Started

Start your learning journey with Alteryx Machine Learning Interactive Lessons

Go to Lessons

I don't understand this Kaggle Titanic example

nyck3333
8 - Asteroid

It's blogged about here and comes with the solution workflows: https://community.alteryx.com/t5/Data-Science/Tackling-the-Titanic-Kaggle-Competition-with-Assisted-...

 

But when I open the classification solved solution and read along about filling null values in the fare column with the median, the solution workbooks uses a formula with some arbitrary value

 

nyck3333_0-1684137528354.png

 

Can somebody please show me step by step everything in that bottom left where test.csv's fare column is being filled with median values where it's null?

When I open up the Select and Summarize, the default setting is at average.  I tried changing it to median but don't see the numbers in the fare column change at all to reflect that, null stays null. 

 

In Python Pandas this would be trivial.

2 REPLIES 2
acarter881
12 - Quasar

Hello, @nyck3333.

 

I'm not sure why the example is so verbose and hardcoded.

 

This can be accomplished using the Imputation tool.

 

Please see the attached png file and workflow.

Eden60
7 - Meteor

I check your URL https://community.alteryx.com/t5/Data-Science/Tackling-the-Titanic-Kaggle-Competition-with-Assisted-salesforce admin

This URL is really helpful for me. thanks for sharing this.

 

You can follow these steps: 

Step 1: Import the CSV file

Step 2: Locate the "fare" column

Step 3: Calculate the median value

Step 4: Fill null values with the median

Step 5: Save or apply changes

 

I hope this will help you.