This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
During my visit of a customer, I came across a feature normalization request from a data science team in the audience.
To be honest, this is the first time I tried to do data normalization in Alteryx. There are multiple super fast and nifty ways to do that in Alteryx.
The fact that you can find out in practically no time 3 different solutions at Alteryx Community, plus write your own tool in Python if you feel like getting your hands dirty with a bit of code just shows how powerful our platform is!
So what is data normalization?
For those new to the concept of data normalization, let me include the greatest explanation of the concept I have read. Ever. Credits to my Fight Club colleague @FadiB
"Data normalization is a way to bring the data to the same scale so that scale-sensitive models like regressions create better models (not give more weight to larger scaled data). Random Forests and Boosted models are less sensitive to this issue. To oversimplify - normalization is a feature transformation where you’re taking numeric columns and bringing all the values between -1 and 1 or some other standard or normalized scale."
How to do that in Alteryx then?
• Including two macros readily available as part of the Designer predictive toolset (z-score and unit interval standardization). By default, you can find these in
• Including a feature normalization Macro from the Gallery (note: This will also normalize new datasets based off of the parameters from your initial model building normalizing metrics so that you can use an old model on new data)
• And as part of my recent Python marathon - I put together a tool using sci-kit package and its MinMaxScaler (i think Normalizer could be used as well) to demonstrate yet another way to use the Python code tool
Note: All of these approaches, including macros and sample data packaged up and attached to this post.
The code used in the Python Code tool
# Transforms features by scaling each feature to a given range.
# This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one.
from ayx import Alteryx
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
df[df.columns] = scaler.fit_transform(df[df.columns])
Hopefully, this will help you get a good idea of how to approach the data normalization challenges.
Feeling challenged to try our Python Code tool and expand on its use here? Try to add the ability to normalize new datasets based off of the parameters from your initial model building normalizing metrics so that you can use an old model on new data. I would be super happy to see this down in replies.