Alteryx Designer Desktop Discussions

DavidM · ‎10-19-2018

How to do normalization in Alteryx?

During my visit of a customer, I came across a feature normalization request from a data science team in the audience.

To be honest, this is the first time I tried to do data normalization in Alteryx. There are multiple super fast and nifty ways to do that in Alteryx.

The fact that you can find out in practically no time 3 different solutions at Alteryx Community, plus write your own tool in Python if you feel like getting your hands dirty with a bit of code just shows how powerful our platform is!

So what is data normalization?

For those new to the concept of data normalization, let me include the greatest explanation of the concept I have read. Ever. Credits to my Fight Club colleague @fadib

"Data normalization is a way to bring the data to the same scale so that scale-sensitive models like regressions create better models (not give more weight to larger scaled data). Random Forests and Boosted models are less sensitive to this issue. To oversimplify - normalization is a feature transformation where you’re taking numeric columns and bringing all the values between -1 and 1 or some other standard or normalized scale."

How to do that in Alteryx then?

• Including two macros readily available as part of the Designer predictive toolset (z-score and unit interval standardization). By default, you can find these in

C:\Program Files\Alteryx\bin\RuntimeData\Macros\Predictive Tools\Supporting_Macros

• Including a feature normalization Macro from the Gallery (note: This will also normalize new datasets based off of the parameters from your initial model building normalizing metrics so that you can use an old model on new data)

• And as part of my recent Python marathon - I put together a tool using sci-kit package and its MinMaxScaler (i think Normalizer could be used as well) to demonstrate yet another way to use the Python code tool

Note: All of these approaches, including macros and sample data packaged up and attached to this post.

The code used in the Python Code tool

# Normalization
# Transforms features by scaling each feature to a given range.
# This estimator scales and translates each feature individually such that it is in the given range on the training set, i.e. between zero and one.

from ayx import Alteryx
from sklearn import preprocessing
import pandas

df=Alteryx.read("#1")

scaler = preprocessing.MinMaxScaler()

df[df.columns] = scaler.fit_transform(df[df.columns])

Alteryx.write(df,1)

df

Bottom Line

Hopefully, this will help you get a good idea of how to approach the data normalization challenges.

Feeling challenged to try our Python Code tool and expand on its use here? Try to add the ability to normalize new datasets based off of the parameters from your initial model building normalizing metrics so that you can use an old model on new data. I would be super happy to see this down in replies.

Oh, and by the way - Alteryx rocks!

David Matyas
Sales Engineer
Alteryx

Nezrin · ‎02-18-2020

This is really insightful. Thanks for sharing this @DavidM !

Thanks,
Nez
Alteryx ACE | Sydney Alteryx User Group Lead | Sydney SparkED Contributor

zakellaoui · ‎05-23-2020

Thanks for this post!

AkimasaKajitani · ‎02-10-2023

Hi @DavidM ,

Thank you for your sharing.

Is the Standardize z-score macro officially supported by Alteryx? When I used this macro with AMP on, the result is wrong.

Please fix it to work correctly with AMP on.

Alteryx Designer Desktop Discussions

How to do Feature Normalization in Alteryx (incl. Python Code tool)