Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Imputing missing values by a random sample of the available values

rmelchiotti
7 - Meteor

Hi all,

is there a way to fill missing values in a column with a random choice from available values from the same column (with probabilities matching the original probabilities)?

 

For example

OriginalImputed
AA
AA
AA
[Null]A
CC
[Null]A
BB
AA
BB
AA

 

In this case for example missing values would be more often substituted by A than C.

 

Any help would be greatly appreciated.

6 REPLIES 6
MarqueeCrew
20 - Arcturus
20 - Arcturus
I would use a summarize tool to calculate the mode (most frequently occurring value) and then use the append fields tool to attach that field to each record. The formula tool would then have an expression to update the field value if it was null.

IIF(IsNull([original]),[mode_original],[original])

Cheers,

Mark
Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
rmelchiotti
7 - Meteor

Thanks for your prompt reply.

Unfortunately in this case mode would not work. Because values are not missing at random using mode (same value for all missing) introduces problems in the modelling phase.

I need to introduce some randomness.

DavidP
17 - Castor
17 - Castor

This workflow would pick a random row from the dataset based on the probability you mentioned. The next step is to wrap it in a batch macro so that it dows it for each null value.

 

random row.png

 

 

MarqueeCrew
20 - Arcturus
20 - Arcturus

@rmelchiotti,

 

Now having returned to my office (not on my iPhone), I have read your challenge more carefully and am prepared to discuss a solution.  In the following picture/workflow I find the domain values that do exist and have created a random replacement.  Based upon the number of existing values found, a number is chosen between 1 and that number.  In your example, there are 8 non-null values.  When a NULL is encountered, it finds the random # value from a replacement table.

 

capture.jpg

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
rmelchiotti
7 - Meteor

This could work, thanks a lot

TimothyL
Alteryx Alumni (Retired)

Hi @rmelchiotti @MarqueeCrew @DavidP 

 

We have built some new missing value imputation macros here: https://community.alteryx.com/t5/Data-Science/Expand-Your-Predictive-Palette-IV-Imputation-Beyond-Me...

 

In this scenario, the MICE macro would be a great fit for random sample imputation, you could simply click the Random method under the configuration. Give it a try! 

 

TL

Labels
Top Solution Authors