Alteryx Designer Desktop Discussions

rmelchiotti · ‎02-20-2019

Hi all,

is there a way to fill missing values in a column with a random choice from available values from the same column (with probabilities matching the original probabilities)?

For example

Original	Imputed
A	A
A	A
A	A
[Null]	A
C	C
[Null]	A
B	B
A	A
B	B
A	A

In this case for example missing values would be more often substituted by A than C.

Any help would be greatly appreciated.

MarqueeCrew · ‎02-20-2019

I would use a summarize tool to calculate the mode (most frequently occurring value) and then use the append fields tool to attach that field to each record. The formula tool would then have an expression to update the field value if it was null.

IIF(IsNull([original]),[mode_original],[original])

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

rmelchiotti · ‎02-20-2019

Thanks for your prompt reply.

Unfortunately in this case mode would not work. Because values are not missing at random using mode (same value for all missing) introduces problems in the modelling phase.

I need to introduce some randomness.

DavidP · ‎02-20-2019

This workflow would pick a random row from the dataset based on the probability you mentioned. The next step is to wrap it in a batch macro so that it dows it for each null value.

MarqueeCrew · ‎02-20-2019

@rmelchiotti,

Now having returned to my office (not on my iPhone), I have read your challenge more carefully and am prepared to discuss a solution. In the following picture/workflow I find the domain values that do exist and have created a random replacement. Based upon the number of existing values found, a number is chosen between 1 and that number. In your example, there are 8 non-null values. When a NULL is encountered, it finds the random # value from a replacement table.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

rmelchiotti · ‎02-20-2019

This could work, thanks a lot

TimothyL · ‎08-31-2020

Hi @rmelchiotti @MarqueeCrew @DavidP

We have built some new missing value imputation macros here: https://community.alteryx.com/t5/Data-Science/Expand-Your-Predictive-Palette-IV-Imputation-Beyond-Me...

In this scenario, the MICE macro would be a great fit for random sample imputation, you could simply click the Random method under the configuration. Give it a try!

TL

Alteryx Designer Desktop Discussions

Imputing missing values by a random sample of the available values

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...