Hi all,
is there a way to fill missing values in a column with a random choice from available values from the same column (with probabilities matching the original probabilities)?
For example
Original | Imputed |
A | A |
A | A |
A | A |
[Null] | A |
C | C |
[Null] | A |
B | B |
A | A |
B | B |
A | A |
In this case for example missing values would be more often substituted by A than C.
Any help would be greatly appreciated.
Solved! Go to Solution.
Thanks for your prompt reply.
Unfortunately in this case mode would not work. Because values are not missing at random using mode (same value for all missing) introduces problems in the modelling phase.
I need to introduce some randomness.
Now having returned to my office (not on my iPhone), I have read your challenge more carefully and am prepared to discuss a solution. In the following picture/workflow I find the domain values that do exist and have created a random replacement. Based upon the number of existing values found, a number is chosen between 1 and that number. In your example, there are 8 non-null values. When a NULL is encountered, it finds the random # value from a replacement table.
Cheers,
Mark
This could work, thanks a lot
Hi @rmelchiotti @MarqueeCrew @DavidP
We have built some new missing value imputation macros here: https://community.alteryx.com/t5/Data-Science/Expand-Your-Predictive-Palette-IV-Imputation-Beyond-Me...
In this scenario, the MICE macro would be a great fit for random sample imputation, you could simply click the Random method under the configuration. Give it a try!
TL
User | Count |
---|---|
18 | |
14 | |
13 | |
9 | |
8 |