Dear Community,
I would like to introduce some random values into a dataset for research purposes (i.e. to determine the impact of abnormal values on different algorithms robustness).
The idea is that I have a raw dataset with a few columns (Xs) and the predicted column (Y).
I would like to introduce abnormal values within the range of each column, like this:
Original dataset:
X1 | X2 | X3 | X4 | X5 | Y |
1 | 11 | 403 | 55 | 19 | 641 |
2 | 42 | 361 | 65 | 17 | 1110 |
6 | 6 | 206 | 71 | 17 | 350 |
2 | 46 | 267 | 20 | 17 | 1061 |
3 | 40 | 261 | 70 | 17 | 979 |
4 | 40 | 148 | 55 | 17 | 860 |
8 | 33 | 350 | 74 | 20 | 1055 |
4 | 29 | 484 | 59 | 17 | 1011 |
9 | 38 | 308 | 22 | 16 | 936 |
Modified dataset:
X1 | X2 | X3 | X4 | X5 | Y |
1 | 11 | 403 | 55 | 19 | 641 |
2 | 42 | 122 | 65 | 17 | 1110 |
6 | 6 | 206 | 71 | 20 | 350 |
10 | 46 | 267 | 64 | 17 | 1061 |
3 | 40 | 261 | 70 | 17 | 979 |
4 | 40 | 148 | 55 | 17 | 860 |
8 | 5 | 350 | 74 | 20 | 1055 |
4 | 29 | 298 | 59 | 17 | 1011 |
9 | 38 | 308 | 22 | 16 | 936 |
So far, I am calculating the min and max of each column, add them to the data set and create the updated “X” with a formula:
X1 | X1_Min | X1_Max | X2 | X2_Min | X2_Max | X3 | … | Y |
1 | 1 | 9 | 11 | 6 | 46 | 403 | … | 641 |
2 | 1 | 9 | 42 | 6 | 46 | 361 | … | 1110 |
6 | 1 | 9 | 6 | 6 | 46 | 206 | … | 350 |
2 | 1 | 9 | 46 | 6 | 46 | 267 | … | 1061 |
3 | 1 | 9 | 40 | 6 | 46 | 261 | … | 979 |
4 | 1 | 9 | 40 | 6 | 46 | 148 | … | 860 |
8 | 1 | 9 | 33 | 6 | 46 | 350 | … | 1055 |
4 | 1 | 9 | 29 | 6 | 46 | 484 | … | 1011 |
9 | 1 | 9 | 38 | 6 | 46 | 308 | … | 936 |
For a 1% introduction of abnormal values in X1, I proceed like this with a formula:
IF RandInt(100) = 100 THEN
[Min_X1]+[Max_X1]*Rand()
ELSE
[X1]
ENDIFSo it means that I have to do it for each column... (I have some datasets with more than 50 "X" columns...)
Is there a way to use the multi-field formula tool like this:
IF RandInt(100) = 100 THEN
[Next Column]+[Next+1 Column]*Rand()
ELSE
[_CurrentField_]
ENDIFI have attached the dummy dataset and workflow it this can help J
Thanks,
Pierre-Louis