Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Iterative Randomness within Columns Range

pierrelouisbescond
8 - Asteroid

Dear Community,

 

I would like to introduce some random values into a dataset for research purposes (i.e. to determine the impact of abnormal values on different algorithms robustness).

The idea is that I have a raw dataset with a few columns (Xs) and the predicted column (Y).

 

I would like to introduce abnormal values within the range of each column, like this:

 

Original dataset:

X1

X2

X3

X4

X5

Y

1

11

403

55

19

641

2

42

361

65

17

1110

6

6

206

71

17

350

2

46

267

20

17

1061

3

40

261

70

17

979

4

40

148

55

17

860

8

33

350

74

20

1055

4

29

484

59

17

1011

9

38

308

22

16

936

 

Modified dataset:

X1

X2

X3

X4

X5

Y

1

11

403

55

19

641

2

42

122

65

17

1110

6

6

206

71

20

350

10

46

267

64

17

1061

3

40

261

70

17

979

4

40

148

55

17

860

8

5

350

74

20

1055

4

29

298

59

17

1011

9

38

308

22

16

936

 

So far, I am calculating the min and max of each column, add them to the data set and create the updated “X” with a formula:

 

X1

X1_Min

X1_Max

X2

X2_Min

X2_Max

X3

Y

1

1

9

11

6

46

403

641

2

1

9

42

6

46

361

1110

6

1

9

6

6

46

206

350

2

1

9

46

6

46

267

1061

3

1

9

40

6

46

261

979

4

1

9

40

6

46

148

860

8

1

9

33

6

46

350

1055

4

1

9

29

6

46

484

1011

9

1

9

38

6

46

308

936

 

For a 1% introduction of abnormal values in X1, I proceed like this with a formula:

 

 

IF RandInt(100) = 100 THEN

    [Min_X1]+[Max_X1]*Rand()

ELSE

    [X1]

ENDIF

 

 

So it means that I have to do it for each column... (I have some datasets with more than 50 "X" columns...)

 

Is there a way to use the multi-field formula tool like this:

 

 

IF RandInt(100) = 100 THEN

    [Next Column]+[Next+1 Column]*Rand()

ELSE

    [_CurrentField_]

ENDIF

 

 

I have attached the dummy dataset and workflow it this can help J

Thanks,
Pierre-Louis

7 REPLIES 7
andyuttley
11 - Bolide
11 - Bolide

Hey @pierrelouisbescond 

 

There's loads of ways to do this. My first instinct was to take your current logic and wrap it up into a macro that runs through every column. To do this, you just need to transpose the data first, then shift it back at the end in a cross tab.

 

Transposing means we can refer to [Name] and [Value] as opposed to specific field names, making it dynamic. Probably could be done through a multi field too though - like i say, this would just be my default method. 

Random1.PNG

Rand2.PNG

 (note that i've switched to "IF RandInt(10) = 10" just for testing, you can switch that back... 

 

Example attached.

Hope this helps

 

Andy

pierrelouisbescond
8 - Asteroid

Thanks a lot @andyuttley !

 

I am happy to see that there was no "obvious" answer which I have missed... and understand that the "transpose" option is regularly an answer to the questions I am raising. I need to deep dive on these possible applications!

 

I am going to "mimic" your solution on real and large datasets 🙂

JoeS
Alteryx
Alteryx

Bit slow to reply as I got stuck on a call at work.

 

But I went down the Dynamic replace method, which skips the need to create a macro, and hopefully you agree its a pretty neat solution:

 

2019-06-27_10-58-48.jpg

 

It should be fully dynamic too🙂

pierrelouisbescond
8 - Asteroid

Hi @JoeS !

Wow, I had never touched the dynamic replace tool and this is a very powerful one!

I will do my best to transpose it to my real datasets... the hardest challenge will be to keep it in mind for future problems 🙂

JoeS
Alteryx
Alteryx

Indeed, it's definitely an under utilised tool!

andyuttley
11 - Bolide
11 - Bolide

@JoeS wrote:

Indeed, it's definitely an under utilised tool!


Yes, was great to see it in a relevant use case here; definitely one I want to get using 

JoeS
Alteryx
Alteryx

@andyuttley wrote:

 

Yes, was great to see it in a relevant use case here; definitely one I want to get using 

Yeah, a clue to remind me to consider it is if the data itself is needed to write the formula, then it may be the right way to go.

 

Like in this instance you needed the min and max values to write the formula then it works. 

 

I was about to write other reasons, then thought "I should check to see if there is a mastery of it" and lo and behold, its here

Labels