Alteryx Designer Desktop Discussions

pierrelouisbescond · ‎06-27-2019

Dear Community,

I would like to introduce some random values into a dataset for research purposes (i.e. to determine the impact of abnormal values on different algorithms robustness).

The idea is that I have a raw dataset with a few columns (Xs) and the predicted column (Y).

I would like to introduce abnormal values within the range of each column, like this:

Original dataset:

X1	X2	X3	X4	X5	Y
1	11	403	55	19	641
2	42	361	65	17	1110
6	6	206	71	17	350
2	46	267	20	17	1061
3	40	261	70	17	979
4	40	148	55	17	860
8	33	350	74	20	1055
4	29	484	59	17	1011
9	38	308	22	16	936

Modified dataset:

X1	X2	X3	X4	X5	Y
1	11	403	55	19	641
2	42	122	65	17	1110
6	6	206	71	20	350
10	46	267	64	17	1061
3	40	261	70	17	979
4	40	148	55	17	860
8	5	350	74	20	1055
4	29	298	59	17	1011
9	38	308	22	16	936

So far, I am calculating the min and max of each column, add them to the data set and create the updated “X” with a formula:

X1	X1_Min	X1_Max	X2	X2_Min	X2_Max	X3	…	Y
1	1	9	11	6	46	403	…	641
2	1	9	42	6	46	361	…	1110
6	1	9	6	6	46	206	…	350
2	1	9	46	6	46	267	…	1061
3	1	9	40	6	46	261	…	979
4	1	9	40	6	46	148	…	860
8	1	9	33	6	46	350	…	1055
4	1	9	29	6	46	484	…	1011
9	1	9	38	6	46	308	…	936

For a 1% introduction of abnormal values in X1, I proceed like this with a formula:

IF RandInt(100) = 100 THEN

    [Min_X1]+[Max_X1]*Rand()

ELSE

    [X1]

ENDIF

So it means that I have to do it for each column... (I have some datasets with more than 50 "X" columns...)

Is there a way to use the multi-field formula tool like this:

IF RandInt(100) = 100 THEN

    [Next Column]+[Next+1 Column]*Rand()

ELSE

    [_CurrentField_]

ENDIF

I have attached the dummy dataset and workflow it this can help J

Thanks,
Pierre-Louis

andyuttley · ‎06-27-2019

Hey @pierrelouisbescond

There's loads of ways to do this. My first instinct was to take your current logic and wrap it up into a macro that runs through every column. To do this, you just need to transpose the data first, then shift it back at the end in a cross tab.

Transposing means we can refer to [Name] and [Value] as opposed to specific field names, making it dynamic. Probably could be done through a multi field too though - like i say, this would just be my default method.

(note that i've switched to "IF RandInt(10) = 10" just for testing, you can switch that back...

Example attached.

Hope this helps

Andy

pierrelouisbescond · ‎06-27-2019

Thanks a lot @andyuttley !

I am happy to see that there was no "obvious" answer which I have missed... and understand that the "transpose" option is regularly an answer to the questions I am raising. I need to deep dive on these possible applications!

I am going to "mimic" your solution on real and large datasets 🙂

JoeS · ‎06-27-2019

Bit slow to reply as I got stuck on a call at work.

But I went down the Dynamic replace method, which skips the need to create a macro, and hopefully you agree its a pretty neat solution:

It should be fully dynamic too🙂

pierrelouisbescond · ‎06-27-2019

Hi @JoeS !

Wow, I had never touched the dynamic replace tool and this is a very powerful one!

I will do my best to transpose it to my real datasets... the hardest challenge will be to keep it in mind for future problems 🙂

JoeS · ‎06-27-2019

Indeed, it's definitely an under utilised tool!

andyuttley · ‎06-27-2019

@JoeS wrote:
Indeed, it's definitely an under utilised tool!

Yes, was great to see it in a relevant use case here; definitely one I want to get using

JoeS · ‎06-27-2019

@andyuttley wrote:

Yes, was great to see it in a relevant use case here; definitely one I want to get using

Yeah, a clue to remind me to consider it is if the data itself is needed to write the formula, then it may be the right way to go.

Like in this instance you needed the min and max values to write the formula then it works.

I was about to write other reasons, then thought "I should check to see if there is a mastery of it" and lo and behold, its here

Alteryx Designer Desktop Discussions

Iterative Randomness within Columns Range

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...