Alteryx Designer Desktop Discussions

Inactive User · ‎10-24-2016

1. When I use the simple tool to create training and testing data set, is it the fixed data set and not use the K-Fold Cross Validation skills? Basically, it is just simple validation?

2. In the logistic regression tool, there is no option to allow me to set up K-Fold, are all the prediction tool do not have this function?

3. In the sample tool, this is the option called "random seed", what the meaning of this one? in what situation I need change the default value? thank you.

BridgetT · ‎10-24-2016

Hi @Inactive User,

I'll respond to your questions in order:

1. The Sample tool is deterministic under all of its configuration options except "Random 1 in N chance for each Record." Also, it only outputs a single stream of data, not 2 (which you'd need for any sort of validation). The Create Samples tool, however, is explicitly intended for separating data into an estimation dataset and a holdout dataset. Thus, the Create Samples tool can be used for simple validation. Neither tool is intended for K-Fold Cross-Validation, though you could use multiple Create Samples tools to perform it.

2. You're correct that the Logistic Regression tool does not support built-in Cross-Validation. At this time, a few Predictive tools (such as the Boosted Model and the Decision Tree) do Cross-Validation internally to choose certain hyperparameters. However, this Cross-Validation is different than the Cross-Validation used in model comparison/selection. You can expect to see a tool for model selection Cross-Validation on the Gallery in the relatively near future.

3. The Sample tool does not have a Random Seed, but the Create Samples tool does. You should change the default value if you'd like to run your workflow again with different selections for Estimation, Validation, and Holdout data.

Best,

Bridget

Bridget Toomey

Research Scientist, Analytic Products

Alteryx

Inactive User · ‎10-24-2016

@BridgetT Thank you so much. This information is helpful.

Also, in my questions above I mentioned "sample tool", I actually mean "create sample tool", you get my point. Thanks.

BridgetT · ‎10-27-2016

@Inactive User: You're welcome! Glad I could help!

Bridget Toomey

Research Scientist, Analytic Products

Alteryx

gmerce · ‎01-09-2017

Hi,

Does the Count Regression Model embarks a cross validation ? I think no.

Do you have any example on how it could be possible to train a count regression model using cross validation instead of training it on a sample dataset ?

Thanks a lot.

NeilR · ‎01-11-2017

@gmerce You can use the Cross Validation tool, available from the Predictive District, after a Count Regression tool. While the Cross Validation tool doesn't alter the model generated by the Count Regression tool, it is designed to generate more accurate performance measures without the need to train your model on a sample of the data.

Alteryx Designer Desktop Discussions

K Fold Cross Validation