This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
1. The Sample tool is deterministic under all of its configuration options except "Random 1 in N chance for each Record." Also, it only outputs a single stream of data, not 2 (which you'd need for any sort of validation). The Create Samples tool, however, is explicitly intended for separating data into an estimation dataset and a holdout dataset. Thus, the Create Samples tool can be used for simple validation. Neither tool is intended for K-Fold Cross-Validation, though you could use multiple Create Samples tools to perform it.
2. You're correct that the Logistic Regression tool does not support built-in Cross-Validation. At this time, a few Predictive tools (such as the Boosted Model and the Decision Tree) do Cross-Validation internally to choose certain hyperparameters. However, this Cross-Validation is different than the Cross-Validation used in model comparison/selection. You can expect to see a tool for model selection Cross-Validation on the Gallery in the relatively near future.
3. The Sample tool does not have a Random Seed, but the Create Samples tool does. You should change the default value if you'd like to run your workflow again with different selections for Estimation, Validation, and Holdout data.
@gmerce You can use the Cross Validation tool, available from the Predictive District, after a Count Regression tool. While the Cross Validation tool doesn't alter the model generated by the Count Regression tool, it is designed to generate more accurate performance measures without the need to train your model on a sample of the data.