I have been working through small data-sets with success. I am know working on large production data-sets that are between 2 to 20 million records and am not having success getting these to run.
Can anyone provide some advice on the best techniques for doing this?
By this I mean linear regressions, decision trees, etc...
Thanks,
Marc.
Solved! Go to Solution.
I would suggest constructing a model based upon a sample of the data. Once generated and validated, you can save the model output as a yxdb file. Then when you run your model, you can input the stored object and use the SCORE tool to apply the model.
If you are trying to build the model based upon 20M input records, that could solve your issue.
Cheers,
Mark
I
Agreed. Finding a representative sample that will not take days to run is definitely a bit of a trick.