Alteryx Designer Desktop Discussions

nbt1032 · ‎02-06-2017

I have been working through small data-sets with success. I am know working on large production data-sets that are between 2 to 20 million records and am not having success getting these to run.

Can anyone provide some advice on the best techniques for doing this?

By this I mean linear regressions, decision trees, etc...

Thanks,

Marc.

MarqueeCrew · ‎02-06-2017

I would suggest constructing a model based upon a sample of the data. Once generated and validated, you can save the model output as a yxdb file. Then when you run your model, you can input the stored object and use the SCORE tool to apply the model.

If you are trying to build the model based upon 20M input records, that could solve your issue.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

nbt1032 · ‎02-06-2017

I

JohnJPS · ‎02-11-2017

You could also do a pseudo cross validation by generating several such (random, stratified) samples and selecting the model that appears to match all records best.

nbt1032 · ‎02-11-2017

Agreed. Finding a representative sample that will not take days to run is definitely a bit of a trick.

Alteryx Designer Desktop Discussions

Modelling Large Datasets