Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Modelling Large Datasets

nbt1032
8 - Asteroid

I have been working through small data-sets with success.   I am know working on large production data-sets that are between 2 to 20 million records and am not having success getting these to run.

 

Can anyone provide some advice on the best techniques for doing this?

 

By this I mean linear regressions, decision trees, etc...

 

Thanks,

Marc.

4 REPLIES 4
MarqueeCrew
20 - Arcturus
20 - Arcturus

I would suggest constructing a model based upon a sample of the data.  Once generated and validated, you can save the model output as a yxdb file.  Then when you run your model, you can input the stored object and use the SCORE tool to apply the model.

 

If you are trying to build the model based upon 20M input records, that could solve your issue.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
nbt1032
8 - Asteroid

I

JohnJPS
15 - Aurora
You could also do a pseudo cross validation by generating several such (random, stratified) samples and selecting the model that appears to match all records best.
nbt1032
8 - Asteroid

Agreed.  Finding a representative sample that will not take days to run is definitely a bit of a trick. 

Labels