Weekly Challenges

MAAbdullahAlMubarah · ‎01-06-2019

Sorry, I sent by Mistake, I should send it someone else, accept my apology.

MAAbdullahAlMubarah · ‎01-06-2019

I tried to practice your solution because I have the same problem but related to (Lenier regression), I noticed you used the Generate rows tool, As I understand it will create a new field if the number of rows reached 1000,000 but what does this mean? what it is useful for ? how it solves the problem of memory limitation? please explain more.

Note: After a while (around 7 minutes) when running your workflow My device halted and not responding !!??

ipeng · ‎02-21-2019

Spoiler

For the particular problem of the coworker, the data size seems to be too big to handle. Connect from Select Tool to remove some fields may be able to solve the problem.

For my R error, the problem is field type mismatch. Connecting from Select Tool also solve the problem.

RichoBsJ · ‎03-03-2019

Hi! Here my solution :)

Spoiler

So, if 3M rows were a problem...let's divide the problem by 15 ;)

RolandSchubert · ‎03-26-2019

Just had a very similar problem, and even after reducing the number of fields and optimizing data types memory limit was exceeded, so I finally tried to solve this challenge ...

JoshuaGostick · ‎04-12-2019

Chose to get the necessary data directly from the K-Centroids Cluster Analysis tool instead of using an Append Cluster tool. Assuming that the K-Centroids Cluster Analysis tool works, hopefully my solution would be a more optimised way of obtaining the data.

Spoiler

Kenda · ‎05-14-2019

Spoiler

I've never ran into this particular problem before, but I do have a tiny bit of background on the tools being used (you can see I picked @SydneyF's brain already before here https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Append-Cluster/ta-p/194965)

Spoiler

I've never ran into this particular problem before, but I do have a tiny bit of background on the tools being used (you can see I picked 's brain already before here https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Tool-Mastery-Append-Cluster/ta-p/194965)

kelly_gilbert · ‎05-27-2019

Big hint in the help page for this one!

Spoiler

First, I read the help on the Append Cluster tool, and there was a big hint here!

Knowing that 1) the actual cluster analysis runs, and 2) I don't need to send the full original dataset into the Append Clusters tool --> maybe it can be done in batches. I generated 5M fake records for testing.

My solution is pretty similar to the given solution:

Batch macro:

Outer workflow:

There also may be some opportunity for using more efficient data types. Possibly, that may be happening in the Select tool in the original workflow, in which case we should connect the Select output to the Append Clusters tool rather than the raw input.

First, I read the help on the Append Cluster tool, and there was a big hint here!Knowing that 1) the actual cluster analysis runs, and 2) I don't need to send the full original dataset into the Append Clusters tool --> maybe it can be done in batches. I generated 5M fake records for testing.My solution is pretty similar to the given solution:Batch macro:Outer workflow:There also may be some opportunity for using more efficient data types. Possibly, that may be happening in the Select tool in the original workflow, in which case we should connect the Select output to the Append Clusters tool rather than the raw input.

TimothyManning · ‎07-24-2019

Spoiler

I couldn't replicate the error unfortunately, I kept getting this error - which upon googling, seems to be a fake error - where all the data still flows through. I then looked at some spoilers and tried to copy their approach with a generate rows, but it still didn't give me the error. I then looked on Google as others had suggested for solutions and I found this awesome post:

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Cluster-Analysis-Customer-R-tool-Error...

This is what JohnJPS said:

"The "cannot allocate vector" error is what R gives when it runs out of memory; as you can see it's trying to allocate quite a large 5+ GB chunk of RAM. This is not unexpected for large datasets, as R can be somewhat memory inefficient. Potential solutions:

Increase RAM on your workstation. (Not a convenient option, but not a joke: more RAM helps).
Find a comfortable data size, and run through only that much, in chunks."

I saw that a batch macro was the answer from that post and from the solution given in the original challenge, so I resorted to just trying to create this myself to see if I could do it. The result was below! I no longer got that error so it did solve the problem, but it took a very long time to complete.

131. Data Analysis 3.PNG

I couldn't replicate the error unfortunately, I kept getting this error - which upon googling, seems to be a fake error - where all the data still flows through. I then looked at some spoilers and tried to copy their approach with a generate rows, but it still didn't give me the error. I then looked on Google as others had suggested for solutions and I found this awesome post:https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Cluster-Analysis-Customer-R-tool-Errors-out/td-p/322719This is what JohnJPS said:"The "cannot allocate vector" error is what R gives when it runs out of memory; as you can see it's trying to allocate quite a large 5+ GB chunk of RAM. This is not unexpected for large datasets, as R can be somewhat memory inefficient. Potential solutions:Increase RAM on your workstation. (Not a convenient option, but not a joke: more RAM helps).Find a comfortable data size, and run through only that much, in chunks."I saw that a batch macro was the answer from that post and from the solution given in the original challenge, so I resorted to just trying to create this myself to see if I could do it. The result was below! I no longer got that error so it did solve the problem, but it took a very long time to complete.

RWvanLeeuwen · ‎08-13-2019

many issues do be thought of, but I could not replicate the same error message with any test dataset I had (probably due to the smaller size)

anyways: here's my workflow and thoughts:

Spoiler

problem+solution-combination 1): out of memory + append grouping in chunks using a batch macro, set datatypes optimally to ensure minimal size, increase memory on machine, try the parallel or future packages from R to allow multithreading. 2) perhaps there is a sparsematrix so tryto see if you run out of memory when you impute the missing data first (mean imputation will suffice just for testing purposes) 3) let the workflow become an iterative version where you each time perform a (stratified) sampling maneuvre in each iteration, find the clusters, and append them to the dataset, in a next step in the sequence you could analyse all labels added to the data and find an average or mode /best-fitting label for each item. Anyways, just document your process properly and discuss with your pears whether this would be appropriatecombine

Weekly Challenges

IDEAS WANTED

Challenge #131: Think Like a CSE... The R Error Message: cannot allocate vector of size...