Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #131: Think Like a CSE... The R Error Message: cannot allocate vector of size...

atcodedog05
22 - Nova
22 - Nova

Learnt a new lesson on how to solve though problems 🙂

 

Spoiler
atcodedog05_0-1586019732182.png
 

 

 

 

hanykowska
11 - Bolide

I have successfully avoided this challenge for over a year, but today is the day I end it.

 

Spoiler
I have tried to reproduce this error a number of times, but sadly couldn't manage it. After a number of tries I went on to check the comments to see if anyone else managed but in the process I saw some of the responses... So I waited a while to try and forget what I saw but couldn't. Today I decided to look through community and see if there are any resources I could base my solution on, and I did! It wasn't exactly the same issue but the memory allocation issue was present (https://community.alteryx.com/t5/Alteryx-Designer/Cluster-Analysis-Customer-R-tool-Errors-out/td-p/3...) so I decided to suggest a solution of splitting data into chunks for appending the clusters and mocked up a short example below (didn't bother to make an actual batch macro though...)

image.png
JennyMartin
9 - Comet
Spoiler
The memory issue would be solved but breaking the data into smaller chunks (as I found out by making the dataset incrementally larger)AWC131.PNG

Spent hours trying to recreate the exact error but figured this was close enough!

JethroChen
10 - Fireball
Spoiler
challenge_131_jc.PNG
AngelosPachis
16 - Nebula

My solution for challenge #131.

 

Workflow

 

Spoiler

I didn't spend time on creating a dataset of 7000 Gb, but I created a workflow as a proof of concept.

 

Using Tableau Superstore dataset, I broke it down into chunks of 500 rows and then fed it into the macro containing the append clusters tool. Then to validate that my macro works as expected, I simply compared my macro output to that of an Append Clusters tool.

Screenshot 2020-09-16 190758.png

 Macro

 

Spoiler
Screenshot 2020-09-16 191835.png
Jonathan-Sherman
15 - Aurora
15 - Aurora

Challenge 131 is done! I took a similar approach to others pushing the dateset through a batch macro to reduce the load when appending the cluster and also created a report to check the cluster assignments.

 

Workflow:

 

Spoiler

image.png

 

Report:

Spoiler
challenge 131 JMS report.PNG

 

johnemery
11 - Bolide
Spoiler
The error shown occurred on the Append Cluster tool.  This tells us that the cluster tool itself was able to handle the size of the data.

Searching for the error, we can quickly find that the error is memory-related. Turns out, ~7500 GB of memory is a bit more than we have available.

By placing the Append Cluster tool workflow in a batch macro, we can apply the clusters to the data set in chunks, thereby working around the memory limitations.

Capture.PNG
Jean-Balteryx
16 - Nebula
16 - Nebula
Spoiler
As the problem is a memory problem and it's happening on cluster appending and not clustering itself, the user should split data in small chunks and perform the append on each chunk. A batch macro should do the trick.
AkimasaKajitani
17 - Castor
17 - Castor

My solution.

 

 

Spoiler
The reason of this problem is that Designer cannot allocate the memory because the data is too large.

But modeling has finished, so the solution is to make the batch macro of clustering part.

At this time, I made the workflow that compare the result between normal workflow and batch macro approach workflow.
How many the user would make the groups, he has to investigate the original data.

Workflow:
AkimasaKajitani_0-1606618833640.png

Batch Macro:

AkimasaKajitani_1-1606618846762.png

 





Jean-Balteryx
16 - Nebula
16 - Nebula

Workflow joined.