We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #131: Think Like a CSE... The R Error Message: cannot allocate vector of size...

gawa
16 - Nebula
16 - Nebula

Cannot reproduce issue(ver 2022.3) but

Spoiler
it should be datasize problem, and can be solve by deselecting unnecessary fields because cluster tools usually need a couple of fields.
And, csv file always load data as STRING, so using Autofield would help to reduce size
olga_strubbe
11 - Bolide

Thank you, @SydneyF!

This was my first time trying out predictive grouping tools.  The R-code is not supported in my workplace, but I was able to follow the logic and how the batch macro needs to be configured to split the workflow in smaller chunks to help solve the size error.  Thanks again!

 

Batch macro:

2.png

 

Workflow using the batch macro:

2024-02-05_14-01-19.png

Reesetrain2
9 - Comet
9 - Comet

All,

 

Well, there was only the solution so I used the Decision Tree to check the Clustering as it fit with a R of .835, so very well done!

Spoiler
Screenshot 2024-02-09 180912.png

Erin
11 - Bolide

R errors are the worst. In my humble opinion....

Bobbyt23
13 - Pulsar
Spoiler
I googled what the error meant and it indicated that the data was probably too big and the process was running out of RAM.

Two possible solutions:
Get more RAM
Reduce the data- You could get rid of anything unnecessary for the process, or batch the data up in to chunks to run through smaller datasets.



ggruccio
ACE Emeritus
ACE Emeritus

I applied the Principal Components solution that I originally posted on the thread in 2018.

 

Spoiler
Looks like the Factors (PC) did too good a job of identifying the clusters!  5PCs = 5 Clusters this time...

Screenshot 2024-10-30 084032.png
alexnajm
18 - Pollux
18 - Pollux

Interesting challenge - the issue is related to memory so I am posting the predictive sample workflow as a placeholder for how it should be built! A batch macro to chunk through the records could be a viable solution

JBevan89
8 - Asteroid

my solution attached

DaisukeTsuchiya
14 - Magnetar
14 - Magnetar

 

Spoiler
Since I wasn't sure, I asked ChatGPT. However, it doesn't seem to be the correct answer.

======================================================================================

The error "Append Cluster: Error: cannot allocate vector of size 7531.1 Gb" arises from R trying to allocate an extremely large object—likely due to distance matrix calculations or feature expansion within the Predictive Clustering Tools. With 3 million records and 30 fields, operations such as pairwise distance computation can easily exceed memory limits (e.g., 3M × 3M = 9 trillion elements).

Root causes include:

  • Excessive record volume and feature dimensions.

  • One-hot encoding of categorical variables expanding the data drastically.

  • Tools like hierarchical clustering (e.g., Ward’s method) requiring O(n²) memory.

Recommended solutions:

  1. Use a Select Tool to reduce columns to only essential numeric fields.

  2. Apply Sampling to reduce the data to a manageable subset (e.g., 300k rows).

  3. Switch from hierarchical to scalable algorithms like K-Means.

  4. Use PCA before clustering to reduce dimensionality.

  5. Break processing into chunks and aggregate results.

  6. Enable the Predictive Data Environment (PDEP) in Alteryx to handle large memory operations more robustly.

This combination ensures memory-efficient processing and avoids R’s vector allocation errors.

 

 

Carolyn
12 - Quasar
12 - Quasar

Solved

 

Spoiler
I generated 30 rows to build and test the macro. Then when it felt like it was at a good spot, I expanded to the full dummy data.

2025-07-19 16_44_54-Alteryx Designer x64 - _Challange_131_Carolyn.yxmd.png

 

2025-07-19 16_44_15-Alteryx Designer x64 - BatchMacro.yxmc.png