Weekly Challenges

gawa · ‎09-06-2023

Cannot reproduce issue(ver 2022.3) but

it should be datasize problem, and can be solve by deselecting unnecessary fields because cluster tools usually need a couple of fields.
And, csv file always load data as STRING, so using Autofield would help to reduce size

it should be datasize problem, and can be solve by deselecting unnecessary fields because cluster tools usually need a couple of fields.And, csv file always load data as STRING, so using Autofield would help to reduce size

olga_strubbe · ‎02-05-2024

Thank you, @SydneyF!

This was my first time trying out predictive grouping tools. The R-code is not supported in my workplace, but I was able to follow the logic and how the batch macro needs to be configured to split the workflow in smaller chunks to help solve the size error. Thanks again!

Batch macro:

Workflow using the batch macro:

Reesetrain2 · ‎02-09-2024

All,

Well, there was only the solution so I used the Decision Tree to check the Clustering as it fit with a R of .835, so very well done!

Spoiler

Erin · ‎08-25-2024

R errors are the worst. In my humble opinion....

Bobbyt23 · ‎10-18-2024

Spoiler

I googled what the error meant and it indicated that the data was probably too big and the process was running out of RAM.

Two possible solutions:
Get more RAM
Reduce the data- You could get rid of anything unnecessary for the process, or batch the data up in to chunks to run through smaller datasets.

I googled what the error meant and it indicated that the data was probably too big and the process was running out of RAM.Two possible solutions:Get more RAMReduce the data- You could get rid of anything unnecessary for the process, or batch the data up in to chunks to run through smaller datasets.

ggruccio · ‎10-30-2024

I applied the Principal Components solution that I originally posted on the thread in 2018.

Spoiler

Looks like the Factors (PC) did too good a job of identifying the clusters! 5PCs = 5 Clusters this time...

Screenshot 2024-10-30 084032.png

Looks like the Factors (PC) did too good a job of identifying the clusters! 5PCs = 5 Clusters this time...

alexnajm · ‎01-15-2025

Interesting challenge - the issue is related to memory so I am posting the predictive sample workflow as a placeholder for how it should be built! A batch macro to chunk through the records could be a viable solution

JBevan89 · ‎05-12-2025

my solution attached

DaisukeTsuchiya · ‎06-26-2025

Spoiler

Since I wasn't sure, I asked ChatGPT. However, it doesn't seem to be the correct answer.

======================================================================================

The error "Append Cluster: Error: cannot allocate vector of size 7531.1 Gb" arises from R trying to allocate an extremely large object—likely due to distance matrix calculations or feature expansion within the Predictive Clustering Tools. With 3 million records and 30 fields, operations such as pairwise distance computation can easily exceed memory limits (e.g., 3M × 3M = 9 trillion elements).

Root causes include:

Excessive record volume and feature dimensions.
One-hot encoding of categorical variables expanding the data drastically.
Tools like hierarchical clustering (e.g., Ward’s method) requiring O(n²) memory.

Recommended solutions:

Use a Select Tool to reduce columns to only essential numeric fields.
Apply Sampling to reduce the data to a manageable subset (e.g., 300k rows).
Switch from hierarchical to scalable algorithms like K-Means.
Use PCA before clustering to reduce dimensionality.
Break processing into chunks and aggregate results.
Enable the Predictive Data Environment (PDEP) in Alteryx to handle large memory operations more robustly.

This combination ensures memory-efficient processing and avoids R’s vector allocation errors.

Since I wasn't sure, I asked ChatGPT. However, it doesn't seem to be the correct answer.======================================================================================The error "Append Cluster: Error: cannot allocate vector of size 7531.1 Gb" arises from R trying to allocate an extremely large object—likely due to distance matrix calculations or feature expansion within the Predictive Clustering Tools. With 3 million records and 30 fields, operations such as pairwise distance computation can easily exceed memory limits (e.g., 3M × 3M = 9 trillion elements).Root causes include:Excessive record volume and feature dimensions.One-hot encoding of categorical variables expanding the data drastically.Tools like hierarchical clustering (e.g., Ward’s method) requiring O(n²) memory.Recommended solutions:Use a Select Tool to reduce columns to only essential numeric fields.Apply Sampling to reduce the data to a manageable subset (e.g., 300k rows).Switch from hierarchical to scalable algorithms like K-Means.Use PCA before clustering to reduce dimensionality.Break processing into chunks and aggregate results.Enable the Predictive Data Environment (PDEP) in Alteryx to handle large memory operations more robustly.This combination ensures memory-efficient processing and avoids R’s vector allocation errors.

Carolyn · ‎07-19-2025

Solved

Spoiler

I generated 30 rows to build and test the macro. Then when it felt like it was at a good spot, I expanded to the full dummy data.

2025-07-19 16_44_54-Alteryx Designer x64 - _Challange_131_Carolyn.yxmd.png

2025-07-19 16_44_54-Alteryx Designer x64 - _Challange_131_Carolyn.yxmd.png

2025-07-19 16_44_15-Alteryx Designer x64 - BatchMacro.yxmc.png

I generated 30 rows to build and test the macro. Then when it felt like it was at a good spot, I expanded to the full dummy data.

Weekly Challenges

IDEAS WANTED

Challenge #131: Think Like a CSE... The R Error Message: cannot allocate vector of size...