Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEACannot reproduce issue(ver 2022.3) but
Thank you, @SydneyF!
This was my first time trying out predictive grouping tools. The R-code is not supported in my workplace, but I was able to follow the logic and how the batch macro needs to be configured to split the workflow in smaller chunks to help solve the size error. Thanks again!
Batch macro:
Workflow using the batch macro:
All,
Well, there was only the solution so I used the Decision Tree to check the Clustering as it fit with a R of .835, so very well done!
I applied the Principal Components solution that I originally posted on the thread in 2018.
The error "Append Cluster: Error: cannot allocate vector of size 7531.1 Gb" arises from R trying to allocate an extremely large object—likely due to distance matrix calculations or feature expansion within the Predictive Clustering Tools. With 3 million records and 30 fields, operations such as pairwise distance computation can easily exceed memory limits (e.g., 3M × 3M = 9 trillion elements).
Root causes include:
Excessive record volume and feature dimensions.
One-hot encoding of categorical variables expanding the data drastically.
Tools like hierarchical clustering (e.g., Ward’s method) requiring O(n²) memory.
Recommended solutions:
Use a Select Tool to reduce columns to only essential numeric fields.
Apply Sampling to reduce the data to a manageable subset (e.g., 300k rows).
Switch from hierarchical to scalable algorithms like K-Means.
Use PCA before clustering to reduce dimensionality.
Break processing into chunks and aggregate results.
Enable the Predictive Data Environment (PDEP) in Alteryx to handle large memory operations more robustly.
This combination ensures memory-efficient processing and avoids R’s vector allocation errors.
Solved