community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
#SANTALYTICS

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More

MB Affinity - optimum number of records per data chunk?

Atom

I'm trying to run a MB Analysis on a very large dataset +500million rows, ~2,000 unique item identifiers. It's probable a dead end on the MB Inspect / Rules tool as it seems like too much data to get it to work whenever I've tried! So now attempting to use the MB affinity tool as according to a few other forum posts it's a lot quicker. Is there an optimum number of records per data chunk in the configuration tool though? (Is it counter intuitive like the sort/join memory usage a lower number is better?!) 

Mine is currently set at 256000 per chunk but should this be higher or lower to optimise the workflow? I'm running on a server that has 256gb RAM so memory isn't too much of an issue I hope!

Nebula
Nebula

Hey @atan87,

Sorry if this is a ridiculous question - but can you confirm that you are referring to a Market Basket analysis when you mention "MB Analysis"?   I've not used these much, so just wanted to make sure I'm on the right track.

 

There are a few people who frequent this community who are skilled at R or statistical analysis, so I've tagged them below

 

@chris_love @MarqueeCrew @patrick_digan @BenMoss @JohnJPS @Patrick

 

 

 

 

Community Content Engineer
Community Content Engineer

Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. It's a subset of affinity analysis.

It's an iterative process, and needs to create matrices of combinations of the items/ transactions to create the association rules or frequent itemsets, therefore can be quite memory/ process intensive - which depends on your machine specs.

There is not necessarily an "optimum" number of records (based on the number of transactions and/or the number of items per transaction) so you're definitely on the right track with chunking.

I'm interested in what the tagged Community users have experienced using these tools; and please post your findings back to this thread - it will undoubtedly help other users.

Alteryx Certified Partner
Alteryx Certified Partner

I can't offer too much advice here as I have never used the alteryx MB affinity tool myself, I can however point you to a blog post which I have written on Market Basket analysis which explains why it can be so memory intensive with large datasets.

 

https://benjnmoss.wordpress.com/2017/02/13/market-basket-analysis-in-alteryx/

 

My gut feeling with this is that if you chunk your dataset into smaller segments then it is likely to improve performance (please don't quote me on this as it's just based on my understanding).

 

It would be interesting to see, using a subset of data, whether the size of the chunks does in fact affect the output result and of course also identifying which is quickest. Perhaps this is something you could have a go at, before as Criston says, feeding back to the community so others can use your knowledge!

 

Ben

Labels