Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!

MB Affinity - optimum number of records per data chunk?

Highlighted
5 - Atom

I'm trying to run a MB Analysis on a very large dataset +500million rows, ~2,000 unique item identifiers. It's probable a dead end on the MB Inspect / Rules tool as it seems like too much data to get it to work whenever I've tried! So now attempting to use the MB affinity tool as according to a few other forum posts it's a lot quicker. Is there an optimum number of records per data chunk in the configuration tool though? (Is it counter intuitive like the sort/join memory usage a lower number is better?!) 

Mine is currently set at 256000 per chunk but should this be higher or lower to optimise the workflow? I'm running on a server that has 256gb RAM so memory isn't too much of an issue I hope!

Highlighted
16 - Nebula
16 - Nebula

Hey @atan87,

Sorry if this is a ridiculous question - but can you confirm that you are referring to a Market Basket analysis when you mention "MB Analysis"?   I've not used these much, so just wanted to make sure I'm on the right track.

 

There are a few people who frequent this community who are skilled at R or statistical analysis, so I've tagged them below

 

@chris_love @MarqueeCrew @patrick_digan @BenMoss @JohnJPS @Patrick

 

 

 

 

Highlighted
Alteryx Community Team
Alteryx Community Team

Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. It's a subset of affinity analysis.

It's an iterative process, and needs to create matrices of combinations of the items/ transactions to create the association rules or frequent itemsets, therefore can be quite memory/ process intensive - which depends on your machine specs.

There is not necessarily an "optimum" number of records (based on the number of transactions and/or the number of items per transaction) so you're definitely on the right track with chunking.

I'm interested in what the tagged Community users have experienced using these tools; and please post your findings back to this thread - it will undoubtedly help other users.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

I can't offer too much advice here as I have never used the alteryx MB affinity tool myself, I can however point you to a blog post which I have written on Market Basket analysis which explains why it can be so memory intensive with large datasets.

 

https://benjnmoss.wordpress.com/2017/02/13/market-basket-analysis-in-alteryx/

 

My gut feeling with this is that if you chunk your dataset into smaller segments then it is likely to improve performance (please don't quote me on this as it's just based on my understanding).

 

It would be interesting to see, using a subset of data, whether the size of the chunks does in fact affect the output result and of course also identifying which is quickest. Perhaps this is something you could have a go at, before as Criston says, feeding back to the community so others can use your knowledge!

 

Ben

Labels