Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Join Tools are taking too long

WishIKnewHowToCode
7 - Meteor

To preface this: I cannot share my raw data. It is several gigs of text files, but also protected health information. There is no way I know of for me to upload a replication of my workflow so others can experience the issue at hand.

Below is a screenshot of what I am experiencing. One of my inputs is transaction data on a line by line basis. The other is demographic information. When joining this data together you see the size of the data jump from 5gb to 7.8tb. At this point Alteryx basically chokes and can't continue processing, getting hung up here for hours. I tried to let it run overnight with no luck. What can I do?

WishIKnewHowToCode_1-1686166639042.png

 

 

5 REPLIES 5
apathetichell
18 - Pollux

you are joining on non-unique columns (or on empty/empty/null/null) this is creating a dreaded many to many join situation. figure out which column you would expect to be unique and add a summarize tool in group by mode prior to your join to prevent your ever increasing join...

 

basically think of it this way. line item data has zip code 11222.

you are matching on a file which should have one entry for 11222 (and geographic data). It doesn't have one entry. It has ten.

alteryx creates 10X each of your line items. this is normal Join behavior.

 

WishIKnewHowToCode
7 - Meteor

@apathetichell So unfortunately, I am looking at remittances, which can have multiple lines with different bits of info. So in my situation I actually need all of the different lines. When Patient 123 has 6 remits, going something like this: Remit 1 - paid, code 1 Remit 2 - denied, code 16, etc

I actually need the full context of all of the lines for my end analysis

apathetichell
18 - Pollux

o.k. then think of it this way - you are running 7 billion rows of data. either get more memory or partition it by using a batch macro.

 

and don't use data cleansing. use a multi-field formula to cleanse whatever you are doing. data cleanse is a resource hog.

WishIKnewHowToCode
7 - Meteor

That's kind of the help I was asking for. Would you know where to point me for a batch macro that partitions data? That sounds really useful.

Qiu
20 - Arcturus
20 - Arcturus

@WishIKnewHowToCode 
I think by partition, we can use certain column of your data, which represents Group Name or sort.
Then we use the unique Group Name as Control parameter and feed to the macro.
Each Group Name will be processed by Batch Macro and then Get Unioned as Output.
There are Batch Macro samples in the community, maybe you can take a look.

Labels