Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Join Tool stuck on 50%

KMadamba
8 - Asteroid

I have an Alteryx Database worth 74 MB and I'm trying to join two columns in it. It gets stuck at 50%. I figured it's because it's making large outputs of it.

 

I saw that the Find and Replace tool might be a good alternative for the join tool and that it should be able to make the module run faster. So I deselected the other columns after joining and left it with just the Record ID column (which is also my unique field) so I can just append the other fields later.

 

However, by doing this, I received the error:

Warning: Find Replace (87): The source input is optimized for datasets of ~100,000 or less. Perfomance may be improved by using a join instead.

 

I've ran out of ideas on how to go around this. Please help

Thank you in advance.

3 REPLIES 3
cmcclellan
13 - Pulsar

It's probably creating a cartesian product, although there's not really enough information to go on.

 

Do a sensibility check on what you're joining, how many records you expect out of the join and any potential "oops, it shouldn't be doing that".  

 

I was doing a join a few hours ago that creates just over 11 million records, but I was expecting that before I ran the flow.  Luckily for me each side only has a few fields so the join happens very quickly.

danilang
19 - Altair
19 - Altair

Hi @KMadamba 

 

Like @cmcclellan said, your join is probably resulting in a Cartesian product.  How many rows are are in your input DB?  Alteryx DBs are highly compressed and a 75MB .yxdb can easily contain millions of rows resulting in self joined data sets with 10^10 or larger numbers of rows.

 

I'm not certain exactly how a join tool functions internally, but it must do something equivalent to sorting the the data on one side and then performing an index or binary search on that side for each record on the other.  These can be expensive operations, even when using the most efficient algorithms, so limiting the number of records and to a lesser extent, the number of columns on both sides is key?

 

Another point to consider is your available RAM. Is your data set ballooning to the point where it needs to start swapping to disk.  

 

Dan

KMadamba
8 - Asteroid

Thank you @cmcclellan  and @danilang .

 

I split the big data into 4 .yxdb files based on the file path they were originally taken from and was able to slightly isolate the issue as it was only getting stuck on that one particular batch. That particular batch also appears to be the biggest file of them all, having 27 MB with 600k records, having 32 columns... the other 3 batches only had ranged from only 5mb to 14mb, same number of columns which ran completely without problems.

 

Also, I currently have 4GB of RAM. I will look into upgrading if all else fails.

Labels