community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Order of operations for Join tool (question on the internals of the join)

Aurora
Aurora

@BenG did a presentation at Inspire on some best practices - one of which is removing unnecessary fields from data BEFORE a join, rather than throwing them away using the join, since join requires a sort which is an expensive operation if you're carrying unnecessary data.

 

Questions on the internals of the join, so that we can all get better at using this well:

- In cases like this - is the join tool smart enough to throw away the extraneous data before doing the sort, or does it carry all this info through a sort?   If it's carrying all this data into the sort (to enable the join), then this feels like a relatively quick optimization to do (i.e. trim out any non-join fields that are discarded in the join as a first step), and I'll log the idea

- does Alteryx actually sort all the data, or does it just sort a list of pointers to memory addresses?    If the former, then we should be aggressive in trimming data columns going into a sort of any kind.   If the latter, then extra columns really should not make much of an impact?

 

 

 

Example below:

 

2017-09-26_8-28-29.png

 

 

2017-09-26_8-33-01.png

 

cc: @AdamR

Select is done post Join - the L and R outputs contain unmodified data sets

 

Rough sequence is (I think - but need someone at Alteryx to confirm):

1. Sort L input by left set of fields in join

2. Sort R input by right set of fields in join

3. Perform join (scan down both sorted sets)

4. Apply select on O output as record pushed into it

 

Worth noting that a Find and Replace does not sort the L input only the R.

Aurora
Aurora

Thanks James - if this is correct then there's an easy technical optimization that can be done with the Join tool to include a pre-processing step removing unused fields.

I'll keep this unsolved for now to allow the internal dev team at Alteryx to opine.

Highlighted
Founder
Founder
The join does remove unneeded field before the sort. As you say, it is an obvious optimization.
Aurora
Aurora

Thanks Ned - appreciate you confirming, so there's no overhead in filtering in the join (as opposed to filtering them out with a select before the tool).

 

Founder
Founder

I should have added - not only does it filter out unneeded fields ahead of time, if there is enough memory it will also just do the sort by moving pointers around, not the whole records.  Of course if there isn't room in memory, it does require that the whole records get written to disk.

 

Also, @jdunkerley79 is correct about the basic order - the select does happen after because often you won't output the join fields from 1 side, even though they are needed in the original sort to do the join.  That means there are 2 different filtering of the output fields.

Labels