@BenG did a presentation at Inspire on some best practices - one of which is removing unnecessary fields from data BEFORE a join, rather than throwing them away using the join, since join requires a sort which is an expensive operation if you're carrying unnecessary data.
Questions on the internals of the join, so that we can all get better at using this well:
- In cases like this - is the join tool smart enough to throw away the extraneous data before doing the sort, or does it carry all this info through a sort? If it's carrying all this data into the sort (to enable the join), then this feels like a relatively quick optimization to do (i.e. trim out any non-join fields that are discarded in the join as a first step), and I'll log the idea
- does Alteryx actually sort all the data, or does it just sort a list of pointers to memory addresses? If the former, then we should be aggressive in trimming data columns going into a sort of any kind. If the latter, then extra columns really should not make much of an impact?
Example below:


cc: @AdamR_AYX