Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

S vs T in the Append Fields Tool

aalayid7
5 - Atom

Hello there friends!

 

Please excuse my lack of understanding of the tool and my unawareness if this question was answered somewhere else. I looked online but couldn't get an answer.

 

Why does the smaller dataset need to be connected to the S anchor and the larger dataset to the Target? What is the significance of that?

 

if I have let's say the following dataset:

 

in the S:

aalayid7_0-1654442596588.png

and in the T:

aalayid7_1-1654442644030.png

I will get 6 records as an output:

 

aalayid7_2-1654442692242.png

Same thing if I flip it and make my larger dataset the Source instead of the target, I will still get 6 records and each row is repeated the same number of times,

 

aalayid7_3-1654442846253.png

So not sure why the Alteryx documentation recommends that the smaller dataset needs to be connected to S anchor. Any explanation is greatly appreciated! Thanks!

 

5 REPLIES 5
DataNath
17 - Castor

Not 100% sure but I would imagine it is to do with how expensive it is in terms of computational power. As you’re joining all to all, this will always ‘blow up’ your data. In this small example it’s essentially matter less, but if you were conducting a full Cartesian join on a dataset of 10 fields/100,000 rows with one of 2 fields/4 rows, I’d imagine it would be less resource-heavy to append the smaller of the two! In a real use case, I’d guess that the ordering of your large dataset is more important and so doing it this way round would maintain that (would have to double check now though - may need to disable AMP!)

Qiu
21 - Polaris
21 - Polaris

@aalayid7 

I agree with @DataNath comments.
I did some digging on "Cartesian join" and did not find anything directly suggesting that the dataset order is imporant though.

It did not say anything here as well.

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Append-Fields/ta-p/874...

https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Cartesian-Join-Cartesian-Product/ta...

mark007
8 - Asteroid

The number of output rows will always be the same regardless of the order of S and T.  Joins are very CPU intensive and I believe the order does impact performance.

 

One other difference will be the order of the output. This can be resolved by changing the column order in the Append Fields tool and using the Sort tool to change the order back but in reality if you want a particular order use that input as the target and avoid extra complexity.  This will become particularly important if the order was important and it cannot be easily re-sorted.

aalayid7
5 - Atom

Mark, I totally agree this can be a very legitimate reason as of why we should follow this particular practice. Less tools in the workflow, will make a better workflow.

aalayid7
5 - Atom

Thanks all! for the great insights you shared with me.

Labels