Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Weirdness in Full Outer Join

KaiLarsen
9 - Comet

We came across a weird "feature" in the Full Outer Join functionality. Depending on whether the join is conducted on fields that have the same name or not, it either behaves like I'm used to SQL functioning or quite differently.

 

Please see attached workflow with comments embedded. If interested, I can also provide examples with similar functionality from SQL Server and very different results. Don't want to suggest that something is definitely wrong, but am a bit worried about consistency between two approaches that most people will not realize result in very different results.

 

Kai :-)

7 REPLIES 7
michael_treadwell
ACE Emeritus
ACE Emeritus

The weirdness you are seeing is based on the field rename in the top join. Alteryx only produces the renamed fields (Right_TeacherId) from the J output. In this case, the Right_TeacherId is NULL except for the value 1 because the Union Tool is configured to align fields by name and no fields from the L or R output have the same name. Removing this column by unselecting it in the Join Tool or manually configuring your field positions in the Union Tool should give you the output you expect from a full outer join.

 

I will say that your bottom case is more interesting. Because of the different field names, you will have to manually configure your field positions in the Union Tool. Union cannot 'know' which fields to match so you get an erroneous join.

KaiLarsen
9 - Comet
Thank you so much for your prompt answer, Michael.

Let me see if I follow you here:
Are you arguing that because Alteryx developed the full outer join using two nodes that have different functionality, it is not only ok for the results to be different from what SQL would generate (under certain circumstances), but that Alteryx full outer join functionality should in fact be redefined away from what the (hopefully) hordes of future SQL-saavy users would expect so as to stay consistent with the individual tools used to deliver join functionality (unless the user starts really digging into the Union tool to somehow combine fields in such a way that they can recreate the results expected from the 50-year-old+ approach that the functionality was named after?

I notice that Tara and Rod gave the answer kudos. Does anyone on the Alteryx crew with historical context of the development of the Full Outer Join functionality have a sense of whether what we find is what you wanted to accomplish? It doesn't seem to be in line with Alteryx's reputation for making things easy. Perhaps rename functionality to "Full Outeryx Join" if it isn't meant to mimic SQL joins?
michael_treadwell
ACE Emeritus
ACE Emeritus

I am not arguing anything at all. I am simply explaining why you are seeing the outputs that you asked about. 

TaraM
Alteryx Alumni (Retired)

I assume that the functionality of the "Full Outer Join" you reference is in the help file for the Join Tool 

 

The reason we built this table was to show how you go about combining output of the Join tool to get other types of output because our Join tool is different than SQL Joins. The table probably needs some ammending to address duplicate and L and R field names. Your example is a bit confusing because the incoming field names are a mix of L and R and then you have a rename happening. The field list you see in the Join Tool includes all the fields from left and right inputs and these are what get output through the J output. Only UN-joined records come out of the R output and ONLY Rinput fields. Likewise with the L output - it only includes UN-joined records from the L input and only L input fields. If you simply de-select the redundant field, TeacherID, you get what you want here. We should be recommending to de-select redundant fields in the configuration in our documentation.

 

In your workflow you say: "There is also a column named TeacherId. One would logically assume that this is the TeacherId from the left table. However, this is not correct as it is introducing the TeacherId for Jin, which was not in the left table." 

 

You will notice going into the Union tool the connections #1, #2, #3...this is the order the field schema is being created which is why you see TeacherID and Jin.

 

I hope this helps clear things up. 

 

 

Tara McCoy
RodL
Alteryx Alumni (Retired)

Thanks for the comments!

 

To your point I gave kudos to @michael_treadwell because his response showed an understanding of the nuances of a Join in a coding language like SQL and how it works differently within the Alteryx platform.

 

While I can’t answer for the original rationale in how the Join tool was developed (since it has been one of the basic tools in the product for over 15 years), I do know that Alteryx is not trying to emulate the join functionality in SQL. This is apparent from the tool Help article that Tara references which refers to a Left and Right Unjoin…which has no corresponding function within SQL…unless you want to write the code with something like…

 

“Select * from A RIGHT JOIN B on A.ID = B.ID where A.ID is NULL”

 

Instead the Join tool provides a way to easily determine whether orphaned records are held within your data. (I have had customers tell me that this functionality is much appreciated as it provides an easy way to create exception reports back to their IT departments for inconsistent data.)

 

As Tara mentioned, the L and R sides of the Join tool are intended to provide the unmatched records. That’s clear from the diagrams in the configuration of the tool and the Help article. The user needs to keep in mind that what comes through the Join tool and out of the L or R sides is in reality unaffected by the Join tool…so the field names and data for the unmatched records will remain the same. The output from the Inner Join is the only output where field names are renamed if duplicates exist. Thus if you want to construct a Full Outer Join, one needs to account for the duplicate names coming from all three anchors. So in the example you provide, it’s actually not the Join where the misconfiguration is, but rather the Union tool. You have it set on “Auto Configure by Name”. In your configuration, with the Right source not being affected by the Join, it is automatically matching up with the Inner and Left outputs from the Join. Instead it should be matched up with the ‘right-sourced’ field name from the inner join. So you would need to “Manually Configure Fields” as below…

 

Manually Configure Union.png

 

So just as there’s a way to determine unmatched records in SQL that may not be quite obvious to someone who doesn’t understand SQL, the same is true that creating a Full Outer Join within Alteryx has its own unique way to create it (and if configured correctly, it is by definition a "Full Outer Join".)

 

Many “SQL-savvy” Alteryx users have found that knowing SQL can be extremely helpful when working with Alteryx. But they have also found that Alteryx is its own software platform with its own strengths and ways to approach data blending (and can be much easier than writing loads of SQL code Smiley Happy ). Your comment that “two approaches that most people will not realize result in very different results” will be true if you expect one software to act like another. (Which is true even across SQL platforms…for while most SQL platforms are SQL-92 compliant, there are still nuances between MSSQL, PostgreSQL, MySQL, Oracle PLSQL, and others that users of those platforms need to understand.) Trying to impose concepts from one programming platform to another without understanding how both work can indeed be problematic.

 

KaiLarsen
9 - Comet

Fair enough, @michael_treadwell. I definitely agree with @RodL that you your answer showed a strong understanding of the tools.  AND you answered the question of what happened beautifully.  As soon as I've determined that Alteryx (the company) is satisfied with this functionality, I'll mark your answer as solution. Thanks again.

KaiLarsen
9 - Comet

Dear @RodL,

 

Thanks for the answer. I have absolutely no problem with any of what you are stating. Love the unjoin functionality, and my message should not in any way be taken as a suggestion that SQL in some way is better than Alteryx for blending, etc. I love Alteryx, and if the choice is between SQL and Alteryx unchanged, I would go for Alteryx. To be clear, I was not comparing SQL and Alteryx as your comment suggested that you thought. Sorry for not being clearer about that.

 

What I was saying was this:

1. If I am doing a full outer join on two tables within Alteryx, Alteryx will create completely different results simply based on the column names I join on.  For example, if I join on FacultyId = FacultyId this will create dramatically different results than FacultyId = TeacherId. Because the problem arises from another node than the one I configure, I suspect many will be confused. Most databases will contain instances where one will have to join on fields that sometimes have the same names and sometimes on field that have different names. I wonder how many of your new user base (within last year or two) are aware?

2. This is not properly documented in the documentation I've seen, including the link provided by Tara.

3. When one appropriates a name ("Full Outer Join") from a standard as widely accepted and used as SQL, that obligates. It could be argued that having extra functionality (such as unjoins) does not change that.

 

Anyway, if there are no worries over having a documented functionality in Alteryx named "Full Outer Join" that sometimes creates results identical to a SQL "Full Outer Join" and sometimes creates incorrect results unless the user goes in and carefully changes the functionality of the union node, that is fine by me. I'm just a bit surprised that warnings about this is not plastered all over the join documentation (the problem applies to more than just the full outer join).

 

Thanks again for helping me understand!

Labels