I couldn't think of a proper title, but I'll try to explain here.
Let's take three data sets:
Input 1
Fruit | Color |
Apple | Red |
Pear | Green |
Input 2
Fruit | Ripe? |
Apple | Yes |
Pear | Yes |
Input 3
Fruit | Cost | Amount |
Apple | 1.00 | 5 |
Pear | 1.50 | 2 |
Peach | 0.50 | 1 |
I'd like to join all three data sets so it looks like this:
Desired Result:
Fruit | Cost | Amount | Ripe? | Color |
Apple | 1.00 | 5 | Yes | Red |
Pear | 1.50 | 2 | Yes | Green |
Peach | 0.50 | 1 |
I set "Join by Specific Fields" and list "Fruit" as all 3 inputs.
On the outputs, I select all fields including Cost, Amount, Ripe and Color. I also need to select "Fruit" from one of the inputs. The problem is, if I select from input #1 or input #2, peach does not show up. if I select from #3, Apple/Pear do not show up. If I select from more than one, we end up with a result like this:
Fruit#1 | Cost | Amount | Ripe? | Color | Fruit #2 |
Apple | 1.00 | 5 | Yes | Red | |
Pear | 1.50 | 2 | Yes | Green | |
0.50 | 1 | Peach |
I thought about a union tool, but then will simply stack them on top of one another rather than layer them in as columns.
Any idea how to get this one to work without some cumbersome workaround?
Solved! Go to Solution.
Hi @CGIQV,
Whenever I want to control the records that do not join, I tend to shy away from using the multiple join tool and just bring multiple regular joins on the canvas so that I can union after each individual join. In the attached example, it would give you the desired output no matter which combination of fruits are in each file.
Best,
mmenth
The results look a bit incomplete at first because of not having a direct match for a join across all three inputs. Attaching a workflow with a couple ways to get the desired output - one join at a time and an outer join, or using that Join Multiple with a select tool to re-organize the fields.
Yep, that's exactly what I was thinking.
I was literally typing out a response to my own post and clicked back quickly and saw your workflow which appears to do exactly what I was trying to explain below....
Option 2 on this one seems to be the most straight forward.
Thank you.
EDIT: In hindsight, this would only work if there was a single input that encapsulated all fruits in the dataset. In this case, input #3 does, but this may not always be the case.
@CGIQV - Totally correct, it's definitely the less dynamic of the options. To be the most flexible you'd want to use joins/unions to do complete outer joins to make sure you're not missing anything from the Join Multiple.
You can also add a formula after the union that say if fruit field is null, use right fruit field which would take care of those situations.