Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Join tool duplicate confusion

Alteryxuserhere
8 - Asteroid

Hi there

Can someone explain to me in simple term how I can use the join tool to check what does not have a match? The right output seems to generate more duplicates that what was in the Right input file. 

How do I just get it to show the exact rows that dont match? The input file has duplicates which I would like to not remove and keep in the output file.

When I try to link the R output to another join, it is generating alot of rows? How do I avoid it? 

 

Thanks!

 

 

13 REPLIES 13
OTrieger
12 - Quasar

As you are using the same data, the one the comes from the Left side after the first Join is the same data without the matches. But still the same data, so the minute that you joining again you are getting Null items on the L, J you will get all the items that comes from L and on R you will have the items that were joined on the first Join

OTrieger
12 - Quasar

Yes as you are using only the right side data.

If you would like to find additional matches that you did not find in the first time, then you will need to connect the Left side of the first Join to the Left side of the 2nd Join.

OTrieger
12 - Quasar

Actually it will be even better to Connect the L from 1st Join with the R from the 1st Join. In this way you will find additional matches from the parts that still not matching. The minute that you are taking the original R then you might have duplication that matched on the first time and also on the 2nd time. But I'm not sure exactly what you would like to achieve here.

ChrisTX
16 - Nebula
16 - Nebula

@Alteryxuserhere a little suggestion:

 

  • identify your data sources going into each Join:
    • your "primary" original input A:  feed this data into the first Join, Left input
    • your secondary input B:  feed this data into the first Join, Right input

The records not matched from the first join will be data coming OUT of the first Join, Left output.

 

For the second Join tool, I think you 'll want to use:

  • Left input: data coming OUT of the first Join, Left output
  • Right input option 1: do you want to go back to the original secondary input B?  this may create duplicates.
  • Right input option 2: you likely want to use the data coming OUT of the first Join, Right output.

See example of multiple Join tools, using a "waterfall" approach, here:

  • Inside Alteryx Designer: Help > Sample Workflows > Use scripting and automation tools > Build a macro > Merge to master file with fuzzy matching
  • You don't have to use the Fuzzy Match tool, just look at the "waterfall" flow and see where the Left and Right inputs come from
  • The "waterfall" approach is:  try matching using first criteria; for those records that don't match try matching using second criteria

 

Chris

Labels