Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Join to Find Difference Instead Creates Thousands of Duplicate Records

hellyars
13 - Pulsar

I am having a join issue.

 

1,054 records from Input A.  Input A = 2021

3,222 inputs from Input B.  Input B = 2018-2021. 

Every record in A has a corresponding record in B.  But, B has a few records missing from A.

I want to find the missing records in B and join them to A.  To do this I have to ignore prior years.

Right now my join is outputting 1.7 million records.  

 

I have attached a poor attempt to replicate my issue at a much smaller scale.

 

UPDATE/CLARIFICATION:

In current example, A only represents 1 year of data.  In the real world, it could represent multiple years of data.  In both cases I am trying to find the missing records in B and join them to A.  

 

I am also trying to avoid manual filters.  Ideally, I want A to filter B or align B based on year in A without human interaction.  

 

 

Help. Thanks!

 

 

5 REPLIES 5
Emil_Kos
17 - Castor
17 - Castor

Hi @hellyars,

 

I am not sure what you want to achieve.

 

This is the filter that you can use in order to keep only 2021 data. 

 

Emil_Kos_0-1611077498983.png

What columns do you want to use in order to join the data that you weren't able to join?


If ID I just used the data that I filtered out based on the year column.

 

If this is now what you are looking for can you provide us with a little bit more details?

 

Emil_Kos_1-1611077693571.png

 

 

hellyars
13 - Pulsar

@Emil_Kos   I am trying to avoid the pre-filters because ultimately I want to run all years simultaneously OR I want the input A to tell Input B to filter on Input A's year (without human interaction).

 

Emil_Kos
17 - Castor
17 - Castor

Hi @hellyars,

 

can you add year to the join as an additional joining field?

hellyars
13 - Pulsar

@Emil_Kos   In the real data I already knew how join (similar to what you suggested) if the Inputs were 1 year each and I could throw filters in here and there.   Things got complicated when B became multiple years (A can to).   I have 1,054 records turning into 1.7 million.  And I am only joining on Year and ID. 

Emil_Kos
17 - Castor
17 - Castor

Hi @hellyars,

 

Now I get it. 


If you are having duplicates in the data set you need to use sample tool or find and replace instead of joining. 

 

I am not sure what another approach to propose. Maybe you can summarize the data for the duplicated positions in the data set B in order to have one line per ID? 

 

Labels