I am having a join issue.
1,054 records from Input A. Input A = 2021
3,222 inputs from Input B. Input B = 2018-2021.
Every record in A has a corresponding record in B. But, B has a few records missing from A.
I want to find the missing records in B and join them to A. To do this I have to ignore prior years.
Right now my join is outputting 1.7 million records.
I have attached a poor attempt to replicate my issue at a much smaller scale.
UPDATE/CLARIFICATION:
In current example, A only represents 1 year of data. In the real world, it could represent multiple years of data. In both cases I am trying to find the missing records in B and join them to A.
I am also trying to avoid manual filters. Ideally, I want A to filter B or align B based on year in A without human interaction.
Help. Thanks!
Hi @hellyars,
I am not sure what you want to achieve.
This is the filter that you can use in order to keep only 2021 data.
What columns do you want to use in order to join the data that you weren't able to join?
If ID I just used the data that I filtered out based on the year column.
If this is now what you are looking for can you provide us with a little bit more details?
@Emil_Kos I am trying to avoid the pre-filters because ultimately I want to run all years simultaneously OR I want the input A to tell Input B to filter on Input A's year (without human interaction).
@Emil_Kos In the real data I already knew how join (similar to what you suggested) if the Inputs were 1 year each and I could throw filters in here and there. Things got complicated when B became multiple years (A can to). I have 1,054 records turning into 1.7 million. And I am only joining on Year and ID.
Hi @hellyars,
Now I get it.
If you are having duplicates in the data set you need to use sample tool or find and replace instead of joining.
I am not sure what another approach to propose. Maybe you can summarize the data for the duplicated positions in the data set B in order to have one line per ID?