Hi All,
I have an issue with trying to use the join tool like vlookup.
Here's the issue:
Dataset A contains "Identifier1"
Dataset B contains "Identifier2"
(most of the customers overlap, but their two identifiers, 1 & 2, are different from each other)
Legacy Dataset - contains "Identifier1" and "Identifier2"
I am trying to find "new" customers where they do not match on either of their identifiers in the Legacy Dataset.
I perform a join where I am using both criteria in the join - then I can export the file and do vlookups to see if there are any matches remaining- there are.
How can I get the join to omit where there is any match between the datasets?
Solved! Go to Solution.
One way I can think of doing this would be to have a Join tool with the legacy dataset in the R input and Dataset A in the L input, then have a second join tool that has Legacy Dataset in R input and Dataset B in the L input, Then you can union the Two L output anchors (unmatched records from A and B) or you can join them together by some other identifier you have. From there you can filter for the uniques using the Unique tool and output
Hi Tessa,
This is a little hard to follow.
Sounds like perhaps matching to the first dataset and taking the unmatched (left) output and then matching against the second and again taking the unmatched output could work.
But its not clear to me if your datasets A and B are both discreet. Or is there a customer_id that they share?
If you can post some sample files or images that would clear things up i think.
Oh, how I wish that would work! That's what I was initially trying to do and getting nowhere. I'd still find matches outside, when validating with a vlookup.
I'm just not sure what I am doing wrong here!
It's hard even for me to follow! I'll try my best to clarify...
I am trying to find "new customers" based on their IDs and whether they've shown up in our database before - there are two possible IDs that they could have- ID#1 and ID#2. We'll call this "New Customer DB"
In the "legacy" database, each customer has both an ID#1 and an ID#2.
1. "New Customer DB" contains both ID#1 and ID#2
2. Join "New Customer DB" to "Legacy" on ID#1
3. Join "New Customer DB" to "Legacy" on ID#2
A customer can match on one but not the other - so in some instances you may see a customer who doesn't match on ID#1 but does match on ID#2. Same in reverse- match on ID#2 but not on ID#1. Neither of those customers are "new."
After I complete steps 1-3, I need to bring this data together in an additional join; my attempts at this have been fruitless.
I am able to export steps #2 & #3 and find all new customers using 2 vlookups and some filters, but I would like to be able to accomplish that in the workflow....
Hi @Tessa
This is one those issues where sample data is extremely useful. I've built something based on what I think your 2 inputs look like, but your statement ""New Customer DB" contains both ID#1 and ID#2" is ambiguous. Does it mean that New Customer DB has one ID column that can contain values from both legacy ID columns, or does New Customer DB have an ID1 and an ID2 column.
This solution assumes the latter, both tables contains an ID1 and and ID2 column
It uses 2 joins and then unions the results and performs a unique on New customer name. This gives you the new customers that match one of 2 legacy ids. This list is joined with the New Customers and the R output is used to give you the New Customers that don't match either.
If this isn't what you're looking for, please provide some sample data that covers all the possible cases and what the output will be. Modify the contents of the the legacy and new customers text inputs and post the workflow back here
Dan
Dan- thank you for this! This is a great perspective - I was stuck my approach and your angle on the problem helped out. I believe your solution works!
Thanks so much!