Join with multiple conditions
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I need to do a blend that looks like a join with multiple conditions.
Here is source 1, my demographics:
primary id | secondary id | last name | first name | dob |
123456 | ABCDEF | smith | jacob | 1/1/2000 |
789012 | GHIJKL | anderson | michael | 6/27/2001 |
We get data back from a test vendor that looks like this:
secondary id | last name | first name | dob | test date | test score |
ABCDEF | smith | jacob | 1/1/2000 | 5/1/2017 | 20 |
anderson | michael | 6/27/2001 | 5/1/2017 | 25 |
When I import it, I need to use the primary key. No big deal. The problem comes when we have a student like in the second row above, where the secondary id didn't come over. Logically, I can match on name, but I am not sure how to implement. Is this a fuzzy match?
I come from SQL, and this is the type of logic I want to implement:
from testscores left outer join demographics on testscores.secondary_id = demographics.secondary_id or (testscores.last_name = demographics.last_name and testscores.first_name = demographics.first_name and testscores.dob = demographics.dob)
We are massaging this file to be imported into a data system, and I can't have any records match that shouldn't. I'd rather see a blank primary_id in the output file than a mismatched student. Fuzzy match is intimidating, but I haven't used it yet.
Solved! Go to Solution.
- Labels:
- Fuzzy Match
- Join
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If there are variations in the names between the two tables, eg the same person is named "Mike" in one and "Michael" in the other, then you may need Fuzzy Matching.
To start with, try something like the attached with two Joins and a Union, and see what falls out, with some sample representative data of your problem situations, both the input data, and what you expect for output, then an example can be made. The dynamic rename is used to add a prefix instead of the default "Right_".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Joe, this is great. I understood that very easily.
I'm not going to worry about fuzzy matching on nicknames yet, because the process already pretty much guarantees that legal names are used when it is gridded in. The ID number, however, or lack of one, depends on when they took the test. The big day uses a 9-digit ID, but if the kid takes it on his own, then very likely no ID number comes over.
I've realized that a mind shift I need to make is to stop thinking about doing everything in one step. Rather, it's about piping the output into different handler(s) based on results. Piping the Left into a second join also makes sure that the order of operations is preserved.
Thank you very much for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey Joe!
You're a beast!
I love you dude haha
