Hi All,
Hoping to pick you brains on best approach
Aim : For given ID I need to check if the values match and if its matches then then flag them as "Yes" if not equal then "No".
Currently : I have two data inputs, in both Inputs i have two columns (Id & ,Name), I have union'd the two data sets together
Input A
ID | Name |
11111 | Masond3 |
22222 | Apple Inc |
Input B
ID | Name |
11111 | Masond30 |
22222 | Apple Inc |
Union Output
ID | Name |
11111 | Masond3 |
11111 | Masond30 |
22222 | Apple Inc |
22222 | Apple Inc |
Expected outcome
ID | Name | Name Match |
11111 | Masond3 | |
11111 | Masond30 | No |
22222 | Apple Inc | Yes |
22222 | Apple Inc |
Solved! Go to Solution.
@Christina_HThis gets me out of a jam, so i can least proceed.
Do you know if there a way where where given the input you could set the rules ?
The rules to select the correct name are in the sort tool, the following unique tool takes the first name for each ID. So to change the rules you would need to adjust the sort.
What kind of rules were you thinking of?
So i am thinking that "Input A", is my "golden version of the Truth", therefore any other Input .i. B,C,Z etc, i want to compare those values against "Input A"
Now for scalability there also be other attributes that i would need to compare.
Example,
Address,
Website,
ID | Input | Name | Website | Address | Name Match | Website match | Address match |
11111 | A | Masond3 | www.masond3.ac.uk | 1 High St | Yes | Yes | |
11111 | B | Masond30 | www.masond3.ac.uk | 1 High St | No | Yes | Yes |
11111 | C | Masond3 | www.masond3.ac.uk | 1 high Stree | Yes | No | |
11111 | D | Masond2 | www.masond3.ac.uk | 1 High Street | No | Yes | No |
Updated workflow attached that will handle multiple columns. I've left the output transposed - it can be cross-tabbed to the output you've provided, but the table will get wider and harder to read with every new attribute you want to compare.
I've also removed the rule that took the majority version of the name (or address/website), the "correct" version will be from the earliest input. Depending on your data, you might want to update this to take the earliest non-null value?