I regularly get data sets that I need to join (on 1 or more fields) and the first thing I want to know is how much of the data is common and will join, and how much will fall out of the process.
The fall out may be bad data that needs cleaning or is legitimately dissimilar. For Example, take two data sets to compare:
| Data Set One | | | | | Data Set 2 | | | |
| ROLE | LEVEL | TEAM | Classification | | Type | Category | Experience | Scrum |
| Analyst | Senior | Full Stack | Big Data | | Java SE | Cloud Native | Junior | Digital |
| Analyst | Junior | Waterfall | Blockchain | | Java SE | Java, BA | Junior | Niche |
| DEV | Mid | Standard | Design Systems Designer | | PM | Big Data | Intermediate | Standard |
| DEV | Senior | Full Stack | Design Systems Engineer | | Tester | Cloud | Senior | Standard |
| Java | Junior | Waterfall | Cloud Native | | Data Scientist | Azure | +5 Years | Digital |
| Java | Mid | Standard | Cloud Native | | Data Modeller | Big Data | Senior | Waterfall |
Project Manager | Senior | Standard | .Net | | Developer | Design Systems Designer | +5 Years | Niche |
I want to make a macro where I can take the two data sets, select the Fields that I want to compare, so for the data above, I want a selection similar to a Join tool, where I select fields:
- Data Set One - ROLE <-> Data Set 2 - Type
- Date Set One - LEVEL <-> Date Set 2 - Experience
- Date Set One - TEAM <-> Date Set 2 - Scrum
- Date Set One - Classification <-> Date Set 2 - Category
Each field value comparison will be independent. I've seen examples of how to create a single list of the field names, but not one that allows me to select and match up both sets of field names. Has any one got a solution here?