I have been tasked with reviewing two similar but different HR source data files. Some analytics needed:
The unique values available in certain fields... % of rows that match between the sources, % Null
The only field that links these two data sets is a UniqueID. The fields for each are similar but not identical. There are over 100 fields for each dataset and each file is around 200k rows (all users in the company)
Would it help if I could manually map the fields first? How could I iterate through every combination of that mapping to do the analysis for the data?
Appreciate any guidance on how to proceed. I can provide some demo data if that would help.
Adam