Comparing Two Large HR data files

Question

I have been tasked with reviewing two similar but different HR source data files.  Some analytics needed:

The unique values available in certain fields... % of rows that match between the sources, % Null

The only field that links these two data sets is a UniqueID.  The fields for each are similar but not identical.  There are over 100 fields for each dataset and each file is around 200k rows (all users in the company)

Would it help if I could manually map the fields first?  How could I iterate through every combination of that mapping to do the analysis for the data?

Appreciate any guidance on how to proceed.  I can provide some demo data if that would help.

Adam

JordanB · Accepted Answer

Hi @aehrenwo

Here are some god resources that can be used to answer a few of your scenarios below.

Field Summary tool will give you the unique values available in certain fields and % Null

Join Tool on Unique ID will give you those records which match and un-match. You can then use the count records tool on the J (Joined records) and take that value away from the total for all records to get the % match on unique ID.

To compare everything you could do a similar process to above, but before the Join you could look at transposing the data and then joining. An example can be found in this article.

Hope this provides more clarity as to how you can go about this.

Best,

Jordan Barker

Solutions Consultant