I have been trying to find an efficient way to compare 2 data sets to find missing or extra information, and was wondering if anyone would be able to help me out.
I have the below example. Dataset 1 is sort of a "key", where I have a set of values in column B that should be associated with a set value in column A. In dataset 2, I can have multiple records that contain the set value in column B, and then the values I want to compare are in column C. So I want to use the relationships of column A-B on Dataset 1 to compare to Column B-C on dataset 2. I have included my ideal output as well.
Solved! Go to Solution.
Hi @IMensink,
The Join tool is your friend for this. By Joining data and using the left and right unjoined rows you can find where each data set doesn't match the other. Additional data from Dataset 2 (Column A) must be added to Dataset 1 (Summarize and Join) to allow for all fields to be compared in the 2nd Join. The remaining tools rename, change field type, summarize, concatenate, and Join/Union the data together in the format you are looking for.
Thank you @T_Willins, very helpful!
Hi @T_Willins
Please could you help me if this solution will work for my requirement which is similar to this.
I have to find the missing Type Group (rows) with the Lookup.
Lookup Table columns Seq Code, Type Code matched with the sample Data .. and any missing Type Group must be reported for each Seq# for the Seq Code, Type Code combination.
Sheet Unit_Test contain the example which is expected to be in output.
Thanks
Meena
Hi @Meena,
From your description and your data it looks like you need to validate the data from the lookup rules is included in the Input_Entire Volume for each Seq #. The attached workflow produces the results on the Macro Output_Entire Volume tab, with what looks like additional missing records. For example, your Excel file indicates Seq Code: 300; Seq #: 9450645163; Type: 4 is missing Child, but the Lookup_Rules indicates this Seq Code/Type Code combination should have Type Groups Child, Father, and Mother. The input data only has Type Group Father, so the Output should have both Child and Mother as missing.
Thank you @T_Willins.
This solution worked.