HI
I am looking to compare data between 2 files - Pre and Post.
1.Col A to Col D in both files are columns which form key to compare data
2. Col E and Col F are being compared
3. When data matches , i am looking to populate "Match" in a new column for Col E and Col F
4.When data dsnt match, i am looking to populate "MisMatch"
5. When there is new key in Post, i am looking to populate N/A
6. Keys in both Pre and Post can be repetitive.
The bigger problem in this situation is that instead of 2 target columns that I need to compare, I actually have 100 columns that need to be compared. With little experience that I have, it looks like a batch macro might have to be used here but I have no idea on how to implement this. Any help is appreciated
Solved! Go to Solution.
There's the V3. But with a different result. The difference is based on the number of duplications. There are 3 rows with a1 b1 c1 d1 in Post and 2 rows in Pre. Comparing 3x2 gave me 6 possibilities. Your output is showing only 3. Please check what would be the correct output, and in case yours is correct, what is the rule to compare multiple x multiple.
Hi Andre
Thanks for your help so far.
The output I gave earlier is the correct output. There should only be 9 records in total in the output data.
Rule for comparison:
1. We are searching for the data in Post in the data in Pre , so the output data will always have same number of rows as in the Post data.
2. The data in Pre and Post is being compared basis combination of Col A,ColB,ColC,ColD as the key.
So, for a1b1c1d1 combination we have below values for ColE in Pre data
Pre : e1, e15 ( lets say Set1)
While comparing Post data now:
2nd row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e1 and search if this e1 is present in the Set1. It is present and hence "Match"
9th row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e17 and search if this e17 is present in the Set1. It is not present and hence "Mismatch"
10th row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e15 and search if this e15 is present in the Set1. It is present and hence "Match"
Similarly for ColF also.
There will be duplicates in both Pre and Post. While displaying results in Output data, if Post data has any duplicates they will have to still be shown.
hmmm, per your description it looks like we need to compare all "sets" in post to only the very first set in Pre. Is it right?
Nope. Basically all sets in Post will have to be searched in all sets in Pre.
If a combination in Post is available in Pre, then check the corresponding (eg ColE , ColF) value in the Pre sets and if matches then "match" else "mismatch"
If a combination in Post is not available in Pre, then NA
Ok. But let's say I compare a set in Post with 2+ sets in Pre for a given column (E, for instance). 1 matches the other don't. Which one I pick?
Ok, let me explain this with example.
Scenario 1:
Post
Combination is a5b5c5d5
ColE value is e20
Pre
Combination is a5b5c5d5 and ColE value is e20
Combination is a5b5c5d5 and ColE value is e8
Combination is a5b5c5d5 and ColE value is e7
In this case ColE status in Post will be match
Scenario 2:
Post
Combination is a5b5c5d5
ColE value is e20
Pre
Combination is a5b5c5d5 and ColE value is e21
Combination is a5b5c5d5 and ColE value is e8
Combination is a5b5c5d5 and ColE value is e11
In this case ColE status in Post will be Mismatch
Scenario 3:
Post
Combination is a5b5c5d5
ColE value is e20
Pre
Combination a5b5c5d5 is not available
In this case ColE status in Post will be NA
Hope this helps
Ok. I wrote in a different way but it looks like in case of conflict, match "wins". I'll adjust the WF and send you
Post
Set1 (row1) e1 f1
Set2 (row8) e17 f1
Set3 (row9) e15 f2
Pre
Set1 (row1) e1 f1
Set2 (row7) e15 f1
For colE
Post1 x Pre1 = e1 x e1 then match (fits to your output)
Post1 x Pre2 = e1 x e15 then mismatch
Post2 x Pre1 = e17 x e1 then mismatch (fits to your output)
Post2 x Pre2 = e17 x e15 then mismatch (fits to your output)
Post3 x Pre1 = e15 x e1 then mismatch
Post3 x Pre2 = e15 x e15 then match (fits to your output)
For colF (all fits to your output)
Post1 x Pre1 = f1 x f1 then match
Post1 x Pre2 = f1 x f1 then match
Post2 x Pre1 = f1 x f1 then match
Post2 x Pre2 = f1 x f1 then match
Post3 x Pre1 = f2 x f1 then mismatch
Post3 x Pre2 = f2 x f1 then mismatch
Thanks Andre. I am testing it but it looks great as of now. Thanks a lot for your support on this so far and appreciate your patience on this. Solution looks great anyways. Cheers
Hi Andre
The workflow has been working well so far.However in case of huge records (300k) where combinations of columns in "Pre" are repeated multiple times within "Pre" (with different values in column e and column f - the workflow got stuck and processing temp data ran into 5000gb.
Regards