Alteryx Designer Desktop Discussions

Kb7980 · ‎11-01-2021

HI

I am looking to compare data between 2 files - Pre and Post.

1.Col A to Col D in both files are columns which form key to compare data

2. Col E and Col F are being compared

3. When data matches , i am looking to populate "Match" in a new column for Col E and Col F

4.When data dsnt match, i am looking to populate "MisMatch"

5. When there is new key in Post, i am looking to populate N/A

6. Keys in both Pre and Post can be repetitive.

The bigger problem in this situation is that instead of 2 target columns that I need to compare, I actually have 100 columns that need to be compared. With little experience that I have, it looks like a batch macro might have to be used here but I have no idea on how to implement this. Any help is appreciated

AZuc · ‎11-03-2021

@Kb7980

There's the V3. But with a different result. The difference is based on the number of duplications. There are 3 rows with a1 b1 c1 d1 in Post and 2 rows in Pre. Comparing 3x2 gave me 6 possibilities. Your output is showing only 3. Please check what would be the correct output, and in case yours is correct, what is the rule to compare multiple x multiple.

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.

Kb7980 · ‎11-05-2021

Hi Andre

Thanks for your help so far.

The output I gave earlier is the correct output. There should only be 9 records in total in the output data.

Rule for comparison:

1. We are searching for the data in Post in the data in Pre , so the output data will always have same number of rows as in the Post data.

2. The data in Pre and Post is being compared basis combination of Col A,ColB,ColC,ColD as the key.

So, for a1b1c1d1 combination we have below values for ColE in Pre data

Pre : e1, e15 ( lets say Set1)

While comparing Post data now:

2nd row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e1 and search if this e1 is present in the Set1. It is present and hence "Match"

9th row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e17 and search if this e17 is present in the Set1. It is not present and hence "Mismatch"

10th row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e15 and search if this e15 is present in the Set1. It is present and hence "Match"

Similarly for ColF also.

There will be duplicates in both Pre and Post. While displaying results in Output data, if Post data has any duplicates they will have to still be shown.

AZuc · ‎11-05-2021

@Kb7980

hmmm, per your description it looks like we need to compare all "sets" in post to only the very first set in Pre. Is it right?

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.

Kb7980 · ‎11-05-2021

Nope. Basically all sets in Post will have to be searched in all sets in Pre.

If a combination in Post is available in Pre, then check the corresponding (eg ColE , ColF) value in the Pre sets and if matches then "match" else "mismatch"

If a combination in Post is not available in Pre, then NA

AZuc · ‎11-05-2021

Ok. But let's say I compare a set in Post with 2+ sets in Pre for a given column (E, for instance). 1 matches the other don't. Which one I pick?

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.

Kb7980 · ‎11-05-2021

Ok, let me explain this with example.

Scenario 1:

Post

Combination is a5b5c5d5

ColE value is e20

Pre

Combination is a5b5c5d5 and ColE value is e20

Combination is a5b5c5d5 and ColE value is e8

Combination is a5b5c5d5 and ColE value is e7

In this case ColE status in Post will be match

Scenario 2:

Post

Combination is a5b5c5d5

ColE value is e20

Pre

Combination is a5b5c5d5 and ColE value is e21

Combination is a5b5c5d5 and ColE value is e8

Combination is a5b5c5d5 and ColE value is e11

In this case ColE status in Post will be Mismatch

Scenario 3:

Post

Combination is a5b5c5d5

ColE value is e20

Pre

Combination a5b5c5d5 is not available

In this case ColE status in Post will be NA

Hope this helps

AZuc · ‎11-05-2021

Ok. I wrote in a different way but it looks like in case of conflict, match "wins". I'll adjust the WF and send you

Post

Set1 (row1) e1 f1
Set2 (row8) e17 f1
Set3 (row9) e15 f2

Pre
Set1 (row1) e1 f1
Set2 (row7) e15 f1

For colE
Post1 x Pre1 = e1 x e1 then match (fits to your output)
Post1 x Pre2 = e1 x e15 then mismatch

Post2 x Pre1 = e17 x e1 then mismatch (fits to your output)
Post2 x Pre2 = e17 x e15 then mismatch (fits to your output)

Post3 x Pre1 = e15 x e1 then mismatch
Post3 x Pre2 = e15 x e15 then match (fits to your output)

For colF (all fits to your output)
Post1 x Pre1 = f1 x f1 then match
Post1 x Pre2 = f1 x f1 then match

Post2 x Pre1 = f1 x f1 then match
Post2 x Pre2 = f1 x f1 then match

Post3 x Pre1 = f2 x f1 then mismatch
Post3 x Pre2 = f2 x f1 then mismatch

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.

AZuc · ‎11-05-2021

@Kb7980

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.

Kb7980 · ‎11-05-2021

Thanks Andre. I am testing it but it looks great as of now. Thanks a lot for your support on this so far and appreciate your patience on this. Solution looks great anyways. Cheers

Kb7980 · ‎12-07-2021

Hi Andre

The workflow has been working well so far.However in case of huge records (300k) where combinations of columns in "Pre" are repeated multiple times within "Pre" (with different values in column e and column f - the workflow got stuck and processing temp data ran into 5000gb.

Regards

Alteryx Designer Desktop Discussions

Iterative Data Comparison