Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Iterative Data Comparison

Kb7980
7 - Meteor

HI

I am looking to compare data between 2 files - Pre and Post. 

1.Col A to Col D in both files are columns which form key to compare data 

2. Col E and Col F are being compared 

3. When data matches , i am looking to populate "Match" in a new column for Col E and Col F

4.When data dsnt match, i am looking to populate "MisMatch"

5. When there is new key in Post, i am looking to populate N/A

6. Keys in both Pre and Post can be repetitive.

 

The bigger problem in this situation is that instead of 2 target columns that I need to compare, I actually have 100 columns that need to be compared. With little experience that I have, it looks like a batch macro might have to be used here but I have no idea on how to implement this. Any help is appreciated

16357866610476559983347706612739.jpg

 

 

19 REPLIES 19
AZuc
Alteryx
Alteryx

@Kb7980 

 

There's the V3. But with a different result. The difference is based on the number of duplications. There are 3 rows with a1 b1 c1 d1 in Post and 2 rows in Pre. Comparing 3x2 gave me 6 possibilities. Your output is showing only 3. Please check what would be the correct output, and in case yours is correct, what is the rule to compare multiple x multiple.

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.


Kb7980
7 - Meteor

Hi Andre

Thanks for your help so far.

The output I gave earlier is the correct output. There should only be 9 records in total in the output data.

Rule for comparison:

1. We are searching for the data in Post in the data in Pre , so the output data will always have same number of rows as in the Post data.

2. The data in Pre and Post is being compared basis combination of Col A,ColB,ColC,ColD as the key.

 

So, for a1b1c1d1 combination we have below values for ColE in Pre data

Pre : e1, e15  ( lets say Set1) 

While comparing Post data now:

2nd row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e1 and search if this e1 is present in the Set1. It is present and hence "Match"

9th row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e17 and search if this e17 is present in the Set1. It is not present and hence "Mismatch"

10th row in Post - we have combination a1b1c1d1 and we check the value of ColE which is e15 and search if this e15 is present in the Set1. It is present and hence "Match"

 

Similarly for ColF also.

 

There will be duplicates in both Pre and Post. While displaying results in Output data, if Post data has any duplicates they will have to still be shown.

AZuc
Alteryx
Alteryx

@Kb7980 

 

hmmm, per your description it looks like we need to compare all "sets" in post to only the very first set in Pre. Is it right?

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.


Kb7980
7 - Meteor

Nope. Basically all sets in Post will have to be searched in all sets in Pre.

If a combination in Post is available in Pre, then check the corresponding (eg ColE , ColF) value in the Pre sets and if matches then "match" else "mismatch" 

 

If a combination in Post is not available in Pre, then NA

 

 

AZuc
Alteryx
Alteryx

Ok. But let's say I compare a set in Post with 2+ sets in Pre for a given column (E, for instance). 1 matches the other don't. Which one I pick? 

 

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.


Kb7980
7 - Meteor

Ok, let me explain this with example.

Scenario 1:

Post

Combination is a5b5c5d5

ColE value is e20

 

Pre

Combination is a5b5c5d5 and ColE value is e20

Combination is a5b5c5d5 and ColE value is e8

Combination is a5b5c5d5 and ColE value is e7

 

 

In this case ColE status in Post will be match

 

Scenario 2:

Post

Combination is a5b5c5d5

ColE value is e20

 

Pre

Combination is a5b5c5d5 and ColE value is e21

Combination is a5b5c5d5 and ColE value is e8

Combination is a5b5c5d5 and ColE value is e11

 

 

In this case ColE status in Post will be Mismatch

 

Scenario 3:

Post

Combination is a5b5c5d5

ColE value is e20

 

Pre

Combination a5b5c5d5 is not available

 

In this case ColE status in Post will be NA

 

Hope this helps

 

AZuc
Alteryx
Alteryx

Ok. I wrote in a different way but it looks like in case of conflict, match "wins". I'll adjust the WF and send you

 

Post

Set1 (row1) e1 f1
Set2 (row8) e17 f1
Set3 (row9) e15 f2

 

Pre
Set1 (row1) e1 f1
Set2 (row7) e15 f1

 

For colE
Post1 x Pre1 = e1 x e1 then match (fits to your output)
Post1 x Pre2 = e1 x e15 then mismatch

Post2 x Pre1 = e17 x e1 then mismatch (fits to your output)
Post2 x Pre2 = e17 x e15 then mismatch (fits to your output)

Post3 x Pre1 = e15 x e1 then mismatch
Post3 x Pre2 = e15 x e15 then match (fits to your output)

 

For colF (all fits to your output)
Post1 x Pre1 = f1 x f1 then match
Post1 x Pre2 = f1 x f1 then match

Post2 x Pre1 = f1 x f1 then match
Post2 x Pre2 = f1 x f1 then match

Post3 x Pre1 = f2 x f1 then mismatch
Post3 x Pre2 = f2 x f1 then mismatch

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.


AZuc
Alteryx
Alteryx

@Kb7980 

André Zuccatti

Sales Engineer - LATAM -
Alteryx, Inc.


Kb7980
7 - Meteor

Thanks Andre. I am testing it but it looks great as of now. Thanks a lot for your support on this so far and appreciate your patience on this. Solution looks great anyways. Cheers

Kb7980
7 - Meteor

Hi Andre

The workflow has been working well so far.However in case of huge records (300k) where  combinations of columns in "Pre" are repeated multiple times within "Pre" (with different values in column e and column f - the workflow got stuck and processing temp data ran into 5000gb. 

 

Regards

 

 

Labels