Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

How To: Compare Data from Two Data sets

TonyM
Alteryx Alumni (Retired)
Created

Data Integrity refers to the accuracy and consistency of data stored in a database, data warehouse, data mart or other construct, and it is a fundamental component of any analytic workflow. In Alteryx, creating a macro to compare expected values to actual values in your data is quite simple and provides a quality control check before producing a visual report. Let me show you how to build this.

 

compare values.png 

 

The two inputs represent the actual and expected values in your data. These data streams are passed through a Record ID tool to keep positional integrity and then passed on to the Transpose tool to create two columns. The first column contains the field names and the second column shows the values within each field. This data is then passed on to a join, matching on Record ID and the Name of the field, in order to compare each value. Lastly, if the data does not match from expected to actual, a custom message will appear in the results messages alerting the user where the mismatch happened within the dataset. The image below shows the error message produced if values differ across datasets.

 

compare values 2.png

 

 

Attachments
Comments
MandeB
5 - Atom

Hello,

 

I'm trying to identify a way to build in a workdlow step to compare two data sets based on a percent variation or a number variation and notify me if the comparison shows that the variation is greater than a set threshold. For example: I have an open accounts receivable file from May that has 500,000 accounts listed and I have an open accounts receivable file from April that has 700,000 accounts listed. The variation between them being 200,000 accounts. I want to build a compare function that will automatically alert me if the variation between the two is greater than say 50,000 accounts. How would I go about doing that?

 

Thanks!

TonyM
Alteryx Alumni (Retired)

Hi @MandeB,

 

First you will need to create a count for each month which can be done in the Summarize tool. Then I would use the formula tool to create a new field that calculates the difference between the two months. Once you have this new field, you can then use an if/then statement to flag any counts greater that the 50,000 account threshold. 

 

formula_flag.jpg

Rohit_Bajaj
9 - Comet

Hi Tony,

 

I have a requirement where in some business logic has been applied when moving data from source to target.

Source is denormalized i.e. flat structure and target is normalized i.e. Name vaue pairs.

Can I use this macro for such cases as well, main query is if there is a formula applied when moving data from source to target, can a reverse formula be applied when comparing fields.

 

Thanks

Rohit

ulrich_schumann
8 - Asteroid
Thanks for sharing the macro Tony - this is very helpful for me.
raamuk
5 - Atom

Hi @

 

 

 

 

raamuk
5 - Atom

Hi @

 

 

 

 

 

 

kirangottu
7 - Meteor

Pls tell me How to perform MINUS using Altreyx tool...

I am a ETL tester,this function is very much needed...Currently I am doing join and comparing every column whcih is little bit tideous job. I want all the source and target columns shold be compared in a one go like MINUS .....Plsss help me....

ThomasL
Alteryx Alumni (Retired)

This is a great write-up! Thanks! Smiley Happy

Kim_Jensen
5 - Atom

I would like to create a message when I have in-joined records. If I have more than 0 right or left in-joined I would like to create a flag for myself. What is the "message expression" I should use? Do I need a message tool after each join? Do I need separate tools for left joins vs. right joins? 

Kim_Jensen
5 - Atom

My previous comment was asking about un-joined records not in-joined - the helpfulness of spell-checkers :(

r4upadhye
11 - Bolide
There need to be a rather simple tool for comparing two data servers: A) Prod DataServer vs. B) Dev DataServer which would compare all the dbs, tables, metadata, attributes, etc... Previously I was using Micosoft Database Compare 2013, but I was looking for a robust solution in Alteryx. Any thoughts?
AlexSantos
5 - Atom

Thank you very much, it helped a lot and I learned to use a new tool

oliverd
7 - Meteor

Helped a lot thanks.