Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Dynamic Dataset Variance Comparison- 2 Datasets

Lili1
8 - Asteroid

Hello everyone, I feel like there's an easy answer to this but I have spent hours and cannot find one. I have two datasets being produced from two different sources, that should in theory be exactly the same if they both worked correctly, but they do not.

My goal is to make a workflow that can compare the numeric fields in both based on the unique identifiers (in my very simple example below they are name and city), and at the end have the columns ordered as;

ProdField1, TestField1, VarianceField1, ProdField2, TestField2, VarianceField2, and so on.


I put together a very simple workflow attached that achieves this. My issue is I have to do this comparison for about 10 very different datasets and I have up to 30 numeric fields in some. The dataset records and field names change between pulls so it makes writing/changing the formula that creates the variance as well as sorting them to fit the order in the end result each time really inefficient. Additionally, for some datasets it can be 5-10 fields that create the Unique ID, and the records are never sorted the same in both datasets (each one can include records the other does not). Between the two datasets I am comparing, the columns will always be named the same thing and be ordered the same way, so that's a plus. 

Any help/suggestions you may have are very, very appreciated as it has taken me nearly 3 hours to do only one of them. Thank you so much!

4 REPLIES 4
Qiu
21 - Polaris
21 - Polaris

@Lili1 
For multiple column issue alway think about the combination of Transpose and CrossTab. 😂

Hope you dont mind below

  • The order of Fields might have changed
  • Hope no special chracters such as "_" included in the original field names.
  • I use the "PROD", "TEST" as sufix instead of postfix, since it is more convenient.
    0815-Lili1.png
Lili1
8 - Asteroid

Thank you so much Qiu! This worked for the variance creations but I'm now stuck adding back in in an aspect from the initial workflow, if you have a few minutes I would really appreciate any suggestions you have!

I want to add a dynamic Variance Check at the end, where I can then filter if any variance exists in a record. The way I was doing this before was with a formula tool;

add column
if abs([var_tax]+[var_expense]+[var_tip])=0 then 0 else 1 endif

 

When the column names changed for the datasets with 30+columns, it's very prone to error for me to go in and change the entire formula to account for all variances in that record. Do you have any suggestions of where I could incorporate this in a dynamic way in this workflow? Thank you so very much for this solution, I would have never thought of this approach!

note. the added append and filter below the multi-row formulas are because I want to have a text input variance threshold and then specify if I want a column to not have a threshold. I also changed the filter after the union to be if Var=0, as the filters before made the null specification invalid. 

Qiu
21 - Polaris
21 - Polaris

Let me check later

You mean you want to check all the columns with prefix "Var" to sum up and check if  it is 0 or not?

Lili1
8 - Asteroid

Yes exactly- I tried using a summarize tool or a formula tool, but because the variance columns are only produced for the fields where one or more variances exist, I couldn't find something to work. I was thinking maybe a way to add a variance column for all the fields and then populate it as zeroes if one doesn't exist, but I don't have a working idea to make it dynamic. Thank you so much again for your help Qiu!

Labels
Top Solution Authors