Engine Works

SeanAdams · ‎01-16-2018

One of our most common challenges is comparing 2 data sets - this comes in many forms, either in understanding what fields are different across data sets (especially when moving data from one location to another); checking how field types are different (diagnosing issues with truncation or text field issues) or even in reconciling two data sets.

There's no single tool to do this directly in Alteryx, but with some ingenuity you can do this yourself - this blog tells you how!

Comparing columns

The first challenge is comparing the column names - the most simple comparison is below.

Field info gives the names & types of each field on the two macro inputs. The join matches the two sets by name to provide the 3 outputs.

However - if you want to allow the user the option to ignore column order, the best way to do this is with checkboxes attached to detours.

For each additional option you add, you will add another interface tool; and another detour – in this case we have 4 transformations for the four options.

From there – split this back into 2 streams, and use a dynamic rename to rename the fields “Left” and “Right”. This becomes the first output.

Report output

Wouldn't it be nice if the tool had a specific output that contained a nicely formatted report showing the differences and matches?

This is very quick to do - just take your matches or differences, pop them into a table; use a layout tool to structure and then send to a macro output.

This is the simple version – basic table, with a heading, and a layout tool. However you can add multiple sections to a report.

Here you can see there are 5 different sections in the report output – each of the sections has a table; a header; a layout. These sections are joined together using a multi-join tool.

Compare row data

The simple version of row compare data uses a transpose to change columns into rows; and then the data is compared based on row ID and column name.

... but as we've seen above - sometimes the columns are ordered differently or have spaces or _ characters etc.

So the challenge here is how to compare the row data when the columns are in different orders, cases, or variants of a name.

The way we tackled this was to use a normalized name for each column which is generated in the column checker section above, and then use this to run the comparison. Because the checker above has already figured out which columns match - you can use this to make the simple solution above useful in this complex case.

The final result looks fairly complex, but hopefully the summary above makes this a little easier to understand.

In Use

So - what does this look like in-use?

If you have any 2 data-sets, in the same sorted order - just drop this macro onto your canvas, and away you go:

The third output gives a nicely formatted report that you can then use to find the differences.

Tools used

The tools that are used are:

Developer: FieldInfo; Detour; Dynamic Rename; Message
Interface: Checkbox; Action
Join: Join; Union
Preparation: Formula; RecordID; Select; Sort
Reporting: Table; Report Text; Layout
Transform: Transpose

Future work

There's lots we can still do with this as a community, areas that I'm thinking about are how to deal with big tables gracefully; a UI to allow users to select which columns they wish to compare; etc. The great thing about macro development is that you have the power to build this yourself, to add & change based on your need and to make it your own.

Please feel free to use this, and improve it - as long as you contribute any improvements back to the community 🙂

vepp · ‎07-22-2019

Amazing Macro. Saved my time. Thanks for sharing here.

stapuff · ‎03-24-2020

This is amazing.... so much could be added to normalize the differences.

This will be going into my tool box.

Thanks,

Puff

terry10 · ‎05-23-2020

Very comprehensive and incredibly useful. Thanks @SeanAdams for posting this!

Engine Works

Compare 2 Data Sets

Comparing columns

Report output

Compare row data

In Use

Tools used

Future work

Comparing data sets

Comparing Data

Comparing 2 different files (same field)

Compare Two Data-sets

Compare numbers in 2 columns