One of our most common challenges is comparing 2 data sets - this comes in many forms, either in understanding what fields are different across data sets (especially when moving data from one location to another); checking how field types are different (diagnosing issues with truncation or text field issues) or even in reconciling two data sets.
There's no single tool to do this directly in Alteryx, but with some ingenuity you can do this yourself - this blog tells you how!
Comparing columns
The first challenge is comparing the column names - the most simple comparison is below.
Field info gives the names & types of each field on the two macro inputs. The join matches the two sets by name to provide the 3 outputs.
However - if you want to allow the user the option to ignore column order, the best way to do this is with checkboxes attached to detours.
For each additional option you add, you will add another interface tool; and another detour – in this case we have 4 transformations for the four options.
From there – split this back into 2 streams, and use a dynamic rename to rename the fields “Left” and “Right”. This becomes the first output.
Report output
Wouldn't it be nice if the tool had a specific output that contained a nicely formatted report showing the differences and matches?
This is very quick to do - just take your matches or differences, pop them into a table; use a layout tool to structure and then send to a macro output.
This is the simple version – basic table, with a heading, and a layout tool. However you can add multiple sections to a report.
Here you can see there are 5 different sections in the report output – each of the sections has a table; a header; a layout. These sections are joined together using a multi-join tool.
Compare row data
The simple version of row compare data uses a transpose to change columns into rows; and then the data is compared based on row ID and column name.
... but as we've seen above - sometimes the columns are ordered differently or have spaces or _ characters etc.
So the challenge here is how to compare the row data when the columns are in different orders, cases, or variants of a name.
The way we tackled this was to use a normalized name for each column which is generated in the column checker section above, and then use this to run the comparison. Because the checker above has already figured out which columns match - you can use this to make the simple solution above useful in this complex case.
The final result looks fairly complex, but hopefully the summary above makes this a little easier to understand.
In Use
So - what does this look like in-use?
If you have any 2 data-sets, in the same sorted order - just drop this macro onto your canvas, and away you go:
The third output gives a nicely formatted report that you can then use to find the differences.
Tools used
The tools that are used are:
Future work
There's lots we can still do with this as a community, areas that I'm thinking about are how to deal with big tables gracefully; a UI to allow users to select which columns they wish to compare; etc. The great thing about macro development is that you have the power to build this yourself, to add & change based on your need and to make it your own.
Please feel free to use this, and improve it - as long as you contribute any improvements back to the community 🙂
Compare2SetsColumnsBasic.yxmc
Compare2SetsRowBasic.yxmc