How to check quickly two tables are identical and identify fields that differ
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have two tables, both consisting of 50m records and 26 fields. I want to a quick way to check they are identical, and if not, then which field has differed.
The best option seems to be to join the tables on all 26 fields which takes about an hour to complete. The problem is then that if any records come out the left and right output, I am not sure which field or fields the record differed on.
I would like to check the actual values in the fields are identical.
Anyone have any suggestions on the quickest way to do this or a pre built macro?
Solved! Go to Solution.
- Labels:
- Join
- Tips and Tricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Are you just looking to check that the data schema and types are identical across all fields? The Field Info Tool on the Developer tab can do that pretty efficiently - just throw one of those on each of your inputs then join on the results of the Field Info tool.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Apologies for not being clearer - I would like to check the actual values in the all fields are identical.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
When you say identical, are you expecting every record to find a match, or for records to be in the same position as well?
If you join on Record Position that would eliminate a lot of the Join performance issues that you may be encountering.
From there, there are a number of ways to go about creating the comparison effectively.
I've attached a zipped macro (which actually uses two other macros inside of it) which I have found to be effective for this.
It takes advantage of the in-built Alteryx functionality of a Join tool, where it renames fields on the right side as RIGHT_FIELD. For each field that has a corresponding RIGHT_FIELD, a new field named MATCH_FIELD is created, which checks if the related fields are equivalent, and outputs a "Yes"/"No" value (which could probably be improved by making the whole thing use an actual binary field). From there, another process counts all of the "No" values in MATCH_ fields on each line, and creates a CountDiscrepancies field.
If you use a join either on record position or on only the fields that make a record unique (primary key), then you can output the J output into this macro. From there, I typically add a Filter tool, and filter for records where CountDiscrepancies > 0. This should help identify exactly which records are not matching.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks a lot for this!
I mean in short, it answers my question that no there doesn't appear to be an easy/quick answer to this. A tool to simply check two or more tables are identical where you could set certain parameters - same values, same field order etc. would be great!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
Follow below steps for each table:
1. First convert all columns to String data type
2. Create new column concat_col and in the formula "concatenate all the columns"
3. Create one more new column key and in the formula use MD5_unicode function in String group available in formulas. ex. MD5_unicode(concat_col)
Join the the two streams of two tables on the key column created in above step.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I found the fastest way to do it is just to transpose all the data in both tables first, selecting the unique field in the "Key Fields" option. And then you can Join these two outputs on the key field, the Name field and the Value field.
This was much quicker and avoided having to select all fields to join on. Of course, this will also identify column names that are no the same (case sensitive), not just the value of fields.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Magic!
Thanks a lot!