I have two tables, both consisting of 50m records and 26 fields. I want to a quick way to check they are identical, and if not, then which field has differed.
The best option seems to be to join the tables on all 26 fields which takes about an hour to complete. The problem is then that if any records come out the left and right output, I am not sure which field or fields the record differed on.
I would like to check the actual values in the fields are identical.
Anyone have any suggestions on the quickest way to do this or a pre built macro?
Solved! Go to Solution.
Are you just looking to check that the data schema and types are identical across all fields? The Field Info Tool on the Developer tab can do that pretty efficiently - just throw one of those on each of your inputs then join on the results of the Field Info tool.
Apologies for not being clearer - I would like to check the actual values in the all fields are identical.
When you say identical, are you expecting every record to find a match, or for records to be in the same position as well?
If you join on Record Position that would eliminate a lot of the Join performance issues that you may be encountering.
From there, there are a number of ways to go about creating the comparison effectively.
I've attached a zipped macro (which actually uses two other macros inside of it) which I have found to be effective for this.
It takes advantage of the in-built Alteryx functionality of a Join tool, where it renames fields on the right side as RIGHT_FIELD.  For each field that has a corresponding RIGHT_FIELD, a new field named MATCH_FIELD is created, which checks if the related fields are equivalent, and outputs a "Yes"/"No" value (which could probably be improved by making the whole thing use an actual binary field).  From there, another process counts all of the "No" values in MATCH_ fields on each line, and creates a CountDiscrepancies field.
If you use a join either on record position or on only the fields that make a record unique (primary key), then you can output the J output into this macro.  From there, I typically add a Filter tool, and filter for records where CountDiscrepancies > 0.  This should help identify exactly which records are not matching.
Thanks a lot for this!
I mean in short, it answers my question that no there doesn't appear to be an easy/quick answer to this. A tool to simply check two or more tables are identical where you could set certain parameters - same values, same field order etc. would be great!
Hi,
Follow below steps for each table:
1. First convert all columns to String data type
2. Create new column concat_col and in the formula "concatenate all the columns"
3. Create one more new column key and in the formula use MD5_unicode function in String group available in formulas. ex. MD5_unicode(concat_col)
Join the the two streams of two tables on the key column created in above step.
Thanks.
I found the fastest way to do it is just to transpose all the data in both tables first, selecting the unique field in the "Key Fields" option. And then you can Join these two outputs on the key field, the Name field and the Value field.
This was much quicker and avoided having to select all fields to join on. Of course, this will also identify column names that are no the same (case sensitive), not just the value of fields.
Magic!
Thanks a lot!
 
					
				
				
			
		
