This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have two tables, both consisting of 50m records and 26 fields. I want to a quick way to check they are identical, and if not, then which field has differed.
The best option seems to be to join the tables on all 26 fields which takes about an hour to complete. The problem is then that if any records come out the left and right output, I am not sure which field or fields the record differed on.
I would like to check the actual values in the fields are identical.
Anyone have any suggestions on the quickest way to do this or a pre built macro?
Are you just looking to check that the data schema and types are identical across all fields? The Field Info Tool on the Developer tab can do that pretty efficiently - just throw one of those on each of your inputs then join on the results of the Field Info tool.
When you say identical, are you expecting every record to find a match, or for records to be in the same position as well?
If you join on Record Position that would eliminate a lot of the Join performance issues that you may be encountering.
From there, there are a number of ways to go about creating the comparison effectively.
I've attached a zipped macro (which actually uses two other macros inside of it) which I have found to be effective for this.
It takes advantage of the in-built Alteryx functionality of a Join tool, where it renames fields on the right side as RIGHT_FIELD. For each field that has a corresponding RIGHT_FIELD, a new field named MATCH_FIELD is created, which checks if the related fields are equivalent, and outputs a "Yes"/"No" value (which could probably be improved by making the whole thing use an actual binary field). From there, another process counts all of the "No" values in MATCH_ fields on each line, and creates a CountDiscrepancies field.
If you use a join either on record position or on only the fields that make a record unique (primary key), then you can output the J output into this macro. From there, I typically add a Filter tool, and filter for records where CountDiscrepancies > 0. This should help identify exactly which records are not matching.
I mean in short, it answers my question that no there doesn't appear to be an easy/quick answer to this. A tool to simply check two or more tables are identical where you could set certain parameters - same values, same field order etc. would be great!
I found the fastest way to do it is just to transpose all the data in both tables first, selecting the unique field in the "Key Fields" option. And then you can Join these two outputs on the key field, the Name field and the Value field.
This was much quicker and avoided having to select all fields to join on. Of course, this will also identify column names that are no the same (case sensitive), not just the value of fields.