So I have a file with around 90 columns which i received from a vendor. A week later the vendor revised the file and send me a new one. The vendor called out what the changes were but i want to make sure those are the only changes in the file.
I know the long way to do this is by putting these files in Alteryx, doing a Join with the common identifier, selecting columns which i want to compare and applying a formula function to see if all the data in the column is the same. However i have 90 columns and this will take forever so I'm wondering if someone has a better solution where i can compare all columns quickly and get a result in which fields there are differences?
Transpose all of the columns with potential changes and then join.
If all of the columns are potential changes - sort of key field. add a record id. then transpose with the record id as key column.
There's more than 400,000 rows so I'm assuming transposing wont be the best solution here?
that's fine - you can compare 400,000 rows. 400,000 rows and one column or two column (if you are matching which columns have the same data) is much easier than 100 columns.
also note: if the name/value are both matched in join that means the join anchor will have values which are the same in both data source. Your new values/updated values will be tied into whichever anchor has the updated values.
And just as an fyi - I generated two data sources with 400,000 random integers and joined on 3 fields to test random numbers (about 9.5% matched on a 1-10 rand). Join isn't all that memory intensive unless you are matching huge string fields or spatial...