We are using Alteryx to compare two very large datasets, with file sizes around 20-30GB and approximately 150-200 columns. Is it possible to shorten the execution time or reduce memory usage by changing the comparison method (for example, by converting the data using a hash function first)? Any suggestions or ideas would be greatly appreciated!
As you mentioned, you can use MD5 hash for the columns needed to be compared.
Search for 3 MD5 functions here.
https://help.alteryx.com/current/en/designer/functions/string-functions.html#example-6846024-12