This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Today I had a case where I needed to compare the values of two datasets. We optimized some alteryx workflows but needed to check that the values did not change.
So I ended up with this kind of logic for a macro:
1. Generating 500 "blank" columns, starting with column name 1,2,3....N. This should cover any amount of columns for our needs.
2. I make use of the MD5 Unicode function, mimicing the checksum function of a databse. MD5_UNICODE(+...[n]). This concatenates all columns into one, then applies MD5 encryption to it to get one unique value per row. I also create column X with value "x".
3. I flip my MD5 value from rows to columns with X as header and "MD5" as value. This places all the MD5 values into a single cell.
4. I then apply the MD5_UNICODE on the cell again for a shorter yet equally unique value.
5. We compare the MD5 value of dataset 1 with dataset 2. If we get the same value then the content of the datasets are identical.
Currently i am aware of the following restrictions:
1. The column header names are not compared.
2. The order of the columns must be the same for both datasets. A column sorting can fix this.
3. Both datasets are converted into V_W string before MDA logic is applied. This just made the MD5 formula more simple.
4. Lots of exessive columns are crerated.
Any feedback on optimizing this, or any issues I should pay attention to with this method is greatly appreciated.
Hope this macro can benifit the community as well.