This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm trying to compare hashes to calculate a number indicative of their level of similarity. Any thoughts about how to compare these strings? The strings are same length and identical hex in the same location indicates similarity. For example, hashes 3 and 4 both start with 'e', so +1 for their similarity. Their positions 7 and 8 are also identical, so +2.
Alas, while elegant, the solution is not practical in my case. I have ten million hashes. Multiplied by 64, that gives me more than half a billion records to join with the same number of records. Do you have any ideas for a more direct comparison of a hash pair? I'm wondering if the Fuzzy Match tool can be used somehow?
I came up with a comparison that essentially doesn't require a JOIN, but does explode columns to rows. It is fairly efficient in that the join that I do use is based upon record position. I understand the simplicity of the array loop. Please do check the timing for this module as I'm interested in knowing if it does improve upon my colleague's solution.
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.
Thank you so much, @jdunkerley79. I was able to optimize it down to 35GB and three hours 9 minutes for the whole workflow, which is almost reasonable for this one-off task, but as soon as I'm done with this task on Monday evening, I'm going to try your solution out.
Joining based on position sounds promising, though.