I have 2 columns with 2 strings as below:
Col1 Col2 Match %
I am fine I am fine 100
Great Wall of China The great wall of China 80
In the first case, all the words match and therefore the %age is 100 however, in the 2nd case 4 out of 5 words match and the %age is 80. Col 1 can be a substring of Col2 and vice versa and there can be cases where strings are not sub string of each other/do not contain each other. The match is case insensitive.
Thanks in Advance!
@Nikita912 one way of doing this
The logic is to transpose the Col1 and Col2 then split it into words and find the length of each words. Group based on record ID and find the sum of the word length and calculate the percentage.
Note: The percentage I used the Round([field],10), so that it will round the result to nearest 10
@Nikita912 Just noticed that the requirement is exact word match not length match. My solution only looking for the word length match, I guess which is not what you are looking for?
I've a huge dataset and applying this logic might be a little challenging, is there any other way of doing it?