Hi there,
I have an excel file with two different columns filled with text and would like to measure by rows (i.e. A2 to B2; A3 to B3...) how much (in %) these texts match with each other based on common words (punctuation is not a matching criteria; when the same word begins with a capital letter or is fully written in capital letters or only in small letters this should result in a match).
Many thanks
Solved! Go to Solution.
Hi @AndreasN
Using the Fuzzy Matching would be the closest thing regarding giving you a score between a text comparison.
However, you can build out a logic to compare with word by word and see if the match, sum up and compare to the total and get a percentage.
Besides this, I dont know if there is an easy way to calculate this.
Pedro.
I attached one example of creating a manual calculation and scoring both string values :
In the example below I did only one sentence.
You can try using fuzzy match.
I started by pre-processing the data with a data cleansing to remove duplicate whitespace, and change everything to uppercase to make it case insensitive. I compared all values from the text field to itself, but you can do this with two separate lists as well.
I added a make group and find and replace to append the "normalized" value to the end, just in case you needed it.
Hi Pedro,
Many thanks for the tips, helped me a lot to understand how to tackle my work.
Best,
Andreas
@echoung1
Many thanks for the tips, helped me a lot to understand how to tackle my work.
Best,
Andreas
If your question is answered, please mark this thread as solved so others can find answers more easily. Thanks!
User | Count |
---|---|
19 | |
15 | |
15 | |
9 | |
8 |