Hello guys I have a large dataset set that contains codes for (costs) expenses and a description but the description for the expenses is a free text entry that contains more than what's needed. I want to relate the description to each other by the similarity between sentences since the codes are not accurate to rely on.
medicine bought from a pharmacy
train tickets to London
bonus for an employee getting ....
employee target reached expenses
The desired outcome is every related cost has the same code. The actual data is huge and have full sentences as a description but always has similar keywords.