Start your learning journey with Alteryx Machine Learning Interactive Lessons
Go to LessonsHello guys I have a large dataset set that contains codes for (costs) expenses and a description but the description for the expenses is a free text entry that contains more than what's needed. I want to relate the description to each other by the similarity between sentences since the codes are not accurate to rely on.
| 4455 | travel expenses | 
| 4466 | flight expenses | 
| 7788 | medical expenses | 
| 7788 | injury costs | 
| 9900 | medicine bought from a pharmacy | 
| 4455 | train tickets to London | 
| 8822 | bonus for an employee getting .... | 
| 8822 | employee target reached expenses | 
The desired outcome is every related cost has the same code. The actual data is huge and have full sentences as a description but always has similar keywords.
| 1100 | travel expenses | 
| 1100 | flight expenses | 
| 1100 | train tickets to London | 
| 2200 | medical expenses | 
| 2200 | injury costs | 
| 2200 | medicine bought from a pharmacy | 
| 9999 | bonus for an employee getting .... | 
| 9999 | employee target reached expenses | 
it sounds like you just need some pattern matching.. also some data validation to control user input..
A very awesome blog post. We are really grateful for your blog post. You will find a lot of approaches after visiting your post.
Thanks for great share.
exploring more content on this forum... techzpod download mobdro
 
					
				
				
			
		
