A problem I'm currently trying to solve and feel like I'm spending way too much time on it..
I have a data set which has some data in it from multiple languages, and I only want English values. I was able to get rid of the words with non English letters with a little regular expression and filtering. However, there's some words that do contain all English letters but aren't English. What I'm trying to do is bring in an English dictionary to compare words and see which rows have non English words according to the dictionary. However, this is proving to be a bit harder than I thought. I think I can do it, but it feels like this should be much simpler than it is.
It would be great to have a tool that would run a "spell check" on fields (almost all dictionaries for all languages are available free online). This could also be useful also just for cleaning up open text types of data where people type stuff in quickly and don't re-read it! 🙂