I'm working with a dataset that includes comments in multiple languages. In order to be able to push translation work in the right direction, we need to be able to identify the language the comment is using.
Is there a way to do this using, for example, existing character sets for particular languages, e.g. Japanese or Korean character sets, or particular words for other languages, e.g. Danish, Spanish?
Or is there maybe an API that can be called to check on the language?
Thanks.
https://detectlanguage.com has an API that looks like it does what you want.
I built a quick macro that uses it. You will need to sign up and get an API key to use, but they have a free tier you could start with.
Thanks Adam, I'll give that a try.
hey @AlexCUK
Theres also a python library:
https://pypi.org/project/langdetect/
I've used it before, if you're comfortable with the python tool this may be a suitable solution.
Cheers,
TheOC
Thanks, I'll take a look at the python solution too.