We're actively looking for ideas on how to improve Weekly Challenges and would love to hear what you think!Submit Feedback
I'm a bit more fussy about what I consider words, but there are some obvious exclusions that probably shouldn't be (221B being a prime example). There are a lot of rules that could be added to make the extraction less flawed (for example splitting the data using a null, new line and space delimiter and selection of unusual semantics (e.g. multiple spaces or tabs) to detect where a word is unusual and so not acceptable in body text.
Anyway, here is a lazy example based on some additional rules as to what might be considered a word.