Challenge #182: Word Sleuthing
Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
ultrarunner
8 - Asteroid
09-29-2019
02:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
agrawaluk
8 - Asteroid
09-29-2019
05:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
16 - Nebula
09-30-2019
05:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
ch12345
7 - Meteor
09-30-2019
05:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm a bit more fussy about what I consider words, but there are some obvious exclusions that probably shouldn't be (221B being a prime example). There are a lot of rules that could be added to make the extraction less flawed (for example splitting the data using a null, new line and space delimiter and selection of unusual semantics (e.g. multiple spaces or tabs) to detect where a word is unusual and so not acceptable in body text.
Anyway, here is a lazy example based on some additional rules as to what might be considered a word.
GBGguilbert
5 - Atom
09-30-2019
05:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
19 - Altair
09-30-2019
05:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Fun.
Big.txt is a compendium of multiple works, though. Try Remembrance of Things Past for a single monumental work, 700K words in 7 volumes. Proust had a lot of free time on his hands!
Dan
wdavis
Alteryx Alumni (Retired)
09-30-2019
07:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
nini
8 - Asteroid
09-30-2019
08:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
14 - Magnetar
09-30-2019
02:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Spoiler
A different answer to the solution, but in part due to cleaning the data up to make all words lowercase.

Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
LordNeilLord
15 - Aurora
10-01-2019
12:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator