Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!
New content is available in Academy! You may need to clear your browser cache for an optimal viewing experience

Challenge #182: Word Sleuthing

Asteroid
 

Mine didn't have the same result either.

 

 

Capture.JPG

 
Highlighted
Meteor
Spoiler
77.PNG
Highlighted
Magnetar
Magnetar
Spoiler
Capture.PNG
Highlighted
Meteor

I'm a bit more fussy about what I consider words, but there are some obvious exclusions that probably shouldn't be (221B being a prime example). There are a lot of rules that could be added to make the extraction less flawed (for example splitting the data using a null, new line and space delimiter and selection of unusual semantics (e.g. multiple spaces or tabs) to detect where a word is unusual and so not acceptable in body text.

 

Anyway, here is a lazy example based on some additional rules as to what might be considered a word.

 

Spoiler
Capture.PNG
Highlighted

Like many others in this thread, I can't quite get the total to line, up, but i seem to be going in the right direction with my logic.

 

Anyone have a good work around for splitting the concatenated words?

Highlighted
Castor
Castor

Fun.

 

Big.txt is a compendium of multiple works, though.  Try Remembrance of Things Past for a single monumental work,  700K words in 7 volumes.  Proust had a lot of free time on his hands!

 

Spoiler
w.png

 Dan

Highlighted
Alteryx
Alteryx

Solution attached

Highlighted
Asteroid

I got the exact results after I have fixed the truncate issues when import .txt file

Highlighted
Fireball
Spoiler
A different answer to the solution, but in part due to cleaning the data up to make all words lowercase.

Challenge 182.PNG
Highlighted
Alteryx Certified Partner

Love the dynamic input tool!!

 

Spoiler
Capture.PNG