Great Sunday morning Coffee challenge.
Few interesting recipes here @Hollingsworth - finding word boundaries; checking for differences vs. a second data set etc.
I've added in a simple and more complex way to compare your answer to the solution (Many people will use the CREW macros @MarqueeCrew ), here's a way to build your own: https://community.alteryx.com/t5/Engine-Works-Blog/Compare-2-Data-Sets/ba-p/88853
For the bonus, I picked The Count of Monte Cristo by Alexandre Dumas, what a great story, and sourced from Project Gutenberg.
It's Top 10 words, filtered for >= 8 characters (to eliminate the The's, A's, I's, etc.) are the characters in it (mostly).