After making the adjustment for the amount of characters allowed per line item, I was finally able to find the very simple solution. I am 100% not surprised that the most used word was "the".
Managed to get the exact match with a little research.
Final output seems incorrect as contains punctuation and Roman numerals, and some abbreviated words have been split at the apostrophe e.g. wasn and t as separate words.
close enough... seems like words are case sensitive, to "the" is counted separately from "The" and so forth. Also repulled the input data from the source text file to remove truncated lines.