The first thing to do is to widen the field length on the input tool so that lines are not truncated. The first try all quantities match except the null value; thus, I moved the null filter after identifying the words, and all counts and percentages match.
Attached are my results. I can tie the provided data to the results, except I excluded null results, so my percents are close, but don't tie exactly. Spoiler is only provided data workflow; all workflows included in attachment.
Bonus #1 - Downloaded results are different as the provided data includes truncated lines that are complete in the downloaded data set.
Bonus #2 - Without opening the workflow and looking at the site, can you guess the source from the top 10 words?
Top Results of Bonus #2: