Challenge #182: Word Sleuthing
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
A solution to last week's challenge can be found here!
This week's challenge comes to us from @BuQu - thanks so much for the contribution!
Using the provided input, count the number of times each word appears in the text. Then, sort the words in descending order and show the percentage of that word's use against the total word count.
Hint: Make sure there are no truncated fields in your input.
Bonus: Use the download tool to get text directly from the website instead of using the text file.
- Labels:
- Basic
- Core
- Data Analysis
- Join
- Parse
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I seemed to be getting a different set of results from the challenge answer file. Maybe someone can help me figure out where I went wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I also got ever so slightly different results. This is whether or not I include concatenated words and whether I use the same case.
But here's my attempt,
output
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm with everyone else....could not get the totals to match. I kept going back and forth, removing punctuation, adding it back in....changing to all lowercase etc. Good exercise, but it may depend on how words are being defined as the same. For instance is "the" = "The"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Great sleuthing, all! In creating the start file, I didn't follow THE HINT! An updated file is provided.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
After solving this I see that the files have been updated. But here's my response as-is.
I had one minor variation between my solution and the provided answer, in that the answer had an additional line of data. The answer contained a row where the Word was a Null value.
There was much variation between my solution using the provided data set and downloading the data directly from the provided URL. A cursory examination revealed that in some lines, words had been hyphenated in the provided data but not in the downloaded data.
Anywho, lunch break is over.