Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #182: Word Sleuthing

AYXAcademy
Alteryx
Alteryx

A solution to last week's challenge can be found here

 

This week's challenge comes to us from @BuQu - thanks so much for the contribution! 

 

Using the provided input, count the number of times each word appears in the text. Then, sort the words in descending order and show the percentage of that word's use against the total word count.

 

Hint: Make sure there are no truncated fields in your input.

 

Bonus: Use the download tool to get text directly from the website instead of using the text file.

 

sherlock-holmes-copywriting.jpg

 

 

 

kshashank03
7 - Meteor

I seemed to be getting a different set of results from the challenge answer file. Maybe someone can help me figure out where I went wrong. 

 

Spoiler
I first got rid of empty and null fields with a filter then used the data cleansing tool to get rid of anything that wasn't a letter and to lowercase the entire dataset. From there I used a text-to-columns tool to split the fields into rows using the space character as a delimiter. Then I just grouped by the new split word field and then counted and summed. 

 

Spoiler
Shashank Alteryx Challenge 182.png
PhilipMannering
16 - Nebula
16 - Nebula

I also got ever so slightly different results. This is whether or not I include concatenated words and whether I use the same case.

 

But here's my attempt,

 

Spoiler
 
 
wfwf


outputoutput

 

 

kat
12 - Quasar

Also can't get the exact results...

 

Spoiler
Challenge #182.PNG
Adam_Dooley
8 - Asteroid

Close...but my solution is high by 2,000 "the"s

 

Spoiler
Capture.PNG
ggruccio
ACE Emeritus
ACE Emeritus

I'm with everyone else....could not get the totals to match.  I kept going back and forth, removing punctuation, adding it back in....changing to all lowercase etc.  Good exercise, but it may depend on how words are being defined as the same.  For instance is "the" = "The"?

patrick_digan
17 - Castor
17 - Castor
Spoiler
Here's my go. I replaced all non letters and numbers. I also ignored case.
Annotation 2019-09-23 124453.jpg
AYXAcademy
Alteryx
Alteryx

Great sleuthing, all!  In creating the start file, I didn't follow THE HINT! An updated file is provided. 

benakesh
12 - Quasar

The counts  match  with the  revised  solution output  !. 

David-Carnes
12 - Quasar

After solving this I see that the files have been updated.  But here's my response as-is.

 

I had one minor variation between my solution and the provided answer, in that the answer had an additional line of data.  The answer contained a row where the Word was a Null value.

 

There was much variation between my solution using the provided data set and downloading the data directly from the provided URL.   A cursory examination revealed that in some lines, words had been hyphenated in the provided data but not in the downloaded data.

 

Anywho, lunch break is over.

 

Spoiler
182.png