We posted the solution JSON file to Cloud Quest #26. Check it out and let us know what you think! Send suggestions to academy@alteryx.com or leave a comment below!
For more detailed instructions on how to import and export Designer Cloud workflow files, check out the pinned article Cloud Quest Submission Process Update.
This week's Cloud Quest was inspired by a submission from Abubakar Mahmood (@BuQu). Thank you for the contribution!
The starting dataset for this quest is a TXT file that contains the transcript of The Project Gutenberg eBook of The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle. TXT files can be easily converted into CSV format for use in Designer Cloud.
Using the provided input, count the number of times each word appears in the text. Then, sort the words in descending order and calculate the percentage of each word's usage relative to the total word count.
Hint: You might notice a different number of output rows in your solution, as it largely depends on how specifically you define a valid word. In our solution, we excluded numbers, empty or null cells, and single-character cells (except for "a" and "I"). If you're unsure where to begin, start by tokenizing words into individual rows.
If you find yourself struggling with any of the tasks, feel free to explore these interactive lessons in Alteryx Academy for guidance:
Once you have completed your quest, go back to your Analytics Cloud library.
Solved!
I had trouble getting my solution to match the provided solution. Some is because the solution counts empty cells as values, whereas I excluded them. But even adjusting for that, I'm still off a bit. I know it's going to come down to my cleansing/parsing being different ... or just a mistake :D
@Carolyn @alexnajm – thank you both for highlighting the inconsistent output rows. You may find that your solution has a different number of output rows, as this depends on how strictly you define a valid word upstream. In the original solution, we applied minimal filtering of words before counting them, as the primary focus of the exercise was on parsing and calculating rather than dictionary-level accuracy for the words themselves.
To address this, we’ve updated the start file with a new Cloud Quest 27 Output.csv. This time, we’ve excluded numbers, empty or null cells, and single-character cells (with the exception of "a" and "I"). While this filtering is not exhaustive, it should align more closely with your workflow results.
Thanks again!
@AYXAcademy I corrected mine and got it working 😊
I am still a bit off with the provided answer but I decide to leave with it. 😁
Nice and quick challenge!
I wrote a blog post (Dutch) on Zipf's law back in 2017 which helps filter out these filler words
Matched exactly.