Free Trial

Cloud Quests

Elevate your workflow skills by solving real-world challenges using the Alteryx Analytics Cloud Platform.

Cloud Quest #27: Word Sleuth

AYXAcademy
Alteryx
Alteryx

Hi Community,

 

We posted the solution JSON file to Cloud Quest #26. Check it out and let us know what you think! Send suggestions to academy@alteryx.com or leave a comment below!

 

Let’s dive into this week's quest!

 

  1. Download and extract the provided ZIP file containing your starting data and workflow files.
  2. Upload the provided Cloud Quest 27 Start.json file into your Analytics Cloud library.
  3. Reconnect the provided Cloud Quest 27 Input.csv and Cloud Quest 27 Output.csv datasets to your starting workflow file.

For more detailed instructions on how to import and export Designer Cloud workflow files, check out the pinned article Cloud Quest Submission Process Update.

 

 

Scenario:

 

This week's Cloud Quest was inspired by a submission from Abubakar Mahmood (@BuQu). Thank you for the contribution! 

 

The starting dataset for this quest is a TXT file that contains the transcript of The Project Gutenberg eBook of The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle. TXT files can be easily converted into CSV format for use in Designer Cloud.

 

Using the provided input, count the number of times each word appears in the text. Then, sort the words in descending order and calculate the percentage of each word's usage relative to the total word count.

 

Hint: You might notice a different number of output rows in your solution, as it largely depends on how specifically you define a valid word. In our solution, we excluded numbers, empty or null cells, and single-character cells (except for "a" and "I"). If you're unsure where to begin, start by tokenizing words into individual rows.

 

Spoiler
A combination of the Append Fields, Sort, Formula, RegEx, Summarize, and Data Cleansing tools should solve your problem, but not necessarily in this sequence.

 

If you find yourself struggling with any of the tasks, feel free to explore these interactive lessons in Alteryx Academy for guidance:

 

Once you have completed your quest, go back to your Analytics Cloud library.

  • Download your workflow solution file.
  • Include your JSON file and a screenshot of your workflow as attachments to your comment.

 

Here’s to a successful quest!

 

Academy Wordmark.png

 

 

 

 

Download Start File | Download Solution File

10 REPLIES 10
Carolyn
12 - Quasar
12 - Quasar

Solved!

 

I had trouble getting my solution to match the provided solution. Some is because the solution counts empty cells as values, whereas I excluded them. But even adjusting for that, I'm still off a bit. I know it's going to come down to my cleansing/parsing being different ... or just a mistake :D

 

Spoiler
2025-02-05 11_06_53-Cloud Quest 27 - Carolyn _ Workflows.png

I was getting 1,094,890 rows, with nulls and empties removed

2025-02-05 11_09_49-.png

With only nulls removed, I get 1,164,765

 

alexnajm
18 - Pollux
18 - Pollux

Finally complete through sheer trial and error!

 

Spoiler
If you need inspiration, try Challenge #182
Quest 27.png

 

AYXAcademy
Alteryx
Alteryx

@Carolyn  @alexnajm – thank you both for highlighting the inconsistent output rows. You may find that your solution has a different number of output rows, as this depends on how strictly you define a valid word upstream. In the original solution, we applied minimal filtering of words before counting them, as the primary focus of the exercise was on parsing and calculating rather than dictionary-level accuracy for the words themselves.

 

To address this, we’ve updated the start file with a new Cloud Quest 27 Output.csv. This time, we’ve excluded numbers, empty or null cells, and single-character cells (with the exception of "a" and "I"). While this filtering is not exhaustive, it should align more closely with your workflow results.

 

Thanks again!

alexnajm
18 - Pollux
18 - Pollux

@AYXAcademy I corrected mine and got it working 😊

Qiu
21 - Polaris
21 - Polaris

I am still a bit off with the provided answer but I decide to leave with it. 😁

Spoiler
Cloud_Quest_27.png
RWvanLeeuwen
11 - Bolide

Nice and quick challenge!

Spoiler
tokenise, format and count - nice and quicktokenise, format and count - nice and quick

 

I wrote a blog post (Dutch)  on Zipf's law back in 2017 which helps filter out these filler words

Spoiler
nice stuff about the predictability of word frequenciesnice stuff about the predictability of word frequencies
patrick_digan
17 - Castor
17 - Castor

image.png

ggruccio
ACE Emeritus
ACE Emeritus

I got close!  At least it matches the top words.

 

Spoiler
Screenshot 2025-02-12 112748.png
JeffF
Alteryx
Alteryx

Matched exactly.

Spoiler
CloudQuest27_JeffF.png