Weekly Challenges

kelsey_kincaid · ‎09-25-2025

This is a cool dataset! My answers varied from the sample, but I was able to reverse engineer it to get things to match.

Spoiler

Part 1: The top stream of data matches the sample output. The bottom stream is how I would have approached it without the sample output guiding me. Both are equally correct but have different results because of sorting behavior. This highlights some interesting topics:
1) Using a summarize tool alters the order of your dataset, which makes subsequent sorts behave differently. The summarize here could be useful if there are multiple entries per movie id. That isn't the case this time, but summarizing would handle for dupes should they ever come through.
2) When doing a Top X analysis in the business world, it can be really valuable to have secondary sort criteria to "break ties" between equal values

Part 2 was pretty straightforward

Part 3 was a can of worms! It looks like the sample output was looking for distinct "original languages" rather than distinct "languages spoken," and also only for movies that had more than 0 profit. The requirements weren't very clear about this, so it was really good practice in evaluating the requirements versus what you're seeing in the data. In the business world, I'd go back for more info and clarifying questions. There's also a sneaky language in here coded as "xx", which upon further investigation means "No Language." I am not sure whether or not it's appropriate to include that in our number - another good question for my stakeholder!

Overall, this was some good practice in data investigation when the results weren't what I expected!

Part 1: The top stream of data matches the sample output. The bottom stream is how I would have approached it without the sample output guiding me. Both are equally correct but have different results because of sorting behavior. This highlights some interesting topics:1) Using a summarize tool alters the order of your dataset, which makes subsequent sorts behave differently. The summarize here could be useful if there are multiple entries per movie id. That isn't the case this time, but summarizing would handle for dupes should they ever come through.2) When doing a Top X analysis in the business world, it can be really valuable to have secondary sort criteria to "break ties" between equal valuesPart 2 was pretty straightforwardPart 3 was a can of worms! It looks like the sample output was looking for distinct "original languages" rather than distinct "languages spoken," and also only for movies that had more than 0 profit. The requirements weren't very clear about this, so it was really good practice in evaluating the requirements versus what you're seeing in the data. In the business world, I'd go back for more info and clarifying questions. There's also a sneaky language in here coded as "xx", which upon further investigation means "No Language." I am not sure whether or not it's appropriate to include that in our number - another good question for my stakeholder!Overall, this was some good practice in data investigation when the results weren't what I expected!

olga_strubbe · ‎09-25-2025

Great! Thank you so much @Qiu for researching this. I am surprised that AMP engine did not pull all the records, I thought it was supposed to be much faster by running the multi-thread process and more efficient that way, but losing records is not good, agree. Good to know!

Imaizumi · ‎09-26-2025

My workflow looks a bit different than most others.
Q1 should in my opinion include where there are ties too. If include ties you end up with 18 movies, because else what decide which tie to include and which one not?
Q2 Since none of the top 10 have same Profit, then we don't need to take that into account as Q1

Q3 Since it frame it as Spoken languages, then it should be 86 and not 87. I think we should include all ISO languages, but it is also important to look at your data. In that data there is a "No Language", and a Silent movies I would not count towards languages spoken in a movie. Maybe this can be up to debate.

Spoiler

RebekaMazuchova · ‎09-26-2025

:)

Powerhouse_21 · ‎09-26-2025

Spoiler

shancmiralles · ‎09-27-2025

group effort for our entry lol ..

we used this weekly challenge for our recent UG virtual session:
https://community.alteryx.com/t5/Philippines/2025-Q3-Virtual-Session-Weekly-Challenge-491/m-p/141547...

Spoiler

2025_q3_virtual_Weekly Challenge_491.png

estherb47 · ‎09-28-2025

Like many, there were lots of differences for task 3 depending on the approach

Spoiler

The JSON tool is able to translate the spoken languages into the correct characters. Text to columns is not. Additionally, some of the spoken language information is incomplete

TobiasFitschen · ‎09-29-2025

Decided to use the ISO code and ended up with 86. Checked a few codes and every checked one was valid except 'xx'. So maybe it is 85 in total?

Spoiler

Yanxuliu · ‎09-29-2025

Challenge #491: Behind the Blockbusters

SandraC3 · ‎09-29-2025

Weekly Challenges

IDEAS WANTED

Challenge #491: Behind the Blockbusters