Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEAThis is a cool dataset! My answers varied from the sample, but I was able to reverse engineer it to get things to match.
Part 1: The top stream of data matches the sample output. The bottom stream is how I would have approached it without the sample output guiding me. Both are equally correct but have different results because of sorting behavior. This highlights some interesting topics:
1) Using a summarize tool alters the order of your dataset, which makes subsequent sorts behave differently. The summarize here could be useful if there are multiple entries per movie id. That isn't the case this time, but summarizing would handle for dupes should they ever come through.
2) When doing a Top X analysis in the business world, it can be really valuable to have secondary sort criteria to "break ties" between equal values
Part 2 was pretty straightforward
Part 3 was a can of worms! It looks like the sample output was looking for distinct "original languages" rather than distinct "languages spoken," and also only for movies that had more than 0 profit. The requirements weren't very clear about this, so it was really good practice in evaluating the requirements versus what you're seeing in the data. In the business world, I'd go back for more info and clarifying questions. There's also a sneaky language in here coded as "xx", which upon further investigation means "No Language." I am not sure whether or not it's appropriate to include that in our number - another good question for my stakeholder!
Overall, this was some good practice in data investigation when the results weren't what I expected!
Great! Thank you so much @Qiu for researching this. I am surprised that AMP engine did not pull all the records, I thought it was supposed to be much faster by running the multi-thread process and more efficient that way, but losing records is not good, agree. Good to know!
My workflow looks a bit different than most others.
Q1 should in my opinion include where there are ties too. If include ties you end up with 18 movies, because else what decide which tie to include and which one not?
Q2 Since none of the top 10 have same Profit, then we don't need to take that into account as Q1
Q3 Since it frame it as Spoken languages, then it should be 86 and not 87. I think we should include all ISO languages, but it is also important to look at your data. In that data there is a "No Language", and a Silent movies I would not count towards languages spoken in a movie. Maybe this can be up to debate.
group effort for our entry lol ..
we used this weekly challenge for our recent UG virtual session:
https://community.alteryx.com/t5/Philippines/2025-Q3-Virtual-Session-Weekly-Challenge-491/m-p/141547...
Like many, there were lots of differences for task 3 depending on the approach
Decided to use the ISO code and ended up with 86. Checked a few codes and every checked one was valid except 'xx'. So maybe it is 85 in total?