Weekly Challenges

AYXAcademy · ‎09-17-2025

Hi Community members,

A solution to last week’s challenge can be found here.

This challenge was submitted by James Bevan, @JBevan89 . Thank you, James for your submission!

For this week’s challenge, we will be working with the TMDB Movie Metadata dataset from Kaggle, a rich collection of information about modern films, including revenues, budgets, and production details.

Your role as a data analyst is to dig into the data and uncover insights by answering the following questions:

Which 10 movies had the highest budget?
Which 10 movies generated the highest profit? (Profit = Revenue – Budget)
How many different languages are spoken across all movies in the dataset?

Once you have completed your challenge, include your solution file and a screenshot of your workflow as attachments to your comment.

Good Luck!

The Academy Team

Source: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata/data

Download Start File

Download Solution File

Hub119 · ‎09-17-2025

For Task 1, there are a few ties so I did my best to apply some sorts to make this mostly match the desired output. For Task 3, my answer ended up being one less than the provided solution... wondering if perhaps it counted the movies that didn't have a listed spoken language as a separate language??? (I didn't count that in mine).

Spoiler

Qiu · ‎09-17-2025

@Hub119
You beat on by speed. 😁
I came to the same observation as you.

Spoiler

For Q1, there are many ties, around 10th rank.
For Q3, I would argue that we should use the ISO 639-1 Language Code, rather that the Language name, since the column contains unicode.

then I came up with 87 distinct languages.

For Q1, there are many ties, around 10th rank.For Q3, I would argue that we should use the ISO 639-1 Language Code, rather that the Language name, since the column contains unicode. then I came up with 87 distinct languages.

alineruizcampos · ‎09-17-2025

I got a difference solution for Task 3!

Spoiler

I agree with you @Qiu , languages like Portuguese and Spanish were missing if we use @Hub119 's method
Screenshot 2025-09-18 140706.png

I agree with you , languages like Portuguese and Spanish were missing if we use 's method

Qiu · ‎09-17-2025

@alineruizcampos
Thank you for agreeing with me.
I checked our result and feel maybe the JSON Parse may have failed on row#2087, where the name is empty.

That is the only difference between our result for Q3.

Spoiler

Hub119 · ‎09-17-2025

@Qiu okay, glad I'm not crazy... I was also getting that count of 80+ records when looking at unique 2 letter language codes. I switched to pulling the listed name in order to try and match the provided solution.

Qiu · ‎09-17-2025

@Hub119
Given what you have done in Advent of Code , its very difficult for me to be convinced that you are not crazy. 😁

Hub119 · ‎09-17-2025

@Qiu totally fair statement 🤣

DaisukeTsuchiya · ‎09-18-2025

My answer did not match, just like everyone else's.

Spoiler

For Q1, I couldn't figure out how to sort within the same ranking, so my answer didn't match.
For Q3, I tried calculating with two different methods, but the result was either 62 or 86, which deviates from the correct answer.

スクリーンショット 2025-09-18 165100.jpg

For Q1, I couldn't figure out how to sort within the same ranking, so my answer didn't match.For Q3, I tried calculating with two different methods, but the result was either 62 or 86, which deviates from the correct answer.

Pilsner · ‎09-18-2025

Fun challenge. Like others, my answer to part 3 didn't quite match, but I got the other parts correct.

Spoiler

Weekly Challenges

IDEAS WANTED

Challenge #491: Behind the Blockbusters

Good Luck!

Source: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata/data

Download Start File