Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #491: Behind the Blockbusters

AYXAcademy
Alteryx
Alteryx

Full Width - WC banner.svgHi Community members,

 

A solution to last week’s challenge can be found here.

 

This challenge was submitted by James Bevan, @JBevan89 . Thank you, James for your submission!

 

For this week’s challenge, we will be working with the TMDB Movie Metadata dataset from Kaggle, a rich collection of information about modern films, including revenues, budgets, and production details.

 

Your role as a data analyst is to dig into the data and uncover insights by answering the following questions:

  1. Which 10 movies had the highest budget?
  2. Which 10 movies generated the highest profit? (Profit = Revenue – Budget)
  3. How many different languages are spoken across all movies in the dataset?

 

Once you have completed your challenge, include your solution file and a screenshot of your workflow as attachments to your comment.

 

Good Luck!

The Academy Team

 

Source: https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata/data

 

Download Start File

Download Solution File

Hub119
11 - Bolide
11 - Bolide

For Task 1, there are a few ties so I did my best to apply some sorts to make this mostly match the desired output.  For Task 3, my answer ended up being one less than the provided solution... wondering if perhaps it counted the movies that didn't have a listed spoken language as a separate language??? (I didn't count that in mine).

Spoiler
C491 Pic.png
Qiu
21 - Polaris
21 - Polaris

@Hub119 
You beat on by speed. 😁
I came to the same observation as you.

Spoiler

For Q1, there are many ties, around 10th rank.
For Q3, I would argue that we should use the ISO 639-1 Language Code, rather that the Language name, since the column contains unicode.

then I came up with 87 distinct languages.


Challenge-491.png

 

alineruizcampos
8 - Asteroid

I got a difference solution for Task 3!

 

Spoiler
I agree with you @Qiu , languages like Portuguese and Spanish were missing if we use @Hub119 's method
Screenshot 2025-09-18 140706.png
Qiu
21 - Polaris
21 - Polaris

@alineruizcampos 
Thank you for agreeing with me.
I checked our result and feel maybe the JSON Parse may have failed on row#2087, where the name is empty.

That is the only difference between our result for Q3.

Spoiler
Challenge-491-A.pngChallenge-491-B.png
Hub119
11 - Bolide
11 - Bolide

@Qiu okay, glad I'm not crazy... I was also getting that count of 80+ records when looking at unique 2 letter language codes.  I switched to pulling the listed name in order to try and match the provided solution.

Qiu
21 - Polaris
21 - Polaris

@Hub119 
Given what you have done in Advent of Code , its very difficult for me to be convinced that you are not crazy. 😁

Hub119
11 - Bolide
11 - Bolide

@Qiu totally fair statement 🤣

DaisukeTsuchiya
14 - Magnetar
14 - Magnetar

My answer did not match, just like everyone else's.

Spoiler
For Q1, I couldn't figure out how to sort within the same ranking, so my answer didn't match.
For Q3, I tried calculating with two different methods, but the result was either 62 or 86, which deviates from the correct answer.

スクリーンショット 2025-09-18 165100.jpg

 

Pilsner
13 - Pulsar

Fun challenge. Like others, my answer to part 3 didn't quite match, but I got the other parts correct. 

Spoiler
491.png