Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEAMy solution:
LEGO Dataset Explanations
To identify a LEGO model based on an inventory ID, we are using four related datasets. Each one plays a unique role in helping us connect the inventory to the final LEGO set information, such as name, year, number of parts, and the theme it belongs to.
Dataset 1 – Inventory Sets
id | Unique inventory ID for a specific LEGO box or set of bricks |
version | Version of the inventory (not used in our joins) |
set_num | LEGO set number that this inventory ID belongs to |
Purpose:
This dataset tells us which LEGO set each inventory ID refers to.
Example: Inventory ID 11148 belongs to set_num = 75911-1.
Dataset 2 – Sets
set_num | LEGO set number (primary key) |
name | Name of the LEGO set |
year | Year the set was released |
theme_id | Reference to the theme or collection the set belongs to |
Purpose:
This dataset provides the actual details of each LEGO set, including its name, release year, and the theme it belongs to (through theme_id).
Dataset 3 – Set Parts Info
model_num | Same as set_num — identifies the LEGO set |
num_parts | Total number of parts in the set |
Purpose:
This dataset tells us how many bricks or parts are in each LEGO set.
We join it using set_num = model_num.
Dataset 4 – Themes
id | Unique identifier for the LEGO theme |
name | Name of the theme (e.g., "Star Wars", "Technic", "City") |
parent_id | Used for nested themes (not used here) |
Purpose:
This dataset gives us the theme name for each theme_id.
We use it to convert theme IDs into readable theme labels.
Summary of Data Flow
Start with Dataset 1 (inventory ID)
Join with Dataset 2 to get set details (name, year, theme ID)
Join with Dataset 3 to get number of parts
Join with Dataset 4 to get the theme name
Loved it!!