Q18:

Note: Before you begin working on this question, please note that CodeGrade has a 1GB RAM limit. If you see an Exit -9 code when submitting your assignment, it is probably due to this RAM limit and previous students had an issue with this question causing too much memory usage. The biggest thing to think about is to only use the tables that you need to answer for this question. For instance, you do not need any of the minifigs data since this is not needed for the output and the merging of this data could cause the number of rows to "explode" in terms of the total count. I understand that this can be annoying that we have to work around CodeGrade's limitations, but it is good practice in memory management.

You should only need to merge the following datasets for this answer so as to not go above the RAM limit: themes, sets, inventories, inventory_parts, parts, and colors

Create a function called set_parts that takes as input the set number (as a string) and version number (as an integer) and outputs a DataFrame with each part in the set.
The DataFrame should have the following columns/column names: set_num, version, name_sets, year, name_themes, part_num, name_part, name_color,quantity,is_spare.
The Data should be sorted by part_num and then name_color.
The index should go from 0 to n-1.

To solve this question, we need to merge the appropriate datasets and filter the data based on the set number and version number. Here's the step-by-step solution:

1. Import the necessary libraries and load the datasets:
```python
import pandas as pd

# Load the datasets
sets = pd.read_csv('sets.csv')
themes = pd.read_csv('themes.csv')
inventories = pd.read_csv('inventories.csv')
inventory_parts = pd.read_csv('inventory_parts.csv')
parts = pd.read_csv('parts.csv')
colors = pd.read_csv('colors.csv')
```

2. Merge the datasets:
```python
merged_data = pd.merge(sets, themes, left_on='theme_id', right_on='id')
merged_data = pd.merge(inventories, merged_data, left_on='set_num', right_on='set_num')
merged_data = pd.merge(inventory_parts, merged_data, left_on='inventory_id', right_on='id')
merged_data = pd.merge(parts, merged_data, left_on='part_num', right_on='part_num')
merged_data = pd.merge(colors, merged_data, left_on='id', right_on='color_id')
```

3. Filter the merged data based on the set number and version number:
```python
set_num = "your_set_number"
version = your_version_number

filtered_data = merged_data[(merged_data['set_num'] == set_num) & (merged_data['version'] == version)]
```

4. Create the final DataFrame with the required columns and sort the data:
```python
final_df = filtered_data[['set_num', 'version', 'name', 'year', 'name_y', 'part_num', 'name', 'name', 'quantity', 'is_spare']]
final_df.columns = ['set_num', 'version', 'name_sets', 'year', 'name_themes', 'part_num', 'name_part', 'name_color', 'quantity', 'is_spare']
final_df = final_df.sort_values(by=['part_num', 'name_color']).reset_index(drop=True)
```

5. Return the final DataFrame:
```python
return final_df
```

Here's the complete function code:

```python
import pandas as pd

def set_parts(set_num, version):
# Load the datasets
sets = pd.read_csv('sets.csv')
themes = pd.read_csv('themes.csv')
inventories = pd.read_csv('inventories.csv')
inventory_parts = pd.read_csv('inventory_parts.csv')
parts = pd.read_csv('parts.csv')
colors = pd.read_csv('colors.csv')

# Merge the datasets
merged_data = pd.merge(sets, themes, left_on='theme_id', right_on='id')
merged_data = pd.merge(inventories, merged_data, left_on='set_num', right_on='set_num')
merged_data = pd.merge(inventory_parts, merged_data, left_on='inventory_id', right_on='id')
merged_data = pd.merge(parts, merged_data, left_on='part_num', right_on='part_num')
merged_data = pd.merge(colors, merged_data, left_on='id', right_on='color_id')

# Filter the merged data based on set number and version number
filtered_data = merged_data[(merged_data['set_num'] == set_num) & (merged_data['version'] == version)]

# Create the final DataFrame with required columns and sort the data
final_df = filtered_data[['set_num', 'version', 'name', 'year', 'name_y', 'part_num', 'name', 'name', 'quantity', 'is_spare']]
final_df.columns = ['set_num', 'version', 'name_sets', 'year', 'name_themes', 'part_num', 'name_part', 'name_color', 'quantity', 'is_spare']
final_df = final_df.sort_values(by=['part_num', 'name_color']).reset_index(drop=True)

return final_df
```

Note: Make sure to replace 'your_set_number' and 'your_version_number' with the actual set number and version number you want to retrieve the parts for.

To solve this question and create the required function, you can follow the step-by-step instructions explained below:

Step 1: Import the necessary libraries:
```python
import pandas as pd
```

Step 2: Define the function called `set_parts` with the required input parameters:
```python
def set_parts(set_num: str, version: int) -> pd.DataFrame:
```

Step 3: Load the required datasets from the tables mentioned earlier:
```python
themes = pd.read_csv('themes.csv')
sets = pd.read_csv('sets.csv')
inventories = pd.read_csv('inventories.csv')
inventory_parts = pd.read_csv('inventory_parts.csv')
parts = pd.read_csv('parts.csv')
colors = pd.read_csv('colors.csv')
```

Step 4: Filter data based on the set number and version:
```python
set_info = sets[(sets['set_num'] == set_num) & (sets['version'] == version)]
```

Step 5: Perform necessary merges to get the required information from different tables:
```python
merged_data = set_info.merge(inventories, on='set_id')
merged_data = merged_data.merge(inventory_parts, on='inventory_id')
merged_data = merged_data.merge(parts, on='part_num')
merged_data = merged_data.merge(colors, on='color_id')
merged_data = merged_data.merge(themes, on='theme_id')
```

Step 6: Select and reorder the columns as per the required format:
```python
merged_data = merged_data[['set_num', 'version', 'name', 'year', 'name_y', 'part_num', 'name_x', 'name', 'quantity', 'is_spare']]
```

Step 7: Sort the data by part_num and then name_color:
```python
merged_data = merged_data.sort_values(by=['part_num', 'name'])
```

Step 8: Reset the index for the DataFrame:
```python
merged_data = merged_data.reset_index(drop=True)
```

Step 9: Return the final DataFrame:
```python
return merged_data
```

That's it! The entire function `set_parts` will look like this:

```python
def set_parts(set_num: str, version: int) -> pd.DataFrame:
themes = pd.read_csv('themes.csv')
sets = pd.read_csv('sets.csv')
inventories = pd.read_csv('inventories.csv')
inventory_parts = pd.read_csv('inventory_parts.csv')
parts = pd.read_csv('parts.csv')
colors = pd.read_csv('colors.csv')

set_info = sets[(sets['set_num'] == set_num) & (sets['version'] == version)]

merged_data = set_info.merge(inventories, on='set_id')
merged_data = merged_data.merge(inventory_parts, on='inventory_id')
merged_data = merged_data.merge(parts, on='part_num')
merged_data = merged_data.merge(colors, on='color_id')
merged_data = merged_data.merge(themes, on='theme_id')

merged_data = merged_data[['set_num', 'version', 'name', 'year', 'name_y', 'part_num', 'name_x', 'name', 'quantity', 'is_spare']]
merged_data = merged_data.sort_values(by=['part_num', 'name'])
merged_data = merged_data.reset_index(drop=True)

return merged_data
```

Make sure to replace the file names with the actual file names, and the column names (`name`, `name_y`, `name_x`) with their actual respective column names in the datasets if they are different.