- Import the necessary libraries:
import pandas as pd
: Imports the Pandas library and assigns it the alias ‘pd.’import matplotlib.pyplot as plt
: Imports the Matplotlib library and assigns it the alias ‘plt,’ which will be used to create plots and visualizations.
- Load the Excel file into a DataFrame:
directory_path
: Specifies the file path to the Excel file you want to load. You should update this path to your Excel file’s location.sheet_name
: Specifies the name of the sheet within the Excel file from which data should be read.df = pd.read_excel(directory_path, sheet_name=sheet_name)
: Reads the data from the Excel file into a Pandas DataFrame named ‘df.’
- Drop rows with missing ‘race,’ ‘age,’ or ‘gender’ values:
df = df.dropna(subset=['race', 'age', 'gender'])
: Removes rows from the DataFrame where any of these three columns (race, age, gender) have missing values.
- Create age groups:
age_bins
: Defines the boundaries for age groups, similar to the previous code snippet.age_labels
: Provides labels for each age group, corresponding to ‘age_bins.’
- Cut the age data into age groups for each race category:
df['Age Group'] = pd.cut(df['age'], bins=age_bins, labels=age_labels)
: Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
- Count the number of individuals in each age group by race and gender:
age_group_counts_by_race_gender = df.groupby(['race', 'gender', 'Age Group'])['name'].count().unstack().fillna(0)
: Groups the data by race, gender, and age group, and then counts the number of individuals in each combination. The ‘unstack()’ function reshapes the data to make it more suitable for visualization, and ‘fillna(0)’ fills missing values with 0.
- Calculate the median age for each race and gender combination:
median_age_by_race_gender = df.groupby(['race', 'gender'])['age'].median()
: Groups the data by race and gender and calculates the median age for each combination.
- Print the median age for each race and gender combination:
print("Median Age by Race and Gender:")
: Prints a header.print(median_age_by_race_gender)
: Prints the calculated median age for each race and gender combination.
- Create grouped bar charts for different genders:
- The code iterates over unique gender values in the DataFrame and creates separate bar charts for each gender.
- For each gender:
- Subset the DataFrame to include only data for that gender.
- Create a grouped bar chart, displaying the number of individuals in different age groups for each race-gender combination.
- Set various plot properties such as the title, labels, legend, and rotation of x-axis labels.
- Display the plot using
plt.show()
.
This code generates grouped bar charts that visualize the distribution of individuals in different age groups for each race-gender combination, helping to analyze the age distribution within these subgroups.
The output is :
Median Age by Race and Gender: race gender A female 47.0 male 34.0 B female 31.0 male 31.0 B;H male 27.0 H female 31.0 male 33.0 N female 32.0 male 31.5 O female 24.5 male 36.0 W female 39.0 male 38.0 Name: age, dtype: float64
Leave a comment