Monday – November 6,2023

  1. Import the necessary libraries:
    • import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd.’
    • import matplotlib.pyplot as plt: Imports the Matplotlib library and assigns it the alias ‘plt,’ which will be used to create plots and visualizations.
  2. Load the Excel file into a DataFrame:
    • directory_path: Specifies the file path to the Excel file you want to load. You should update this path to your Excel file’s location.
    • sheet_name: Specifies the name of the sheet within the Excel file from which data should be read.
    • df = pd.read_excel(directory_path, sheet_name=sheet_name): Reads the data from the Excel file into a Pandas DataFrame named ‘df.’
  3. Drop rows with missing ‘race,’ ‘age,’ or ‘gender’ values:
    • df = df.dropna(subset=['race', 'age', 'gender']): Removes rows from the DataFrame where any of these three columns (race, age, gender) have missing values.
  4. Create age groups:
    • age_bins: Defines the boundaries for age groups, similar to the previous code snippet.
    • age_labels: Provides labels for each age group, corresponding to ‘age_bins.’
  5. Cut the age data into age groups for each race category:
    • df['Age Group'] = pd.cut(df['age'], bins=age_bins, labels=age_labels): Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
  6. Count the number of individuals in each age group by race and gender:
    • age_group_counts_by_race_gender = df.groupby(['race', 'gender', 'Age Group'])['name'].count().unstack().fillna(0): Groups the data by race, gender, and age group, and then counts the number of individuals in each combination. The ‘unstack()’ function reshapes the data to make it more suitable for visualization, and ‘fillna(0)’ fills missing values with 0.
  7. Calculate the median age for each race and gender combination:
    • median_age_by_race_gender = df.groupby(['race', 'gender'])['age'].median(): Groups the data by race and gender and calculates the median age for each combination.
  8. Print the median age for each race and gender combination:
    • print("Median Age by Race and Gender:"): Prints a header.
    • print(median_age_by_race_gender): Prints the calculated median age for each race and gender combination.
  9. Create grouped bar charts for different genders:
    • The code iterates over unique gender values in the DataFrame and creates separate bar charts for each gender.
    • For each gender:
      • Subset the DataFrame to include only data for that gender.
      • Create a grouped bar chart, displaying the number of individuals in different age groups for each race-gender combination.
      • Set various plot properties such as the title, labels, legend, and rotation of x-axis labels.
      • Display the plot using plt.show().

This code generates grouped bar charts that visualize the distribution of individuals in different age groups for each race-gender combination, helping to analyze the age distribution within these subgroups.

The output is :

Median Age by Race and Gender:
race  gender
A     female    47.0
      male      34.0
B     female    31.0
      male      31.0
B;H   male      27.0
H     female    31.0
      male      33.0
N     female    32.0
      male      31.5
O     female    24.5
      male      36.0
W     female    39.0
      male      38.0
Name: age, dtype: float64

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *