Wednesday – December 6,2023.

In this Python code, I use the pandas library to analyze an Excel dataset containing information about offenses. I read the data into a DataFrame and then clean it by excluding rows where either the ‘OFFENSE_CODE_GROUP’ or ‘STREET’ columns contain integers, as well as dropping any missing values in these columns. Next, I group the cleaned data by street, counting the unique types of crimes for each location and sorting the results in descending order. I print the output, which displays street names and the corresponding counts of unique offenses, from the highest to the lowest offenses. Additionally, I identify and print the top 5 offense categories based on their frequency in the dataset.

Code:

import pandas as pd

# Read the data from an Excel file
df = pd.read_excel(r’D:\General\UMass Dartmouth\Subjects\Fall 2023 – MTH 522 – Mathematical Statistics\Project 3\customdataset.xlsx’)

# Remove rows where either ‘OFFENSE_CODE_GROUP’ or ‘STREET’ contains integers
# Also, drop rows with missing values in ‘OFFENSE_CODE_GROUP’ or ‘STREET’ columns
df_cleaned = df[
~df.applymap(lambda x: isinstance(x, (int, float)))[‘OFFENSE_CODE_GROUP’] &
~df.applymap(lambda x: isinstance(x, (int, float)))[‘STREET’]
].dropna(subset=[‘OFFENSE_CODE_GROUP’, ‘STREET’])

# Group by street and count unique types of crimes
result = df_cleaned.groupby(‘STREET’)[‘OFFENSE_CODE_GROUP’].nunique().sort_values(ascending=False)

# Optionally, reset the index if desired
# result = result.reset_index()

# Print the result, including the highest to the lowest offenses
print(result.to_frame().reset_index().to_string(index=False))

# Get the top 5 offense categories
top5_offenses = df_cleaned[‘OFFENSE_CODE_GROUP’].value_counts().nlargest(5)

# Print the top 5 offense categories
print(“\nTop 5 Offense Categories:”)
print(top5_offenses)

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *