In this Python code, I use the pandas library to analyze an Excel dataset containing information about offenses. I read the data into a DataFrame and then clean it by excluding rows where either the ‘OFFENSE_CODE_GROUP’ or ‘STREET’ columns contain integers, as well as dropping any missing values in these columns. Next, I group the cleaned data by street, counting the unique types of crimes for each location and sorting the results in descending order. I print the output, which displays street names and the corresponding counts of unique offenses, from the highest to the lowest offenses. Additionally, I identify and print the top 5 offense categories based on their frequency in the dataset.
Code:
import pandas as pd
# Read the data from an Excel file
df = pd.read_excel(r’D:\General\UMass Dartmouth\Subjects\Fall 2023 – MTH 522 – Mathematical Statistics\Project 3\customdataset.xlsx’)
# Remove rows where either ‘OFFENSE_CODE_GROUP’ or ‘STREET’ contains integers
# Also, drop rows with missing values in ‘OFFENSE_CODE_GROUP’ or ‘STREET’ columns
df_cleaned = df[
~df.applymap(lambda x: isinstance(x, (int, float)))[‘OFFENSE_CODE_GROUP’] &
~df.applymap(lambda x: isinstance(x, (int, float)))[‘STREET’]
].dropna(subset=[‘OFFENSE_CODE_GROUP’, ‘STREET’])
# Group by street and count unique types of crimes
result = df_cleaned.groupby(‘STREET’)[‘OFFENSE_CODE_GROUP’].nunique().sort_values(ascending=False)
# Optionally, reset the index if desired
# result = result.reset_index()
# Print the result, including the highest to the lowest offenses
print(result.to_frame().reset_index().to_string(index=False))
# Get the top 5 offense categories
top5_offenses = df_cleaned[‘OFFENSE_CODE_GROUP’].value_counts().nlargest(5)
# Print the top 5 offense categories
print(“\nTop 5 Offense Categories:”)
print(top5_offenses)
Leave a comment