Sunday – December 10,2023.

This is Project 3 for MTH 522 at the University of Massachusetts Dartmouth.

Project Title:

Analysis of Boston Crime Incident Data: Exploring Crime Patterns and Trends 

The provided dataset has been thoroughly examined and comprehensively reported in the project document.

The contribution report has been added to the final page of the report.

Report-PRJ3

Friday – December 8,2023.

In my analysis of a crime dataset, I initially identified the top three streets with the highest number of shootings, including “WASHINGTON ST,” “BOYLSTON ST,” and “BLUE HILL AVE,” along with the most prevalent offenses in these areas. I then determined the most common time for shootings, finding that incidents were most frequent in June, on Saturdays, and at midnight. Further investigation into UCR categories revealed that “Part Three” crimes were predominant, with variations in top offenses across UCR parts. Examining streets associated with UCR parts, “WASHINGTON ST” consistently appeared prominently. Additionally, I explored district-level data, highlighting the districts with the highest occurrences for different UCR parts. Finally, I identified the top five streets with the most diverse range of crimes, such as “CENTRE ST” and “WASHINGTON ST,” and visualized the findings through insightful bar graphs. Overall, the analysis provided a comprehensive understanding of the dataset’s crime patterns, street occurrences, and UCR categories.

Wednesday – December 6,2023.

In this Python code, I use the pandas library to analyze an Excel dataset containing information about offenses. I read the data into a DataFrame and then clean it by excluding rows where either the ‘OFFENSE_CODE_GROUP’ or ‘STREET’ columns contain integers, as well as dropping any missing values in these columns. Next, I group the cleaned data by street, counting the unique types of crimes for each location and sorting the results in descending order. I print the output, which displays street names and the corresponding counts of unique offenses, from the highest to the lowest offenses. Additionally, I identify and print the top 5 offense categories based on their frequency in the dataset.

Code:

import pandas as pd

# Read the data from an Excel file
df = pd.read_excel(r’D:\General\UMass Dartmouth\Subjects\Fall 2023 – MTH 522 – Mathematical Statistics\Project 3\customdataset.xlsx’)

# Remove rows where either ‘OFFENSE_CODE_GROUP’ or ‘STREET’ contains integers
# Also, drop rows with missing values in ‘OFFENSE_CODE_GROUP’ or ‘STREET’ columns
df_cleaned = df[
~df.applymap(lambda x: isinstance(x, (int, float)))[‘OFFENSE_CODE_GROUP’] &
~df.applymap(lambda x: isinstance(x, (int, float)))[‘STREET’]
].dropna(subset=[‘OFFENSE_CODE_GROUP’, ‘STREET’])

# Group by street and count unique types of crimes
result = df_cleaned.groupby(‘STREET’)[‘OFFENSE_CODE_GROUP’].nunique().sort_values(ascending=False)

# Optionally, reset the index if desired
# result = result.reset_index()

# Print the result, including the highest to the lowest offenses
print(result.to_frame().reset_index().to_string(index=False))

# Get the top 5 offense categories
top5_offenses = df_cleaned[‘OFFENSE_CODE_GROUP’].value_counts().nlargest(5)

# Print the top 5 offense categories
print(“\nTop 5 Offense Categories:”)
print(top5_offenses)

Monday – December 4, 2023.

So, for the final project, I have decided to work on this dataset: https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system

These are the steps for analysis, I will be following for our analysis:

  1. Variety of Crimes in Different Areas:
    1. Group the data by street and analyze the count of unique types of crimes on each street.
    2. Visualize the results using bar charts or other appropriate plots.
  2. Most Common Crime Types, Time, and Day on Specific Streets:
    1. Filter the data for each street and analyze the most common crime types, days, and hours.
    2. Use bar charts, pie charts, or heatmaps for visualization.
  3. Rise in Certain Crimes in Specific Areas:
    1. Perform a temporal analysis to identify trends in specific types of crimes over time.
    2. Use line charts or other time series visualizations.
  4. Common Crimes Rising Over Time:
    1. Analyze the overall trend of common crimes over the entire dataset.
    2. Consider creating a time series plot to visualize the changes.
  5. Common Neighborhoods with Crime:
    1. Group the data by neighborhood to identify areas with higher crime rates.
    2. Visualize the results using maps or bar charts.
  6. Time Analysis:
    1. Analyze the data based on time factors such as month, day of the week, and hour.
    2. Identify patterns and trends over time using appropriate visualizations.
  7. Map Chart Visualization:
    1. Utilize the latitude and longitude information to create a map chart.
    2. Color-code or size-code data points based on the frequency of crimes in each location.
  8. Correlation Analysis:
    1. Use statistical methods to identify correlations between different variables (e.g., time, day, month) and types of crimes.
    2. Visualize correlations using correlation matrices or scatter plots.
  9. Shooting Data Analysis:
    1. Analyze shooting data separately, identifying patterns, and correlations with other variables.
    2. Visualize shooting incidents on a map and explore temporal patterns.
  10. Predictive Models:
    1. Depending on the nature of your dataset, you can build predictive models to forecast future crime incidents or classify incidents into different categories.
    2. Common algorithms include decision trees, random forests, or neural networks.

 

 

Friday – December 1, 2023.

Geospatial Analysis of Violations

A geospatial analysis of the dataset can offer valuable insights into the distribution of health violations across different locations. By leveraging the latitude and longitude information provided for each establishment, a map can be created to visualize the concentration of violations in specific geographical areas. This analysis could help identify clusters of non-compliant establishments or areas with consistently high or low compliance rates. Furthermore, overlaying demographic or economic data onto the map may reveal correlations between the socio-economic context of an area and the adherence to health and safety standards by food establishments. Geospatial tools and visualizations, such as heatmaps or choropleth maps, can be employed for a comprehensive representation of the spatial distribution of violations.