Wednesday – November 29, 2023.

My second approach to analysis of that data is:

Temporal Analysis of Violations

Another insightful approach to analyzing the dataset is to conduct a temporal analysis of the recorded violations. This involves exploring how the frequency and nature of violations change over time. By grouping the data based on inspection dates, trends in compliance and non-compliance can be identified. For example, one could investigate whether there are specific months or seasons when certain types of violations are more prevalent. Additionally, examining the time lapse between consecutive inspections for each establishment can provide insights into the effectiveness of corrective actions taken by businesses. Utilizing line charts or heatmaps can be effective in visualizing temporal patterns in violation occurrences.

Monday – November 27, 2023.

This week I am looking to do analysis on this dataset:

https://data.boston.gov/dataset/active-food-establishment-licenses

Data Analysis Approach 1: Overview of Inspection Results

In the provided dataset containing information about various food establishments, particularly focusing on restaurants, a comprehensive analysis can be conducted to gain insights into their compliance with health and safety standards. The dataset includes details such as business name, license information, inspection outcomes, and specific violations noted during inspections. One approach to analyzing this data is to generate an overall overview of the inspection results for each establishment. This could involve calculating the percentage of inspections that resulted in a pass, fail, or other status. Additionally, identifying patterns in the types of violations recorded and their frequency across different establishments can provide valuable information. Visualizations such as pie charts or bar graphs can be employed to effectively communicate the distribution of inspection outcomes and the most common violations.

 

FRIDAY – NOVEMBER 24,2023

My final analysis for this data is :

Business Growth and Collaboration Analysis

To support business growth, understanding key factors such as business size, service offerings, and collaborative opportunities is crucial. Analyzing businesses like “IMMAD, LLC” in Forensic Science or “Sparkle Clean Boston LLC” in Clean-tech/Green-tech reveals specific niches that may have growth potential. Implementing targeted marketing and innovation in these niches can be strategic for expansion.

Moreover, identifying businesses open to collaboration can foster a mutually beneficial environment. For instance, “Boston Property Buyers” and “Presidential Properties” both operate in Real Estate. Recognizing such connections can lead to collaborative ventures, shared resources, and a stronger market presence.

Finally, businesses with no digital presence or incomplete information, like “Not yet” and “N/A,” present opportunities for improvement. Implementing digital strategies, such as creating a website or optimizing contact information, can enhance visibility and accessibility, contributing to overall business success.

WEDNESDAY – NOVEMBER 22,2023

In the same data, I continued my analysis.

Digital Presence and Communication Analysis

The dataset includes businesses’ online presence through websites, email addresses, and phone numbers. Analyzing the online landscape is crucial for understanding the modern business environment. For instance, businesses like “Boston Chinatown Tours” and “Interactive Construction Inc.” have websites, providing opportunities for digital marketing, customer engagement, and e-commerce. Evaluating the effectiveness of these online platforms and optimizing them for user experience can enhance business visibility and customer interaction.

Furthermore, analyzing contact information such as email addresses and phone numbers is vital for communication strategies. “Eye Adore Threading” and “Alexis Frobin Acupuncture” have multiple contact points, ensuring accessibility for potential clients. Utilizing data-driven communication strategies, such as email marketing or SMS campaigns, can enhance customer engagement and retention.

The “Other Information” field, specifying if a business is “Minority-owned” or “Immigrant-owned,” can influence marketing narratives. Highlighting these aspects in digital communication can resonate positively with diverse audiences, fostering a sense of community and inclusivity.

 

 

 

Monday – November 20,2023

Today, I started looking at a new set of data, the data can be found here: https://data.boston.gov/dataset/women-owned-businesses

Business Type and Location Analysis

In this dataset, businesses’ key attributes include Business Name, Business Type, Physical Location/Address, Business Zipcode, Business Website, Business Phone Number, Business Email, and Other Information. The initial step in data analysis involves categorizing businesses based on their types. This classification facilitates a comprehensive understanding of the diverse industries present. For instance, businesses like “Advocacy for Special Kids, LLC” and “HAI Analytics” fall under the Education category, while “Alexis Frobin Acupuncture” and “Eye Adore Threading” belong to the Healthcare sector. “CravenRaven Boutique” and “All Fit Alteration” represent the Retail industry, showcasing a variety of business types.

Next, examining the geographical distribution of businesses is essential. The physical locations and zip codes reveal clusters of businesses within specific regions, offering insights into the economic landscape of different areas. Businesses such as “Boston Sports Leagues” and “All Things Visual” in the 2116 zip code highlight concentrations of services in that region. Understanding the spatial distribution enables targeted marketing and resource allocation for business growth.

Additionally, analyzing the “Other Information” field, which includes details like “Minority-owned” and “Immigrant-owned,” provides valuable socio-economic insights. This information aids in identifying businesses contributing to diversity and inclusivity within the entrepreneurial landscape. Focusing on supporting minority and immigrant-owned businesses could be a strategic approach for community development and economic empowerment.

Friday – November 17,2023

Today I looked at data of “Hyde Park” . In order to analyze the provided data for Hyde Park across different decades, several data analysis techniques can be employed. Firstly, a temporal trend analysis can be conducted to observe population changes over time, identifying peaks and troughs in each demographic category. Age distribution patterns can be explored through bar charts, highlighting shifts in the population structure. Additionally, educational attainment trends can be visualized using pie charts or bar graphs to understand changes in the level of education within the community. The nativity and race/ethnicity data can be further examined using percentage distribution analysis to track variations in the composition of the population. Labor force participation rates, divided by gender, can be visualized to discern patterns in workforce dynamics. Housing tenure analysis, using pie charts or bar graphs, can reveal shifts in the proportion of owner-occupied and renter-occupied units, providing insights into housing trends. Overall, a combination of graphical representation and statistical measures would facilitate a comprehensive understanding of the demographic, educational, labor, and housing dynamics in Hyde Park over the specified decades.

Wednesday – November 15,2023.

Today I looked at the second sheet “Back Bay” of the Excel sheet https://data.boston.gov/dataset/neighborhood-demographics

The dataset on Back Bay offers insights into the neighborhood’s evolution across different decades, allowing for a comprehensive analysis of various demographic aspects. Notable patterns include population fluctuations, with a decline until 1990 followed by relative stability. Age distribution highlights shifts in the percentage of residents across different age groups, particularly a substantial increase in the 20-34 age bracket from 32% in 1950 to 54% in 1980. Educational attainment displays changing proportions of individuals with varying levels of education, notably showcasing a significant rise in those with a Bachelor’s Degree or Higher from 20% in 1950 to 81% in 2010. Nativity data reveals fluctuations in the percentage of foreign-born residents, while the race/ethnicity distribution indicates a decrease in the white population and a rise in the Asian/PI category. Labor force participation demonstrates gender-based variations, and housing tenure data underscores changes in the ratio of owner-occupied to renter-occupied units. Collectively, this dataset provides a nuanced understanding of the socio-demographic landscape in Back Bay over the decades.

Monday – November 13, 2023

I am currently examining the dataset on Analyze Boston, specifically focusing on the “Allston” sheet within the “neighborhoodsummaryclean_1950-2010” Excel file, which is available at https://data.boston.gov/dataset/neighborhood-demographics. The dataset provides a comprehensive overview of demographic and socioeconomic trends in Allston spanning several decades. Notably, there is evident population growth from 1950 to 2010. The age distribution data reveals intriguing patterns, including shifts in the percentage of residents across various age groups over the years. Educational attainment data reflects changes in the population’s education levels, notably showcasing a significant increase in the percentage of individuals holding a Bachelor’s degree or higher. The nativity data sheds light on the proportion of foreign-born residents, indicating shifts in immigration patterns. Changes in the racial and ethnic composition are apparent, with a declining percentage of White residents and an increase in Asian/PI residents. The labor force participation data by gender is noteworthy, illustrating fluctuations in male and female employment rates. Housing tenure data suggests a rise in the number of renter-occupied units over the years. Potential data analysis avenues may involve exploring correlations between demographic shifts, educational attainment, and housing tenure to gain deeper insights into the socio-economic dynamics of Allston.

Sunday – November 12,2023

This is Project 2 for MTH 522 at the University of Massachusetts Dartmouth.

Project Title:

Analysis of Fatal Police Shootings in the United States Using Washington Post Data 

The provided dataset has been thoroughly examined and comprehensively reported in the project document.

The contribution report has been added to the final page of the report.

Project 2

 

 

Friday – November 10,2023.

In today’s analysis, I loaded police shooting data from an Excel file into a Pandas DataFrame and aimed to investigate the distribution of justified and unjustified use of force by police across different racial groups, focusing on both male and female incidents. To achieve this, I defined a function to determine whether force was justified based on threat types and weapons involved. I then applied this function to the dataset, creating a new column indicating the justification of force. Subsequently, I filtered the data to include only incidents involving Black, White, Hispanic, and Asian individuals. After separating the data by gender, I calculated the occurrences and percentages of ‘False’ justified force cases for each race. Using Seaborn and Matplotlib, I created bar plots to visually represent these percentages for both male and female incidents. The analysis provides insights into potential disparities in the perceived justification of police force across different racial groups and genders, as visualized in the generated bar plots.

Wednesday – November 8,2023.

In todays analysis, I wrote the code to perform text analysis on specific columns of an Excel dataset to count the frequencies of words in those columns. Here’s a step-by-step explanation of the code:

  1. Import the necessary libraries:
    • import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd’ for working with data.
    • from collections import Counter: Imports the Counter class from the collections module, which is used to count the frequency of words.
  2. Define the column names you want to analyze:
    • columns_to_analyze: A list containing the names of the columns you want to analyze for word frequencies. In this code, the columns specified are ‘threat_type’, ‘flee_status’, ‘armed_with’, and ‘body_camera.’
  3. Specify the file path to your Excel document:
    • directory_path: Specifies the file path to the Excel file you want to analyze. Make sure to update this path to your Excel file’s location.
  4. Load your data into a DataFrame:
    • df = pd.read_excel(directory_path): Reads the data from the Excel file specified by ‘directory_path’ into a Pandas DataFrame named ‘df.’
  5. Initialize a dictionary to store word counts for each column:
    • word_counts = {}: Creates an empty dictionary named ‘word_counts’ to store the word counts for each specified column.
  6. Iterate through the specified columns:
    • The code uses a for loop to go through each column specified in the columns_to_analyze list.
  7. Retrieve and preprocess the data from the column:
    • column_data = df[column_name].astype(str): Retrieves the data from the current column, converts it to strings to ensure consistent data type, and stores it in the ‘column_data’ variable.
  8. Tokenize the text and count the frequency of each word:
    • The code tokenizes the text within each column using the following steps:
      • words = ' '.join(column_data).split(): Joins all the text in the column into a single string, then splits it into individual words. This step prepares the data for word frequency counting.
      • word_counts[column_name] = Counter(words): Uses the Counter class to count the frequency of each word in the ‘words’ list and stores the results in the ‘word_counts’ dictionary under the column name as the key.
  9. Print the words and their frequencies for each column:
    • The code iterates through the ‘word_counts’ dictionary and prints the word frequencies for each column. It displays the column name, followed by the individual words and their counts for that column.

The code provides a word frequency analysis for the specified columns in your dataset, making it easier to understand the distribution of words in those columns. This can be useful for identifying common terms or patterns in the data.

Monday – November 6,2023

  1. Import the necessary libraries:
    • import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd.’
    • import matplotlib.pyplot as plt: Imports the Matplotlib library and assigns it the alias ‘plt,’ which will be used to create plots and visualizations.
  2. Load the Excel file into a DataFrame:
    • directory_path: Specifies the file path to the Excel file you want to load. You should update this path to your Excel file’s location.
    • sheet_name: Specifies the name of the sheet within the Excel file from which data should be read.
    • df = pd.read_excel(directory_path, sheet_name=sheet_name): Reads the data from the Excel file into a Pandas DataFrame named ‘df.’
  3. Drop rows with missing ‘race,’ ‘age,’ or ‘gender’ values:
    • df = df.dropna(subset=['race', 'age', 'gender']): Removes rows from the DataFrame where any of these three columns (race, age, gender) have missing values.
  4. Create age groups:
    • age_bins: Defines the boundaries for age groups, similar to the previous code snippet.
    • age_labels: Provides labels for each age group, corresponding to ‘age_bins.’
  5. Cut the age data into age groups for each race category:
    • df['Age Group'] = pd.cut(df['age'], bins=age_bins, labels=age_labels): Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
  6. Count the number of individuals in each age group by race and gender:
    • age_group_counts_by_race_gender = df.groupby(['race', 'gender', 'Age Group'])['name'].count().unstack().fillna(0): Groups the data by race, gender, and age group, and then counts the number of individuals in each combination. The ‘unstack()’ function reshapes the data to make it more suitable for visualization, and ‘fillna(0)’ fills missing values with 0.
  7. Calculate the median age for each race and gender combination:
    • median_age_by_race_gender = df.groupby(['race', 'gender'])['age'].median(): Groups the data by race and gender and calculates the median age for each combination.
  8. Print the median age for each race and gender combination:
    • print("Median Age by Race and Gender:"): Prints a header.
    • print(median_age_by_race_gender): Prints the calculated median age for each race and gender combination.
  9. Create grouped bar charts for different genders:
    • The code iterates over unique gender values in the DataFrame and creates separate bar charts for each gender.
    • For each gender:
      • Subset the DataFrame to include only data for that gender.
      • Create a grouped bar chart, displaying the number of individuals in different age groups for each race-gender combination.
      • Set various plot properties such as the title, labels, legend, and rotation of x-axis labels.
      • Display the plot using plt.show().

This code generates grouped bar charts that visualize the distribution of individuals in different age groups for each race-gender combination, helping to analyze the age distribution within these subgroups.

The output is :

Median Age by Race and Gender:
race  gender
A     female    47.0
      male      34.0
B     female    31.0
      male      31.0
B;H   male      27.0
H     female    31.0
      male      33.0
N     female    32.0
      male      31.5
O     female    24.5
      male      36.0
W     female    39.0
      male      38.0
Name: age, dtype: float64

Friday – November 3,2023.

Today I worked on a Python script that uses the Pandas library to load data from an Excel file, perform some data analysis on the age distribution of individuals, and then create a bar graph to visualize the distribution of individuals in different age groups. Here’s a step-by-step explanation of the code:

  1. Import the necessary libraries:
    • import pandas as pd: Imports the Pandas library and assigns it the alias ‘pd.’
    • import matplotlib.pyplot as plt: Imports the Matplotlib library, specifically the ‘pyplot’ module, and assigns it the alias ‘plt.’ Matplotlib is used for creating plots and visualizations.
  2. Load the Excel file into a DataFrame:
    • directory_path: Specifies the file path to the Excel file you want to load. Make sure to update this path to the location of your Excel file.
    • sheet_name: Specifies the name of the sheet within the Excel file from which data should be read.
    • df = pd.read_excel(directory_path, sheet_name=sheet_name): Uses the pd.read_excel function to read the data from the Excel file into a Pandas DataFrame named ‘df.’
  3. Calculate the median age of all individuals:
    • median_age = df['age'].median(): Calculates the median age of all individuals in the ‘age’ column of the DataFrame and stores it in the ‘median_age’ variable.
    • print("Median Age of All Individuals:", median_age): Prints the calculated median age to the console.
  4. Create age groups:
    • age_bins: Defines the boundaries for age groups. In this case, individuals will be grouped into the specified age ranges.
    • age_labels: Provides labels for each age group, corresponding to the ‘age_bins.’
  5. Cut the age data into age groups:
    • df['Age Group'] = pd.cut(df['age'], bins=age_bins, labels=age_labels): Creates a new column ‘Age Group’ in the DataFrame by categorizing individuals’ ages into the age groups defined in ‘age_bins’ and labeling them with ‘age_labels.’
  6. Count the number of individuals in each age group:
    • age_group_counts = df['Age Group'].value_counts().sort_index(): Counts the number of individuals in each age group and sorts them by the age group labels. The result is stored in the ‘age_group_counts’ variable.
  7. Create a bar graph to analyze age groups:
    • plt.figure(figsize=(10, 6)): Sets the size of the figure for the upcoming plot.
    • age_group_counts.plot(kind='bar', color='skyblue'): Plots a bar graph using the ‘age_group_counts’ data, where each bar represents an age group. ‘skyblue’ is the color of the bars.
    • plt.title('Age Group Analysis'): Sets the title of the plot.
    • plt.xlabel('Age Group'): Sets the label for the x-axis.
    • plt.ylabel('Number of Individuals'): Sets the label for the y-axis.
    • plt.xticks(rotation=45): Rotates the x-axis labels by 45 degrees for better readability.
    • plt.show(): Displays the bar graph on the screen.

After running this code, you will get a bar graph showing the distribution of individuals in different age groups based on the data from the Excel file.

Wednesday – November 1,2023.

Today, I wrote a code in Python and used the pandas and collections libraries to analyze data from an Excel file. Here’s a simple explanation of what it does:

  1. It starts by importing two libraries: “pandas” (commonly used for data analysis) and “Counter” from “collections” (used for counting elements in a list).
  2. The code specifies the names of the columns you want to analyze from an Excel file. These columns include information like “threat_type,” “flee_status,” “armed_with,” and others.
  3. It sets the file path to the location of your Excel document. You need to replace this path with the actual path to your Excel file.
  4. The code uses “pd.read_excel” to load the data from the Excel file into a DataFrame (a table-like structure for data).
  5. It initializes a dictionary called “word_counts” to store word frequencies for each of the specified columns.
  6. The code then goes through each of the specified columns one by one. For each column:
    • It retrieves the data from that column and converts it to strings to ensure uniform data type.
    • It breaks the text into individual words (tokenizes it) and counts how many times each word appears in that column.
    • These word counts are stored in the “word_counts” dictionary under the column’s name.
    • Finally, the code prints the words and their frequencies for each of the specified columns. It goes through the “word_counts” dictionary and displays the words and how many times they appear in each column.

In summary, this code reads data from an Excel file, tokenizes the text in specific columns, and counts the frequency of each word in those columns. It then prints out the word frequencies for each column, which can be useful for understanding the data in those columns.