Monday – September 18, 2023.

Import Necessary Libraries: The code imports essential libraries for data handling, analysis, and plotting, including pandas, numpy, scikit-learn, and matplotlib.

Load Data: It retrieves data from an Excel file located at a specified file path from my laptop.

Data Cleaning: The code ensures data cleanliness by removing rows with missing values (NaN) in the “Inactivity” column.

Data Setup: After cleaning, the data is split into two parts:

Independent variables (X): These are features that might affect “Inactivity,” like “% Diabetes” and “% Obesity.”

Dependent variable (y): This is the variable we want to predict, which is “Inactivity.”

Linear Regression Model: The code constructs a linear regression model, which is a mathematical formula that finds a link between independent variables (diabetes and obesity percentages) and the dependent variable (inactivity percentage).

Model Training: The model is trained on the data to learn how changes in independent variables influence the dependent variable. It identifies the best-fit line that minimizes the difference between predicted and actual “Inactivity” percentages.

Print Results: The code displays the outcomes of the linear regression analysis, including the intercept (where the line crosses the Y-axis) and coefficients (slopes for each independent variable). These values help interpret the relationship between the variables.

Make Predictions: Using the trained model, the code predicts “Inactivity” percentages based on new values of independent variables (diabetes and obesity percentages).

Plot Results: To visualize the model’s performance, a scatter plot is created. It compares actual “Inactivity” percentages (X-axis) with predicted percentages (Y-axis). A well-fitted model will have points closely aligned with a diagonal line.

In summary, this code loads, cleans, and prepares data, trains a linear regression model to understand relationships, and visualizes the model’s predictions, all aimed at explaining “Inactivity” percentages based on diabetes and obesity percentages.

Project 1 - Progress report - Jupyter Notebook

 

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *