Wednesday – September 20, 2023

In today’s analysis, we concentrated on building a linear regression model to examine the ‘% Obesity’ data by utilizing ‘% Diabetes’ and ‘% Inactivity’ data. Furthermore, we also built a linear regression model to examine the ‘% Diabetes’ data, considering the influence of ‘% Obesity’ and ‘% Inactivity’ data.

Linear regression is a statistical method for modeling the relationship between a dependent variable and one or more independent variables. Linear regression aims to find the best-fitting line through the data points. This line can then be used to make predictions about the dependent variable given the values of the independent variables.

The equation for a linear regression model is as follows:

y = mx + b

Where:

  • y is the dependent variable
  • x is the independent variable
  • m is the slope of the line
  • b is the y-intercept of the line

The slope of the line tells us how much the dependent variable changes for every one-unit change in the independent variable. The y-intercept of the line tells us the value of the dependent variable when the independent variable is equal to zero.

Imports the necessary libraries. This includes the following:

    • pandas: A library for data manipulation and analysis.
    • numpy: A library for scientific computing.
    • sklearn: A library for machine learning.
    • matplotlib: A library for data visualization.
  1. Loads the data from an Excel file. The file path is specified by the variable file_path.
  2. Removes rows with missing values in the dependent variable. The dependent variable is the variable that we want to predict. In this case, it is the percentage of obesity and diabetes.
  3. Defines the independent variables and the dependent variable. The independent variables are the variables that we use to predict the dependent variable. In this case, they are the percentage of obesity and inactivity; in the second case, they are the % Obesity’ and ‘% Inactivity data.
  4. Creates a linear regression model. This is done using the LinearRegression() class from the sklearn library.
  5. Fits the model to the data. This is done using the fit() method of the LinearRegression class.
  6. Prints the intercept and coefficients. The intercept is the value of the predicted dependent variable when all independent variables equal zero. The coefficients are the values that multiply the independent variables in the linear regression equation.
  7. Makes predictions using the model. This is done using the predict() method of the LinearRegression class.
  8. Plots the actual vs. predicted values. This is done using the matplotlib library.
  9. Calculates the regression line. This is done using the predict() method of the LinearRegression class.
  10. Adds the regression line to the plot. This is done using the plot() method of the matplotlib library.
  11. Displays the legend. This is done using the legend() method of the matplotlib library.
  12. Shows the plot. This is done using the show() method of the matplotlib library.
Project 1 - Progress report - Jupyter Notebook