Wednesday – 13 September , 2023.

During my work today, I conducted an in-depth analysis of the 2018 health data from the CDC. Leveraging the Python programming language, I executed calculations about statistical measures, specifically the standard deviation and kurtosis, for the variables representing the percentages of diabetic cases, obesity rates, and inactivity levels. Throughout this process, it became apparent that I encountered a notable challenge in the form of missing or “NaN” values within the dataset. As a result, I dedicated a significant portion of my efforts to data cleaning and preparation to ensure the accuracy of subsequent statistical analyses.

Regrettably, I observed a disparity between the kurtosis values I obtained and those outlined in the reference material provided by our professor. I intend to bring it to the attention of our instructor and teaching assistants during our upcoming class session.

I have attached a PDF of the code I wrote today for you to look over.

Looking forward, my immediate objective involves the calculation of p-values. I plan to implement a t-test, a statistical method that will facilitate hypothesis testing and aid in making informed conclusions about the dataset.Project 1 - Progress report - Jupyter Notebook

 

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *