Friday – 15 September, 2023.

I performed a correlation analysis on three datasets: diabetes, obesity, and inactivity. The study revealed a strong correlation between all three datasets, with the FIPS code being the common factor. Recognizing the need for a more comprehensive analysis, I merged these three datasets into a single Excel spreadsheet for a more holistic examination.

I wrote a code to combine the three datasets and found 356 data points in common. I then cleaned the Excel sheet which involved addressing the issue of redundant columns containing information on county, state, and year. To enhance data clarity, I removed these columns. Additionally, I improved the dataset’s readability by renaming specific columns and adjusting column widths to facilitate data visualization.

Next, I focused on a geographical analysis, explicitly counting the number of counties within each state. I found that Texas has 138 counties, while several states in the dataset have only one county entry, making them statistically less reliable for meaningful analysis. This makes the data unreliable for research, as it is heavily skewed towards certain states. For example, Wyoming only has one county, so analysis of Wyoming would be incorrect as it will skewed towards once particular county only and we would not get the general view of the entire state which is our objective.

docs

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *