Exploration of World Health Statistics
This is the Exploratory Analysis of the 2012 World Health Statistics from the World Health Organization.
Sample of Unclean Data
A quick view of the following unclean data sample allows us to discover a number of problems. First, there are too many null values. In fact, Out of 129 countries, 112 countries are missing entries for contraceptive prevalence, antenatal care coverage, births attended by skilled health personnel. Secondly, the column names are coded and too long, hence inconvenient to interpret for our purpose of exploration. As a result, I cleaned the data by removing columns that were significantly incomplete and focused on the Under-5 mortality rates and Maternal mortality ratios for 1990, 2000 and 2010. I also noted that the names “rate” and “ratio” are misleading, as both columns have values over 100. I found out from the source website that Under-5 mortality rate refers to the number of deaths by age 5 per 1000 live births. Maternal mortality ratio refers to the the number of deaths per 100,000 live birth. I considered scaling both variables to a percentage basis, but did not proceed because each variable has its own reasonable scale. For example, no more than 1600 mothers die giving birth out of 100,000 live births, which means every maternal mortality ratio is going to be less than 1.6%, which is harder to account for the differences and scale.
|Country||MDG4: Under-5 mortality rate 1990||MDG4: Under-5 mortality rate 2000||MDG4: Under-5 mortality rate 2010||MDG5: Maternal mortality ratio 1990||MDG5: Maternal mortality ratio 2000||MDG5: Maternal mortality ratio 2010||MDG5: Contraceptive prevalence (%) – rural||MDG5: Contraceptive prevalence (%) – urban||MDG5: Contraceptive prevalence (%) – poorest||MDG5: Contraceptive prevalence (%) – wealthiest||MDG5: Contraceptive prevalence (%) – no education||MDG5: Contraceptive prevalence (%) – educated||MDG5: Antenatal care coverage - rural||MDG5: Antenatal care coverage - urban||MDG5: Antenatal care coverage – poorest||MDG5: Antenatal care coverage – wealthiest||MDG5: Antenatal care coverage – no education||MDG5: Antenatal care coverage – educated||MDG5: Births attended by skilled health personnel (%) - rural||MDG5: Births attended by skilled health personnel (%) - urban||MDG5: Births attended by skilled health personnel (%) - poorest||MDG5: Births attended by skilled health personnel (%) - wealthiest||MDG5: Births attended by skilled health personnel (%) - no education||MDG5: Births attended by skilled health personnel (%) - educated||Children aged <5 years who are stunted (%) - rural||Children aged <5 years who are stunted (%) - urban||Children aged <5 years who are stunted (%) - poorest||Children aged <5 years who are stunted (%) - wealthiest||Children aged <5 years who are stunted (%) - no education||Children aged <5 years who are stunted (%) - educated||MDG4: Under-5 mortality rate - rural||MDG4: Under-5 mortality rate – urban||MDG4: Under-5 mortality rate – poorest||MDG4: Under-5 mortality rate – wealthiest||MDG4: Under-5 mortality rate – no education||MDG4: Under-5 mortality rate – educated|
|Antigua & Barbuda||26||15||8||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN||NaN|
My intuition is that the distribution of the mortality rates will follow a Poisson distribution, since each death is an independent event, the average frequency during our time period is known, and that it is only meaningful to count how many times deaths have occurred but not how many times deaths have not occurred. That is confirmed by the following histograms. Most countries have very low mortality rates while only a few countries have over 250 under-5 deaths per 1000 births or over 100 maternal deaths per 100,000 births.
Note: The bin size and grid size of each histogram differs, which may not showcase the decreasing trend of mortality rates.
We can also see that the overall mortality rates decreased when we moved to 2000 and 2010 from the following box plots. The mean and range consistently dropped across time for both under-5 mortality rate and maternal mortality ratio. The mean Under-5 mortality rate dropped from around 50 deaths per 1000 births in 1990 to fewer than 25 deaths per 1000 births in 2010. The Maternal mortality ratio decreased from 100 deaths per 100,000 births in 1990 to around 60 deaths per 100,000 births. Its range shrank dramatically by half from 1990 to 2010.
In addition, I wanted to look at whether the two mortality rates are correlated and the extent of correlations between them. Taking advantage of Seaborn's jointplot, I was able to find a high pearson correlation coefficient between the maternal mortality ratio and the under-5 mortality rate. This makes sense because both are higher in less developed countries with less advanced medical infrastructure, knowledge and awareness.
The heat maps show how the two variables concentrate on countries across Africa, in south Asia and South America. The striking resemblance between the two maps shows how correlated the two mortality rates are, confirming our previous analysis.
Heatmap of Under-5 Mortality Rate across the world
Heatmap of Maternal Mortality Ratio across the world