Exploration of World Health Statistics

This is the Exploratory Analysis of the 2012 World Health Statistics from the World Health Organization.

Sample of Unclean Data

A quick view of the following unclean data sample allows us to discover a number of problems. First, there are too many null values. In fact, Out of 129 countries, 112 countries are missing entries for contraceptive prevalence, antenatal care coverage, births attended by skilled health personnel. Secondly, the column names are coded and too long, hence inconvenient to interpret for our purpose of exploration. As a result, I cleaned the data by removing columns that were significantly incomplete and focused on the Under-5 mortality rates and Maternal mortality ratios for 1990, 2000 and 2010. I also noted that the names “rate” and “ratio” are misleading, as both columns have values over 100. I found out from the source website that Under-5 mortality rate refers to the number of deaths by age 5 per 1000 live births. Maternal mortality ratio refers to the the number of deaths per 100,000 live birth. I considered scaling both variables to a percentage basis, but did not proceed because each variable has its own reasonable scale. For example, no more than 1600 mothers die giving birth out of 100,000 live births, which means every maternal mortality ratio is going to be less than 1.6%, which is harder to account for the differences and scale.

Country MDG4: Under-5 mortality rate 1990 MDG4: Under-5 mortality rate 2000 MDG4: Under-5 mortality rate 2010 MDG5: Maternal mortality ratio 1990 MDG5: Maternal mortality ratio 2000 MDG5: Maternal mortality ratio 2010 MDG5: Contraceptive prevalence (%) – rural MDG5: Contraceptive prevalence (%) – urban MDG5: Contraceptive prevalence (%) – poorest MDG5: Contraceptive prevalence (%) – wealthiest MDG5: Contraceptive prevalence (%) – no education MDG5: Contraceptive prevalence (%) – educated MDG5: Antenatal care coverage - rural MDG5: Antenatal care coverage - urban MDG5: Antenatal care coverage – poorest MDG5: Antenatal care coverage – wealthiest MDG5: Antenatal care coverage – no education MDG5: Antenatal care coverage – educated MDG5: Births attended by skilled health personnel (%) - rural MDG5: Births attended by skilled health personnel (%) - urban MDG5: Births attended by skilled health personnel (%) - poorest MDG5: Births attended by skilled health personnel (%) - wealthiest MDG5: Births attended by skilled health personnel (%) - no education MDG5: Births attended by skilled health personnel (%) - educated Children aged <5 years who are stunted (%) - rural Children aged <5 years who are stunted (%) - urban Children aged <5 years who are stunted (%) - poorest Children aged <5 years who are stunted (%) - wealthiest Children aged <5 years who are stunted (%) - no education Children aged <5 years who are stunted (%) - educated MDG4: Under-5 mortality rate - rural MDG4: Under-5 mortality rate – urban MDG4: Under-5 mortality rate – poorest MDG4: Under-5 mortality rate – wealthiest MDG4: Under-5 mortality rate – no education MDG4: Under-5 mortality rate – educated
Afghanistan NaN 151 149 1300 1000 460 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Albania 41 29 18 48 39 27 10 12 10 14 NaN 13 57 82 49 91 NaN 80 99 100 98 100 NaN 100 19 20 27 13 NaN 17 28 13 34 13 NaN 19
Algeria 68 49 36 220 140 97 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Andorra 9 5 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Angola 243 200 161 1200 890 450 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Antigua & Barbuda 26 15 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Argentina 27 20 14 71 63 77 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Armenia 55 33 20 46 38 30 16 22 12 29 NaN 19 56 82 51 88 NaN 72 98 99 96 100 NaN 98 17 19 20 19 NaN 17 41 26 51 23 NaN 33
Australia 9 6 5 10 9 7 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Austria 9 6 4 10 5 4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Data Exploration

My intuition is that the distribution of the mortality rates will follow a Poisson distribution, since each death is an independent event, the average frequency during our time period is known, and that it is only meaningful to count how many times deaths have occurred but not how many times deaths have not occurred. That is confirmed by the following histograms. Most countries have very low mortality rates while only a few countries have over 250 under-5 deaths per 1000 births or over 100 maternal deaths per 100,000 births.

hist

Note: The bin size and grid size of each histogram differs, which may not showcase the decreasing trend of mortality rates.

We can also see that the overall mortality rates decreased when we moved to 2000 and 2010 from the following box plots. The mean and range consistently dropped across time for both under-5 mortality rate and maternal mortality ratio. The mean Under-5 mortality rate dropped from around 50 deaths per 1000 births in 1990 to fewer than 25 deaths per 1000 births in 2010. The Maternal mortality ratio decreased from 100 deaths per 100,000 births in 1990 to around 60 deaths per 100,000 births. Its range shrank dramatically by half from 1990 to 2010.

box

In addition, I wanted to look at whether the two mortality rates are correlated and the extent of correlations between them. Taking advantage of Seaborn's jointplot, I was able to find a high pearson correlation coefficient between the maternal mortality ratio and the under-5 mortality rate. This makes sense because both are higher in less developed countries with less advanced medical infrastructure, knowledge and awareness.

correlation

The heat maps show how the two variables concentrate on countries across Africa, in south Asia and South America. The striking resemblance between the two maps shows how correlated the two mortality rates are, confirming our previous analysis.

Heatmap of Under-5 Mortality Rate across the world

Heatmap of Maternal Mortality Ratio across the world