Introduction:
For this project I decided to explore the data presented in the 2015 world happiness report as I’ve personally been interested in what factors constitute a “happy” society and which nations around the world have the happiest citizens or highest quality of life. This dataset has many categories that all have differing effects on a nation’s harmony –– provided are country, region, happiness rank, happiness score, economy, family, health, freedom, trust, generosity and dystopia.
The data that I used for this assignemnt can be found here
Loading the Packages and Datasets:
library(tidyverse)
library(knitr)
library(plotly)
library(xtable)
df <- read_csv("2015.csv")
Data Visualization 1:
To begin, I created a scatterplot of all 158 countries’ happiness scores with each region color-coded. If you hover your cursor over each plot, the name of the country that the dot is representing, along with its happiness score will pop up.
p <- ggplot(df, aes(x = Region, y=`Happiness Score`)) +
geom_point(aes(z=Country, y1=`Happiness Score`, x1 = Region, color=Region), alpha=0.5, position="jitter") +
coord_flip() +
labs(title="Happiness Score By Region")
## Warning: Ignoring unknown aesthetics: z, y1, x1
ggplotly(p, tooltip = c("x1", "y1", "z"))
Let’s have a look at the provided factors that affect world happiness levels by examining the top 5 happiest and unhappiest countries in the world.
Top 5 Happiest Countries:
kable(df %>% arrange(`Happiness Rank`) %>%
head(n = 5))
Switzerland |
Western Europe |
1 |
7.587 |
0.03411 |
1.39651 |
1.34951 |
0.94143 |
0.66557 |
0.41978 |
0.29678 |
2.51738 |
Iceland |
Western Europe |
2 |
7.561 |
0.04884 |
1.30232 |
1.40223 |
0.94784 |
0.62877 |
0.14145 |
0.43630 |
2.70201 |
Denmark |
Western Europe |
3 |
7.527 |
0.03328 |
1.32548 |
1.36058 |
0.87464 |
0.64938 |
0.48357 |
0.34139 |
2.49204 |
Norway |
Western Europe |
4 |
7.522 |
0.03880 |
1.45900 |
1.33095 |
0.88521 |
0.66973 |
0.36503 |
0.34699 |
2.46531 |
Canada |
North America |
5 |
7.427 |
0.03553 |
1.32629 |
1.32261 |
0.90563 |
0.63297 |
0.32957 |
0.45811 |
2.45176 |
Observations:
As you could conclude from Data Visualization 1, 4 out of the 5 countries here are located in Western Europe. This is not surprising to me as Western Europe is an affluent region. It’s interesting, however, how Switzerland placed #1 as it’s land-locked while all of the other countries in the top 5 have at least one border touching the ocean. I also notice that despite placing 4th, Norway’s GDP is actually the highest among the 5 nations, leading to the conclusion that there are other factors that affect a nation’s happiness than merely its GDP output.
Top 5 Unhappiest Countries:
kable(df %>% arrange(desc(`Happiness Rank`)) %>%
head(n = 5))
Togo |
Sub-Saharan Africa |
158 |
2.839 |
0.06727 |
0.20868 |
0.13995 |
0.28443 |
0.36453 |
0.10731 |
0.16681 |
1.56726 |
Burundi |
Sub-Saharan Africa |
157 |
2.905 |
0.08658 |
0.01530 |
0.41587 |
0.22396 |
0.11850 |
0.10062 |
0.19727 |
1.83302 |
Syria |
Middle East and Northern Africa |
156 |
3.006 |
0.05015 |
0.66320 |
0.47489 |
0.72193 |
0.15684 |
0.18906 |
0.47179 |
0.32858 |
Benin |
Sub-Saharan Africa |
155 |
3.340 |
0.03656 |
0.28665 |
0.35386 |
0.31910 |
0.48450 |
0.08010 |
0.18260 |
1.63328 |
Rwanda |
Sub-Saharan Africa |
154 |
3.465 |
0.03464 |
0.22208 |
0.77370 |
0.42864 |
0.59201 |
0.55191 |
0.22628 |
0.67042 |
Observations:
Here, 4 out of the 5 countries are in Sub-Saharan Africa while one is in the Middle East. Togo and Benin share a border while Benin and Rwanda share a border. Besides Syria, the data under health for the 4 African countries are rather low, along with their GDP. Although GDP is definitely not the sole factor in determining whether a country’s citizens are happy, it seems to be a large determining factor.
Data Visualization 2
Because we determined that most of the countries in the top 5 were from Western Europe, I decided to create a visualization where I could see all of the Western European nations and their corresponding happiness scores to compare.
Western_Europe <- df %>%
filter(df$Region == "Western Europe" & df$`Happiness Score`)
Western_Europe %>% ggplot(aes(reorder(Country, `Happiness Score`),
`Happiness Score`)) +
geom_col(aes(fill = `Happiness Score`)) +
scale_fill_gradient(low = "brown2", high = "darkseagreen1") +
coord_flip() +
labs(title = "Happiness Scores of Countries in Western Europe", x = "Western European Countries", y = "Happiness Scores")
Data Visualization 3:
To compare the GDPs of different regions around the world, I created a layered boxplot. The pink dots represent each country within their corresponding latitudinal regions.
ggplot(df, aes(x = Region, y = `Economy (GDP per Capita)`)) +
geom_boxplot(col = "slategrey") +
geom_point(col = "hotpink2", alpha = 0.3, position = "jitter") +
coord_flip() +
labs(title="Regions of the World and the GDPs of Their Nations")
Observations:
As one would expect, the GDPs in North America, Western Europe, Australia and New Zealand are very high. I would predict that the happiness scores in the countries within these regions would be high as well.
To test whether high GDP and high levels of Happiness do in fact have a correlation, we conduct a linear mdodel.
fit <- lm(`Happiness Score` ~ `Economy (GDP per Capita)`, data = df)
summary(fit) %>% xtable() %>% kable()
(Intercept) |
3.498810 |
0.1330458 |
26.29779 |
0 |
Economy (GDP per Capita) |
2.218227 |
0.1420352 |
15.61745 |
0 |
Looking at the last column, we see that the p-value is basically 0. This leads us to conclude that GDP and happiness levels are in fact related to one another and that if a region has a high GDP, they are likely to be a “happy” region on the World Happiness Index.
Data Visualization 4:
We see a clear positive correlation between GDP and happiness. Each purple dot represents a country.
ggplot(data = df, aes(x = `Happiness Score`, y = `Economy (GDP per Capita)`)) +
geom_point(color = "#9900ff", alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE, color = "#99bbff") +
labs(title = "GDP and Happiness Levels around the World")
Conclusion
This was a very interesting project in that although the data is slightly dated (2015), I was able to learn that the size of a country doesn’t necessarily impact its GDP. Yes, Togo, Benin, Burundi and Rwanda – four of the top 5 unhappiest countries – are very small area-wise, but Luxembourg is the smallest nation in the world yet places 17th on the World Happiness Index with a high GDP of ~1.56.
Of the 9 categories that affect happiness according to this data, I would have thought that GDP, Trust (Government Corruption) and Freedom were the 3 most important factors, but after comparing the data of the top 5 happiest and unhappiest countries, I was surprised to find that Trust was much less of importance in comparison to Health and Family.
I also think it would be interesting to compare the categories against one another, as I would think that a high GDP would lead to longer lifespans (Health) and even higher values in Family. This could be an interesting extension of this project.
In terms of deviations from the project proposal, I thought that I could do Data Visualization 2 with all 158 countries in the world, but as Kenneth suggested, that would be too much data making the graph much too long for conevience. I decided to condense my dataset to just Western Europe. I also originally did not plan to have 4 visualizations but after learning about linear models and regressions in our last class session, I wanted to try applying what I learned to my project.