Introduction:

For this project I decided to explore the data presented in the 2015 world happiness report as I’ve personally been interested in what factors constitute a “happy” society and which nations around the world have the happiest citizens or highest quality of life. This dataset has many categories that all have differing effects on a nation’s harmony –– provided are country, region, happiness rank, happiness score, economy, family, health, freedom, trust, generosity and dystopia.

The data that I used for this assignemnt can be found here

Loading the Packages and Datasets:

library(tidyverse)
library(knitr)
library(plotly)
library(xtable)

df <- read_csv("2015.csv")

Data Visualization 1:

To begin, I created a scatterplot of all 158 countries’ happiness scores with each region color-coded. If you hover your cursor over each plot, the name of the country that the dot is representing, along with its happiness score will pop up.

p <- ggplot(df, aes(x = Region, y=`Happiness Score`)) +
  geom_point(aes(z=Country, y1=`Happiness Score`, x1 = Region,  color=Region), alpha=0.5, position="jitter") + 
  coord_flip() +
  labs(title="Happiness Score By Region")
## Warning: Ignoring unknown aesthetics: z, y1, x1
ggplotly(p, tooltip = c("x1", "y1", "z"))

Let’s have a look at the provided factors that affect world happiness levels by examining the top 5 happiest and unhappiest countries in the world.

Top 5 Happiest Countries:

kable(df %>% arrange(`Happiness Rank`) %>%
  head(n = 5))
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Switzerland Western Europe 1 7.587 0.03411 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738
Iceland Western Europe 2 7.561 0.04884 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201
Denmark Western Europe 3 7.527 0.03328 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204
Norway Western Europe 4 7.522 0.03880 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531
Canada North America 5 7.427 0.03553 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176

Observations:

As you could conclude from Data Visualization 1, 4 out of the 5 countries here are located in Western Europe. This is not surprising to me as Western Europe is an affluent region. It’s interesting, however, how Switzerland placed #1 as it’s land-locked while all of the other countries in the top 5 have at least one border touching the ocean. I also notice that despite placing 4th, Norway’s GDP is actually the highest among the 5 nations, leading to the conclusion that there are other factors that affect a nation’s happiness than merely its GDP output.

Top 5 Unhappiest Countries:

kable(df %>% arrange(desc(`Happiness Rank`)) %>%
  head(n = 5))
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Togo Sub-Saharan Africa 158 2.839 0.06727 0.20868 0.13995 0.28443 0.36453 0.10731 0.16681 1.56726
Burundi Sub-Saharan Africa 157 2.905 0.08658 0.01530 0.41587 0.22396 0.11850 0.10062 0.19727 1.83302
Syria Middle East and Northern Africa 156 3.006 0.05015 0.66320 0.47489 0.72193 0.15684 0.18906 0.47179 0.32858
Benin Sub-Saharan Africa 155 3.340 0.03656 0.28665 0.35386 0.31910 0.48450 0.08010 0.18260 1.63328
Rwanda Sub-Saharan Africa 154 3.465 0.03464 0.22208 0.77370 0.42864 0.59201 0.55191 0.22628 0.67042

Observations:

Here, 4 out of the 5 countries are in Sub-Saharan Africa while one is in the Middle East. Togo and Benin share a border while Benin and Rwanda share a border. Besides Syria, the data under health for the 4 African countries are rather low, along with their GDP. Although GDP is definitely not the sole factor in determining whether a country’s citizens are happy, it seems to be a large determining factor.

Data Visualization 2

Because we determined that most of the countries in the top 5 were from Western Europe, I decided to create a visualization where I could see all of the Western European nations and their corresponding happiness scores to compare.

Western_Europe <- df %>% 
  filter(df$Region == "Western Europe" & df$`Happiness Score`)

Western_Europe %>% ggplot(aes(reorder(Country, `Happiness Score`), 
                       `Happiness Score`)) + 
  geom_col(aes(fill = `Happiness Score`)) + 
  scale_fill_gradient(low = "brown2", high = "darkseagreen1") + 
  coord_flip() + 
  labs(title = "Happiness Scores of Countries in Western Europe", x = "Western European Countries", y = "Happiness Scores")

Luxembourg is the smallest country in the world, yet it still placed very high (17th) on the world happiness index with a GDP of 1.56391, illustrating that a country’s size and its GDP levels are not necessarily correlated with one another. Luxembourg scores very high in the categories of GDP, Family, Health and Freedom, which the top 5 unhappiest countries scores low on, indicating that those are more of the important qualities that affect a nation’s overall happiness. Below is Luxembourg’s data for reference:
kable(df %>%
  filter(Country == "Luxembourg"))
Country Region Happiness Rank Happiness Score Standard Error Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual
Luxembourg Western Europe 17 6.946 0.03499 1.56391 1.21963 0.91894 0.61583 0.37798 0.28034 1.96961

Data Visualization 3:

To compare the GDPs of different regions around the world, I created a layered boxplot. The pink dots represent each country within their corresponding latitudinal regions.

ggplot(df, aes(x = Region, y = `Economy (GDP per Capita)`)) +
  geom_boxplot(col = "slategrey") +
  geom_point(col = "hotpink2", alpha = 0.3, position = "jitter") +
  coord_flip() +
  labs(title="Regions of the World and the GDPs of Their Nations")

Observations:

As one would expect, the GDPs in North America, Western Europe, Australia and New Zealand are very high. I would predict that the happiness scores in the countries within these regions would be high as well.

To test whether high GDP and high levels of Happiness do in fact have a correlation, we conduct a linear mdodel.

fit <- lm(`Happiness Score` ~ `Economy (GDP per Capita)`, data = df)
summary(fit) %>% xtable() %>% kable()
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.498810 0.1330458 26.29779 0
Economy (GDP per Capita) 2.218227 0.1420352 15.61745 0

Data Visualization 4:

We see a clear positive correlation between GDP and happiness. Each purple dot represents a country.

ggplot(data = df, aes(x = `Happiness Score`, y = `Economy (GDP per Capita)`)) +
  geom_point(color = "#9900ff", alpha = 0.3) +
  geom_smooth(method = "lm", se = FALSE, color = "#99bbff") +
  labs(title = "GDP and Happiness Levels around the World")

Conclusion

This was a very interesting project in that although the data is slightly dated (2015), I was able to learn that the size of a country doesn’t necessarily impact its GDP. Yes, Togo, Benin, Burundi and Rwanda – four of the top 5 unhappiest countries – are very small area-wise, but Luxembourg is the smallest nation in the world yet places 17th on the World Happiness Index with a high GDP of ~1.56.
Of the 9 categories that affect happiness according to this data, I would have thought that GDP, Trust (Government Corruption) and Freedom were the 3 most important factors, but after comparing the data of the top 5 happiest and unhappiest countries, I was surprised to find that Trust was much less of importance in comparison to Health and Family.
I also think it would be interesting to compare the categories against one another, as I would think that a high GDP would lead to longer lifespans (Health) and even higher values in Family. This could be an interesting extension of this project.
In terms of deviations from the project proposal, I thought that I could do Data Visualization 2 with all 158 countries in the world, but as Kenneth suggested, that would be too much data making the graph much too long for conevience. I decided to condense my dataset to just Western Europe. I also originally did not plan to have 4 visualizations but after learning about linear models and regressions in our last class session, I wanted to try applying what I learned to my project.