The issue of economic disparity has always been a prominent issue within the United States, resulting in inequality of resource distribution throughout various populations, demographics, and areas. Throughout the past decades, the already large differences in socioeconomic levels have been further widened by the rise and fall of certain industries and/or technologies in geographical areas. We present a data visualization of the degree of economic disparity in certain states compared to others in order to see how it affects the general economic progress of each state.
We use the following datasets in our analysis:
Using the data above, we answer the following questions:
By drawing comparisons, we hope to obtain a visual representation and observe trends about how America is progressing in regards to combating socioeconomic inequality.
Package imports:
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 3.0.0 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(readr)
library(dplyr)
library(usmap)
library(knitr)
We started with five seperate files that needed to be read into the dataset.
setwd("~/Desktop/Stats 32")
Party <- read_csv("PartyInfo.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## `Democrat/Lean Democratic` = col_double(),
## `Republican/Lean Republican` = col_double(),
## `Democratic advantage` = col_double(),
## N = col_number(),
## Classification = col_character()
## )
Unemployment <- read_csv("UnemploymentRate.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## Unemployment_Rate = col_double()
## )
Poverty <- read_csv("PovertyRate.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## `Poverty Rate` = col_double()
## )
GDP <- read_csv("GDP.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## GDP = col_number()
## )
CPoverty <- read_csv("ChildPoverty.csv")
## Parsed with column specification:
## cols(
## State = col_character(),
## Rate = col_double()
## )
We felt it would be useful to merge the data together by its common feature, “state.” Moreover, we knew that it would help us in the longrun to ensure feature names were lowercase and that all spaces were replaces by underscores because some arguments depend on the proper formatting.
Merge1 <- left_join(Party, Unemployment, by = "State")
Merge2 <- left_join(Merge1, Poverty, by = "State")
Merge3 <- left_join(Merge2, CPoverty, by = "State")
Statesext <- left_join(Merge3, GDP, by = "State")
Statesext <- Statesext[,c("State", "Classification", "Unemployment_Rate", "Poverty Rate", "Rate", "GDP")]
names(Statesext) <- c("state", "classification", "unemployment_rate", "poverty_rate", "child_poverty_rate", "gdp")
head(Statesext)
## # A tibble: 6 x 6
## state classification unemployment_ra… poverty_rate child_poverty_r… gdp
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Massach… Strong Democra… 2.9 10 12.2 596.
## 2 Vermont Strong Democra… 2.4 11 12.1 34.8
## 3 Hawaii Strong Democra… 2.7 8.8 11.9 97.3
## 4 New York Strong Democra… 4 13.6 18.6 1732.
## 5 Maryland Strong Democra… 3.6 9 11.6 428.
## 6 Califor… Strong Democra… 4 12.8 17.4 3137.
Below we see the distribution of political party affiliation of the United States in 2018:
state_party <- Statesext %>%
group_by(classification) %>%
summarise(
n_states <- n())
names(state_party) <- c("Party Affiliation", "Number of States")
state_party
## # A tibble: 5 x 2
## `Party Affiliation` `Number of States`
## <chr> <int>
## 1 Competitive 10
## 2 Lean Democratic 8
## 3 Lean Republican 5
## 4 Strong Democratic 14
## 5 Strong Republican 13
plot_usmap(data = Statesext, values = "classification") +
scale_fill_manual(values = c("mediumorchid3", "cadetblue2", "pink", "blue4", "red4"), name= "classification") +
theme(legend.position = "right") + labs(title = "Political Leaning by State")
Above, we see a map of how each state in the US leans politically. We wanted to investigate if this distribution may track the unemployment rate of each state:
plot_usmap(data = Statesext, values = "unemployment_rate") +
scale_fill_continuous(low = "white", high = "orchid4", name = "Unemployment Rate (%)", label = scales::comma
) + theme(legend.position = "right") + labs(title = "Unemployment Rate by State")
To the bare eye, we see a very slight difference between states that lean Democratic and states that lean Republican.
In fact, if we look at the mean unemployment rate broken down by political affiliation, we start to see a trend with Republican-leaning states yielding a smaller unemployment rate than the Democratic-leaning states.
party_unemployment <- Statesext %>% group_by(classification) %>% summarise(me = median(unemployment_rate))
party_unemployment
## # A tibble: 5 x 2
## classification me
## <chr> <dbl>
## 1 Competitive 3.45
## 2 Lean Democratic 3.65
## 3 Lean Republican 3.3
## 4 Strong Democratic 3.65
## 5 Strong Republican 3.3
We look to a bar chart to visually demonstrate this difference:
ggplot(party_unemployment, aes(x = classification, y = me, fill=classification)) + geom_bar(stat= "identity") + coord_flip() + scale_fill_manual(values=c("purple", "cadetblue2", "pink", "blue", "red")) + labs(title="Political Affiliation vs Unemployment Rate", x="Political Affiliation", y="Unemployment Rate(Median)") + theme(legend.position = "none")
In the bar chart above, we have plotted the median unemployment rate against its respective politcal affiliation. Because the differences seemed small, we wanted to further investigate the distribution through a box plot.
ggplot(Statesext, aes(x = classification, y = `unemployment_rate`)) +
geom_boxplot(col = "slategrey") +
geom_point(col = "hotpink2", alpha = 0.3, position = "jitter") +
coord_flip() +
labs(title="Political Leaning vs Unemployment Rate (2018)", x = "Political Affiliation", y = "Unemployment Rate (Percentage)")
The plot shows that strongly and leaning Republican states have lower median unemployment rates than strongly and leaning Democratic states. However, there appears to be a wider range of data for Democratic states.
Next, we would like to investigate into whether there would be correlation between state political leaning and poverty rate.
Upon these observations, we were curious to see how the poverty rate compared, so we followed a similar process to compare party affliation to poverty rate.
Below is the breakdown of the poverty rate by political distribution:
party_poverty <- Statesext %>% group_by(classification) %>% summarise(med = median(poverty_rate))
party_poverty
## # A tibble: 5 x 2
## classification med
## <chr> <dbl>
## 1 Competitive 14.0
## 2 Lean Democratic 12.4
## 3 Lean Republican 13.2
## 4 Strong Democratic 10.7
## 5 Strong Republican 13.1
ggplot(party_poverty, aes(x = classification, y = med, fill=classification)) + geom_bar(stat= "identity") + coord_flip() + scale_fill_manual(values=c("purple", "cadetblue2", "pink", "blue", "red")) + labs(title="Political Affiliation vs Poverty Rate", x="Political Affiliation", y="Poverty Rate(Median)") + theme(legend.position = "none")
As before, we decided that a box plot would allow us to get a better idea of the distribution of the data.
ggplot(Statesext, aes(x = classification, y = `poverty_rate`)) +
geom_boxplot(col = "slategrey") +
geom_point(col = "hotpink2", alpha = 0.3, position = "jitter") +
coord_flip() +
labs(title="Political Leaning vs Poverty Rate", x = "Political Affiliation", y = "Poverty Rate (Percentage)")
There appears to generally be a wider range of data variability in all political affiliations than Visualization 4 (political leaning vs unemployment rate). Both strongly and leaning Republican states have higher poverty rates than do strongly and leaning Democratic states.
Shown below is the comparison between state GDP levels and state unemployment rate.
ggplot(data = Statesext, aes(x = `unemployment_rate`, y = `gdp`)) +
geom_point(color = "hotpink2", alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE, col = "slategrey") +
labs(title = "State GDP vs. State Unemployment Rate", x = "Unemployment Rate (Percentage)", y = "GDP (Billions of US Dollars)") +
geom_text(aes(label=ifelse(gdp>1000,as.character(state),'')),hjust=0,vjust=0)
## `geom_smooth()` using formula 'y ~ x'
As we can see, there is a slight upward trend, but not a significant correlation between unemployment and GDP. There are notable outliers in the graph, such as California, Texas, New York, and Florida, which share similarities such as having large, diverse population bases and wide socioeconomic disparities.
We are interested in comparing between the state unemployment rate and state child poverty rate to see how unemployment affects not only adults but also families and children.
ggplot(data = Statesext, aes(x = `unemployment_rate`, y = `child_poverty_rate`)) +
geom_point(col = "hotpink2", alpha = 0.3) +
geom_smooth(method = "lm", se = FALSE, col = "slategrey") +
labs(title = "State Unemployment Rate vs. State Child Poverty Rate", x = "Unemployment Rate (Percentage)", y = "Child Poverty Rate (Percentage)") +
geom_text(aes(label=ifelse(unemployment_rate>6,as.character(state),'')),hjust=0.4,vjust=2)
## `geom_smooth()` using formula 'y ~ x'
There is a clear positive correlation between unemployment rate and child poverty rate. However, there is a significant outlier, which is Alaska, which has a high unemployment rate yet a low child poverty rate.
Below are summaries of our results for each query.
Through queries 1 and 2, we observe an interesting phenomenon of strongly Democratic states (e.g. California, New York) having high median unemployment rates and yet low poverty rates. This can be explained by their diverse socioeconomic levels, in which large portions of the population are at both the top and bottom of the economic ladder. Further data analysis should be conducted to elucidate these observations.
Through queries 3 and 4, our observations of unemployment rates were both rejected and confirmed. Unemployment rates do not have an immediate connection with state GDP rates and yet significantly influence the wellbeing of children and families (as represented by child poverty rate). There might be further nuances that need to be investigated, such as the gross income within different areas of the state versus unemployment rates.
Through these comparisons, we hope to have obtained a more complete visual representation of how America is progressing in regards to combating socioeconomic inequality. By observing these most recent trends in disparities, we analyze the factors that may be improved to increase socioeconomic mobility in the future.