This is an analysis of US presidential elections data for 2016 at the county level. Since only a small percentage of votes went to independent candidates, we will only compare Democrat and Republican voteshare.
The data for this analysis is taken from https://github.com/tonmcg/County_Level_Election_Results_12-16.
Library imports:
library(tidyverse)
library(knitr)
Read in the data:
df <- read_csv("http://web.stanford.edu/class/stats32/assets/lecture-2/2016-presidential-election-county-results.csv")
kable(head(df))
fips_cod | county | total | dem | gop | other | dem_prop | gop_prop | other_prop | state | state_name |
---|---|---|---|---|---|---|---|---|---|---|
26041 | Delta County | 18467 | 6431 | 11112 | 924 | 0.3482428 | 0.6017220 | 0.0500352 | MI | Michigan |
48295 | Lipscomb County | 1322 | 135 | 1159 | 28 | 0.1021180 | 0.8767020 | 0.0211800 | TX | Texas |
01127 | Walker County | 29243 | 4486 | 24208 | 549 | 0.1534042 | 0.8278220 | 0.0187737 | AL | Alabama |
48389 | Reeves County | 3184 | 1659 | 1417 | 108 | 0.5210427 | 0.4450377 | 0.0339196 | TX | Texas |
56017 | Hot Springs County | 2535 | 400 | 1939 | 196 | 0.1577909 | 0.7648915 | 0.0773176 | WY | Wyoming |
20043 | Doniphan County | 3366 | 584 | 2601 | 181 | 0.1734997 | 0.7727273 | 0.0537730 | KS | Kansas |
There are 3112 rows in total, matching the number of counties in the US (Source: http://www.snopes.com/trump-won-3084-of-3141-counties-clinton-won-57/ and http://www.wnd.com/2016/12/trumps-landslide-2623-to-489-among-u-s-counties/).
The dataset contains the following columns:
names(df)
## [1] "fips_cod" "county" "total" "dem" "gop"
## [6] "other" "dem_prop" "gop_prop" "other_prop" "state"
## [11] "state_name"
per_dem
and per_gop
refer to the percentage of votes going to Democrats and Republicans respectively.diff
represents the absolute difference between Republican votes - Democrat votes.per_point_diff
represents this difference as a percentage of total votes.combined_fips
is a 5-digit code identifying the county. (From Wikipedia: The FIPS county code is a five-digit Federal Information Processing Standards (FIPS) code (FIPS 6-4) which uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states.)Since we are interested in whether a given county had more Republican or Democrat votes, we have to recompute the diff
and per_point_diff
columns. diff
and per_point_diff
will be positive if there are more Republican votes than Democrat votes (and vice versa).
df <- df %>% mutate(diff = gop - dem,
per_point_diff = diff / total * 100)
Compute percentage of popular vote won by each party:
paste0("Republican % of popular vote: ",
round(sum(df$gop) / sum(df$total) * 100, digits = 1),
"%")
## [1] "Republican % of popular vote: 47.3%"
paste0("Democrat % of popular vote: ",
round(sum(df$dem) / sum(df$total) * 100, digits = 1),
"%")
## [1] "Democrat % of popular vote: 47.8%"
Although Clinton lost the presidential election, she actually won the popular vote!
Compute number of counties won by each party:
df %>% transmute(gop_won = gop > dem) %>%
summarize(gop_won = sum(gop_won))
## # A tibble: 1 x 1
## gop_won
## <int>
## 1 2625
Painting a completely different picture, Trump won 2654 out of 3141 counties (or 84.5% of all counties). Clinton only won 487 counties. This suggests that Clinton won in counties with large populations, or that the margin of victory was slimmer in the counties that Trump won compared with the counties that Clinton won.
We have Clinton winning the popular vote on one hand, but Trump winning many more counties. How can we reconcile these two facts?
One theory is that Clinton won her counties by a huge margin percentage-wise, while Trump won his counties by a slim margin percentage-wise. To test this theory, we could plot a histogram of the per_point_diff
:
ggplot() +
geom_histogram(data = df, mapping = aes(x = per_point_diff)) +
labs(title = "Histogram of % vote margin",
x = "% Republicans won by", y = "Frequency")
The chart does not support the theory that Trump had narrower margins of victory in the counties that he won: he won a sizeable number of counties with > 50% vote difference.
Let’s try plotting a histogram of diff
to look at absolute differences instead:
ggplot() +
geom_histogram(data = df, mapping = aes(x = diff)) +
labs(title = "Histogram of absolute vote margin",
x = "No. of votes Republicans won by", y = "Frequency")
This chart is very different! In the counties that Clinton won, she won it by extremely large margins in terms of absolute votes. Thus, even though she won very few counties compared to Trump, these large margins meant that she could actually win the popular vote.
The code below shows that the top 45 counties with largest absolute vote difference were all won by Clinton (number 46 was Montgomery, TX, which went to Trump).
df %>% select(State = state, County = county, diff) %>%
mutate(abs_diff = abs(diff)) %>%
arrange(desc(abs_diff)) %>%
select(State, County, `Vote difference` = diff) %>%
head(n = 50) %>%
kable()
State | County | Vote difference |
---|---|---|
CA | Los Angeles County | -1112035 |
IL | Cook County | -1088369 |
NY | Kings County | -461433 |
NY | New York County | -456546 |
PA | Philadelphia County | -455124 |
WA | King County | -446227 |
NY | Queens County | -334839 |
MA | Middlesex County | -292756 |
CA | Santa Clara County | -289853 |
FL | Miami-Dade County | -289340 |
MI | Wayne County | -288709 |
FL | Broward County | -288435 |
MD | Prince George’s County | -284337 |
NY | Bronx County | -283979 |
CA | Alameda County | -273514 |
DC | District of Columbia | -248670 |
MN | Hennepin County | -237515 |
MD | Montgomery County | -226776 |
CA | San Francisco County | -211139 |
OR | Multnomah County | -208699 |
OH | Cuyahoga County | -204080 |
TX | Dallas County | -196980 |
VA | Fairfax County | -196648 |
GA | DeKalb County | -191600 |
MA | Suffolk County | -191170 |
TX | Travis County | -179725 |
GA | Fulton County | -171503 |
NJ | Essex County | -168972 |
WI | Milwaukee County | -162895 |
TX | Harris County | -161511 |
MD | Baltimore City | -155836 |
WI | Dane County | -146236 |
CA | Contra Costa County | -144757 |
OH | Franklin County | -143633 |
NC | Mecklenburg County | -137955 |
FL | Orange County | -134488 |
CO | Denver County | -130974 |
CA | San Diego County | -130817 |
NY | Westchester County | -124027 |
CA | San Mateo County | -122275 |
LA | Orleans Parish | -109566 |
MN | Ramsey County | -106151 |
PA | Allegheny County | -105529 |
NC | Wake County | -104746 |
TX | Montgomery County | 104444 |
NJ | Hudson County | -104365 |
FL | Palm Beach County | -100649 |
MA | Norfolk County | -99958 |
TN | Shelby County | -91692 |
TX | El Paso County | -90942 |
When analyzing elections, we have to examine the data from many different perspectives in order to get the full story.