County-level Data for 2016 Presidential Elections

Introduction

This is an analysis of US presidential elections data for 2016 at the county level. Since only a small percentage of votes went to independent candidates, we will only compare Democrat and Republican voteshare.

The data for this analysis is taken from https://github.com/tonmcg/County_Level_Election_Results_12-16.

Data import and checking

Library imports:

library(tidyverse)
library(knitr)

Read in the data:

df <- read_csv("http://web.stanford.edu/class/stats32/assets/lecture-2/2016-presidential-election-county-results.csv")
kable(head(df))

fips_cod	county	total	dem	gop	other	dem_prop	gop_prop	other_prop	state	state_name
26041	Delta County	18467	6431	11112	924	0.3482428	0.6017220	0.0500352	MI	Michigan
48295	Lipscomb County	1322	135	1159	28	0.1021180	0.8767020	0.0211800	TX	Texas
01127	Walker County	29243	4486	24208	549	0.1534042	0.8278220	0.0187737	AL	Alabama
48389	Reeves County	3184	1659	1417	108	0.5210427	0.4450377	0.0339196	TX	Texas
56017	Hot Springs County	2535	400	1939	196	0.1577909	0.7648915	0.0773176	WY	Wyoming
20043	Doniphan County	3366	584	2601	181	0.1734997	0.7727273	0.0537730	KS	Kansas

There are 3112 rows in total, matching the number of counties in the US (Source: http://www.snopes.com/trump-won-3084-of-3141-counties-clinton-won-57/ and http://www.wnd.com/2016/12/trumps-landslide-2623-to-489-among-u-s-counties/).

The dataset contains the following columns:

names(df)

##  [1] "fips_cod"   "county"     "total"      "dem"        "gop"       
##  [6] "other"      "dem_prop"   "gop_prop"   "other_prop" "state"     
## [11] "state_name"

per_dem and per_gop refer to the percentage of votes going to Democrats and Republicans respectively.
diff represents the absolute difference between Republican votes - Democrat votes.
per_point_diff represents this difference as a percentage of total votes.
combined_fips is a 5-digit code identifying the county. (From Wikipedia: The FIPS county code is a five-digit Federal Information Processing Standards (FIPS) code (FIPS 6-4) which uniquely identifies counties and county equivalents in the United States, certain U.S. possessions, and certain freely associated states.)

Since we are interested in whether a given county had more Republican or Democrat votes, we have to recompute the diff and per_point_diff columns. diff and per_point_diff will be positive if there are more Republican votes than Democrat votes (and vice versa).

df <- df %>% mutate(diff = gop - dem,
                    per_point_diff = diff / total * 100)

Summary statistics

Compute percentage of popular vote won by each party:

paste0("Republican % of popular vote: ", 
       round(sum(df$gop) / sum(df$total) * 100, digits = 1),
       "%")

## [1] "Republican % of popular vote: 47.3%"

paste0("Democrat % of popular vote: ", 
       round(sum(df$dem) / sum(df$total) * 100, digits = 1),
       "%")

## [1] "Democrat % of popular vote: 47.8%"

Although Clinton lost the presidential election, she actually won the popular vote!

Compute number of counties won by each party:

df %>% transmute(gop_won = gop > dem) %>%
    summarize(gop_won = sum(gop_won))

## # A tibble: 1 x 1
##   gop_won
##     <int>
## 1    2625

Painting a completely different picture, Trump won 2654 out of 3141 counties (or 84.5% of all counties). Clinton only won 487 counties. This suggests that Clinton won in counties with large populations, or that the margin of victory was slimmer in the counties that Trump won compared with the counties that Clinton won.

Histograms

We have Clinton winning the popular vote on one hand, but Trump winning many more counties. How can we reconcile these two facts?

One theory is that Clinton won her counties by a huge margin percentage-wise, while Trump won his counties by a slim margin percentage-wise. To test this theory, we could plot a histogram of the per_point_diff:

ggplot() +
    geom_histogram(data = df, mapping = aes(x = per_point_diff)) + 
    labs(title = "Histogram of % vote margin", 
         x = "% Republicans won by", y = "Frequency")

The chart does not support the theory that Trump had narrower margins of victory in the counties that he won: he won a sizeable number of counties with > 50% vote difference.

Let’s try plotting a histogram of diff to look at absolute differences instead:

ggplot() +
    geom_histogram(data = df, mapping = aes(x = diff)) + 
    labs(title = "Histogram of absolute vote margin", 
         x = "No. of votes Republicans won by", y = "Frequency")

This chart is very different! In the counties that Clinton won, she won it by extremely large margins in terms of absolute votes. Thus, even though she won very few counties compared to Trump, these large margins meant that she could actually win the popular vote.

The code below shows that the top 45 counties with largest absolute vote difference were all won by Clinton (number 46 was Montgomery, TX, which went to Trump).

df %>% select(State = state, County = county, diff) %>%
    mutate(abs_diff = abs(diff)) %>%
    arrange(desc(abs_diff)) %>%
    select(State, County, `Vote difference` = diff) %>%
    head(n = 50) %>%
    kable()

State	County	Vote difference
CA	Los Angeles County	-1112035
IL	Cook County	-1088369
NY	Kings County	-461433
NY	New York County	-456546
PA	Philadelphia County	-455124
WA	King County	-446227
NY	Queens County	-334839
MA	Middlesex County	-292756
CA	Santa Clara County	-289853
FL	Miami-Dade County	-289340
MI	Wayne County	-288709
FL	Broward County	-288435
MD	Prince George’s County	-284337
NY	Bronx County	-283979
CA	Alameda County	-273514
DC	District of Columbia	-248670
MN	Hennepin County	-237515
MD	Montgomery County	-226776
CA	San Francisco County	-211139
OR	Multnomah County	-208699
OH	Cuyahoga County	-204080
TX	Dallas County	-196980
VA	Fairfax County	-196648
GA	DeKalb County	-191600
MA	Suffolk County	-191170
TX	Travis County	-179725
GA	Fulton County	-171503
NJ	Essex County	-168972
WI	Milwaukee County	-162895
TX	Harris County	-161511
MD	Baltimore City	-155836
WI	Dane County	-146236
CA	Contra Costa County	-144757
OH	Franklin County	-143633
NC	Mecklenburg County	-137955
FL	Orange County	-134488
CO	Denver County	-130974
CA	San Diego County	-130817
NY	Westchester County	-124027
CA	San Mateo County	-122275
LA	Orleans Parish	-109566
MN	Ramsey County	-106151
PA	Allegheny County	-105529
NC	Wake County	-104746
TX	Montgomery County	104444
NJ	Hudson County	-104365
FL	Palm Beach County	-100649
MA	Norfolk County	-99958
TN	Shelby County	-91692
TX	El Paso County	-90942

Conclusion

When analyzing elections, we have to examine the data from many different perspectives in order to get the full story.