Let’s start by drawing a map of the USA. The maps
package contains a lot of outlines of continents, countries, states, and counties. ggplot2
’s map_data
function puts these outlines in data frame format, which then allows us to plot them with ggplot
.
Let’s load the packages that we need:
library(tidyverse)
library(maps)
## Warning: package 'maps' was built under R version 4.0.5
Type in the following code:
map_USA <- map_data("usa")
head(map_USA)
## long lat group order region subregion
## 1 -101.4078 29.74224 1 1 main <NA>
## 2 -101.3906 29.74224 1 2 main <NA>
## 3 -101.3620 29.65056 1 3 main <NA>
## 4 -101.3505 29.63911 1 4 main <NA>
## 5 -101.3219 29.63338 1 5 main <NA>
## 6 -101.3047 29.64484 1 6 main <NA>
(If the code above doesn’t work, it could be that the maps
package has not been installed yet. Install the maps
package and try the code above again.) Each row in the map_USA
dataset is one point on the outline of the USA. We are going to use ggplot2
’s geom_polygon
to connect these points.
It terms out that you can’t draw a map of the US with just one polygon: there are islands (e.g. Long Island) which form separate polygons. To draw just the “main” mainland, filter as follows:
map_USA_main <- map_USA %>% filter(region == "main")
In the same code chunk, type the following code:
ggplot() +
geom_polygon(data = map_USA_main, mapping = aes(x = long, y = lat))
We get a map! By default, the fill of the polygon is black. Click on the “Zoom” button and resize the window. Do you notice what happens to the map?
In order to keep the correct aspect ratio, amend the code for the map by adding on an extra line:
ggplot() +
geom_polygon(data = map_USA_main, mapping = aes(x = long, y = lat)) +
coord_quickmap()
Try resizing the window and see the difference in behavior from before.
Let’s try to draw a county-level map. For the later part of this lab, we want to label these counties by FIPS codes (you can read more about FIPS county codes here). It turns out that it’s not so easy to get mapping data with FIPS codes, so I’ve put together a dataset to save you this trouble. Load it with the following code:
map_county_fips <- readRDS("county_map_fips.rds")
head(map_county_fips)
## long lat group order region subregion fips
## 1 -86.50517 32.34920 1 1 alabama autauga 1001
## 2 -86.53382 32.35493 1 2 alabama autauga 1001
## 3 -86.54527 32.36639 1 3 alabama autauga 1001
## 4 -86.55673 32.37785 1 4 alabama autauga 1001
## 5 -86.57966 32.38357 1 5 alabama autauga 1001
## 6 -86.59111 32.37785 1 6 alabama autauga 1001
Let’s draw a county map using code that’s very similar to what we had for drawing the map of the USA:
ggplot() +
geom_polygon(data = map_county_fips,
mapping = aes(x = long, y = lat, group = group)) +
coord_quickmap()
Now, let’s draw a map with black outlines for the counties, and different colors for each state:
ggplot() +
geom_polygon(data = map_county_fips,
mapping = aes(x = long, y = lat, group = group, fill = region),
col = "black") +
coord_quickmap() +
theme(legend.position="none")
coord_quickmap()
do? Find some other map coord_
commands and experimant with them.group
argument in geom_polygon()
? Test is out on the data frame map_USA
.The goal of the rest of this lab is to visualize the 2016 US presidential elections results on a county-level map. Let’s load the elections data and have a peek at it:
df <- read_csv("2016_US_Presdential_Results_for_class.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## fips = col_double(),
## percent_diff = col_double(),
## state = col_character(),
## county = col_character(),
## votes_dem = col_double(),
## votes_gop = col_double(),
## total_votes = col_double()
## )
head(df)
## # A tibble: 6 x 7
## fips percent_diff state county votes_dem votes_gop total_votes
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2013 15.2 AK Alaska 93003 130413 246588
## 2 2016 15.2 AK Alaska 93003 130413 246588
## 3 2020 15.2 AK Alaska 93003 130413 246588
## 4 2050 15.2 AK Alaska 93003 130413 246588
## 5 2060 15.2 AK Alaska 93003 130413 246588
## 6 2068 15.2 AK Alaska 93003 130413 246588
In order to have the fills of the counties depend on our elections data, we need to somehow get information from our df
dataset to the map_county_fips
dataset. We can achieve this by using dplyr
’s left-join.
The code below joins the two data frames on the fips
column (map_county_fips
is on the left, df
is on the right):
# join elections data to mapping data
map_county_per_diff <- map_county_fips %>%
left_join(df, by = "fips")
Next, we use ggplot
to plot the data:
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
coord_quickmap() +
labs(title = "Election results by county")
Great, we got a map! There are two things we can do to improve on it:
Let’s change the color scale first. Amend the map plotting code to the following (see ?scale_fill_gradient2
to understand how it works):
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by county")
To remove parts of the plot which are not helpful, type in the following code in the same chunk before the ggplot
call:
map_theme <- theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.background = element_rect(fill = "white")
)
Next, add map_theme
to the ggplot
function call:
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by county") +
map_theme
Play around with the labels, theme and color scales. Do any of the counties stand out to you? How can we modify the code above to draw a map of just one state?
We can see that large parts of the country voted more Republican than Democratic. At the same time, it doesn’t look like 85% of the map is red. This suggests that Trump won many counties which are geographically small.
group_by()
and summarize()
, create a map of the state-by-state vote totals (this is demonstrated in the next section, but you should try to do it yourself before moving on).To make the map better, we could draw state boundaries as well. Create a new code chunk and type in the following code. (Compare this code with the code in the previous chunk and try to figure out how the state boundaries were drawn.)
map_state <- map_data("state")
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
geom_polygon(data = map_state, fill = NA, color = "black") +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by county") +
map_theme
(If you’re curious, that very blue spot in the middle of the country is Oglala County, South Dakota. From Wikipedia: “The counties surrounding Oglala Lakota County are predominantly Republican, but, like most Native American counties, Oglala Lakota is heavily Democratic, giving over 75 percent of the vote to every Democratic presidential nominee in every election back to 1984, making it one of the most Democratic counties in the United States. No Republican has carried the county in a presidential election since 1952.”)
Let’s try to make the same map but at the state level, instead of at the county level. Make a new dataframe containing state-level data:
state_df <- df %>% group_by(state) %>%
summarize(votes_dem = sum(votes_dem),
votes_gop = sum(votes_gop),
total_votes = sum(total_votes)) %>%
mutate(diff = votes_gop - votes_dem,
percent_diff = diff / total_votes)
Notice that state_df
and map_state
contain their data on state differently: it’s abbreviated in state_df
, while its the full name in map_state
. We’ll need to do some data wrangling/transformation to get them to match.
R has 2 built-in variables, state.abb
and state.name
, which can help us. First, let’s augment these variables with “District of Columbia”:
state_abb <- c(state.abb, "DC")
state_name <- c(state.name, "District of Columbia")
Next, we use the following line of code to add a new column to state_df
which has the state name in full. (See ?match
to figure out what is going on here. We also have to use tolower()
as the state names are all in small letters in the map_state
dataset.)
state_df$region <- tolower(state_name[match(state_df$state, state_abb)])
We are now in a position to join the datasets:
combined_state_df <- map_state %>% left_join(state_df, by = "region")
The commands for plotting are almost the same as for the county map:
ggplot(data = combined_state_df, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
geom_polygon(fill = NA, color = "black") +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by state") +
map_theme
We can also plot the state labels on the map (notice that the group
can no longer be in the ggplot()
call):
state_names <- combined_state_df %>% group_by(state) %>%
summarize(lat = 0.5 * (max(lat) + min(lat)),
long = 0.5 * (max(long) + min(long)))
ggplot(data = combined_state_df, mapping = aes(x = long, y = lat)) +
geom_polygon(aes(group = group, fill = percent_diff)) +
geom_polygon(aes(group = group), fill = NA, color = "black") +
geom_text(data = state_names,
mapping = aes(x = long, y = lat, label = state)) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by state") +
map_theme
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] maps_3.3.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5
## [5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.0
## [9] ggplot2_3.3.3 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.22 haven_2.3.1 colorspace_2.0-0
## [5] vctrs_0.3.6 generics_0.1.0 htmltools_0.5.1.1 yaml_2.2.1
## [9] utf8_1.2.1 rlang_0.4.10 pillar_1.5.1 glue_1.4.2
## [13] withr_2.4.1 DBI_1.1.1 dbplyr_2.1.0 modelr_0.1.8
## [17] readxl_1.3.1 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0
## [21] cellranger_1.1.0 rvest_1.0.0 evaluate_0.14 labeling_0.4.2
## [25] knitr_1.31 fansi_0.4.2 highr_0.8 broom_0.7.5
## [29] Rcpp_1.0.6 scales_1.1.1 backports_1.2.1 jsonlite_1.7.2
## [33] farver_2.1.0 fs_1.5.0 hms_1.0.0 digest_0.6.27
## [37] stringi_1.5.3 grid_4.0.4 cli_2.3.1 tools_4.0.4
## [41] magrittr_2.0.1 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.1
## [45] xml2_1.3.2 reprex_1.0.0 lubridate_1.7.10 assertthat_0.2.1
## [49] rmarkdown_2.7 httr_1.4.2 rstudioapi_0.13 R6_2.5.0
## [53] compiler_4.0.4