Let’s start by drawing a map of the USA. The maps package contains a lot of outlines of continents, countries, states, and counties. ggplot2’s map_data function puts these outlines in data frame format, which then allows us to plot them with ggplot.
Let’s load the packages that we need:
library(tidyverse)
library(maps)
## Warning: package 'maps' was built under R version 4.0.5
Type in the following code:
map_USA <- map_data("usa")
head(map_USA)
## long lat group order region subregion
## 1 -101.4078 29.74224 1 1 main <NA>
## 2 -101.3906 29.74224 1 2 main <NA>
## 3 -101.3620 29.65056 1 3 main <NA>
## 4 -101.3505 29.63911 1 4 main <NA>
## 5 -101.3219 29.63338 1 5 main <NA>
## 6 -101.3047 29.64484 1 6 main <NA>
(If the code above doesn’t work, it could be that the maps package has not been installed yet. Install the maps package and try the code above again.) Each row in the map_USA dataset is one point on the outline of the USA. We are going to use ggplot2’s geom_polygon to connect these points.
It terms out that you can’t draw a map of the US with just one polygon: there are islands (e.g. Long Island) which form separate polygons. To draw just the “main” mainland, filter as follows:
map_USA_main <- map_USA %>% filter(region == "main")
In the same code chunk, type the following code:
ggplot() +
geom_polygon(data = map_USA_main, mapping = aes(x = long, y = lat))
We get a map! By default, the fill of the polygon is black. Click on the “Zoom” button and resize the window. Do you notice what happens to the map?
In order to keep the correct aspect ratio, amend the code for the map by adding on an extra line:
ggplot() +
geom_polygon(data = map_USA_main, mapping = aes(x = long, y = lat)) +
coord_quickmap()
Try resizing the window and see the difference in behavior from before.
Let’s try to draw a county-level map. For the later part of this lab, we want to label these counties by FIPS codes (you can read more about FIPS county codes here). It turns out that it’s not so easy to get mapping data with FIPS codes, so I’ve put together a dataset to save you this trouble. Load it with the following code:
map_county_fips <- readRDS("county_map_fips.rds")
head(map_county_fips)
## long lat group order region subregion fips
## 1 -86.50517 32.34920 1 1 alabama autauga 1001
## 2 -86.53382 32.35493 1 2 alabama autauga 1001
## 3 -86.54527 32.36639 1 3 alabama autauga 1001
## 4 -86.55673 32.37785 1 4 alabama autauga 1001
## 5 -86.57966 32.38357 1 5 alabama autauga 1001
## 6 -86.59111 32.37785 1 6 alabama autauga 1001
Let’s draw a county map using code that’s very similar to what we had for drawing the map of the USA:
ggplot() +
geom_polygon(data = map_county_fips,
mapping = aes(x = long, y = lat, group = group)) +
coord_quickmap()
Now, let’s draw a map with black outlines for the counties, and different colors for each state:
ggplot() +
geom_polygon(data = map_county_fips,
mapping = aes(x = long, y = lat, group = group, fill = region),
col = "black") +
coord_quickmap() +
theme(legend.position="none")
coord_quickmap() do? Find some other map coord_ commands and experimant with them.group argument in geom_polygon()? Test is out on the data frame map_USA.The goal of the rest of this lab is to visualize the 2016 US presidential elections results on a county-level map. Let’s load the elections data and have a peek at it:
df <- read_csv("2016_US_Presdential_Results_for_class.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## fips = col_double(),
## percent_diff = col_double(),
## state = col_character(),
## county = col_character(),
## votes_dem = col_double(),
## votes_gop = col_double(),
## total_votes = col_double()
## )
head(df)
## # A tibble: 6 x 7
## fips percent_diff state county votes_dem votes_gop total_votes
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 2013 15.2 AK Alaska 93003 130413 246588
## 2 2016 15.2 AK Alaska 93003 130413 246588
## 3 2020 15.2 AK Alaska 93003 130413 246588
## 4 2050 15.2 AK Alaska 93003 130413 246588
## 5 2060 15.2 AK Alaska 93003 130413 246588
## 6 2068 15.2 AK Alaska 93003 130413 246588
In order to have the fills of the counties depend on our elections data, we need to somehow get information from our df dataset to the map_county_fips dataset. We can achieve this by using dplyr’s left-join.
The code below joins the two data frames on the fips column (map_county_fips is on the left, df is on the right):
# join elections data to mapping data
map_county_per_diff <- map_county_fips %>%
left_join(df, by = "fips")
Next, we use ggplot to plot the data:
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
coord_quickmap() +
labs(title = "Election results by county")
Great, we got a map! There are two things we can do to improve on it:
Let’s change the color scale first. Amend the map plotting code to the following (see ?scale_fill_gradient2 to understand how it works):
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by county")
To remove parts of the plot which are not helpful, type in the following code in the same chunk before the ggplot call:
map_theme <- theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.background = element_rect(fill = "white")
)
Next, add map_theme to the ggplot function call:
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by county") +
map_theme
Play around with the labels, theme and color scales. Do any of the counties stand out to you? How can we modify the code above to draw a map of just one state?
We can see that large parts of the country voted more Republican than Democratic. At the same time, it doesn’t look like 85% of the map is red. This suggests that Trump won many counties which are geographically small.
group_by() and summarize(), create a map of the state-by-state vote totals (this is demonstrated in the next section, but you should try to do it yourself before moving on).To make the map better, we could draw state boundaries as well. Create a new code chunk and type in the following code. (Compare this code with the code in the previous chunk and try to figure out how the state boundaries were drawn.)
map_state <- map_data("state")
ggplot(data = map_county_per_diff, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
geom_polygon(data = map_state, fill = NA, color = "black") +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by county") +
map_theme
(If you’re curious, that very blue spot in the middle of the country is Oglala County, South Dakota. From Wikipedia: “The counties surrounding Oglala Lakota County are predominantly Republican, but, like most Native American counties, Oglala Lakota is heavily Democratic, giving over 75 percent of the vote to every Democratic presidential nominee in every election back to 1984, making it one of the most Democratic counties in the United States. No Republican has carried the county in a presidential election since 1952.”)
Let’s try to make the same map but at the state level, instead of at the county level. Make a new dataframe containing state-level data:
state_df <- df %>% group_by(state) %>%
summarize(votes_dem = sum(votes_dem),
votes_gop = sum(votes_gop),
total_votes = sum(total_votes)) %>%
mutate(diff = votes_gop - votes_dem,
percent_diff = diff / total_votes)
Notice that state_df and map_state contain their data on state differently: it’s abbreviated in state_df, while its the full name in map_state. We’ll need to do some data wrangling/transformation to get them to match.
R has 2 built-in variables, state.abb and state.name, which can help us. First, let’s augment these variables with “District of Columbia”:
state_abb <- c(state.abb, "DC")
state_name <- c(state.name, "District of Columbia")
Next, we use the following line of code to add a new column to state_df which has the state name in full. (See ?match to figure out what is going on here. We also have to use tolower() as the state names are all in small letters in the map_state dataset.)
state_df$region <- tolower(state_name[match(state_df$state, state_abb)])
We are now in a position to join the datasets:
combined_state_df <- map_state %>% left_join(state_df, by = "region")
The commands for plotting are almost the same as for the county map:
ggplot(data = combined_state_df, mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = percent_diff)) +
geom_polygon(fill = NA, color = "black") +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by state") +
map_theme
We can also plot the state labels on the map (notice that the group can no longer be in the ggplot() call):
state_names <- combined_state_df %>% group_by(state) %>%
summarize(lat = 0.5 * (max(lat) + min(lat)),
long = 0.5 * (max(long) + min(long)))
ggplot(data = combined_state_df, mapping = aes(x = long, y = lat)) +
geom_polygon(aes(group = group, fill = percent_diff)) +
geom_polygon(aes(group = group), fill = NA, color = "black") +
geom_text(data = state_names,
mapping = aes(x = long, y = lat, label = state)) +
scale_fill_gradient2(low = "blue", high = "red") +
coord_quickmap() +
labs(title = "Election results by state") +
map_theme
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] maps_3.3.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5
## [5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.0
## [9] ggplot2_3.3.3 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.22 haven_2.3.1 colorspace_2.0-0
## [5] vctrs_0.3.6 generics_0.1.0 htmltools_0.5.1.1 yaml_2.2.1
## [9] utf8_1.2.1 rlang_0.4.10 pillar_1.5.1 glue_1.4.2
## [13] withr_2.4.1 DBI_1.1.1 dbplyr_2.1.0 modelr_0.1.8
## [17] readxl_1.3.1 lifecycle_1.0.0 munsell_0.5.0 gtable_0.3.0
## [21] cellranger_1.1.0 rvest_1.0.0 evaluate_0.14 labeling_0.4.2
## [25] knitr_1.31 fansi_0.4.2 highr_0.8 broom_0.7.5
## [29] Rcpp_1.0.6 scales_1.1.1 backports_1.2.1 jsonlite_1.7.2
## [33] farver_2.1.0 fs_1.5.0 hms_1.0.0 digest_0.6.27
## [37] stringi_1.5.3 grid_4.0.4 cli_2.3.1 tools_4.0.4
## [41] magrittr_2.0.1 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.1
## [45] xml2_1.3.2 reprex_1.0.0 lubridate_1.7.10 assertthat_0.2.1
## [49] rmarkdown_2.7 httr_1.4.2 rstudioapi_0.13 R6_2.5.0
## [53] compiler_4.0.4