ggplot2
Note: There are often multiple ways to answer each question.
Load the ggplot2
and fueleconomy
packages, as well as the vehicles
dataset.
library(ggplot2)
library(fueleconomy)
## Warning: package 'fueleconomy' was built under R version 4.0.5
data(vehicles)
hwy
vs. cty
.ggplot(vehicles, aes(x = cty, y = hwy)) +
geom_point()
cyl
column to a factor.vehicles$cyl <- factor(vehicles$cyl)
cyl
value. Also, change the color scale to “YlOrRd”.ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_point() +
scale_color_brewer(palette = "YlOrRd")
## Warning: Removed 58 rows containing missing values (geom_point).
Notice how the NAs got removed!
alpha = 0.1
.ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_point(alpha = 0.1)
year
.ggplot(vehicles, aes(x = year)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
year
with just 5 bins.ggplot(vehicles, aes(x = year)) +
geom_histogram(bins = 5)
cyl
, make a violin plot of hwy
values.ggplot(vehicles, aes(x = cyl, y = hwy)) +
geom_violin()
cyl
, make a boxplot of hwy
values.ggplot(vehicles, aes(x = cyl, y = hwy)) +
geom_boxplot()
fuel
there are in the dataset. (Hint: Use the geom_bar
geom.)ggplot(vehicles, aes(x = fuel)) +
geom_bar()
coord_flip()
layer to the previous plot to make a horizontal barplot.ggplot(vehicles, aes(x = fuel)) +
geom_bar() +
coord_flip()
vehicles <- vehicles[1:2000, ]
hwy
vs. cty
. Give axis titles and a main title to the plot to make it more interpretable.ggplot(vehicles, aes(x = cty, y = hwy)) +
geom_point() +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg")
cyl
value. Also reduce the alpha of the points to an appropriate level and introduce jitter.ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_jitter(alpha = 0.2) +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg")
cyl
is in its own plot.ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_jitter(alpha = 0.2) +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg") +
facet_wrap(~ cyl)
cyl
values compare with each other, a lot of the plot space is wasted. Modify the plot so that each little plot has its own x and y scale. (Hint: This website might be helpful.)ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
geom_jitter(alpha = 0.2) +
labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
y = "Highway mpg") +
facet_wrap(~ cyl, scales = "free")
fuel
there are in the dataset. (Use the geom_bar
geom.) Change the theme to ggplot
’s black and white theme.ggplot(vehicles, aes(x = fuel)) +
geom_bar() +
theme_bw()
displ
for each value of drive
. Overlay that with a scatterplot of displ
vs. drive
(with jitter and alpha). How does the scatterplot give the reader more information?ggplot(vehicles, aes(x = drive, y = displ)) +
geom_violin() +
geom_jitter(alpha = 0.2)
## Warning: Removed 2 rows containing non-finite values (stat_ydensity).
## Warning: Removed 2 rows containing missing values (geom_point).
The scatterplot shows us how many observations there really are for each value of drive
. The violin plot doesn’t convey that information well. (For example, there are very few observations with 2-Wheel Drive.)
hwy
against year
with alpha value 0.5. Add a geom_smooth
layer with option method = "lm"
and without the SE bands.ggplot(vehicles, aes(x = year, y = hwy)) +
geom_jitter(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'
fuel
. Also, change the theme to ggplot
’s minimal theme and move the legend to the bottom of the plot. What happens to the geom_smooth
layer?ggplot(vehicles, aes(x = year, y = hwy, col = fuel)) +
geom_jitter(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
theme_minimal() +
theme(legend.position = "bottom")
## `geom_smooth()` using formula 'y ~ x'
The geom_smooth
layer gives a separate smoothed estimate for each value of fuel
.
hwy
vs. cty
, with the color of the point depending on year
. Change the color scale to “Spectral”. Do you see a trend?ggplot(vehicles, aes(x = cty, y = hwy, col = year)) +
geom_jitter() +
scale_color_distiller(palette = "Spectral")
As time goes on, we tend to see higher values of both highway and city mpg. This makes sense, since we expect the cars to be more fuel-efficient as time goes on.
ggplot(vehicles, aes(x = cty, y = hwy, col = year)) +
geom_jitter() +
scale_color_distiller(palette = "Reds") +
labs(title = "Plot of highway mpg vs. city mpg") +
theme_bw() +
theme(plot.title = element_text(size = rel(1.5), face = "bold", hjust = 0.5))