Part 1

Note: There are often multiple ways to answer each question.

Load the ggplot2 and fueleconomy packages, as well as the vehicles dataset.

library(ggplot2)
library(fueleconomy)
## Warning: package 'fueleconomy' was built under R version 4.0.5
data(vehicles)
  1. Make a scatterplot of hwy vs. cty.
ggplot(vehicles, aes(x = cty, y = hwy)) +
    geom_point()

  1. Convert the cyl column to a factor.
vehicles$cyl <- factor(vehicles$cyl)
  1. Modify the plot from Qn 1 such that the color of the dot represents cyl value. Also, change the color scale to “YlOrRd”.
ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
    geom_point() +
    scale_color_brewer(palette = "YlOrRd")
## Warning: Removed 58 rows containing missing values (geom_point).

Notice how the NAs got removed!

  1. There is a lot of overplotting in the plot above. Remove the color scale and modify the previous plot so that alpha = 0.1.
ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
    geom_point(alpha = 0.1)

  1. Make a histogram of year.
ggplot(vehicles, aes(x = year)) +
    geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  1. Make a histogram of year with just 5 bins.
ggplot(vehicles, aes(x = year)) +
    geom_histogram(bins = 5)

  1. For each value of cyl, make a violin plot of hwy values.
ggplot(vehicles, aes(x = cyl, y = hwy)) +
    geom_violin()

  1. For each value of cyl, make a boxplot of hwy values.
ggplot(vehicles, aes(x = cyl, y = hwy)) +
    geom_boxplot()

  1. Make a barplot to show how many cars of each type of fuel there are in the dataset. (Hint: Use the geom_bar geom.)
ggplot(vehicles, aes(x = fuel)) +
    geom_bar()

  1. Add a coord_flip() layer to the previous plot to make a horizontal barplot.
ggplot(vehicles, aes(x = fuel)) +
    geom_bar() +
    coord_flip()

Part 2

vehicles <- vehicles[1:2000, ]
  1. Make a scatterplot of hwy vs. cty. Give axis titles and a main title to the plot to make it more interpretable.
ggplot(vehicles, aes(x = cty, y = hwy)) +
    geom_point() +
    labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
         y = "Highway mpg")

  1. Modify the plot above such that the color of the dot represents cyl value. Also reduce the alpha of the points to an appropriate level and introduce jitter.
ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
    geom_jitter(alpha = 0.2) +
    labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
         y = "Highway mpg")

  1. Modify the plot above so that each value of cyl is in its own plot.
ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
    geom_jitter(alpha = 0.2) +
    labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
         y = "Highway mpg") +
    facet_wrap(~ cyl)

  1. While the plot above gives us a good idea of how cars with different cyl values compare with each other, a lot of the plot space is wasted. Modify the plot so that each little plot has its own x and y scale. (Hint: This website might be helpful.)
ggplot(vehicles, aes(x = cty, y = hwy, col = cyl)) +
    geom_jitter(alpha = 0.2) +
    labs(title = "Scatterplot of highway mpg vs. city mpg", x = "City mpg",
         y = "Highway mpg") +
    facet_wrap(~ cyl, scales = "free")

  1. Make a barplot to show how many cars of each type of fuel there are in the dataset. (Use the geom_bar geom.) Change the theme to ggplot’s black and white theme.
ggplot(vehicles, aes(x = fuel)) +
    geom_bar() +
    theme_bw()

  1. Make a violin plot to show the distribution of displ for each value of drive. Overlay that with a scatterplot of displ vs. drive (with jitter and alpha). How does the scatterplot give the reader more information?
ggplot(vehicles, aes(x = drive, y = displ)) +
    geom_violin() +
    geom_jitter(alpha = 0.2)
## Warning: Removed 2 rows containing non-finite values (stat_ydensity).
## Warning: Removed 2 rows containing missing values (geom_point).

The scatterplot shows us how many observations there really are for each value of drive. The violin plot doesn’t convey that information well. (For example, there are very few observations with 2-Wheel Drive.)

  1. Make a (jittered) scatterplot of hwy against year with alpha value 0.5. Add a geom_smooth layer with option method = "lm" and without the SE bands.
ggplot(vehicles, aes(x = year, y = hwy)) +
    geom_jitter(alpha = 0.5) +
    geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

  1. Modify the previous plot so that the color of the points depends on fuel. Also, change the theme to ggplot’s minimal theme and move the legend to the bottom of the plot. What happens to the geom_smooth layer?
ggplot(vehicles, aes(x = year, y = hwy, col = fuel)) +
    geom_jitter(alpha = 0.5) +
    geom_smooth(method = "lm", se = FALSE) +
    theme_minimal() +
    theme(legend.position = "bottom")
## `geom_smooth()` using formula 'y ~ x'

The geom_smooth layer gives a separate smoothed estimate for each value of fuel.

  1. Make a (jittered) scatterplot of hwy vs. cty, with the color of the point depending on year. Change the color scale to “Spectral”. Do you see a trend?
ggplot(vehicles, aes(x = cty, y = hwy, col = year)) +
    geom_jitter() +
    scale_color_distiller(palette = "Spectral")

As time goes on, we tend to see higher values of both highway and city mpg. This makes sense, since we expect the cars to be more fuel-efficient as time goes on.

  1. Modify the theme of the plot above to a theme you like and try a different color scale. Also, give the plot a title and make it bigger, bold and centralized.
ggplot(vehicles, aes(x = cty, y = hwy, col = year)) +
    geom_jitter() +
    scale_color_distiller(palette = "Reds") +
    labs(title = "Plot of highway mpg vs. city mpg") +
    theme_bw() +
    theme(plot.title = element_text(size = rel(1.5), face = "bold", hjust = 0.5))