Note: There are often multiple ways to answer each question.

Part 1: Basic plotting grammar

Load the ggplot2 and fueleconomy packages, as well as the vehicles dataset.

library(ggplot2)
library(fueleconomy)
data(vehicles)
  1. Make a scatterplot of hwy vs. cty.

  2. Convert the cyl column to a factor.

  3. Modify the plot from Qn 1 such that the color of the dot represents cyl value. Also, change the color scale to “YlOrRd”.

  4. There is a lot of overplotting in the plot above. Remove the color scale and modify the previous plot so that alpha = 0.1.

  5. Make a histogram of year.

  6. Make a histogram of year with just 5 bins.

  7. For each value of cyl, make a violin plot of hwy values.

  8. For each value of cyl, make a boxplot of hwy values.

  9. Make a barplot to show how many cars of each type of fuel there are in the dataset. (Hint: Use the geom_bar geom.)

  10. Add a coord_flip() layer to the previous plot to make a horizontal barplot.

Part 2: Overlays and customization

Run the code below to extract just the first 1,000 rows of the dataset.

vehicles <- vehicles[1:2000, ]
  1. Make a scatterplot of hwy vs. cty. Give axis titles and a main title to the plot to make it more interpretable.

  2. Modify the plot above such that the color of the dot represents cyl value. Also reduce the alpha of the points to an appropriate level and introduce jitter.

  3. Modify the plot above so that each value of cyl is in its own plot.

  4. While the plot above gives us a good idea of how cars with different cyl values compare with each other, a lot of the plot space is wasted. Modify the plot so that each little plot has its own x and y scale. (Hint: This website might be helpful.)

  5. Make a barplot to show how many cars of each type of fuel there are in the dataset. (Use the geom_bar geom.) Change the theme to ggplot’s black and white theme.

  6. Make a violin plot to show the distribution of displ for each value of drive. Overlay that with a scatterplot of displ vs. drive (with jitter and alpha). How does the scatterplot give the reader more information?

  7. Make a (jittered) scatterplot of hwy against year with alpha value 0.5. Add a geom_smooth layer with option method = "lm" and without the SE bands.

  8. Modify the previous plot so that the color of the points depends on fuel. Also, change the theme to ggplot’s minimal theme and move the legend to the bottom of the plot. What happens to the geom_smooth layer?

  9. Make a (jittered) scatterplot of hwy vs. cty, with the color of the point depending on year. Change the color scale to “Spectral”. Do you see a trend?

  10. Modify the theme of the plot above to a theme you like and try a different color scale. Also, give the plot a title and make it bigger, bold and centralized.