GitHub Classroom Assignment

# R
library(pak)
packages <- c("ggplot2", "cowplot", "ggthemes", "patchwork")
pak::pkg_install(packages)

lapply(packages, require, character.only = TRUE)
# Python
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

mpg = pd.read_csv("mpg.csv")

Exercises

Exercise 1 – Bar Plot Modification

  • Task:
    • Add a title to the plot: “Distribution of Cars by Class”.
    • Change the x-axis label to “Type of Car”.
    • Color the bars in blue.
    • Rotate the x-axis labels by 45 degrees.
  • Expected Output: An updated plot with the above specifications.

Initial Plot: A simple bar plot displaying the number of cars for each class in the mpg dataset.

# R
ggplot(data = mpg, aes(x = class)) +
  geom_bar()
# Python
sorted_mpg = mpg['class'].value_counts().sort_index()
plt.bar(sorted_mpg.index, sorted_mpg.values)

Exercise 2 – Histogram Modification

  • Task:
    • Add a title to the plot: “Highway Mileage Distribution”.
    • Change the x-axis label to “Miles Per Gallon”.
    • Fill the histogram bars with green but have a black border.
    • Set the bin width to 2.
  • Expected Output: An updated plot with the above specifications.

Initial Plot: A histogram showcasing the distribution of highway miles per gallon (hwy) from the mpg dataset.

# R
ggplot(data = mpg, aes(x = hwy)) + 
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Python
plt.hist(mpg['hwy'], bins = 30)

Exercise 3 – Scatter Plot with Facets

  • Task:
    • Add a title: “Engine Displacement vs. Highway MPG”.
    • Change the x-axis label to “Engine Size (liters)” and y-axis label to “Highway MPG”.
    • Color the points by class and shape them by the type of drive (e.g., 4wd, fwd, rwd).
    • Add a smooth trend line (with standard error or confidence interval) to the plot. Consider adjusting the alpha of the points for clarity.
    • Facet the plot by cyl (number of cylinders) in a 2x2 grid format.
  • Expected Output: An updated plot with the above specifications.

Initial Plot: A scatter plot illustrating the relationship between engine displacement (displ) and highway MPG (hwy).

# R
ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point()
# Python
sns.scatterplot(data = mpg, x = 'displ', y = 'hwy')

Exercise 4: Enhanced Boxplots using after_stat() and patchwork

  • Task:
  1. Modify plot1:
    • Color the boxes based on median value of cty using a gradient from light blue (low mpg) to dark blue (high mpg).
    • Add a title: “City MPG by Manufacturer”.
    • Rotate x-axis labels by 90 degrees and adjust their size for readability.
    • Apply a theme of your choice from the ggthemes library (R) or .set_theme() in the Seaborn library (Python).
  2. Modify plot2:
    • Color the boxes based on median value of hwy using a gradient from light green (low mpg) to dark green (high mpg).
    • Add a title: “Highway MPG by Manufacturer”.
    • Rotate x-axis labels by 90 degrees and adjust their size for readability.
    • Apply the same theme as plot1.
  3. Combine the two modified plots side by side using the patchwork library (R) or subplots (Python).
  • Expected Output: A plot with the above specifications.
# R
plot1 <- ggplot(data = mpg, aes(x = manufacturer, y = cty)) +
  geom_boxplot()

plot2 <- ggplot(data = mpg, aes(x = manufacturer, y = hwy)) +
  geom_boxplot()
# Python
plot1 = sns.boxplot(data = mpg, x = 'manufacturer', y = 'cty')

plot2 = sns.boxplot(data = mpg, x = 'manufacturer', y = 'hwy')