GGPlot2

“ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson’s Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers.” - https://en.wikipedia.org/wiki/Ggplot2

Cheat Sheet

When starting to learn ggplot, remembering abbreviations can be tricky. Here’s a cheat sheet with various functions and graphing types:

https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf.

Setup

(Run install.packages(‘ggplot2’) to install)

library(ggplot2)

Basics

To make a graph, call ‘ggplot()’ and then append what you’d like to display (like layering components together). The simplest graph of this form follows, using a range of values along the X axis and Y axis.

ggplot() + geom_point(aes(x = 0:10, y = 0:10))

We don’t have to provide a value expression in the aes, we can also provide column names - so long as we provide the “data” argument to ggplot with the data frame containing our columns.

# Mock data in a dataframe (same "ranges" as before)
df = data.frame(X = 0:10, Y = 0:10)

# ggplot will resolve the column names using the data frame
ggplot(data = df) +
  geom_point(aes(x = X, y = Y))

A Non-Exhaustive List of Graph Types

Blank

ggplot() + geom_blank()

Points

ggplot() + geom_point(aes(x = 0:10, y = 0:10))

Jitter

(Adds random jitter to points)

ggplot() + geom_jitter(aes(x = 0:10, y = 0:10))

Line

ggplot() + geom_line(aes(x = 0:10, y = c(0, 3, 2, 1, 5, 2, 3, 5, 1, -2, 0)))

Bar

ggplot() + geom_bar(aes(x = c('a', 'b', 'b', 'b', 'c', 'c')))

Histogram

Like bar, for continuous variables. Bin width can be specified.

thousandRollsOfTenDice = replicate(1000, sum(sample(0:6, 10, replace=TRUE)))
ggplot() + geom_histogram(binwidth=2, aes(x = thousandRollsOfTenDice))

Labels

Graph labels can be changed by appending the functions:

  • ggtitle
  • xlab
  • ylab
  • labs

Exercise

Make a blank graph with custom labels.

Faceting

Sometimes you want to make multiple graphs, each representing a different aspect of your data. You could write out a function that takes in the data and the desired value to filter out the data for, or you could “facet” and let ggplot do this for you.

Example

Let’s histogram the hours of sleep of various mammals.

# msleep is a dataset built into ggplot. When you library(ggplot2) you import this data.
ggplot(data = msleep) +
  geom_histogram(binwidth=2, aes(x = sleep_total))

Now let’s facet based this graph by what the mammals eat (the “vore” column of the data).

# msleep is a dataset built into ggplot. When you library(ggplot2) you import this data.
ggplot(data = msleep) +
  geom_histogram(binwidth=2, aes(x = sleep_total)) +
  facet_wrap(~vore)

Exercise

The ‘diamonds’ dataset is also built into ggplot. It contains pricing and sizing data of 50,000 diamonds. Columns include:

  • Carat
  • Cut
  • Depth
  • Price
  • X/Y/Z (sizing info)

1. Plot “x” size on X axis, Price on Y

2. Plot same graph as before, faceting on cut