“ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson’s Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers.” - https://en.wikipedia.org/wiki/Ggplot2

When starting to learn ggplot, remembering abbreviations can be tricky. Here’s a cheat sheet with various functions and graphing types:

https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf.

(Run install.packages(‘ggplot2’) to install)

`library(ggplot2)`

To make a graph, call ‘ggplot()’ and then append what you’d like to display (like layering components together). The simplest graph of this form follows, using a range of values along the X axis and Y axis.

`ggplot() + geom_point(aes(x = 0:10, y = 0:10))`

We don’t have to provide a value expression in the aes, we can also provide column names - so long as we provide the “data” argument to ggplot with the data frame containing our columns.

```
# Mock data in a dataframe (same "ranges" as before)
df = data.frame(X = 0:10, Y = 0:10)
# ggplot will resolve the column names using the data frame
ggplot(data = df) +
geom_point(aes(x = X, y = Y))
```

`ggplot() + geom_blank()`

`ggplot() + geom_point(aes(x = 0:10, y = 0:10))`

(Adds random jitter to points)

`ggplot() + geom_jitter(aes(x = 0:10, y = 0:10))`

`ggplot() + geom_line(aes(x = 0:10, y = c(0, 3, 2, 1, 5, 2, 3, 5, 1, -2, 0)))`

`ggplot() + geom_bar(aes(x = c('a', 'b', 'b', 'b', 'c', 'c')))`

Like bar, for continuous variables. Bin width can be specified.

```
thousandRollsOfTenDice = replicate(1000, sum(sample(0:6, 10, replace=TRUE)))
ggplot() + geom_histogram(binwidth=2, aes(x = thousandRollsOfTenDice))
```

Graph labels can be changed by appending the functions:

- ggtitle
- xlab
- ylab
- labs

Make a blank graph with custom labels.

Sometimes you want to make multiple graphs, each representing a different aspect of your data. You could write out a function that takes in the data and the desired value to filter out the data for, or you could “facet” and let ggplot do this for you.

Let’s histogram the hours of sleep of various mammals.

```
# msleep is a dataset built into ggplot. When you library(ggplot2) you import this data.
ggplot(data = msleep) +
geom_histogram(binwidth=2, aes(x = sleep_total))
```

Now let’s facet based this graph by what the mammals eat (the “vore” column of the data).

```
# msleep is a dataset built into ggplot. When you library(ggplot2) you import this data.
ggplot(data = msleep) +
geom_histogram(binwidth=2, aes(x = sleep_total)) +
facet_wrap(~vore)
```

The ‘diamonds’ dataset is also built into ggplot. It contains pricing and sizing data of 50,000 diamonds. Columns include:

- Carat
- Cut
- Depth
- Price
- X/Y/Z (sizing info)