Chapter 13 Data Visualization
Data visualization is a critical component to data analytics. Telling a story with data is becoming an increasingly important part of data analytics, a desired skill for all areas within the field. Creating, modifying, and customizing data visualizations will be covered within this section.
Visualization gives you answers to questions you didn’t know you had
Ben Schneiderman
13.1 Creating plots
Visualizing data is an essential part of data analytics, either during data exploration or to convey findings of the analysis. There are many different plot types, the following are common plots used to visualize numeric and character data.
13.1.1 Create a bar plot
| Description | |
|---|---|
| Method to create a bar plot from a single character column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv") %>%
dplyr::filter(!is.na(construction_grade))Sample Instructions
data %>%
package::function() +
aes(x = column_name) +
geom_bar()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = construction_grade) +
geom_bar()13.1.2 Create a histogram plot
| Description | |
|---|---|
| Method to create a histogram plot from a single numeric column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name) +
geom_histogram()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms) +
geom_histogram()13.1.3 Create a line plot
| Description | |
|---|---|
| Method to create a line plot from two numeric columns |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_parcel_assessed_value.csv |
Preparation
df <- readr::read_csv("C:/data/sample_parcel_assessed_value.csv") %>%
dplyr::group_by(tax_year) %>%
dplyr::tally() %>%
dplyr::ungroup()Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2) +
geom_line()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = tax_year, y = n) +
geom_line()13.1.4 Create a scatter plot
| Description | |
|---|---|
| Method to create a scatter plot from two numeric columns |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2) +
geom_point()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms) +
geom_point()13.1.5 Create a box plot
| Description | |
|---|---|
| Method to create a box plot from both a character and numeric column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2) +
geom_boxplot()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = construction_grade, y = bedrooms) +
geom_boxplot()13.2 Visualizing dimensions
Adding dimensions to the mix when visualizing provides deeper potential insights. Dimensions are typically used to change the colour or size of a plot’s geometry.
13.2.1 Apply single fill colour to a plot
| Description | |
|---|---|
| Method to add a single colour to a bar plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv") %>%
dplyr::filter(!is.na(construction_grade))Sample Instructions
data %>%
package::function() +
aes(x = column_name) +
geom_bar(fill = "hex colour")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = construction_grade) +
geom_bar(fill = "#484848")13.2.2 Apply a single colour to a line/point plot
| Description | |
|---|---|
| Method to add a single colour to a scatter plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2) +
geom_point(color = "hex colour")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms) +
geom_point(colour = "#484848")13.2.3 Change plot fill colour by column values
| Description | |
|---|---|
| Method to change the fill colour of a bar plot by the values in a column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv") %>%
dplyr::filter(!is.na(construction_grade))Sample Instructions
data %>%
package::function() +
aes(x = column_name1, fill = column_name2) +
geom_bar()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = construction_grade, fill = garage) +
geom_bar()13.2.4 Change plot line/point colour by column values
| Description | |
|---|---|
| Method to change the colour of a scatter plot by the values in a column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point()13.2.5 Change point plot size by numeric column
| Description | |
|---|---|
| Method to change the size of points of a scatter plot by the value in a column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, size = column_name3) +
geom_point()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, size = bathrooms) +
geom_point()13.3 Add text to a plot
Text provides context helping users understand the purpose and specific details related to the plot.
13.3.1 Add title to a plot
| Description | |
|---|---|
| Method to add title to a plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
dplyr::filter(!is.na(construction_grade))Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(title = "text")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(title = "Relationship of Bedrooms to Bathrooms")13.3.2 Add subtitle to a plot
| Description | |
|---|---|
| Method to add subtitle to a plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(subtitle = "text")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(subtitle = "Three HRM Neighbourboods")13.3.3 Add x-axis label to a plot
| Description | |
|---|---|
| Method to add x-axis label to a plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(x = "text")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(x = "Bedrooms")13.3.4 Add y-axis label to a plot
| Description | |
|---|---|
| Method to add y-axis label to a plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(y = "text")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(y = "Bathrooms")13.3.5 Add caption to a plot
| Description | |
|---|---|
| Method to add caption to a plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(caption = "text")Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(caption = "Created with ggplot2")13.4 Customizing a plot
ggplot2 will generate acceptable plots with the default settings for many situations; data exploration, internal communication, but in many circumstances you will want to change the defaults to meet specific design needs. Customizing the look and style of a data visualization is essential to make your plot standout and grab the user’s attention.
13.4.1 Add theme to a plot
| Description | |
|---|---|
| Method to add theme to a plot |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme_minimal()Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme_minimal()13.4.2 Remove major x/y grid line
| Description | |
|---|---|
| Method to remove major x/y grid line |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank()
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank()
)13.4.3 Remove minor x/y grid line
| Description | |
|---|---|
| Method to remove minor x/y grid line |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)13.4.4 Change major x/y grid line (colour/type/size)
| Description | |
|---|---|
| Method to change major x/y grid line (colour/style) |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.major.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.major.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.major.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.3),
panel.grid.major.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.3)
)13.4.5 Change minor x/y grid line (colour/type/size)
| Description | |
|---|---|
| Method to change minor x/y grid line (colour/style) |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
)13.4.6 Change legend position
| Description | |
|---|---|
| Method to change legend position |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
theme.position = "location"
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
theme.position = "bottom"
)13.4.7 Remove legend
| Description | |
|---|---|
| Method to remove legend |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
theme.position = "value"
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
theme.position = "none"
)13.4.8 Change legend font (colour/style)
| Description | |
|---|---|
| Method to change legend font (colour/style) |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
)13.4.9 Remove legend title
| Description | |
|---|---|
| Method to remove legend title |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
)13.4.10 Remove x/y axis text
| Description | |
|---|---|
| Method to remove x/y axis text |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank()
)13.4.11 Change x/y axis font (colour/size)
| Description | |
|---|---|
| Method to change x/y axis font (colour/size) |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("data/sample_dwelling_characteristics.csv")Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
axis.text.x = ggplot2::element_text(colour = "hex colour", size = number),
axis.text.y = ggplot2::element_text(colour = "hex colour", size = number)
)Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
axis.text.x = ggplot2::element_text(colour = "#484848", size = 8),
axis.text.y = ggplot2::element_text(colour = "#484848", size = 8)
)13.4.12 Wrap x axis text
| Description | |
|---|---|
| Method to wrap x axis text |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
dplyr::filter(!is.na(construction_grade))Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
scale_x_discrete(labels = function(x) str_wrap(x, width = number))Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = construction_grade) +
geom_bar(fill = "#1E6E6E") +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))13.4.13 Stitch it all together
| Description | |
|---|---|
| Method to stitch all ggplot2 options together |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
df <- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
dplyr::filter(!is.na(construction_grade))Sample Instructions
data %>%
package::function() +
aes(x = column_name1, y = column_name2, colour = column_name3) +
geom_bar(fill = "hex colour") +
labs(title = "text",
subtitle = "text",
x = "text",
y = "text",
caption = "text") +
theme(
# Text -- title, subtitle, labels, and caption
plot.title = ggplot2::element_text(colour = "hex colour", size = number, face = "text"),
plot.subtitle = ggplot2::element_text(colour = "hex colour", size = number),
plot.caption = ggplot2::element_text(colour = "hex colour", size = number),
axis.text.x = ggplot2::element_text(colour = "hex colour", size = number),
axis.text.y = ggplot2::element_text(colour = "hex colour", size = number),
# Chart elements -- grids
panel.grid.major.x = ggplot2::element_blank(),
panel.grid.major.y = ggplot2::element_line(colour = "hex colour", linetype = "text", size = number),
panel.grid.minor.x = ggplot2::element_blank(),
panel.grid.minor.y = ggplot2::element_blank(),
panel.background = element_blank()
) +
scale_x_discrete(labels = function(x) str_wrap(x, width = number))Actual Instructions
df %>%
ggplot2::ggplot() +
aes(x = construction_grade) +
geom_bar(fill = "#1E6E6E") +
labs(title = "Distribution of Housing Construction Grade",
subtitle = "Three HRM Neighbourboods",
x = "Construction Grade",
y = "Number of Houses",
caption = "Create with ggplot2") +
theme(
# Text -- title, subtitle, labels, and caption
plot.title = ggplot2::element_text(colour = "#484848", size = 14, face = "bold"),
plot.subtitle = ggplot2::element_text(colour = "#484848", size = 12),
plot.caption = ggplot2::element_text(colour = "#484848", size = 6),
axis.text.x = ggplot2::element_text(colour = "#484848", size = 8),
axis.text.y = ggplot2::element_text(colour = "#484848", size = 8),
# Chart elements -- grids
panel.grid.major.x = ggplot2::element_blank(),
panel.grid.major.y = ggplot2::element_line(colour = "#484848", linetype = "dotted", size = 0.2),
panel.grid.minor.x = ggplot2::element_blank(),
panel.grid.minor.y = ggplot2::element_blank(),
panel.background = element_blank()
) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))