Chapter 13 Data Visualization
Data visualization is a critical component to data analytics. Telling a story with data is becoming an increasingly important part of data analytics, a desired skill for all areas within the field. Creating, modifying, and customizing data visualizations will be covered within this section.
Visualization gives you answers to questions you didn’t know you had
Ben Schneiderman
13.1 Creating plots
Visualizing data is an essential part of data analytics, either during data exploration or to convey findings of the analysis. There are many different plot types, the following are common plots used to visualize numeric and character data.
13.1.1 Create a bar plot
Description | |
---|---|
Method to create a bar plot from a single character column |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("C:/data/sample_dwelling_characteristics.csv") %>%
df ::filter(!is.na(construction_grade)) dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name) +
geom_bar()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = construction_grade) +
geom_bar()
13.1.2 Create a histogram plot
Description | |
---|---|
Method to create a histogram plot from a single numeric column |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("C:/data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name) +
geom_histogram()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms) +
geom_histogram()
13.1.3 Create a line plot
Description | |
---|---|
Method to create a line plot from two numeric columns |
Ingredients | |
---|---|
Package | Data |
readr |
sample_parcel_assessed_value.csv |
Preparation
<- readr::read_csv("C:/data/sample_parcel_assessed_value.csv") %>%
df ::group_by(tax_year) %>%
dplyr::tally() %>%
dplyr::ungroup() dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2) +
geom_line()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = tax_year, y = n) +
geom_line()
13.1.4 Create a scatter plot
Description | |
---|---|
Method to create a scatter plot from two numeric columns |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("C:/data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2) +
geom_point()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms) +
geom_point()
13.1.5 Create a box plot
Description | |
---|---|
Method to create a box plot from both a character and numeric column |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("C:/data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2) +
geom_boxplot()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = construction_grade, y = bedrooms) +
geom_boxplot()
13.2 Visualizing dimensions
Adding dimensions to the mix when visualizing provides deeper potential insights. Dimensions are typically used to change the colour or size of a plot’s geometry.
13.2.1 Apply single fill colour to a plot
Description | |
---|---|
Method to add a single colour to a bar plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") %>%
df ::filter(!is.na(construction_grade)) dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name) +
geom_bar(fill = "hex colour")
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = construction_grade) +
geom_bar(fill = "#484848")
13.2.2 Apply a single colour to a line/point plot
Description | |
---|---|
Method to add a single colour to a scatter plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2) +
geom_point(color = "hex colour")
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms) +
geom_point(colour = "#484848")
13.2.3 Change plot fill colour by column values
Description | |
---|---|
Method to change the fill colour of a bar plot by the values in a column |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") %>%
df ::filter(!is.na(construction_grade)) dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, fill = column_name2) +
geom_bar()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = construction_grade, fill = garage) +
geom_bar()
13.2.4 Change plot line/point colour by column values
Description | |
---|---|
Method to change the colour of a scatter plot by the values in a column |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point()
13.2.5 Change point plot size by numeric column
Description | |
---|---|
Method to change the size of points of a scatter plot by the value in a column |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, size = column_name3) +
geom_point()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, size = bathrooms) +
geom_point()
13.3 Add text to a plot
Text provides context helping users understand the purpose and specific details related to the plot.
13.3.1 Add title to a plot
Description | |
---|---|
Method to add title to a plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
df ::filter(!is.na(construction_grade)) dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(title = "text")
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(title = "Relationship of Bedrooms to Bathrooms")
13.3.2 Add subtitle to a plot
Description | |
---|---|
Method to add subtitle to a plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(subtitle = "text")
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(subtitle = "Three HRM Neighbourboods")
13.3.3 Add x-axis label to a plot
Description | |
---|---|
Method to add x-axis label to a plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(x = "text")
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(x = "Bedrooms")
13.3.4 Add y-axis label to a plot
Description | |
---|---|
Method to add y-axis label to a plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
labs(y = "text")
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
labs(y = "Bathrooms")
13.4 Customizing a plot
ggplot2 will generate acceptable plots with the default settings for many situations; data exploration, internal communication, but in many circumstances you will want to change the defaults to meet specific design needs. Customizing the look and style of a data visualization is essential to make your plot standout and grab the user’s attention.
13.4.1 Add theme to a plot
Description | |
---|---|
Method to add theme to a plot |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme_minimal()
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme_minimal()
13.4.2 Remove major x/y grid line
Description | |
---|---|
Method to remove major x/y grid line |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank()
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_blank()
)
13.4.3 Remove minor x/y grid line
Description | |
---|---|
Method to remove minor x/y grid line |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)
13.4.4 Change major x/y grid line (colour/type/size)
Description | |
---|---|
Method to change major x/y grid line (colour/style) |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.major.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.major.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.major.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.3),
panel.grid.major.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.3)
)
13.4.5 Change minor x/y grid line (colour/type/size)
Description | |
---|---|
Method to change minor x/y grid line (colour/style) |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
)
13.4.6 Change legend position
Description | |
---|---|
Method to change legend position |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
theme.position = "location"
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
theme.position = "bottom"
)
13.4.7 Remove legend
Description | |
---|---|
Method to remove legend |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
theme.position = "value"
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
theme.position = "none"
)
13.4.8 Change legend font (colour/style)
Description | |
---|---|
Method to change legend font (colour/style) |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
)
13.4.9 Remove legend title
Description | |
---|---|
Method to remove legend title |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
)
13.4.10 Remove x/y axis text
Description | |
---|---|
Method to remove x/y axis text |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
axis.text.x = element_blank(),
axis.text.y = element_blank()
)
13.4.11 Change x/y axis font (colour/size)
Description | |
---|---|
Method to change x/y axis font (colour/size) |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("data/sample_dwelling_characteristics.csv") df
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
theme(
axis.text.x = ggplot2::element_text(colour = "hex colour", size = number),
axis.text.y = ggplot2::element_text(colour = "hex colour", size = number)
)
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = bedrooms, y = bathrooms, color = bathrooms) +
geom_point() +
theme(
axis.text.x = ggplot2::element_text(colour = "#484848", size = 8),
axis.text.y = ggplot2::element_text(colour = "#484848", size = 8)
)
13.4.12 Wrap x axis text
Description | |
---|---|
Method to wrap x axis text |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
df ::filter(!is.na(construction_grade)) dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_point() +
scale_x_discrete(labels = function(x) str_wrap(x, width = number))
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = construction_grade) +
geom_bar(fill = "#1E6E6E") +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))
13.4.13 Stitch it all together
Description | |
---|---|
Method to stitch all ggplot2 options together |
Ingredients | |
---|---|
Package | Data |
readr |
sample_dwelling_characteristics.csv |
Preparation
<- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
df ::filter(!is.na(construction_grade)) dplyr
Sample Instructions
%>%
data ::function() +
packageaes(x = column_name1, y = column_name2, colour = column_name3) +
geom_bar(fill = "hex colour") +
labs(title = "text",
subtitle = "text",
x = "text",
y = "text",
caption = "text") +
theme(
# Text -- title, subtitle, labels, and caption
plot.title = ggplot2::element_text(colour = "hex colour", size = number, face = "text"),
plot.subtitle = ggplot2::element_text(colour = "hex colour", size = number),
plot.caption = ggplot2::element_text(colour = "hex colour", size = number),
axis.text.x = ggplot2::element_text(colour = "hex colour", size = number),
axis.text.y = ggplot2::element_text(colour = "hex colour", size = number),
# Chart elements -- grids
panel.grid.major.x = ggplot2::element_blank(),
panel.grid.major.y = ggplot2::element_line(colour = "hex colour", linetype = "text", size = number),
panel.grid.minor.x = ggplot2::element_blank(),
panel.grid.minor.y = ggplot2::element_blank(),
panel.background = element_blank()
+
) scale_x_discrete(labels = function(x) str_wrap(x, width = number))
Actual Instructions
%>%
df ::ggplot() +
ggplot2aes(x = construction_grade) +
geom_bar(fill = "#1E6E6E") +
labs(title = "Distribution of Housing Construction Grade",
subtitle = "Three HRM Neighbourboods",
x = "Construction Grade",
y = "Number of Houses",
caption = "Create with ggplot2") +
theme(
# Text -- title, subtitle, labels, and caption
plot.title = ggplot2::element_text(colour = "#484848", size = 14, face = "bold"),
plot.subtitle = ggplot2::element_text(colour = "#484848", size = 12),
plot.caption = ggplot2::element_text(colour = "#484848", size = 6),
axis.text.x = ggplot2::element_text(colour = "#484848", size = 8),
axis.text.y = ggplot2::element_text(colour = "#484848", size = 8),
# Chart elements -- grids
panel.grid.major.x = ggplot2::element_blank(),
panel.grid.major.y = ggplot2::element_line(colour = "#484848", linetype = "dotted", size = 0.2),
panel.grid.minor.x = ggplot2::element_blank(),
panel.grid.minor.y = ggplot2::element_blank(),
panel.background = element_blank()
+
) scale_x_discrete(labels = function(x) str_wrap(x, width = 10))