Chapter 13 Data Visualization

Data visualization is a critical component to data analytics. Telling a story with data is becoming an increasingly important part of data analytics, a desired skill for all areas within the field. Creating, modifying, and customizing data visualizations will be covered within this section.

Visualization gives you answers to questions you didn’t know you had
Ben Schneiderman

13.1 Creating plots

Visualizing data is an essential part of data analytics, either during data exploration or to convey findings of the analysis. There are many different plot types, the following are common plots used to visualize numeric and character data.

13.1.1 Create a bar plot

Description
Method to create a bar plot from a single character column
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv") %>%
  dplyr::filter(!is.na(construction_grade))


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name) +
  geom_bar()


Actual Instructions

df %>%
  ggplot2::ggplot() +
  aes(x = construction_grade) +
  geom_bar()

13.1.2 Create a histogram plot

Description
Method to create a histogram plot from a single numeric column
Ingredients
Package Data

readr
magrittr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name) +
  geom_histogram()


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms) +
  geom_histogram()

13.1.3 Create a line plot

Description
Method to create a line plot from two numeric columns
Ingredients
Package Data

readr
dplyr
ggplot2

sample_parcel_assessed_value.csv


Preparation

df <- readr::read_csv("C:/data/sample_parcel_assessed_value.csv") %>%
  dplyr::group_by(tax_year) %>%
  dplyr::tally() %>%
  dplyr::ungroup()


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2) +
  geom_line()


Actual Instructions

df %>%
  ggplot2::ggplot() +
  aes(x = tax_year, y = n) +
  geom_line()

13.1.4 Create a scatter plot

Description
Method to create a scatter plot from two numeric columns
Ingredients
Package Data

readr
magrittr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2) +
  geom_point()


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms) +
  geom_point()

13.1.5 Create a box plot

Description
Method to create a box plot from both a character and numeric column
Ingredients
Package Data

readr
magrittr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("C:/data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2) +
  geom_boxplot()


Actual Instructions

df %>%
  ggplot2::ggplot() +
  aes(x = construction_grade, y = bedrooms) +
  geom_boxplot()

13.2 Visualizing dimensions

Adding dimensions to the mix when visualizing provides deeper potential insights. Dimensions are typically used to change the colour or size of a plot’s geometry.

13.2.1 Apply single fill colour to a plot

Description
Method to add a single colour to a bar plot
Ingredients
Package Data

readr
magrittr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv") %>%
  dplyr::filter(!is.na(construction_grade))


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name) +
  geom_bar(fill = "hex colour")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = construction_grade) +
  geom_bar(fill = "#484848")

13.2.2 Apply a single colour to a line/point plot

Description
Method to add a single colour to a scatter plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2) +
  geom_point(color = "hex colour")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms) +
  geom_point(colour = "#484848")

13.2.3 Change plot fill colour by column values

Description
Method to change the fill colour of a bar plot by the values in a column
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv") %>%
  dplyr::filter(!is.na(construction_grade))


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, fill = column_name2) +
  geom_bar()


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = construction_grade, fill = garage) +
  geom_bar()

13.2.4 Change plot line/point colour by column values

Description
Method to change the colour of a scatter plot by the values in a column
Ingredients
Package Data

readr
magrittr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point()


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point()

13.2.5 Change point plot size by numeric column

Description
Method to change the size of points of a scatter plot by the value in a column
Ingredients
Package Data

readr
magrittr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, size = column_name3) +
  geom_point()


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, size = bathrooms) +
  geom_point()

13.3 Add text to a plot

Text provides context helping users understand the purpose and specific details related to the plot.

13.3.1 Add title to a plot

Description
Method to add title to a plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
  dplyr::filter(!is.na(construction_grade))


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  labs(title = "text")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  labs(title = "Relationship of Bedrooms to Bathrooms")

13.3.2 Add subtitle to a plot

Description
Method to add subtitle to a plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  labs(subtitle = "text")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  labs(subtitle = "Three HRM Neighbourboods")

13.3.3 Add x-axis label to a plot

Description
Method to add x-axis label to a plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  labs(x = "text")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  labs(x = "Bedrooms")

13.3.4 Add y-axis label to a plot

Description
Method to add y-axis label to a plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  labs(y = "text")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  labs(y = "Bathrooms")

13.3.5 Add caption to a plot

Description
Method to add caption to a plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  labs(caption = "text")


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  labs(caption = "Created with ggplot2")

13.4 Customizing a plot

ggplot2 will generate acceptable plots with the default settings for many situations; data exploration, internal communication, but in many circumstances you will want to change the defaults to meet specific design needs. Customizing the look and style of a data visualization is essential to make your plot standout and grab the user’s attention.

13.4.1 Add theme to a plot

Description
Method to add theme to a plot
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme_minimal()


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme_minimal()

13.4.2 Remove major x/y grid line

Description
Method to remove major x/y grid line
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_blank()
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    panel.grid.major.x = element_blank(),
    panel.grid.major.y = element_blank()
  )

13.4.3 Remove minor x/y grid line

Description
Method to remove minor x/y grid line
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_blank()
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_blank()
  )

13.4.4 Change major x/y grid line (colour/type/size)

Description
Method to change major x/y grid line (colour/style)
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.major.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
    panel.grid.major.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    panel.grid.major.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.3),
    panel.grid.major.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.3)
  )

13.4.5 Change minor x/y grid line (colour/type/size)

Description
Method to change minor x/y grid line (colour/style)
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
    panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
    panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
  )

13.4.6 Change legend position

Description
Method to change legend position
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    theme.position = "location"
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    theme.position = "bottom"
  )

13.4.7 Remove legend

Description
Method to remove legend
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    theme.position = "value"
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    theme.position = "none"
  )

13.4.8 Change legend font (colour/style)

Description
Method to change legend font (colour/style)
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
    panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
    panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
  )

13.4.9 Remove legend title

Description
Method to remove legend title
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.minor.x = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number),
    panel.grid.minor.y = ggplot2::element_line(colour = "hex colour", linetype = "type", size = number)
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    panel.grid.minor.x = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1),
    panel.grid.minor.y = ggplot2::element_line(colour = "#484848", linetype = "solid", size = 0.1)
  )

13.4.10 Remove x/y axis text

Description
Method to remove x/y axis text
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    panel.grid.minor.x = element_blank(),
    panel.grid.minor.y = element_blank()
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    axis.text.x = element_blank(),
    axis.text.y = element_blank()
  )

13.4.11 Change x/y axis font (colour/size)

Description
Method to change x/y axis font (colour/size)
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("data/sample_dwelling_characteristics.csv")


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  theme(
    axis.text.x = ggplot2::element_text(colour = "hex colour", size = number),
    axis.text.y = ggplot2::element_text(colour = "hex colour", size = number)
  )


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = bedrooms, y = bathrooms, color = bathrooms) +
  geom_point() +
  theme(
    axis.text.x = ggplot2::element_text(colour = "#484848", size = 8),
    axis.text.y = ggplot2::element_text(colour = "#484848", size = 8)
  )

13.4.12 Wrap x axis text

Description
Method to wrap x axis text
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
  dplyr::filter(!is.na(construction_grade))


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_point() +
  scale_x_discrete(labels = function(x) str_wrap(x, width = number))


Actual Instructions

 df %>%
  ggplot2::ggplot() +
  aes(x = construction_grade) +
  geom_bar(fill = "#1E6E6E") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))

13.4.13 Stitch it all together

Description
Method to stitch all ggplot2 options together
Ingredients
Package Data

readr
dplyr
ggplot2

sample_dwelling_characteristics.csv


Preparation

df <- readr::read_csv("~/Github/r-recipe-book/data/sample_dwelling_characteristics.csv") %>%
  dplyr::filter(!is.na(construction_grade))


Sample Instructions

data %>%
  package::function() +
  aes(x = column_name1, y = column_name2, colour = column_name3) +
  geom_bar(fill = "hex colour") +
  labs(title = "text",
       subtitle = "text",
       x = "text",
       y = "text",
       caption = "text") +
  theme(
    # Text -- title, subtitle, labels, and caption
    plot.title = ggplot2::element_text(colour = "hex colour", size = number, face = "text"),
    plot.subtitle = ggplot2::element_text(colour = "hex colour", size = number),
    plot.caption = ggplot2::element_text(colour = "hex colour", size = number),
    axis.text.x = ggplot2::element_text(colour = "hex colour", size = number),
    axis.text.y = ggplot2::element_text(colour = "hex colour", size = number),
    
    # Chart elements -- grids
    panel.grid.major.x = ggplot2::element_blank(),
    panel.grid.major.y = ggplot2::element_line(colour = "hex colour", linetype = "text", size = number),
    panel.grid.minor.x = ggplot2::element_blank(),
    panel.grid.minor.y = ggplot2::element_blank(),
    panel.background = element_blank()
  ) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = number))


Actual Instructions

df %>%
  ggplot2::ggplot() +
  aes(x = construction_grade) +
  geom_bar(fill = "#1E6E6E") +
  labs(title = "Distribution of Housing Construction Grade",
       subtitle = "Three HRM Neighbourboods",
       x = "Construction Grade",
       y = "Number of Houses",
       caption = "Create with ggplot2") +
  theme(
    # Text -- title, subtitle, labels, and caption
    plot.title = ggplot2::element_text(colour = "#484848", size = 14, face = "bold"),
    plot.subtitle = ggplot2::element_text(colour = "#484848", size = 12),
    plot.caption = ggplot2::element_text(colour = "#484848", size = 6),
    axis.text.x = ggplot2::element_text(colour = "#484848", size = 8),
    axis.text.y = ggplot2::element_text(colour = "#484848", size = 8),
    
    # Chart elements -- grids
    panel.grid.major.x = ggplot2::element_blank(),
    panel.grid.major.y = ggplot2::element_line(colour = "#484848", linetype = "dotted", size = 0.2),
    panel.grid.minor.x = ggplot2::element_blank(),
    panel.grid.minor.y = ggplot2::element_blank(),
    panel.background = element_blank()
  ) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10))