Chapter 10 Modifying Data
When working with data it is common that values in columns need to be modified, or creating new columns of data. The following recipes outline how to change data with a column, and how to create data in a new column using different techniques.
10.1 Static Value
Changing or creating data values in a column so that all rows have the same value; with examples for character, numeric, and date data types.
10.1.1 Change or create a single character column with a static character value
| Description | |
|---|---|
| Method to change or create a character column with a single specific character value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, column_name = "text")Actual Instructions
dplyr::mutate(df, country = "Canada")10.1.2 Change or create a single numeric column with a static numeric value
| Description | |
|---|---|
| Method to change or create a numeric column with a single specific numeric value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, column_name = number)Actual Instructions
dplyr::mutate(df, population = 100000)10.1.3 Change a single date column with a static date value
| Description | |
|---|---|
| Method to change a date column with a single specific date value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, column_name = "yyyy-mm-dd")Actual Instructions
dplyr::mutate(df, date_added = "2021-11-19")10.1.4 Create a new date column with a static date value
| Description | |
|---|---|
| Method to create a new date column with a single specific date value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = as.Date("yyyy-mm-dd"))Actual Instructions
dplyr::mutate(df, date_added = as.Date("2021-11-19"))10.1.5 Change or create multiple character columns with a static character value
| Description | |
|---|---|
| Method to change or create multiple character columns with a single specific character value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, column_name1 = "text",
column_name2 = "text")Actual Instructions
dplyr::mutate(df, country = "Canada",
country_code = "CA")10.1.6 Change or create multiple numeric columns with a static numeric value
| Description | |
|---|---|
| Method to change or create multiple numeric columns with a single specific numeric value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, column_name1 = number,
column_name2 = number)Actual Instructions
dplyr::mutate(df, population = 100000,
growth_rate = 1.75)10.1.7 Change multiple date columns with a static date value
| Description | |
|---|---|
| Method to change multiple date columns with a single specific date value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, column_name1 = "yyyy-mm-dd",
column_name2 = "yyyy-mm-dd")Actual Instructions
dplyr::mutate(df, date_added = "2021-11-19",
date_changed = "2021-11-20")10.1.8 Create multiple new date columns with a static date value
| Description | |
|---|---|
| Method to create multiple new date columns with a single specific date value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name1 = as.Date("yyyy-mm-dd"),
new_column_name2 = as.Date("yyyy-mm-dd"))Actual Instructions
dplyr::mutate(df, date_added = as.Date("2021-11-19"),
date_changed = as.Date("2021-11-20"))10.2 Using Existing Column(s)
Changing or creating data values in a column with values from an existing column of data; with examples for character, numeric, and date data types.
10.2.1 Create new character column from static text and a character column
| Description | |
|---|---|
| Method to create a new character column by combining values from a character column with static text |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name, paste0("text", column_name))Actual Instructions
dplyr::mutate(df, label = paste0("My name is ", first_name))10.2.2 Create new character column from multiple character columns
| Description | |
|---|---|
| Method to create a new character column by combining values from multiple character columns |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name, paste0(column_name1, column_name2))Actual Instructions
dplyr::mutate(df, username = paste0(first_name, last_name))10.2.3 Create new character column from multiple character columns and static text
| Description | |
|---|---|
| Method to create a new character column by combining values from multiple character columns separated by a static text value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name, paste0(column_name1, " ", column_name2))Actual Instructions
dplyr::mutate(df, legal_name = paste0(first_name, " ", last_name))10.2.4 Create new numeric column from a numeric column and a mathematical operation
| Description | |
|---|---|
| Method to create a new numeric column by combining values from a numeric column and using a mathematical operation |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = column_name operator number)Actual Instructions
dplyr::mutate(df, cost_per_month = cost / 12)10.2.5 Create new numeric column from multiple numeric columns and mathematical operations
| Description | |
|---|---|
| Method to create a new numeric column by combining values from multiple numeric column and using a mathematical operation |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = column_name1 operator column_name2)Actual Instructions
dplyr::mutate(df, population_density = population / area_sq_km)10.2.6 Create new date column from a date column and a mathematical operation
| Description | |
|---|---|
| Method to create a new date column by combining values from a date column and using a mathematical operation |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = column_name operator number)Actual Instructions
dplyr::mutate(df, closing_date = date_created + 7)10.2.7 Create new date column from multiple date columns and mathematical operations
| Description | |
|---|---|
| Method to create a new date column by combining values from multiple date column and using a mathematical operation |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = column_name1 operator column_name2)Actual Instructions
dplyr::mutate(df, duration = end_date - start_date)10.2.8 Create new date column from a date column and a function
| Description | |
|---|---|
| Method to create a new date column by combining values from a date column and using a function |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = package::function(column_name))Actual Instructions
dplyr::mutate(df, year_created = lubridate::year(date_created))10.3 Conditionally from Existing Column
Creating data values in a new column using conditional statements from a existing column of data.
10.3.1 Create new character column conditionally with default static value
| Description | |
|---|---|
| Method to create a new character column by using a conditional statement adding a static default value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = package::function(
column_name operator number ~ "text",
TRUE ~ "text"))Actual Instructions
dplyr::mutate(df, result = dplyr::case_when(
score >= 50 ~ "pass",
TRUE ~ "fail"))10.3.2 Create new character column conditionally with character column as default value
| Description | |
|---|---|
| Method to create a new character column by using a conditional statement adding default values from a character column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = package::function(
column_name operator number ~ "text",
TRUE ~ column_name))Actual Instructions
dplyr::mutate(df, result = dplyr::case_when(
grade == "F" ~ "Fail",
TRUE ~ result))10.3.3 Create new character column conditionally from existing numeric column with default static value
| Description | |
|---|---|
| Method to create a new character column by using a conditional statement from an existing numeric column adding a static default value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = package::function(
column_name operator number ~ "text",
column_name operator number ~ "text",
TRUE ~ "text"))Actual Instructions
dplyr::mutate(data, result = dplyr::case_when(
score < 50 ~ "fail",
score >= 50 ~ "pass",
TRUE ~ "error"))10.3.4 Create new character column conditionally from range of values in existing numeric column with default static value
| Description | |
|---|---|
| Method to create a new character column by using a conditional statement from a range of values in an existing numeric column adding a static default value |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = package::function(
package::function(column_name, low_range, high_range) ~ "text",
package::function(column_name, low_range, high_range) ~ "text",
package::function(column_name, low_range, high_range) ~ "text",
package::function(column_name, low_range, high_range) ~ "text",
TRUE ~ "text"))Actual Instructions
dplyr::mutate(data, grade = dplyr::case_when(
dplyr::between(score, 80, 100) ~ "A",
dplyr::between(score, 70, 79) ~ "B",
dplyr::between(score, 60, 69) ~ "C",
dplyr::between(score, 50, 59) ~ "D",
TRUE ~ "F"))10.3.5 Create new character column conditionally from specific values in a character column
| Description | |
|---|---|
| Method to create a new character column by using a conditional statement from specific values from a character column |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Preparation
df <- readr::read_csv("C:/data/sample.csv")Sample Instructions
package::function(data, new_column_name = package::function(
column_name %in% c("text1", "text2", "text3", "text4") ~ "new value1",
column_name == "text5" ~ "new value 2",
column_name == "text6" ~ "new value 3",
column_name %in% c("text7", "text8") ~ "new value4",
column_name %in% c("text9", "text10") ~ "new value5",
column_name %in% c("text11", "text12", "text13") ~ "new value6"))Actual Instructions
dplyr::mutate(data, region = dplyr::case_when(
province_code %in% c("NL", "NS", "PE", "NB") ~ "Atlantic",
province_code == "QC" ~ "Quebec",
province_code == "ON" ~ "Ontario",
province_code %in% c("MB", "SK") ~ "Praries",
province_code %in% c("AB", "BC") ~ "Western",
province_code %in% c("NU", "NT", "YK") ~ "Northern"))