Chapter 10 Modifying Data
When working with data it is common that values in columns need to be modified, or creating new columns of data. The following recipes outline how to change data with a column, and how to create data in a new column using different techniques.
10.1 Static Value
Changing or creating data values in a column so that all rows have the same value; with examples for character, numeric, and date data types.
10.1.1 Change or create a single character column with a static character value
Description | |
---|---|
Method to change or create a character column with a single specific character value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name = "text") package
Actual Instructions
::mutate(df, country = "Canada") dplyr
10.1.2 Change or create a single numeric column with a static numeric value
Description | |
---|---|
Method to change or create a numeric column with a single specific numeric value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name = number) package
Actual Instructions
::mutate(df, population = 100000) dplyr
10.1.3 Change a single date column with a static date value
Description | |
---|---|
Method to change a date column with a single specific date value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name = "yyyy-mm-dd") package
Actual Instructions
::mutate(df, date_added = "2021-11-19") dplyr
10.1.4 Create a new date column with a static date value
Description | |
---|---|
Method to create a new date column with a single specific date value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = as.Date("yyyy-mm-dd")) package
Actual Instructions
::mutate(df, date_added = as.Date("2021-11-19")) dplyr
10.1.5 Change or create multiple character columns with a static character value
Description | |
---|---|
Method to change or create multiple character columns with a single specific character value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name1 = "text",
packagecolumn_name2 = "text")
Actual Instructions
::mutate(df, country = "Canada",
dplyrcountry_code = "CA")
10.1.6 Change or create multiple numeric columns with a static numeric value
Description | |
---|---|
Method to change or create multiple numeric columns with a single specific numeric value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name1 = number,
packagecolumn_name2 = number)
Actual Instructions
::mutate(df, population = 100000,
dplyrgrowth_rate = 1.75)
10.1.7 Change multiple date columns with a static date value
Description | |
---|---|
Method to change multiple date columns with a single specific date value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name1 = "yyyy-mm-dd",
packagecolumn_name2 = "yyyy-mm-dd")
Actual Instructions
::mutate(df, date_added = "2021-11-19",
dplyrdate_changed = "2021-11-20")
10.1.8 Create multiple new date columns with a static date value
Description | |
---|---|
Method to create multiple new date columns with a single specific date value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name1 = as.Date("yyyy-mm-dd"),
packagenew_column_name2 = as.Date("yyyy-mm-dd"))
Actual Instructions
::mutate(df, date_added = as.Date("2021-11-19"),
dplyrdate_changed = as.Date("2021-11-20"))
10.2 Using Existing Column(s)
Changing or creating data values in a column with values from an existing column of data; with examples for character, numeric, and date data types.
10.2.1 Create new character column from static text and a character column
Description | |
---|---|
Method to create a new character column by combining values from a character column with static text |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name, paste0("text", column_name)) package
Actual Instructions
::mutate(df, label = paste0("My name is ", first_name)) dplyr
10.2.2 Create new character column from multiple character columns
Description | |
---|---|
Method to create a new character column by combining values from multiple character columns |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name, paste0(column_name1, column_name2)) package
Actual Instructions
::mutate(df, username = paste0(first_name, last_name)) dplyr
10.2.3 Create new character column from multiple character columns and static text
Description | |
---|---|
Method to create a new character column by combining values from multiple character columns separated by a static text value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name, paste0(column_name1, " ", column_name2)) package
Actual Instructions
::mutate(df, legal_name = paste0(first_name, " ", last_name)) dplyr
10.2.4 Create new numeric column from a numeric column and a mathematical operation
Description | |
---|---|
Method to create a new numeric column by combining values from a numeric column and using a mathematical operation |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = column_name operator number) package
Actual Instructions
::mutate(df, cost_per_month = cost / 12) dplyr
10.2.5 Create new numeric column from multiple numeric columns and mathematical operations
Description | |
---|---|
Method to create a new numeric column by combining values from multiple numeric column and using a mathematical operation |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = column_name1 operator column_name2) package
Actual Instructions
::mutate(df, population_density = population / area_sq_km) dplyr
10.2.6 Create new date column from a date column and a mathematical operation
Description | |
---|---|
Method to create a new date column by combining values from a date column and using a mathematical operation |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = column_name operator number) package
Actual Instructions
::mutate(df, closing_date = date_created + 7) dplyr
10.2.7 Create new date column from multiple date columns and mathematical operations
Description | |
---|---|
Method to create a new date column by combining values from multiple date column and using a mathematical operation |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = column_name1 operator column_name2) package
Actual Instructions
::mutate(df, duration = end_date - start_date) dplyr
10.2.8 Create new date column from a date column and a function
Description | |
---|---|
Method to create a new date column by combining values from a date column and using a function |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = package::function(column_name)) package
Actual Instructions
::mutate(df, year_created = lubridate::year(date_created)) dplyr
10.3 Conditionally from Existing Column
Creating data values in a new column using conditional statements from a existing column of data.
10.3.1 Create new character column conditionally with default static value
Description | |
---|---|
Method to create a new character column by using a conditional statement adding a static default value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = package::function(
package~ "text",
column_name operator number TRUE ~ "text"))
Actual Instructions
::mutate(df, result = dplyr::case_when(
dplyr>= 50 ~ "pass",
score TRUE ~ "fail"))
10.3.2 Create new character column conditionally with character column as default value
Description | |
---|---|
Method to create a new character column by using a conditional statement adding default values from a character column |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = package::function(
package~ "text",
column_name operator number TRUE ~ column_name))
Actual Instructions
::mutate(df, result = dplyr::case_when(
dplyr== "F" ~ "Fail",
grade TRUE ~ result))
10.3.3 Create new character column conditionally from existing numeric column with default static value
Description | |
---|---|
Method to create a new character column by using a conditional statement from an existing numeric column adding a static default value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = package::function(
package~ "text",
column_name operator number ~ "text",
column_name operator number TRUE ~ "text"))
Actual Instructions
::mutate(data, result = dplyr::case_when(
dplyr< 50 ~ "fail",
score >= 50 ~ "pass",
score TRUE ~ "error"))
10.3.4 Create new character column conditionally from range of values in existing numeric column with default static value
Description | |
---|---|
Method to create a new character column by using a conditional statement from a range of values in an existing numeric column adding a static default value |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = package::function(
package::function(column_name, low_range, high_range) ~ "text",
package::function(column_name, low_range, high_range) ~ "text",
package::function(column_name, low_range, high_range) ~ "text",
package::function(column_name, low_range, high_range) ~ "text",
packageTRUE ~ "text"))
Actual Instructions
::mutate(data, grade = dplyr::case_when(
dplyr::between(score, 80, 100) ~ "A",
dplyr::between(score, 70, 79) ~ "B",
dplyr::between(score, 60, 69) ~ "C",
dplyr::between(score, 50, 59) ~ "D",
dplyrTRUE ~ "F"))
10.3.5 Create new character column conditionally from specific values in a character column
Description | |
---|---|
Method to create a new character column by using a conditional statement from specific values from a character column |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, new_column_name = package::function(
package%in% c("text1", "text2", "text3", "text4") ~ "new value1",
column_name == "text5" ~ "new value 2",
column_name == "text6" ~ "new value 3",
column_name %in% c("text7", "text8") ~ "new value4",
column_name %in% c("text9", "text10") ~ "new value5",
column_name %in% c("text11", "text12", "text13") ~ "new value6")) column_name
Actual Instructions
::mutate(data, region = dplyr::case_when(
dplyr%in% c("NL", "NS", "PE", "NB") ~ "Atlantic",
province_code == "QC" ~ "Quebec",
province_code == "ON" ~ "Ontario",
province_code %in% c("MB", "SK") ~ "Praries",
province_code %in% c("AB", "BC") ~ "Western",
province_code %in% c("NU", "NT", "YK") ~ "Northern")) province_code