Chapter 8 Data Transformation

Data is collected in many different ways, and for many different purposes. While the data structure may work for oen use case, it may not work for another. A few common series of tasks is to transform the strucutre and data types. This chapter will focus on transforming data to follow the tidy data principles.


It is often said that 80% of data analysis is spent on the cleaning and preparing data. And it’s not just a first step, but it must be repeated many times over the course of analysis as new problems come to light or new data is collected.
Tidy data - exerts from Hadly Wickham

8.1 Between Data Types

It is very common that the data types stored in a dataset differ from the desired data type for analysis. Data may be collected in one format for various reason, however that format many not allow for specific data analysis techniques. The following recipes outline how to change columns of data to a new data type.

8.1.1 Change character data type to numeric data type in single column

Description
Method to change all character values to numeric values in a single column of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, column_name = as.numeric(column_name))


Actual Instructions

dplyr::mutate(df, column1 = as.numeric(column1))

8.1.2 Change numeric data type to character data type in single column

Description
Method to change all numeric values to character values in a single column of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, column_name = as.character(column_name))


Actual Instructions

dplyr::mutate(df, column1 = as.character(column1))

8.1.3 Change character data type to date data type in single column

Description
Method to change all character values to date values in a single column of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, column_name = as.Date(column_name))


Actual Instructions

dplyr::mutate(df, column1 = as.Date(column1))

8.1.4 Change all character data columns to numeric data

Description
Method to change all character values to numeric values in all character columns of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, is.character, as.numeric)


Actual Instructions

dplyr::mutate_if(df, is.character, as.numeric)

8.1.5 Change all numeric data columns to character data

Description
Method to change all numeric values to character values in all numeric columns of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, is.numeric, as.character)


Actual Instructions

dplyr::mutate_if(df, is.numeric, as.character)

8.1.6 Change all character data columns to date data

Description
Method to change all character values to date values in all character columns of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, is.chacter, as.Date)


Actual Instructions

dplyr::mutate_if(df, is.character, as.Date)

8.1.7 Change character data type to numeric data type in selected columns

Description
Method to change all character values to numeric values in selected character columns of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, c("column_name1", "column_name2"), as.numeric)


Actual Instructions

dplyr::mutate_at(df, c("column1", "column2"), as.numeric)

8.1.8 Change numeric data type to character data type in selected columns

Description
Method to change all numeric values to character values in selected numeric columns of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, c("column_name1", "column_name2"), as.character)


Actual Instructions

dplyr::mutate_at(df, c("column1", "column2"), as.character)

8.1.9 Change character data type to date data type in selected columns

Description
Method to change all character values to date values in selected character columns of a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, c("column_name1", "column_name2"), as.Date)


Actual Instructions

dplyr::mutate_at(df, c("column1", "column2"), as.Date)

8.2 Reshaping Data