Chapter 8 Data Transformation
Data is collected in many different ways, and for many different purposes. While the data structure may work for oen use case, it may not work for another. A few common series of tasks is to transform the strucutre and data types. This chapter will focus on transforming data to follow the tidy data principles.
It is often said that 80% of data analysis is spent on the cleaning and preparing data. And it’s not just a first step, but it must be repeated many times over the course of analysis as new problems come to light or new data is collected.
Tidy data - exerts from Hadly Wickham
8.1 Between Data Types
It is very common that the data types stored in a dataset differ from the desired data type for analysis. Data may be collected in one format for various reason, however that format many not allow for specific data analysis techniques. The following recipes outline how to change columns of data to a new data type.
8.1.1 Change character data type to numeric data type in single column
Description | |
---|---|
Method to change all character values to numeric values in a single column of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name = as.numeric(column_name)) package
Actual Instructions
::mutate(df, column1 = as.numeric(column1)) dplyr
8.1.2 Change numeric data type to character data type in single column
Description | |
---|---|
Method to change all numeric values to character values in a single column of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name = as.character(column_name)) package
Actual Instructions
::mutate(df, column1 = as.character(column1)) dplyr
8.1.3 Change character data type to date data type in single column
Description | |
---|---|
Method to change all character values to date values in a single column of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, column_name = as.Date(column_name)) package
Actual Instructions
::mutate(df, column1 = as.Date(column1)) dplyr
8.1.4 Change all character data columns to numeric data
Description | |
---|---|
Method to change all character values to numeric values in all character columns of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, is.character, as.numeric) package
Actual Instructions
::mutate_if(df, is.character, as.numeric) dplyr
8.1.5 Change all numeric data columns to character data
Description | |
---|---|
Method to change all numeric values to character values in all numeric columns of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, is.numeric, as.character) package
Actual Instructions
::mutate_if(df, is.numeric, as.character) dplyr
8.1.6 Change all character data columns to date data
Description | |
---|---|
Method to change all character values to date values in all character columns of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, is.chacter, as.Date) package
Actual Instructions
::mutate_if(df, is.character, as.Date) dplyr
8.1.7 Change character data type to numeric data type in selected columns
Description | |
---|---|
Method to change all character values to numeric values in selected character columns of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, c("column_name1", "column_name2"), as.numeric) package
Actual Instructions
::mutate_at(df, c("column1", "column2"), as.numeric) dplyr
8.1.8 Change numeric data type to character data type in selected columns
Description | |
---|---|
Method to change all numeric values to character values in selected numeric columns of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, c("column_name1", "column_name2"), as.character) package
Actual Instructions
::mutate_at(df, c("column1", "column2"), as.character) dplyr
8.1.9 Change character data type to date data type in selected columns
Description | |
---|---|
Method to change all character values to date values in selected character columns of a dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Preparation
<- readr::read_csv("C:/data/sample.csv") df
Sample Instructions
::function(data, c("column_name1", "column_name2"), as.Date) package
Actual Instructions
::mutate_at(df, c("column1", "column2"), as.Date) dplyr