Chapter 9 Combining Data
It is very common to have data stored in multiple different tables or dataframes. For many data analysis approaches it is preferred to have multiple data sources in a single dataframe. This chapter will focus on combining multiple dataframes into a single dataframe.
9.1 Appending Dataframes
Appending data is a common practice to combine multiple dataframes into a single dataframes without using common values between datasets to combine data.
9.1.1 Add a dataframe to bottom of a dataframe
Description | |
---|---|
Method to add rows from a dataframe that has the same columns to the bottom of another dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample1.csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
rbind(data1, data2)
Actual Instructions
rbind(df1, df2)
9.1.2 Add a dataframe to right of a dataframe
Description | |
---|---|
Method to add columns from a dataframe to the right of the existing columns in another dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
<- readr::read_csv("C:/data/sample1.csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
cbind(data1, data2)
Actual Instructions
cbind(df1, df2)
9.2 Joining Dataframes
Joining data is a common practice to combine multiple dataframes into a single dataframes based on common values within columns.
9.2.1 Join with same joining column names
Description | |
---|---|
Method to join two dataframes when the joining column name is the same in both dataframes |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::left_join(df1, df2, by = "user_id") dplyr
9.2.2 Join with different joining names
Description | |
---|---|
Method to join two dataframes when the joining column name is different between the dataframes |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = c("data1 column_name" = "data2 column_name")) package
Actual Instructions
::left_join(df1, df2, by = c("user_id" = "userid")) dplyr
9.2.3 Join with subset of columns from one dataframe
Description | |
---|---|
Method to join two dataframes whiles reducing the number of columns from one dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, package::function(data2, column_name1, column_name2, column_name3), by = "column_name") package
Actual Instructions
::left_join(df1, dplyr::select(df2, user_id, product_purchased, purchase_amount), by = "user_id") dplyr
9.2.4 Join with subset of columns from two dataframes
Description | |
---|---|
Method to join two dataframes whiles reducing the number of columns from both dataframes |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(package::function(data1, column_name1, column_name2, column_name3), package::function(data2, column_name1, column_name2, column_name3), by = "column_name") package
Actual Instructions
::left_join(dplyr::select(df1, user_id, first_name, last_name), dplyr::select(df2, user_id, product_purchased, purchase_amount), by = "user_id") dplyr
9.2.5 Join matching rows in both dataframes
Description | |
---|---|
Method to join two dataframes when rows between both dataframes match |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::inner_join(df1, df2, by = "user_id") dplyr
9.2.6 Join matching rows from the second dataframe to the first dataframe
Description | |
---|---|
Method to join matching rows from the second dataframe to the first dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::left_join(df1, df2, by = "user_id") dplyr
9.2.7 Join matching rows from the first dataframe to the first dataframe
Description | |
---|---|
Method to join matching rows from the first dataframe to the first dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::right_join(df1, df2, by = "user_id") dplyr
9.2.8 Join all rows regardless of matching
Description | |
---|---|
Method to join regardless of matching between both dataframes |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::full_join(df1, df2, by = "user_id") dplyr
9.2.9 Filter dataframe where matching rows between dataframes
Description | |
---|---|
Method to filter rows in first dataframe that match the second dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::semi_join(df1, df2, by = "user_id") dplyr
9.2.10 Filter dataframe where non-matching rows between dataframes
Description | |
---|---|
Method to filter rows in first dataframe that do not match the second dataframe |
Ingredients | |
---|---|
Package | Data |
readr |
sample1.csv |
Preparation
<- readr::read_csv("C:/data/sample.1csv")
df1 <- readr::read_csv("C:/data/sample2.csv") df2
Sample Instructions
::function(data1, data2, by = "column_name") package
Actual Instructions
::anti_join(df1, df2, by = "user_id") dplyr