Chapter 9 Combining Data

It is very common to have data stored in multiple different tables or dataframes. For many data analysis approaches it is preferred to have multiple data sources in a single dataframe. This chapter will focus on combining multiple dataframes into a single dataframe.

9.1 Appending Dataframes

Appending data is a common practice to combine multiple dataframes into a single dataframes without using common values between datasets to combine data.

9.1.1 Add a dataframe to bottom of a dataframe

Description
Method to add rows from a dataframe that has the same columns to the bottom of another dataframe
Ingredients
Package Data

readr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample1.csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

rbind(data1, data2)


Actual Instructions

rbind(df1, df2)

9.1.2 Add a dataframe to right of a dataframe

Description
Method to add columns from a dataframe to the right of the existing columns in another dataframe
Ingredients
Package Data

readr

sample1.csv
sample2.csv


df1 <- readr::read_csv("C:/data/sample1.csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

cbind(data1, data2)


Actual Instructions

cbind(df1, df2)

9.2 Joining Dataframes

Joining data is a common practice to combine multiple dataframes into a single dataframes based on common values within columns.

9.2.1 Join with same joining column names

Description
Method to join two dataframes when the joining column name is the same in both dataframes
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::left_join(df1, df2, by = "user_id")

9.2.2 Join with different joining names

Description
Method to join two dataframes when the joining column name is different between the dataframes
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = c("data1 column_name" = "data2 column_name"))


Actual Instructions

dplyr::left_join(df1, df2, by = c("user_id" = "userid"))

9.2.3 Join with subset of columns from one dataframe

Description
Method to join two dataframes whiles reducing the number of columns from one dataframe
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, package::function(data2, column_name1, column_name2, column_name3), by = "column_name")


Actual Instructions

dplyr::left_join(df1, dplyr::select(df2, user_id, product_purchased, purchase_amount), by = "user_id")

9.2.4 Join with subset of columns from two dataframes

Description
Method to join two dataframes whiles reducing the number of columns from both dataframes
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(package::function(data1, column_name1, column_name2, column_name3), package::function(data2, column_name1, column_name2, column_name3), by = "column_name")


Actual Instructions

dplyr::left_join(dplyr::select(df1, user_id, first_name, last_name), dplyr::select(df2, user_id, product_purchased, purchase_amount), by = "user_id")

9.2.5 Join matching rows in both dataframes

Description
Method to join two dataframes when rows between both dataframes match
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::inner_join(df1, df2, by = "user_id")

9.2.6 Join matching rows from the second dataframe to the first dataframe

Description
Method to join matching rows from the second dataframe to the first dataframe
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::left_join(df1, df2, by = "user_id")

9.2.7 Join matching rows from the first dataframe to the first dataframe

Description
Method to join matching rows from the first dataframe to the first dataframe
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::right_join(df1, df2, by = "user_id")

9.2.8 Join all rows regardless of matching

Description
Method to join regardless of matching between both dataframes
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::full_join(df1, df2, by = "user_id")

9.2.9 Filter dataframe where matching rows between dataframes

Description
Method to filter rows in first dataframe that match the second dataframe
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::semi_join(df1, df2, by = "user_id")

9.2.10 Filter dataframe where non-matching rows between dataframes

Description
Method to filter rows in first dataframe that do not match the second dataframe
Ingredients
Package Data

readr
dplyr

sample1.csv
sample2.csv


Preparation

df1 <- readr::read_csv("C:/data/sample.1csv")
df2 <- readr::read_csv("C:/data/sample2.csv")


Sample Instructions

package::function(data1, data2, by = "column_name")


Actual Instructions

dplyr::anti_join(df1, df2, by = "user_id")