Chapter 4 Importing Data

Data are the ingredients needed for all successful analytic undertaking. The volume and diversity of data created daily is increasing dramatically from various different sources. Importing data is a necessary and staple step for any analytics projects.

The amount of data we produce every day is truly mind-boggling. There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT). Over the last two years alone 90 percent of the data in the world was generated
Bernard Marr
Forbes

4.1 CSV Files

A comma separated value (CSV) file is a file format commonly used to store rectangular structured data, especially with data analytics projects. The next few recipes leverage the readr package for importing CSV files.

4.1.1 Import CSV File macOS

Description
Basic method for reading a csv file on a macOS computer
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("file_path/file_name.csv")


Actual Instructions

readr::read_csv("/data/sample_dwelling_characteristics.csv")

4.1.2 Import CSV File Windows

Description
Basic method for reading a csv file on a Windows computer
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("file_path/file_name.csv")


Actual Instructions

readr::read_csv("C:/data/sample_dwelling_characteristics.csv")

4.1.3 Import CSV File Linux

Description
Basic method for reading a csv file on a Linux computer
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("~/file_path/file_name.csv")


Actual Instructions

readr::read_csv("~/data/sample_dwelling_characteristics.csv")

4.1.4 Import CSV from GitHub

Description
Basic method for reading a csv file stored in a GitHub repository
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("url/file_name.csv")


Actual Instructions

readr::read_csv("https://raw.githubusercontent.com/ryangarnett/r-recipe-book/main/data/sample_dwelling_characteristics.csv")

4.1.5 Skip Rows on CSV Import

Description
Skip a number of rows when reading a csv file
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("file_path/file_name.csv", skip = number)


Actual Instructions

readr::read_csv("C:/data/sample_dwelling_characteristics.csv", skip = 2)

4.1.6 Skip Blank Rows on CSV Import

Description
Skip all rows that are blank when reading a csv file
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("file_path/file_name.csv", skip = TRUE)


Actual Instructions

readr::read_csv("C:/data/sample_dwelling_characteristics.csv", skip = TRUE)

4.1.7 Read Number of Rows on CSV Import

Description
Read a specific number of rows when reading a csv file
Ingredients
Package Data

readr

sample.csv


Sample Instructions

package::function("file_path/file_name.csv", n_max = number)


Actual Instructions

readr::read_csv("C:/data/sample_dwelling_characteristics.csv", n_max = 1000)

4.1.8 Read Specific Columns on CSV Import

Description
Read a specific columns when reading a csv file
Ingredients
Package Data

readr

sample.csv


Sample Instructions

object_name <- package::function(column1 = col_data_type(),
                         column2 = col_data_type())

package::function("file_path/file_name.csv", col_types = object_name)


Actual Instructions

selected_columns <- readr::cols_only(year_built = col_double(),
                              style = col_character())

readr::read_csv("C:/data/sample_dwelling_characteristics.csv", col_types = selected_columns)

4.2 Excel Files

Within many organizations Excel files are a file format commonly used to store rectangular structured data. The next few recipes leverage the readxl package for importing Excel files.

4.2.1 Import Excel File

Description
Basic method for reading an excel file
Ingredients
Package Data

readxl

sample.xlsx


Sample Instructions

package::function("file_path/file_name.xlsx")


Actual Instructions

readxl::read_excel("C:/data/sample.xlsx")

4.2.2 Skip Rows on Excel Import

Description
Skip rows when reading excel files
Ingredients
Package Data

readxl

sample.xlsx


Sample Instructions

package::function("file_path/file_name.xlsx", skip = number)


Actual Instructions

readxl::read_excel("C:/data/sample.xlsx", skip = 2)

4.2.3 Read Number of Rows on Excel Import

Description
Read a specific number of rows when reading an excel file
Ingredients
Package Data

readxl

sample.xlsx


Sample Instructions

package::function("file_path/file_name.xlsx", n_max = number)


Actual Instructions

readxl::read_excel("C:/data/sample.xlsx", n_max = 1000)

4.2.4 Read Specific Sheet on Excel Import

Description
Read a specific sheet of an excel file
Ingredients
Package Data

readxl

sample.xlsx


Sample Instructions

package::function("file_path/file_name.xlsx", sheet = "sheet_name")


Actual Instructions

readxl::read_excel("C:/data/sample.xlsx", sheet = "source_data")

4.2.5 Read Range of Cells on Excel Import

Description
Read a specific range of cells of an excel file
Ingredients
Package Data

readxl

sample.xlsx


Sample Instructions

package::function("file_path/file_name.xlsx", range = "upper left cell ID:lower right cell ID")


Actual Instructions

readxl::read_excel("C:/data/sample.xlsx", range = "B2:D25")