Chapter 4 Importing Data
Data are the ingredients needed for all successful analytic undertaking. The volume and diversity of data created daily is increasing dramatically from various different sources. Importing data is a necessary and staple step for any analytics projects.
The amount of data we produce every day is truly mind-boggling. There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT). Over the last two years alone 90 percent of the data in the world was generated
Bernard Marr
Forbes
4.1 CSV Files
A comma separated value (CSV) file is a file format commonly used to store rectangular structured data, especially with data analytics projects. The next few recipes leverage the readr
package for importing CSV files.
4.1.1 Import CSV File macOS
Description | |
---|---|
Basic method for reading a csv file on a macOS computer |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("file_path/file_name.csv") package
Actual Instructions
::read_csv("/data/sample_dwelling_characteristics.csv") readr
4.1.2 Import CSV File Windows
Description | |
---|---|
Basic method for reading a csv file on a Windows computer |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("file_path/file_name.csv") package
Actual Instructions
::read_csv("C:/data/sample_dwelling_characteristics.csv") readr
4.1.3 Import CSV File Linux
Description | |
---|---|
Basic method for reading a csv file on a Linux computer |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("~/file_path/file_name.csv") package
Actual Instructions
::read_csv("~/data/sample_dwelling_characteristics.csv") readr
4.1.4 Import CSV from GitHub
Description | |
---|---|
Basic method for reading a csv file stored in a GitHub repository |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("url/file_name.csv") package
Actual Instructions
::read_csv("https://raw.githubusercontent.com/ryangarnett/r-recipe-book/main/data/sample_dwelling_characteristics.csv") readr
4.1.5 Skip Rows on CSV Import
Description | |
---|---|
Skip a number of rows when reading a csv file |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("file_path/file_name.csv", skip = number) package
Actual Instructions
::read_csv("C:/data/sample_dwelling_characteristics.csv", skip = 2) readr
4.1.6 Skip Blank Rows on CSV Import
Description | |
---|---|
Skip all rows that are blank when reading a csv file |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("file_path/file_name.csv", skip = TRUE) package
Actual Instructions
::read_csv("C:/data/sample_dwelling_characteristics.csv", skip = TRUE) readr
4.1.7 Read Number of Rows on CSV Import
Description | |
---|---|
Read a specific number of rows when reading a csv file |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
::function("file_path/file_name.csv", n_max = number) package
Actual Instructions
::read_csv("C:/data/sample_dwelling_characteristics.csv", n_max = 1000) readr
4.1.8 Read Specific Columns on CSV Import
Description | |
---|---|
Read a specific columns when reading a csv file |
Ingredients | |
---|---|
Package | Data |
readr |
sample.csv |
Sample Instructions
<- package::function(column1 = col_data_type(),
object_name column2 = col_data_type())
::function("file_path/file_name.csv", col_types = object_name) package
Actual Instructions
<- readr::cols_only(year_built = col_double(),
selected_columns style = col_character())
::read_csv("C:/data/sample_dwelling_characteristics.csv", col_types = selected_columns) readr
4.2 Excel Files
Within many organizations Excel files are a file format commonly used to store rectangular structured data. The next few recipes leverage the readxl
package for importing Excel files.
4.2.1 Import Excel File
Description | |
---|---|
Basic method for reading an excel file |
Ingredients | |
---|---|
Package | Data |
readxl |
sample.xlsx |
Sample Instructions
::function("file_path/file_name.xlsx") package
Actual Instructions
::read_excel("C:/data/sample.xlsx") readxl
4.2.2 Skip Rows on Excel Import
Description | |
---|---|
Skip rows when reading excel files |
Ingredients | |
---|---|
Package | Data |
readxl |
sample.xlsx |
Sample Instructions
::function("file_path/file_name.xlsx", skip = number) package
Actual Instructions
::read_excel("C:/data/sample.xlsx", skip = 2) readxl
4.2.3 Read Number of Rows on Excel Import
Description | |
---|---|
Read a specific number of rows when reading an excel file |
Ingredients | |
---|---|
Package | Data |
readxl |
sample.xlsx |
Sample Instructions
::function("file_path/file_name.xlsx", n_max = number) package
Actual Instructions
::read_excel("C:/data/sample.xlsx", n_max = 1000) readxl
4.2.4 Read Specific Sheet on Excel Import
Description | |
---|---|
Read a specific sheet of an excel file |
Ingredients | |
---|---|
Package | Data |
readxl |
sample.xlsx |
Sample Instructions
::function("file_path/file_name.xlsx", sheet = "sheet_name") package
Actual Instructions
::read_excel("C:/data/sample.xlsx", sheet = "source_data") readxl
4.2.5 Read Range of Cells on Excel Import
Description | |
---|---|
Read a specific range of cells of an excel file |
Ingredients | |
---|---|
Package | Data |
readxl |
sample.xlsx |
Sample Instructions
::function("file_path/file_name.xlsx", range = "upper left cell ID:lower right cell ID") package
Actual Instructions
::read_excel("C:/data/sample.xlsx", range = "B2:D25") readxl