Chapter 4 Importing Data
Data are the ingredients needed for all successful analytic undertaking. The volume and diversity of data created daily is increasing dramatically from various different sources. Importing data is a necessary and staple step for any analytics projects.
The amount of data we produce every day is truly mind-boggling. There are 2.5 quintillion bytes of data created each day at our current pace, but that pace is only accelerating with the growth of the Internet of Things (IoT). Over the last two years alone 90 percent of the data in the world was generated
Bernard Marr
Forbes
4.1 CSV Files
A comma separated value (CSV) file is a file format commonly used to store rectangular structured data, especially with data analytics projects. The next few recipes leverage the readr package for importing CSV files.
4.1.1 Import CSV File macOS
| Description | |
|---|---|
| Basic method for reading a csv file on a macOS computer |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("file_path/file_name.csv")Actual Instructions
readr::read_csv("/data/sample_dwelling_characteristics.csv")4.1.2 Import CSV File Windows
| Description | |
|---|---|
| Basic method for reading a csv file on a Windows computer |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("file_path/file_name.csv")Actual Instructions
readr::read_csv("C:/data/sample_dwelling_characteristics.csv")4.1.3 Import CSV File Linux
| Description | |
|---|---|
| Basic method for reading a csv file on a Linux computer |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("~/file_path/file_name.csv")Actual Instructions
readr::read_csv("~/data/sample_dwelling_characteristics.csv")4.1.4 Import CSV from GitHub
| Description | |
|---|---|
| Basic method for reading a csv file stored in a GitHub repository |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("url/file_name.csv")Actual Instructions
readr::read_csv("https://raw.githubusercontent.com/ryangarnett/r-recipe-book/main/data/sample_dwelling_characteristics.csv")4.1.5 Skip Rows on CSV Import
| Description | |
|---|---|
| Skip a number of rows when reading a csv file |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("file_path/file_name.csv", skip = number)Actual Instructions
readr::read_csv("C:/data/sample_dwelling_characteristics.csv", skip = 2)4.1.6 Skip Blank Rows on CSV Import
| Description | |
|---|---|
| Skip all rows that are blank when reading a csv file |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("file_path/file_name.csv", skip = TRUE)Actual Instructions
readr::read_csv("C:/data/sample_dwelling_characteristics.csv", skip = TRUE)4.1.7 Read Number of Rows on CSV Import
| Description | |
|---|---|
| Read a specific number of rows when reading a csv file |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
package::function("file_path/file_name.csv", n_max = number)Actual Instructions
readr::read_csv("C:/data/sample_dwelling_characteristics.csv", n_max = 1000)4.1.8 Read Specific Columns on CSV Import
| Description | |
|---|---|
| Read a specific columns when reading a csv file |
| Ingredients | |
|---|---|
| Package | Data |
readr |
sample.csv |
Sample Instructions
object_name <- package::function(column1 = col_data_type(),
column2 = col_data_type())
package::function("file_path/file_name.csv", col_types = object_name)Actual Instructions
selected_columns <- readr::cols_only(year_built = col_double(),
style = col_character())
readr::read_csv("C:/data/sample_dwelling_characteristics.csv", col_types = selected_columns)4.2 Excel Files
Within many organizations Excel files are a file format commonly used to store rectangular structured data. The next few recipes leverage the readxl package for importing Excel files.
4.2.1 Import Excel File
| Description | |
|---|---|
| Basic method for reading an excel file |
| Ingredients | |
|---|---|
| Package | Data |
readxl |
sample.xlsx |
Sample Instructions
package::function("file_path/file_name.xlsx")Actual Instructions
readxl::read_excel("C:/data/sample.xlsx")4.2.2 Skip Rows on Excel Import
| Description | |
|---|---|
| Skip rows when reading excel files |
| Ingredients | |
|---|---|
| Package | Data |
readxl |
sample.xlsx |
Sample Instructions
package::function("file_path/file_name.xlsx", skip = number)Actual Instructions
readxl::read_excel("C:/data/sample.xlsx", skip = 2)4.2.3 Read Number of Rows on Excel Import
| Description | |
|---|---|
| Read a specific number of rows when reading an excel file |
| Ingredients | |
|---|---|
| Package | Data |
readxl |
sample.xlsx |
Sample Instructions
package::function("file_path/file_name.xlsx", n_max = number)Actual Instructions
readxl::read_excel("C:/data/sample.xlsx", n_max = 1000)4.2.4 Read Specific Sheet on Excel Import
| Description | |
|---|---|
| Read a specific sheet of an excel file |
| Ingredients | |
|---|---|
| Package | Data |
readxl |
sample.xlsx |
Sample Instructions
package::function("file_path/file_name.xlsx", sheet = "sheet_name")Actual Instructions
readxl::read_excel("C:/data/sample.xlsx", sheet = "source_data")4.2.5 Read Range of Cells on Excel Import
| Description | |
|---|---|
| Read a specific range of cells of an excel file |
| Ingredients | |
|---|---|
| Package | Data |
readxl |
sample.xlsx |
Sample Instructions
package::function("file_path/file_name.xlsx", range = "upper left cell ID:lower right cell ID")Actual Instructions
readxl::read_excel("C:/data/sample.xlsx", range = "B2:D25")