Chapter 12 Descriptive Analysis

12.1 Single Column

12.1.1 Find minimum value in a numeric column

Description
Method to find the minimum value of a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, minimum_year_built = min(year_built, na.rm = TRUE))

12.1.2 Find maximum value in a numeric column

Description
Method to find the maximum value of a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, maximum_year_built = max(year_built, na.rm = TRUE))

12.1.3 Calculate mean value in a numeric column

Description
Method to calculate the mean value of a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, mean_year_built = mean(year_built, na.rm = TRUE))

12.1.4 Calculate median value in a numeric column

Description
Method to calculate the median value of a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, median_year_built = median(year_built, na.rm = TRUE))

12.1.5 Calculate standard deviation in a numeric column

Description
Method to calculate the standard deviation of a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, stdev_year_built = sd(year_built, na.rm = TRUE))

12.1.6 Calculate sum of a numeric column

Description
Method to calculate the total sum of a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, sum_living_units = sum(living_units, na.rm = TRUE))

12.1.7 Round calculated mean value in a numeric column

Description
Method to calculate the mean value of a numeric column and round to one decimal place in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(function(column_name, na.rm = TRUE), number))


Actual Instructions

dplyr::summarize(df, mean_year_built = round(mean(year_built, na.rm = TRUE), 1))

12.1.8 Calculate multiple descriptive statistics of a numeric column

Description
Method to calculate multiple descriptive statistics from a numeric column in a dataframe
Ingredients
Package Data

readr
dplyr

sample.csv


Preparation

df <- readr::read_csv("C:/data/sample.csv")


Sample Instructions

package::function(data, new_column_name = function(column_name, na.rm = TRUE),
                  new_column_name = function(column_name, na.rm = TRUE),
                  new_column_name = function(column_name, na.rm = TRUE),
                  new_column_name = function(column_name, na.rm = TRUE),
                  new_column_name = function(column_name, na.rm = TRUE))


Actual Instructions

dplyr::summarize(df, min_year_built = min(year_built, na.rm = TRUE),
                 mean_year_built = mean(year_built, na.rm = TRUE),
                 median_year_built = median(year_built, na.rm = TRUE),
                 max_year_built = max(year_built, na.rm = TRUE),
                 stdev_year_built = sd(year_built, na.rm = TRUE))

12.2 Multiple Columns

column based rowwise summarize_all summarize_if min, mean, max, median, sd