Summary
Treats data cleaning as where analysts spend sixty to eighty percent of their working time, walking through range checks for biologically impossible values, logical and consistency checks across variables, and outlier detection using the interquartile range, z-score, and visual inspection. Develops the three Rubin missingness mechanisms of MCAR, MAR, and MNAR with the analytic strategies each one permits, contrasting listwise deletion, single imputation, and multiple imputation with Rubin's rules. Closes with descriptive statistics for individual variables, exploration of relationships between variables, and the principle that audit trails for every decision are not optional.
Audio
Transcript
Download .mdLoading transcript…