Lesson 9 Podcast — Introduction to Clustered Data

Summary

Identifies clustering as the assumption-violating reality of nearly all epidemiologic data, walking through hierarchical clustering of patients within hospitals, students within classrooms within schools, repeated measurements within people, children within families, and households within neighbourhoods, plus cross-classified structures. Quantifies within-cluster correlation through the intraclass correlation coefficient and shows how naive analyses inflate Type I error rates by reducing effective sample size below the nominal count. Previews five families of solutions including cluster fixed effects, robust standard errors, generalized estimating equations, mixed models, and survey-weighted approaches, setting up the deeper treatments that follow.

Introduction to Clustered Data

Summary

Audio

Transcript