Mixed Models for Continuous Data

Exploratory Data Analysis For Epidemiology

Learning objectives for this lesson:

Understand the concepts of fixed and random effects in linear mixed models
Write and interpret the linear mixed model equation with random intercepts
Decompose variance into between-group and within-group components and calculate the ICC
Understand random slopes models and their interpretation as hierarchical models
Explain contextual effects and how within-group and between-group regressions can differ
Describe estimation methods (ML and REML) and their properties
Conduct inference for both fixed and random effects in mixed models
Understand the role of BLUPs, shrinkage, residuals, and model diagnostics

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas

Linear mixed model (LMM)An extension of linear regression for clustered or hierarchical data that combines fixed effects (population-average parameters) and random effects (cluster-specific deviations). Written y = Xβ + Zu + ε, with u and ε assumed normally distributed.

Fixed effectsParameters describing the average relationship between covariates and the outcome in the population. Estimated as single values (no distributional assumption on the parameter itself).

Random effectsCluster-specific deviations modeled as draws from a normal distribution with mean zero and an estimated variance. They induce within-cluster correlation in the marginal distribution of the outcome.

Random interceptA cluster-specific shift in the mean response. Each cluster has its own intercept drawn from N(0, σ²_u); the slopes are common across clusters.

Random slopeA cluster-specific deviation in the effect of a covariate. Allows the relationship between a predictor and the outcome to vary across clusters.

Variance componentsThe separate variances at each level of a multilevel model: between-cluster (σ²_u) and within-cluster (σ²_e). Sum to the total variance and define the ICC.

Level-1 / Level-2Conventional terminology in multilevel modeling. Level-1 = lowest-level observations (e.g., students, patients); level-2 = clusters (e.g., schools, clinics). Higher levels follow analogously.

ShrinkageThe tendency of mixed-model predictions for individual clusters to be pulled toward the overall mean, especially when within-cluster sample sizes are small or between-cluster variance is small. Improves predictive accuracy by trading bias for reduced variance.

Partial poolingA property of mixed models that combines (no pooling: separate per-cluster fits) with (complete pooling: a single combined fit). Each cluster's estimate borrows strength from the others in proportion to its information.

Within- vs between-cluster centeringDecomposing a level-1 covariate into its cluster mean (between-cluster part) and the deviation from that mean (within-cluster part) to estimate distinct between- and within-cluster effects.

Random-effects assumptionThe assumption that random effects are independent of the model's covariates. Violation produces biased fixed-effect estimates; tested by comparing fixed-effects and random-effects fits (Hausman test in econometrics).

Methods & Statistical Concepts

Restricted Maximum Likelihood (REML)An estimation method that produces approximately unbiased variance-component estimates by maximizing the likelihood of residuals after accounting for the fixed effects. Default in most mixed-model software for parameter estimates and standard errors.

Maximum Likelihood (ML)An estimation method that maximizes the joint likelihood of all parameters. Required when comparing models with different fixed-effect structures via likelihood-ratio tests; underestimates variance components for small samples.

Best Linear Unbiased Predictor (BLUP)The empirical Bayes prediction of a cluster's random effect given the data. Combines the cluster-specific estimate with the overall mean, producing shrinkage toward zero.

Empirical BayesAn approach in which the “prior” for cluster-level effects (the random-effects distribution) is estimated from the data and then used to derive posterior predictions of cluster effects (BLUPs).

Likelihood-ratio test for variance componentsA test of whether a random effect is needed (H₀: variance = 0). Because zero is on the boundary of the parameter space, the standard chi-square reference distribution is conservative; a 50:50 mixture of χ²₀ and χ²₁ is recommended.

AIC / BIC for mixed modelsInformation criteria used to compare non-nested mixed models. Comparisons of fixed-effect structures require ML (not REML); comparisons of random-effect structures require fitting on the same fixed-effects specification.

Kenward–Roger / Satterthwaite degrees of freedomSmall-sample adjustments to denominator degrees of freedom in the F-tests of fixed effects. Improve coverage of confidence intervals and accuracy of P-values when cluster numbers are small.

lme4 / nlme (R packages)Two widely used R packages for fitting linear mixed models. nlme (Pinheiro & Bates) supports flexible covariance structures; lme4 (Bates et al.) is faster and supports GLMMs but does not provide P-values by default.

Caterpillar plotA graphical display of cluster-specific BLUPs sorted by point estimate, with uncertainty bars. Useful for spotting outlying clusters and visualizing shrinkage.

Random-effects diagnosticsPlots and statistics used to check the normality and homoscedasticity assumptions of random effects (e.g., QQ-plots of BLUPs, level-2 residuals).

Key People

Nan Laird & James WareCo-authored the foundational 1982 paper that formalized the linear mixed-effects model and EM algorithm for longitudinal data, paving the way for modern multilevel software.

Charles R. Henderson (1911–1989)American animal scientist who developed the mixed-model equations and the BLUP framework in the 1950s for genetic evaluation of livestock. His work underlies modern mixed-model estimation.

Douglas Bates (b. 1949)American statistician and lead author of the R packages nlme and lme4. His work has made mixed-model fitting accessible to applied researchers across many disciplines.

José PinheiroStatistician and co-author with Douglas Bates of the influential book Mixed-Effects Models in S and S-PLUS (2000), which helped popularize mixed-model methods in statistics and biostatistics.

David A. HarvilleAmerican statistician who developed restricted maximum likelihood (REML) estimation in the 1970s, providing the variance-component estimation method now standard in mixed models.

No matching entries. Try a different search term.

Section 1

Introduction & The Linear Mixed Model

⏱ Estimated time: 20 minutes

Lesson 10 · HSCI 410

Mixed Models for Continuous Data

Letting each cluster have its own parameters, drawn from a common distribution.

Where this fits

Building on an earlier lesson

An earlier lesson mapped the full range of options for clustered data. This lesson takes the mixed-model branch and develops it from the ground up.

Running example: systolic blood pressure across 30 clinics, roughly 960 patients.
R activities use the phaa_clinics.csv dataset carried forward from an earlier lesson.
The framework built here extends directly to discrete outcomes in a later lesson.

Section 1 of 4

Introduction & The Linear Mixed Model

Fixed and random effects, variance components, the ICC, and the random-intercept equation.

Core distinction

Fixed vs. random effects

Fixed effects

Population-average relationships. The effect of age, treatment, or smoking on the outcome, assumed the same across all groups.

Random effects

Group-specific deviations modeled as draws from a normal distribution with mean zero. They capture how clusters differ from the overall mean.

The mixed model includes both: fixed effects for the predictors of interest, random effects for the clustering structure.

Variance decomposition

Splitting total variance into two components

Total variance

\[ \color{#0B7B6B}{\text{Var}(Y)} = \color{#C2410C}{\sigma^2_g} + \color{#6D28D9}{\sigma^2} \]

Var(Y) total variance σ²_g between-group σ² within-group

Between-group: \(\sigma^2_g\)

How much cluster means vary around the overall mean.

Within-group: \(\sigma^2\)

How much individuals vary around their own cluster mean.

The model equation

Random intercept model

Random-intercept model (Laird & Ware, 1982)

\[ \color{#0B7B6B}{Y_{ij}} = \color{#C2410C}{\beta_0 + \beta_1 X_{ij}} + \color{#6D28D9}{u_j} + \color{#BE185D}{\varepsilon_{ij}} \]

Y_ij outcome β₀+β₁X fixed part u_j group random intercept ε_ij residual

where \(u_j \sim N(0,\, \sigma^2_g)\) and \(\varepsilon_{ij} \sim N(0,\, \sigma^2)\).

Each group gets its own intercept: \(\beta_0 + u_j\). The fixed intercept \(\beta_0\) is the overall mean; \(u_j\) is that group's deviation from it.

Measuring clustering

The intraclass correlation coefficient (ICC)

ICC

\[ \color{#0B7B6B}{\rho} = \frac{\color{#C2410C}{\sigma^2_g}}{\color{#C2410C}{\sigma^2_g} + \color{#6D28D9}{\sigma^2}} \]

ρ intraclass correlation σ²_g between-group σ² within-group

ICC \(\approx 0\)

Clustering matters little. Within-cluster observations are no more similar than between-cluster observations.

ICC \(\approx 1\)

Nearly all variance is between groups. Individuals within a cluster are very similar to one another.

Example: \(\sigma^2_g = 200,\; \sigma^2 = 800 \Rightarrow \rho = 0.20\)

Matrix form

The general linear mixed model

General form

\[ \color{#0B7B6B}{\mathbf{Y}} = \color{#C2410C}{\mathbf{X}\boldsymbol{\beta}} + \color{#6D28D9}{\mathbf{Z}\mathbf{u}} + \color{#BE185D}{\boldsymbol{\varepsilon}} \]

Y outcome vector Xβ fixed effects Zu random effects ε residuals

\(\mathbf{X}\boldsymbol{\beta}\): fixed effects (predictors of interest)
\(\mathbf{Z}\mathbf{u}\): random effects (cluster-specific deviations)
\(\boldsymbol{\varepsilon}\): individual-level residual error

The random-intercept model is the special case where \(\mathbf{Z}\) is a column of group-membership indicators.

Carry forward

What to take into the next section

Fixed effects estimate population-average predictor relationships; random effects capture cluster-level deviations.
Total variance splits into between-group \(\sigma^2_g\) and within-group \(\sigma^2\) components.
The ICC \(\rho\) measures what fraction of variance is attributable to clustering.
The random intercept model adds one latent term per cluster to ordinary regression.

Introduction and Overview

An earlier lesson ended with a roadmap of methods for clustered data (fixed effects, robust variances, GEE, mixed models, and survey-design adjustments. This lesson takes the deepest single branch of that roadmap and develops it: linear mixed models for continuous outcomes. The mixed-model framework is the backbone for the next two lessons as well, where we extend it to discrete outcomes (a later lesson) and to repeated-measures designs (a later lesson), so the concepts you build here pay forward repeatedly.

The four content sections move from foundations to applications. This section introduces the linear mixed model, the distinction between fixed and random effects, and the random-intercepts specification, the simplest form of the model. A later section extends the framework to random slopes and full hierarchical models, where covariate effects themselves can vary across clusters. A later section tackles the subtle but consequential difference between within-cluster and between-cluster (contextual) effects, and the inferential machinery (REML, likelihood ratio tests) used to fit and compare these models. A later section closes with prediction, BLUPs, residual diagnostics, and the practical questions you face when validating a fitted mixed model.

Learning Objectives

Distinguish fixed effects from random effects and explain when a mixed model is preferable to ordinary regression.
Partition total variance into between- and within-cluster components and compute the intraclass correlation coefficient (ICC).
Write the random-intercept linear mixed model in scalar and matrix form, identifying each parameter.
Interpret the ICC as both a measure of clustering and a signal of when mixed-model adjustments matter.

Fixed vs. Random Effects

In many epidemiological studies, observations are grouped or clustered: patients within hospitals, students within schools, or repeated measurements within individuals. Linear mixed models (also known as multilevel or hierarchical linear models) handle such data by incorporating both fixed effects and random effects (Laird & Ware, 1982; Curran & Bauer, 2011).

▸ INTERACTIVE STORY: RANDOM INTERCEPTS & SLOPES
Open full screen ↗

Five classrooms, then random intercepts, then random slopes, then partial pooling. Next ▶ advances scenes.

A 6-scene visualization of mixed-effects models: naive pooled regression ignoring clusters, the reveal of nested classrooms, random intercepts (different starting heights), random slopes (different angles), and partial pooling (shrinkage toward the group mean).

Fixed effects are parameters of primary interest that are constant across groups, namely the population-average effects of predictors. Random effects represent variation across groups or clusters and are modeled as random draws from a probability distribution, typically normal with mean zero.

Fixed EffectsClick to explore

Random EffectsClick to explore

Why Mixed Models?Click to explore

Variance Components

A fundamental concept in mixed models is the partitioning of total variance into two components: between-group variance (σ²_g) and within-group variance (σ²). The between-group variance captures how much group means differ from the overall mean, while the within-group variance captures how much individual observations vary around their group mean.

When to Use Mixed Models

Mixed models are appropriate when your data has a hierarchical or clustered structure; for example, patients nested within clinics, animals nested within herds, or repeated measures nested within subjects. If observations within groups are correlated (i.e., the ICC is non-trivial), ignoring this structure can lead to incorrect inference. Mixed models handle unbalanced data gracefully and can incorporate predictors at both the individual and group levels.

The Linear Mixed Model Equation

The random intercept model extends ordinary linear regression by adding a group-specific random deviation to the intercept (Laird & Ware, 1982):

Linear mixed model with random intercept (Eq 21.2)

\[ \color{#0B7B6B}{Y_i} = \color{#C2410C}{\beta_0 + \beta_1 X_{1i} + \cdots + \beta_k X_{ki}} + \color{#6D28D9}{u_{\text{group}(i)}} + \color{#BE185D}{\varepsilon_i} \]

The outcome is the fixed-effect part shared by everyone, plus a group-specific random intercept, plus individual residual error.

In this model, each group has its own intercept: β₀ + u_group, where u_group ~ N(0, σ²_g) and ε_i ~ N(0, σ²). The random intercept u_group captures how much a particular group’s mean deviates from the overall intercept β₀.

The Intraclass Correlation Coefficient (ICC)

The ICC measures the proportion of total variance that is attributable to between-group differences:

Intraclass correlation coefficient

\[ \color{#0B7B6B}{\rho} = \frac{\color{#C2410C}{\sigma^2_g}}{\color{#C2410C}{\sigma^2_g} + \color{#6D28D9}{\sigma^2}} \]

The ICC is the between-group variance as a share of the total: between-group plus within-group variance.

An ICC close to 0 means observations within groups are no more similar than observations from different groups. An ICC close to 1 means most of the variance is between groups, and observations within the same group are very similar.

Two readings of the ICC help it stick. First, it is exactly the correlation you would expect between two patients drawn from the same clinic, which is why a larger ICC signals stronger clustering. Second, it tells you when clustering is worth modelling at all. Using the running example, a between-clinic variance near 65 and a residual (within-clinic) variance near 300 give an ICC of 65 / (65 + 300), which is close to 0.18, so roughly a sixth of the variation in blood pressure tracks which clinic a patient attends. An ordinary regression that ignored the clinics would treat those correlated patients as independent and report standard errors that are too small.

R Activity: Random-intercept linear mixed model with lme4::lmer()

The clustered dataset phaa_clinics.csv (30 clinics, ~960 patients) carries forward from an earlier lesson. The continuous outcome is sbp. (1 | clinic_id) says "give every clinic its own intercept, drawn from a Normal." The full annotated script is in r-activities/HSCI_410_Lesson_10_Mixed_Models_for_Continuous_Data.R.

library(lme4);  library(lmerTest);  library(performance)
clinics <- read.csv("phaa_clinics.csv", stringsAsFactors = FALSE)
clinics$clinic_id    <- factor(clinics$clinic_id)
clinics$clinic_urban <- factor(clinics$clinic_urban,
                                  levels = c("rural","urban"))
clinics$smoker       <- factor(clinics$smoker, levels = c("No","Yes"))

# 1. Null model: variance partitioning
m0 <- lmer(sbp ~ 1 + (1 | clinic_id), data = clinics)
icc(m0)

# 2. Add patient-level fixed effects
m1 <- lmer(sbp ~ age + smoker + bmi + female + (1 | clinic_id),
           data = clinics)

# 3. Add cluster-level fixed effects (clinic_urban, clinic_size)
m2 <- lmer(sbp ~ age + smoker + bmi + female
                  + clinic_urban + scale(clinic_size)
                  + (1 | clinic_id),
           data = clinics)
summary(m2)
icc(m2)

# 4. Is the random intercept needed?
ranova(m2)

# 5. Random slope: does the smoker effect vary by clinic?
m3 <- lmer(sbp ~ age + smoker + bmi + female + clinic_urban + scale(clinic_size)
                  + (1 + smoker | clinic_id), data = clinics)
anova(m2, m3)        # LRT: random-intercept-only vs random-slope

Why lmerTest. Plain lme4 hides p-values for fixed effects (the df are uncertain). lmerTest adds Satterthwaite-corrected p-values that are good enough for most reports. The contextual-effects trick (splitting age into its clinic-mean and within-clinic deviation) is in the full activity file; it lets you separate "older patients have higher SBP" from "clinics with older patients have higher SBP."

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console / plot before answering.

1. From summary(m0) Random effects table, report the between-clinic variance (sigma^2_u) and residual variance (sigma^2). Compute and confirm icc(m0) = u^2 / (u^2 + sigma^2). What fraction of total SBP variance lies between clinics?

Model answerBetween-clinic variance (σ²_u) is typically around 50–80 mmHg²; residual variance (σ²) is typically 250–350 mmHg². ICC = σ²_u / (σ²_u + σ²) ≈ 65 / (65+300) ≈ 0.18. About 18% of the total SBP variance lies between clinics, with the remaining 82% within clinics, so the clustering is moderate and worth modelling.

2. Compare icc(m0) to icc(m2). After adding the patient- and clinic-level fixed effects, did the between-clinic variance shrink? What does that tell you about how much of the original clustering was explained by clinic_urban and clinic_size?

Model answericc(m2) typically drops to ~0.10 after adding patient-level (age, sex, BMI) and clinic-level (urban, size) fixed effects, down from 0.18 in m0. The shrinkage of ~0.08 in ICC reflects how much of the original between-clinic variance was explained by the clinic-level covariates, about half. Residual ICC of 0.10 says clinics still differ in ways not captured by these covariates (perhaps quality of care, patient socioeconomic profile, regional factors).

3. From anova(m2, m3), what is the chi-square and p-value for adding a random slope on smoker? Should you keep the random slope, or stick with the random-intercept-only model? Why is the simpler model preferred when the LRT is non-significant?

Model answeranova(m2, m3) for adding a random slope on smoker typically returns a χ² around 1–3 with p > 0.10, which is non-significant. Because the extra slope variance sits at its lower boundary of zero, that reported p-value is if anything conservative, so the boundary correction from this lesson would only flip a borderline result, not a clearly non-significant one like this. Keep the simpler random-intercept-only model. The non-significant LRT means the data don't provide strong evidence that the smoking effect varies meaningfully between clinics; the additional random-effect parameter doesn't improve fit enough to justify its loss of df. Simpler models are preferred because (a) fewer parameters to estimate, more stable; (b) easier to interpret and communicate; (c) avoid overfitting to clinic-specific noise; (d) match the principle of parsimony, explaining the data with the fewest assumptions.

Saved.

Matrix Notation

In matrix form, the linear mixed model is written as:

Matrix notation (Eq 21.8)

\[ \color{#0B7B6B}{\mathbf{Y}} = \color{#C2410C}{\mathbf{X}\boldsymbol{\beta}} + \color{#6D28D9}{\mathbf{Z}\mathbf{u}} + \color{#BE185D}{\boldsymbol{\varepsilon}} \]

The outcome vector equals the fixed-effects design times its coefficients, plus the random-effects design times the random effects, plus residual error.

Here, X is the design matrix for fixed effects, β is the vector of fixed effect coefficients, Z is the design matrix for random effects, u is the vector of random effects, and ε is the vector of residual errors.

Connection to ANOVA-based variance component estimation

ANOVA-based methods provide simple estimators of variance components by equating observed mean squares to their expected values. While these methods are intuitive and historically important, they can produce negative variance estimates (which are set to zero in practice). Likelihood-based methods (ML and REML) are generally preferred as they constrain variance estimates to be non-negative and handle unbalanced data more naturally.

Example: Herd-level variation in milk production

Consider a study of milk production across 50 dairy herds with varying numbers of cows per herd. A random intercept model with herd as the grouping factor would estimate σ²_g (between-herd variance) and σ² (within-herd variance). If σ²_g = 200 and σ² = 800, the ICC = 200/(200+800) = 0.20, meaning 20% of the total variation in milk production is attributable to differences between herds.

Assumptions of the random intercept model

The standard random intercept model assumes: (1) random effects u are normally distributed with mean 0 and variance σ²_g; (2) residuals ε are normally distributed with mean 0 and variance σ²; (3) random effects and residuals are independent of each other and of the predictors; (4) conditional on the random effects, observations within the same group are independent.

Reflection

Why might treating group effects as random rather than fixed be advantageous? Think about a study with many groups. What practical benefits does the random effects approach offer?

Model answerRandom effects vs. fixed effects: random treats group-level variation as drawn from a distribution; fixed estimates a separate parameter for each group. Random advantages: (a) fewer parameters in models with many groups: 1 variance parameter vs. K−1 fixed-effect parameters; (b) allows estimation of group-level covariate effects (urban/rural, size) that fixed effects absorb; (c) provides shrinkage estimates that pool information across groups, producing more stable estimates for small groups; (d) generalises to new groups outside the sample (you can predict for an unobserved 26th clinic). Practical scenario: 50 schools in a longitudinal study, fixed effects on school would consume 49 degrees of freedom and prevent estimation of school-level covariates; random effects on school costs one variance parameter and lets you estimate effects of school-level factors like funding.

Reflection saved!

* Complete the quiz and reflection to continue.

HSCI 410, Lesson 10

Exploratory Data Analysis For Epidemiology

Mixed Models for Continuous Data

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction & The Linear Mixed Model

Mixed Models for Continuous Data

Building on an earlier lesson

Introduction & The Linear Mixed Model

Fixed vs. random effects

Fixed effects

Random effects

Splitting total variance into two components

Between-group: \(\sigma^2_g\)

Within-group: \(\sigma^2\)

Random intercept model

The intraclass correlation coefficient (ICC)

ICC \(\approx 0\)

ICC \(\approx 1\)

The general linear mixed model

What to take into the next section

Introduction and Overview

Learning Objectives

Fixed vs. Random Effects

Variance Components

When to Use Mixed Models

The Linear Mixed Model Equation

The Intraclass Correlation Coefficient (ICC)

R Reflect on what you just ran

Matrix Notation

Reflection

Random Slopes & Hierarchical Models

Random Slopes & Hierarchical Models

Adding a random slope

Covariance matrix of random effects

Hierarchical model interpretation

When to add random slopes

Add a random slope when

Stay with random intercepts when

What to take into the next section

Introduction and Overview

Learning Objectives

Random Slopes

Random Intercept Model

Random Intercept + Random Slope Model

The Covariance Matrix of Random Effects

Hierarchical Model Interpretation

Practical Considerations

Reflection

Contextual Effects & Statistical Analysis

Contextual Effects & Statistical Analysis

What a contextual effect is

Within-cluster effect (\(\beta_W\))

Between-cluster effect (\(\beta_B\))

Group-mean centering

ML vs. REML

Maximum Likelihood (ML)

Restricted Maximum Likelihood (REML)

When to use ML vs. REML

Comparing random-effects structures

Comparing fixed-effects structures

What to take into the next section

Introduction and Overview

Learning Objectives

Contextual Effects

Understanding Contextual Effects

Group-Mean Centering

Estimation Methods: ML vs. REML

Maximum Likelihood (ML)

Restricted Maximum Likelihood (REML)

Inference in Mixed Models

Reflection

Prediction, Residuals & Diagnostics

Prediction, Residuals & Diagnostics

BLUPs: predicting cluster-level random effects

The shrinkage factor

Large cluster (\(m_j\) large)

Small cluster (\(m_j\) small)

Residuals at each level of the hierarchy

Level-1 residuals (\(\hat{\varepsilon}_{ij}\))