HSCI 410 — Lesson 9

Mixed Models for Continuous Data

Exploratory Data Analysis For Epidemiology

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Understand the concepts of fixed and random effects in linear mixed models
  • Write and interpret the linear mixed model equation with random intercepts
  • Decompose variance into between-group and within-group components and calculate the ICC
  • Understand random slopes models and their interpretation as hierarchical models
  • Explain contextual effects and how within-group and between-group regressions can differ
  • Describe estimation methods (ML and REML) and their properties
  • Conduct inference for both fixed and random effects in mixed models
  • Understand the role of BLUPs, shrinkage, residuals, and model diagnostics

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Introduction & The Linear Mixed Model

⏱ Estimated time: 20 minutes

Fixed vs. Random Effects

In many epidemiological studies, observations are grouped or clustered—patients within hospitals, students within schools, or repeated measurements within individuals. Linear mixed models (also known as multilevel or hierarchical linear models) handle such data by incorporating both fixed effects and random effects.

Fixed effects are parameters of primary interest that are constant across groups—the population-average effects of predictors. Random effects represent variation across groups or clusters and are modeled as random draws from a probability distribution, typically normal with mean zero.

📍
Fixed Effects
Click to learn more
🎲
Random Effects
Click to learn more
🔬
Why Mixed Models?
Click to learn more

Variance Components

A fundamental concept in mixed models is the partitioning of total variance into two components: between-group variance (σ²g) and within-group variance (σ²). The between-group variance captures how much group means differ from the overall mean, while the within-group variance captures how much individual observations vary around their group mean.

When to Use Mixed Models

Mixed models are appropriate when your data has a hierarchical or clustered structure—for example, patients nested within clinics, animals nested within herds, or repeated measures nested within subjects. If observations within groups are correlated (i.e., the ICC is non-trivial), ignoring this structure can lead to incorrect inference. Mixed models handle unbalanced data gracefully and can incorporate predictors at both the individual and group levels.

The Linear Mixed Model Equation

The random intercept model extends ordinary linear regression by adding a group-specific random deviation to the intercept:

Linear Mixed Model with Random Intercept (Eq 21.2)
Yi = β0 + β1X1i + … + βkXki + ugroup(i) + εi

In this model, each group has its own intercept: β0 + ugroup, where ugroup ~ N(0, σ²g) and εi ~ N(0, σ²). The random intercept ugroup captures how much a particular group’s mean deviates from the overall intercept β0.

The Intraclass Correlation Coefficient (ICC)

The ICC measures the proportion of total variance that is attributable to between-group differences:

Intraclass Correlation Coefficient
ρ = σ²g / (σ²g + σ²)

An ICC close to 0 means observations within groups are no more similar than observations from different groups. An ICC close to 1 means most of the variance is between groups, and observations within the same group are very similar.

Matrix Notation

In matrix form, the linear mixed model is written as:

Matrix Notation (Eq 21.8)
Y = Xβ + Zu + ε

Here, X is the design matrix for fixed effects, β is the vector of fixed effect coefficients, Z is the design matrix for random effects, u is the vector of random effects, and ε is the vector of residual errors.

Connection to ANOVA-based variance component estimation

ANOVA-based methods provide simple estimators of variance components by equating observed mean squares to their expected values. While these methods are intuitive and historically important, they can produce negative variance estimates (which are set to zero in practice). Likelihood-based methods (ML and REML) are generally preferred as they constrain variance estimates to be non-negative and handle unbalanced data more naturally.

Example: Herd-level variation in milk production

Consider a study of milk production across 50 dairy herds with varying numbers of cows per herd. A random intercept model with herd as the grouping factor would estimate σ²g (between-herd variance) and σ² (within-herd variance). If σ²g = 200 and σ² = 800, the ICC = 200/(200+800) = 0.20, meaning 20% of the total variation in milk production is attributable to differences between herds.

Assumptions of the random intercept model

The standard random intercept model assumes: (1) random effects u are normally distributed with mean 0 and variance σ²g; (2) residuals ε are normally distributed with mean 0 and variance σ²; (3) random effects and residuals are independent of each other and of the predictors; (4) conditional on the random effects, observations within the same group are independent.

Section 1 Knowledge Check

1. In a linear mixed model, random effects represent:

Random effects capture the variability across groups/clusters. They are not fixed parameters but rather realizations from a probability distribution, typically normal with mean 0.

2. The ICC in a random intercept model equals:

The ICC = σ²g / (σ²g + σ²) gives the proportion of total variance attributable to between-group differences, measuring how similar observations within the same group are.

3. In the model Y = Xβ + Zu + ε, the term Zu represents:

In matrix notation, Zu represents the random effects component, where Z is the design matrix for random effects and u is the vector of random effects (e.g., random intercepts for each group).

Reflection

Why might treating group effects as random rather than fixed be advantageous? Think about a study with many groups — what practical benefits does the random effects approach offer?

Reflection saved!
* Complete the quiz and reflection to continue.
Section 2

Random Slopes & Hierarchical Models

⏱ Estimated time: 20 minutes

Random Slopes

A random intercept model assumes that the effect of each predictor is the same across all groups—only the baseline level (intercept) varies. A random slopes model relaxes this assumption by allowing the effect (slope) of one or more predictors to vary across groups as well.

Random Intercept and Slope Model
Yij = (β0 + u0j) + (β1 + u1j)X1ij + εij

Here, u0j is the random intercept for group j and u1j is the random slope for group j. Each group effectively has its own regression line with intercept (β0 + u0j) and slope (β1 + u1j).

Random Intercept Model

In the random intercept model, all groups share the same slope for each predictor but have different baseline levels. The regression lines for different groups are parallel, shifted up or down by their random intercept. This is appropriate when you believe the effect of a predictor is consistent across groups but groups differ in their overall level of the outcome.

Random Intercept + Random Slope Model

In this extended model, each group can have both a different intercept and a different slope. The regression lines are no longer parallel—they can fan out, converge, or cross. This model requires estimating additional parameters: the variance of the random slope (σ²u1) and the covariance between the random intercept and slope (σu01). It is more flexible but also more complex and requires sufficient data.

The Covariance Matrix of Random Effects

When a model includes both random intercepts and random slopes, the random effects for each group are typically assumed to follow a bivariate normal distribution. The covariance matrix of the random effects includes three parameters: the variance of the random intercept (σ²u0), the variance of the random slope (σ²u1), and the covariance between them (σu01).

A positive covariance means groups with higher intercepts tend to have steeper slopes; a negative covariance means the opposite. This covariance should generally be estimated rather than assumed to be zero.

Hierarchical Model Interpretation

Random slopes models are sometimes called hierarchical or multilevel models because predictors can have effects at each level of the hierarchy. At the individual level, predictors explain variation within groups; at the group level, predictors or random effects explain variation between groups. This formulation shows that each predictor can enter as a fixed effect, a random effect at the group level, or both.

Scenario: Treatment Effects Across Clinical Sites

Imagine a multi-center clinical trial measuring blood pressure reduction across 30 hospitals. A random intercept model would allow hospitals to differ in their patients’ baseline blood pressure. A random slopes model would additionally allow the treatment effect itself to vary across hospitals—some hospitals might show a larger treatment effect than others due to differences in patient populations, adherence, or clinical practice. The variance of the random slope tells you how much the treatment effect varies across sites.

Practical Considerations

Models with multiple random effects can have many parameters in the covariance matrix, and identifiability can become an issue. Practical parsimony is important: only include random effects for which there is theoretical justification and sufficient data. Adding unnecessary random effects can lead to convergence problems, unstable estimates, or singular covariance matrices.

Section 2 Knowledge Check

1. A random slopes model allows:

In a random slopes model, the regression coefficient (slope) for a predictor is allowed to be different for each group, modeled as random draws from a distribution.

2. In a random slopes model, the covariance between random intercept and slope:

The covariance between random intercepts and slopes can take any value. A positive covariance means groups with higher intercepts tend to have steeper slopes; a negative covariance means the opposite.

3. Random slope models are sometimes called hierarchical models because:

The hierarchical model formulation shows that each predictor can enter as a fixed effect, a random effect at the group level, or both — allowing effects at multiple levels of the data hierarchy.

Reflection

Consider a multi-center clinical trial where the treatment effect might vary across centers. What would a random slopes model for the treatment effect tell you that a random intercept model would not?

Reflection saved!
* Complete the quiz and reflection to continue.
Section 3

Contextual Effects & Statistical Analysis

⏱ Estimated time: 20 minutes

Contextual Effects

A contextual effect occurs when a predictor measured at the individual level has a different effect depending on whether you examine variation within groups or between groups. For a contextual effect to exist, two conditions must hold: (1) the predictor must vary both between and within groups, and (2) the within-group and between-group slopes must differ.

Understanding Contextual Effects

Imagine studying the relationship between income and health across neighborhoods. At the individual level (within a neighborhood), higher income might improve health modestly. But the neighborhood-level (between-group) effect of average income could be much stronger because wealthier neighborhoods have better infrastructure, cleaner environments, and more health services. The difference between these two slopes is the contextual effect—the additional benefit of living in a high-income neighborhood beyond one’s own income level.

Group-Mean Centering

To separate within-group and between-group effects, we can use group-mean centering. This replaces the original predictor X1i with:

Group-Mean Centered Variable
Z1i = X1iX1,group(i)

The centered variable Z1i captures purely within-group variation (how an individual differs from their group mean). The group mean X1,group can then be included as a separate predictor to capture between-group variation.

Contextual Effects Model (Eq 21.12 & 21.13)
Yi = β0 + βWZ1i + βBX1,group(i) + ugroup(i) + εi

Here, βW is the within-group effect and βB is the between-group effect. If βW ≠ βB, a contextual effect is present. Ignoring this distinction can lead to the ecological fallacy (wrongly attributing group-level associations to individuals) or the atomistic fallacy (wrongly attributing individual-level associations to groups).

Estimation Methods: ML vs. REML

Maximum Likelihood (ML)

ML estimation simultaneously estimates all parameters (fixed effects and variance components) by maximizing the full likelihood. However, ML does not account for the degrees of freedom used in estimating fixed effects, which leads to downward-biased estimates of variance components—analogous to dividing by n instead of n−1 in sample variance estimation. ML is required when comparing models with different fixed effects structures (e.g., likelihood ratio tests for fixed effects).

Restricted Maximum Likelihood (REML)

REML estimation adjusts for the degrees of freedom lost in estimating fixed effects by restricting the likelihood to a subspace orthogonal to the fixed effects. This produces less biased variance component estimates, especially when the number of groups is small. REML is generally the preferred method for estimating variance components. However, REML log-likelihoods are not comparable across models with different fixed effects.

Inference in Mixed Models

Inference for fixed effects

Wald tests are commonly used to test individual fixed effects. These are approximate tests that rely on asymptotic theory. For finite samples, approximations such as the Satterthwaite or Kenward–Roger methods provide better reference distributions (t or F) by estimating effective degrees of freedom. These corrections are especially important when the number of groups is small.

Inference for random effects (variance components)

To test whether a variance component is significantly different from zero (e.g., H0: σ²g = 0), we use a likelihood ratio test comparing the model with and without that random effect. However, because the null value (0) is on the boundary of the parameter space, the standard χ² reference distribution is too conservative. The recommended correction is to halve the P-value from the χ² test.

Model comparison strategies

When comparing models with different random effects but the same fixed effects, use REML-based likelihood ratio tests (with P-value halving for boundary tests). When comparing models with different fixed effects, use ML-based likelihood ratio tests or information criteria (AIC, BIC). Always ensure that models being compared are nested and fit to the same data.

Section 3 Knowledge Check

1. A contextual effect exists when:

A contextual effect is present when the effect of a predictor differs depending on whether you look at variation within groups or between groups, meaning the within-group and between-group slopes differ.

2. REML estimation is generally preferred over ML because:

REML (Restricted Maximum Likelihood) adjusts for the loss of degrees of freedom from estimating fixed effects, leading to less biased variance component estimates compared to ML, especially with fewer groups.

3. When testing whether a random effect variance is significantly different from zero:

Because the null hypothesis (σ² = 0) is on the boundary of the parameter space, the standard chi-square reference distribution is too conservative. The recommended correction is to halve the P-value obtained from the chi-square distribution.

Reflection

Explain in your own words why the ecological fallacy can occur when contextual effects are ignored. Give an example from epidemiology where the group-level and individual-level associations might differ.

Reflection saved!
* Complete the quiz and reflection to continue.
Section 4

Prediction, Residuals & Diagnostics

⏱ Estimated time: 15 minutes

BLUPs: Best Linear Unbiased Predictors

In a mixed model, the random effects u are not directly observed—they are latent variables. We estimate them using BLUPs (Best Linear Unbiased Predictors), which are the predicted values of the random effects conditional on the observed data.

📊
BLUPs
Click to learn more
📈
Empirical Bayes
Click to learn more
🔍
Shrinkage
Click to learn more

The Shrinkage Factor

BLUPs are weighted averages of the group-specific estimate and the overall mean. The amount of shrinkage depends on the group size and the ICC:

Shrinkage Factor
Shrinkage = σ²g / (σ²g + σ² / m)

When m (the group size) is large, the shrinkage factor approaches 1 and the BLUP closely approximates the raw group mean. When m is small, the factor is closer to 0 and the BLUP is pulled substantially toward the overall mean.

Shrinkage in Practice

Consider a random intercept model for patient outcomes across hospitals. Suppose σ²g = 5 and σ² = 20. For a hospital with 100 patients: shrinkage factor = 5/(5 + 20/100) = 5/5.2 = 0.96 — the BLUP is very close to the hospital’s raw mean. For a hospital with only 4 patients: shrinkage factor = 5/(5 + 20/4) = 5/10 = 0.50 — the BLUP is pulled halfway toward the overall mean. This borrowing of strength from other groups is a key advantage of mixed models.

Residuals in Mixed Models

Mixed models produce multiple sets of residuals, one for each level of the hierarchy:

  • Individual-level residuals (ε): the difference between observed values and the group-specific predictions (using BLUPs)
  • Group-level residuals (u): the BLUPs themselves, representing how each group deviates from the overall mean

Model Diagnostics

Checking model assumptions in mixed models involves examining residuals at each level of the hierarchy. A recommended strategy is to work from the highest hierarchical level downward:

Practical Diagnostic Advice

Start by examining group-level residuals (BLUPs): check for normality using Q-Q plots and look for influential groups. Then examine individual-level residuals: check normality, homoscedasticity, and look for outliers. Working top-down helps identify whether problems originate at the group level rather than being caused by a few individual outliers within groups. Also check that residuals at each level are uncorrelated with predictors and fitted values.

Box-Cox Transformation for Mixed Models

When model assumptions (normality, homoscedasticity) are violated, a Box-Cox transformation of the response variable may help. In mixed models, the procedure involves computing transformed Y values for a range of λ values, fitting the same model to each, and comparing log-likelihoods. Crucially, ML estimation (not REML) must be used for comparing log-likelihoods across different λ values, because REML likelihoods are not comparable when the response variable changes.

Section 4 Knowledge Check

1. BLUPs (Best Linear Unbiased Predictors) exhibit shrinkage, meaning:

Shrinkage occurs because BLUPs are weighted averages of the group-specific estimate and the overall mean. Smaller groups contribute less information, so their BLUPs are pulled more toward the overall mean.

2. When checking residuals in a mixed model, it is recommended to:

Mixed models have residuals at multiple levels. Starting at the highest level helps identify whether problems (influential groups, non-normality) exist at the group level rather than being caused by individual outliers within groups.

3. The Box-Cox transformation in mixed models:

When comparing models with different transformations of Y, ML estimation must be used because REML log-likelihoods are not comparable across models with different response variables. The transformation otherwise follows the same principles as in standard regression.

Reflection

A random intercept model for hospital costs has a group of 5 hospitals with only 3 patients each and another group of 20 hospitals with 100 patients each. How would shrinkage affect the BLUP estimates differently for these two groups?

Reflection saved!
* Complete the quiz and reflection to continue.
Final Assessment

Lesson 9 — Comprehensive Assessment

⏱ Estimated time: 25 minutes

This final assessment covers all material from this lesson. You must answer all 15 questions correctly (100%) and complete the final reflection to finish the lesson.

Final Reflection

Reflecting on this lesson, summarize the key decisions a researcher must make when fitting a linear mixed model (choosing fixed vs. random effects, estimation method, model diagnostics). Which aspect do you find most challenging?

Reflection saved!

Final Assessment (15 Questions)

1. In a linear mixed model, random effects are assumed to follow:

Standard linear mixed models assume random effects follow a normal distribution with mean zero and some variance to be estimated (e.g., u ~ N(0, σ²g)).

2. The variance component σ²g in a random intercept model represents:

σ²g captures the variance of the random intercepts across groups — that is, how much the group means vary around the overall mean, which is the between-group variance.

3. If σ²g = 3 and σ² = 12, the ICC is:

ICC = σ²g / (σ²g + σ²) = 3 / (3 + 12) = 3/15 = 0.20.

4. A random slopes model differs from a random intercept model by:

A random slopes model extends the random intercept model by allowing one or more regression coefficients (slopes) to vary randomly across groups.

5. The covariance between random intercepts and slopes in a random slopes model:

The covariance between random intercepts and slopes is a model parameter that should be estimated. It tells us whether groups with higher baseline levels tend to have stronger or weaker effects.

6. A contextual effect is detected when:

A significant contextual effect means the group-level (between-group) relationship between X and Y differs from the individual-level (within-group) relationship.

7. Group-mean centering replaces X1i with:

Group-mean centering subtracts the group mean from each individual value, creating a variable Z1i = X1iX1,group(i) that captures purely within-group variation.

8. REML differs from ML estimation in that REML:

REML restricts the likelihood to a subspace orthogonal to the fixed effects, effectively adjusting for the degrees of freedom used in estimating β, which reduces bias in variance component estimates.

9. When comparing ML and REML for model selection:

REML likelihoods are not comparable when the fixed effects differ between models (because the restricted subspace changes). ML must be used for comparing models with different fixed effects structures.

10. Wald tests in mixed models:

Wald tests in mixed models are approximate because the reference distribution (Z or t) is only asymptotically correct. Finite-sample corrections (e.g., Satterthwaite approximation) provide better F or t reference distributions.

11. BLUPs are also known as:

BLUPs are called empirical Bayes estimates because they can be derived from a Bayesian framework where the random effects have a prior (normal) distribution, and the “empirical” part refers to estimating the prior parameters from data.

12. Shrinkage in BLUPs is stronger when:

The shrinkage factor σ²g/(σ²g + σ²/m) approaches 0 (more shrinkage) when m is small or when σ²g is small relative to σ². Small groups have less information, so their estimates are pulled more toward the overall mean.

13. Mixed models have residuals:

Mixed models produce residuals at each level: individual-level residuals (ε) and predicted values of random effects at each grouping level, all of which should be examined in diagnostics.

14. The Box-Cox transformation for mixed models requires using ML rather than REML because:

Since Box-Cox involves comparing models with different transformations of Y, and REML likelihoods depend on the fixed effects design matrix (which changes meaning when Y is transformed), ML must be used for valid comparisons.

15. In a hierarchical model formulation, each predictor can enter the model:

The hierarchical model perspective shows that every predictor can potentially have effects at multiple levels — as a fixed (population-average) effect and as a random (group-varying) effect at one or more higher levels.

Lesson 9 Complete!

Congratulations! You have successfully completed the Mixed Models for Continuous Data module. Your responses have been downloaded automatically.