Mixed Models for Continuous Data
Exploratory Data Analysis For Epidemiology
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Understand the concepts of fixed and random effects in linear mixed models
- Write and interpret the linear mixed model equation with random intercepts
- Decompose variance into between-group and within-group components and calculate the ICC
- Understand random slopes models and their interpretation as hierarchical models
- Explain contextual effects and how within-group and between-group regressions can differ
- Describe estimation methods (ML and REML) and their properties
- Conduct inference for both fixed and random effects in mixed models
- Understand the role of BLUPs, shrinkage, residuals, and model diagnostics
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Introduction & The Linear Mixed Model
Fixed vs. Random Effects
In many epidemiological studies, observations are grouped or clustered—patients within hospitals, students within schools, or repeated measurements within individuals. Linear mixed models (also known as multilevel or hierarchical linear models) handle such data by incorporating both fixed effects and random effects.
Fixed effects are parameters of primary interest that are constant across groups—the population-average effects of predictors. Random effects represent variation across groups or clusters and are modeled as random draws from a probability distribution, typically normal with mean zero.
Variance Components
A fundamental concept in mixed models is the partitioning of total variance into two components: between-group variance (σ²g) and within-group variance (σ²). The between-group variance captures how much group means differ from the overall mean, while the within-group variance captures how much individual observations vary around their group mean.
Mixed models are appropriate when your data has a hierarchical or clustered structure—for example, patients nested within clinics, animals nested within herds, or repeated measures nested within subjects. If observations within groups are correlated (i.e., the ICC is non-trivial), ignoring this structure can lead to incorrect inference. Mixed models handle unbalanced data gracefully and can incorporate predictors at both the individual and group levels.
The Linear Mixed Model Equation
The random intercept model extends ordinary linear regression by adding a group-specific random deviation to the intercept:
In this model, each group has its own intercept: β0 + ugroup, where ugroup ~ N(0, σ²g) and εi ~ N(0, σ²). The random intercept ugroup captures how much a particular group’s mean deviates from the overall intercept β0.
The Intraclass Correlation Coefficient (ICC)
The ICC measures the proportion of total variance that is attributable to between-group differences:
An ICC close to 0 means observations within groups are no more similar than observations from different groups. An ICC close to 1 means most of the variance is between groups, and observations within the same group are very similar.
Matrix Notation
In matrix form, the linear mixed model is written as:
Here, X is the design matrix for fixed effects, β is the vector of fixed effect coefficients, Z is the design matrix for random effects, u is the vector of random effects, and ε is the vector of residual errors.
ANOVA-based methods provide simple estimators of variance components by equating observed mean squares to their expected values. While these methods are intuitive and historically important, they can produce negative variance estimates (which are set to zero in practice). Likelihood-based methods (ML and REML) are generally preferred as they constrain variance estimates to be non-negative and handle unbalanced data more naturally.
Consider a study of milk production across 50 dairy herds with varying numbers of cows per herd. A random intercept model with herd as the grouping factor would estimate σ²g (between-herd variance) and σ² (within-herd variance). If σ²g = 200 and σ² = 800, the ICC = 200/(200+800) = 0.20, meaning 20% of the total variation in milk production is attributable to differences between herds.
The standard random intercept model assumes: (1) random effects u are normally distributed with mean 0 and variance σ²g; (2) residuals ε are normally distributed with mean 0 and variance σ²; (3) random effects and residuals are independent of each other and of the predictors; (4) conditional on the random effects, observations within the same group are independent.
Section 1 Knowledge Check
1. In a linear mixed model, random effects represent:
2. The ICC in a random intercept model equals:
3. In the model Y = Xβ + Zu + ε, the term Zu represents:
Reflection
Why might treating group effects as random rather than fixed be advantageous? Think about a study with many groups — what practical benefits does the random effects approach offer?
Random Slopes & Hierarchical Models
Random Slopes
A random intercept model assumes that the effect of each predictor is the same across all groups—only the baseline level (intercept) varies. A random slopes model relaxes this assumption by allowing the effect (slope) of one or more predictors to vary across groups as well.
Here, u0j is the random intercept for group j and u1j is the random slope for group j. Each group effectively has its own regression line with intercept (β0 + u0j) and slope (β1 + u1j).
Random Intercept Model
In the random intercept model, all groups share the same slope for each predictor but have different baseline levels. The regression lines for different groups are parallel, shifted up or down by their random intercept. This is appropriate when you believe the effect of a predictor is consistent across groups but groups differ in their overall level of the outcome.
Random Intercept + Random Slope Model
In this extended model, each group can have both a different intercept and a different slope. The regression lines are no longer parallel—they can fan out, converge, or cross. This model requires estimating additional parameters: the variance of the random slope (σ²u1) and the covariance between the random intercept and slope (σu01). It is more flexible but also more complex and requires sufficient data.
The Covariance Matrix of Random Effects
When a model includes both random intercepts and random slopes, the random effects for each group are typically assumed to follow a bivariate normal distribution. The covariance matrix of the random effects includes three parameters: the variance of the random intercept (σ²u0), the variance of the random slope (σ²u1), and the covariance between them (σu01).
A positive covariance means groups with higher intercepts tend to have steeper slopes; a negative covariance means the opposite. This covariance should generally be estimated rather than assumed to be zero.
Hierarchical Model Interpretation
Random slopes models are sometimes called hierarchical or multilevel models because predictors can have effects at each level of the hierarchy. At the individual level, predictors explain variation within groups; at the group level, predictors or random effects explain variation between groups. This formulation shows that each predictor can enter as a fixed effect, a random effect at the group level, or both.
Imagine a multi-center clinical trial measuring blood pressure reduction across 30 hospitals. A random intercept model would allow hospitals to differ in their patients’ baseline blood pressure. A random slopes model would additionally allow the treatment effect itself to vary across hospitals—some hospitals might show a larger treatment effect than others due to differences in patient populations, adherence, or clinical practice. The variance of the random slope tells you how much the treatment effect varies across sites.
Practical Considerations
Models with multiple random effects can have many parameters in the covariance matrix, and identifiability can become an issue. Practical parsimony is important: only include random effects for which there is theoretical justification and sufficient data. Adding unnecessary random effects can lead to convergence problems, unstable estimates, or singular covariance matrices.
Section 2 Knowledge Check
1. A random slopes model allows:
2. In a random slopes model, the covariance between random intercept and slope:
3. Random slope models are sometimes called hierarchical models because:
Reflection
Consider a multi-center clinical trial where the treatment effect might vary across centers. What would a random slopes model for the treatment effect tell you that a random intercept model would not?
Contextual Effects & Statistical Analysis
Contextual Effects
A contextual effect occurs when a predictor measured at the individual level has a different effect depending on whether you examine variation within groups or between groups. For a contextual effect to exist, two conditions must hold: (1) the predictor must vary both between and within groups, and (2) the within-group and between-group slopes must differ.
Imagine studying the relationship between income and health across neighborhoods. At the individual level (within a neighborhood), higher income might improve health modestly. But the neighborhood-level (between-group) effect of average income could be much stronger because wealthier neighborhoods have better infrastructure, cleaner environments, and more health services. The difference between these two slopes is the contextual effect—the additional benefit of living in a high-income neighborhood beyond one’s own income level.
Group-Mean Centering
To separate within-group and between-group effects, we can use group-mean centering. This replaces the original predictor X1i with:
The centered variable Z1i captures purely within-group variation (how an individual differs from their group mean). The group mean X1,group can then be included as a separate predictor to capture between-group variation.
Here, βW is the within-group effect and βB is the between-group effect. If βW ≠ βB, a contextual effect is present. Ignoring this distinction can lead to the ecological fallacy (wrongly attributing group-level associations to individuals) or the atomistic fallacy (wrongly attributing individual-level associations to groups).
Estimation Methods: ML vs. REML
Maximum Likelihood (ML)
ML estimation simultaneously estimates all parameters (fixed effects and variance components) by maximizing the full likelihood. However, ML does not account for the degrees of freedom used in estimating fixed effects, which leads to downward-biased estimates of variance components—analogous to dividing by n instead of n−1 in sample variance estimation. ML is required when comparing models with different fixed effects structures (e.g., likelihood ratio tests for fixed effects).
Restricted Maximum Likelihood (REML)
REML estimation adjusts for the degrees of freedom lost in estimating fixed effects by restricting the likelihood to a subspace orthogonal to the fixed effects. This produces less biased variance component estimates, especially when the number of groups is small. REML is generally the preferred method for estimating variance components. However, REML log-likelihoods are not comparable across models with different fixed effects.
Inference in Mixed Models
Wald tests are commonly used to test individual fixed effects. These are approximate tests that rely on asymptotic theory. For finite samples, approximations such as the Satterthwaite or Kenward–Roger methods provide better reference distributions (t or F) by estimating effective degrees of freedom. These corrections are especially important when the number of groups is small.
To test whether a variance component is significantly different from zero (e.g., H0: σ²g = 0), we use a likelihood ratio test comparing the model with and without that random effect. However, because the null value (0) is on the boundary of the parameter space, the standard χ² reference distribution is too conservative. The recommended correction is to halve the P-value from the χ² test.
When comparing models with different random effects but the same fixed effects, use REML-based likelihood ratio tests (with P-value halving for boundary tests). When comparing models with different fixed effects, use ML-based likelihood ratio tests or information criteria (AIC, BIC). Always ensure that models being compared are nested and fit to the same data.
Section 3 Knowledge Check
1. A contextual effect exists when:
2. REML estimation is generally preferred over ML because:
3. When testing whether a random effect variance is significantly different from zero:
Reflection
Explain in your own words why the ecological fallacy can occur when contextual effects are ignored. Give an example from epidemiology where the group-level and individual-level associations might differ.
Prediction, Residuals & Diagnostics
BLUPs: Best Linear Unbiased Predictors
In a mixed model, the random effects u are not directly observed—they are latent variables. We estimate them using BLUPs (Best Linear Unbiased Predictors), which are the predicted values of the random effects conditional on the observed data.
The Shrinkage Factor
BLUPs are weighted averages of the group-specific estimate and the overall mean. The amount of shrinkage depends on the group size and the ICC:
When m (the group size) is large, the shrinkage factor approaches 1 and the BLUP closely approximates the raw group mean. When m is small, the factor is closer to 0 and the BLUP is pulled substantially toward the overall mean.
Consider a random intercept model for patient outcomes across hospitals. Suppose σ²g = 5 and σ² = 20. For a hospital with 100 patients: shrinkage factor = 5/(5 + 20/100) = 5/5.2 = 0.96 — the BLUP is very close to the hospital’s raw mean. For a hospital with only 4 patients: shrinkage factor = 5/(5 + 20/4) = 5/10 = 0.50 — the BLUP is pulled halfway toward the overall mean. This borrowing of strength from other groups is a key advantage of mixed models.
Residuals in Mixed Models
Mixed models produce multiple sets of residuals, one for each level of the hierarchy:
- Individual-level residuals (ε): the difference between observed values and the group-specific predictions (using BLUPs)
- Group-level residuals (u): the BLUPs themselves, representing how each group deviates from the overall mean
Model Diagnostics
Checking model assumptions in mixed models involves examining residuals at each level of the hierarchy. A recommended strategy is to work from the highest hierarchical level downward:
Start by examining group-level residuals (BLUPs): check for normality using Q-Q plots and look for influential groups. Then examine individual-level residuals: check normality, homoscedasticity, and look for outliers. Working top-down helps identify whether problems originate at the group level rather than being caused by a few individual outliers within groups. Also check that residuals at each level are uncorrelated with predictors and fitted values.
Box-Cox Transformation for Mixed Models
When model assumptions (normality, homoscedasticity) are violated, a Box-Cox transformation of the response variable may help. In mixed models, the procedure involves computing transformed Y values for a range of λ values, fitting the same model to each, and comparing log-likelihoods. Crucially, ML estimation (not REML) must be used for comparing log-likelihoods across different λ values, because REML likelihoods are not comparable when the response variable changes.
Section 4 Knowledge Check
1. BLUPs (Best Linear Unbiased Predictors) exhibit shrinkage, meaning:
2. When checking residuals in a mixed model, it is recommended to:
3. The Box-Cox transformation in mixed models:
Reflection
A random intercept model for hospital costs has a group of 5 hospitals with only 3 patients each and another group of 20 hospitals with 100 patients each. How would shrinkage affect the BLUP estimates differently for these two groups?
Lesson 9 — Comprehensive Assessment
This final assessment covers all material from this lesson. You must answer all 15 questions correctly (100%) and complete the final reflection to finish the lesson.
Final Reflection
Reflecting on this lesson, summarize the key decisions a researcher must make when fitting a linear mixed model (choosing fixed vs. random effects, estimation method, model diagnostics). Which aspect do you find most challenging?
Final Assessment (15 Questions)
1. In a linear mixed model, random effects are assumed to follow:
2. The variance component σ²g in a random intercept model represents:
3. If σ²g = 3 and σ² = 12, the ICC is:
4. A random slopes model differs from a random intercept model by:
5. The covariance between random intercepts and slopes in a random slopes model:
6. A contextual effect is detected when:
7. Group-mean centering replaces X1i with:
8. REML differs from ML estimation in that REML:
9. When comparing ML and REML for model selection:
10. Wald tests in mixed models:
11. BLUPs are also known as:
12. Shrinkage in BLUPs is stronger when:
13. Mixed models have residuals:
14. The Box-Cox transformation for mixed models requires using ML rather than REML because:
15. In a hierarchical model formulation, each predictor can enter the model: