Mixed Models for Continuous Data

Exploratory Data Analysis For Epidemiology

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Understand the concepts of fixed and random effects in linear mixed models
Write and interpret the linear mixed model equation with random intercepts
Decompose variance into between-group and within-group components and calculate the ICC
Understand random slopes models and their interpretation as hierarchical models
Explain contextual effects and how within-group and between-group regressions can differ
Describe estimation methods (ML and REML) and their properties
Conduct inference for both fixed and random effects in mixed models
Understand the role of BLUPs, shrinkage, residuals, and model diagnostics

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Introduction & The Linear Mixed Model

⏱ Estimated time: 20 minutes

Fixed vs. Random Effects

In many epidemiological studies, observations are grouped or clustered—patients within hospitals, students within schools, or repeated measurements within individuals. Linear mixed models (also known as multilevel or hierarchical linear models) handle such data by incorporating both fixed effects and random effects.

Fixed effects are parameters of primary interest that are constant across groups—the population-average effects of predictors. Random effects represent variation across groups or clusters and are modeled as random draws from a probability distribution, typically normal with mean zero.

📍

Fixed Effects

Click to learn more

🎲

Random Effects

Click to learn more

🔬

Why Mixed Models?

Click to learn more

Variance Components

A fundamental concept in mixed models is the partitioning of total variance into two components: between-group variance (σ²_g) and within-group variance (σ²). The between-group variance captures how much group means differ from the overall mean, while the within-group variance captures how much individual observations vary around their group mean.

When to Use Mixed Models

Mixed models are appropriate when your data has a hierarchical or clustered structure—for example, patients nested within clinics, animals nested within herds, or repeated measures nested within subjects. If observations within groups are correlated (i.e., the ICC is non-trivial), ignoring this structure can lead to incorrect inference. Mixed models handle unbalanced data gracefully and can incorporate predictors at both the individual and group levels.

The Linear Mixed Model Equation

The random intercept model extends ordinary linear regression by adding a group-specific random deviation to the intercept:

Linear Mixed Model with Random Intercept (Eq 21.2)

Y_i = β₀ + β₁X_1i + … + β_kX_ki + u_group(i) + ε_i

In this model, each group has its own intercept: β₀ + u_group, where u_group ~ N(0, σ²_g) and ε_i ~ N(0, σ²). The random intercept u_group captures how much a particular group’s mean deviates from the overall intercept β₀.

The Intraclass Correlation Coefficient (ICC)

The ICC measures the proportion of total variance that is attributable to between-group differences:

Intraclass Correlation Coefficient

ρ = σ²_g / (σ²_g + σ²)

An ICC close to 0 means observations within groups are no more similar than observations from different groups. An ICC close to 1 means most of the variance is between groups, and observations within the same group are very similar.

Matrix Notation

In matrix form, the linear mixed model is written as:

Matrix Notation (Eq 21.8)

Y = Xβ + Zu + ε

Here, X is the design matrix for fixed effects, β is the vector of fixed effect coefficients, Z is the design matrix for random effects, u is the vector of random effects, and ε is the vector of residual errors.

Connection to ANOVA-based variance component estimation

ANOVA-based methods provide simple estimators of variance components by equating observed mean squares to their expected values. While these methods are intuitive and historically important, they can produce negative variance estimates (which are set to zero in practice). Likelihood-based methods (ML and REML) are generally preferred as they constrain variance estimates to be non-negative and handle unbalanced data more naturally.

Example: Herd-level variation in milk production

Consider a study of milk production across 50 dairy herds with varying numbers of cows per herd. A random intercept model with herd as the grouping factor would estimate σ²_g (between-herd variance) and σ² (within-herd variance). If σ²_g = 200 and σ² = 800, the ICC = 200/(200+800) = 0.20, meaning 20% of the total variation in milk production is attributable to differences between herds.

Assumptions of the random intercept model

The standard random intercept model assumes: (1) random effects u are normally distributed with mean 0 and variance σ²_g; (2) residuals ε are normally distributed with mean 0 and variance σ²; (3) random effects and residuals are independent of each other and of the predictors; (4) conditional on the random effects, observations within the same group are independent.

Section 1 Knowledge Check

1. In a linear mixed model, random effects represent:

The primary parameters of interest Variation across groups or clusters that is modeled as random Fixed constants in the model Measurement error only

Random effects capture the variability across groups/clusters. They are not fixed parameters but rather realizations from a probability distribution, typically normal with mean 0.

2. The ICC in a random intercept model equals:

σ² / σ²_g σ²_g / (σ²_g + σ²) σ²_g × σ² (σ²_g + σ²) / σ²_g

The ICC = σ²_g / (σ²_g + σ²) gives the proportion of total variance attributable to between-group differences, measuring how similar observations within the same group are.

3. In the model Y = Xβ + Zu + ε, the term Zu represents:

Fixed effects Residual error Random effects contributions from group-level variation The intercept only

In matrix notation, Zu represents the random effects component, where Z is the design matrix for random effects and u is the vector of random effects (e.g., random intercepts for each group).

Reflection

Why might treating group effects as random rather than fixed be advantageous? Think about a study with many groups — what practical benefits does the random effects approach offer?

Reflection saved!

* Complete the quiz and reflection to continue.

Section 2

Random Slopes & Hierarchical Models

⏱ Estimated time: 20 minutes

Random Slopes

A random intercept model assumes that the effect of each predictor is the same across all groups—only the baseline level (intercept) varies. A random slopes model relaxes this assumption by allowing the effect (slope) of one or more predictors to vary across groups as well.

Random Intercept and Slope Model

Y_ij = (β₀ + u_0j) + (β₁ + u_1j)X_1ij + ε_ij

Here, u_0j is the random intercept for group j and u_1j is the random slope for group j. Each group effectively has its own regression line with intercept (β₀ + u_0j) and slope (β₁ + u_1j).

Random Intercept Model

In the random intercept model, all groups share the same slope for each predictor but have different baseline levels. The regression lines for different groups are parallel, shifted up or down by their random intercept. This is appropriate when you believe the effect of a predictor is consistent across groups but groups differ in their overall level of the outcome.

Random Intercept + Random Slope Model

In this extended model, each group can have both a different intercept and a different slope. The regression lines are no longer parallel—they can fan out, converge, or cross. This model requires estimating additional parameters: the variance of the random slope (σ²_u1) and the covariance between the random intercept and slope (σ_u01). It is more flexible but also more complex and requires sufficient data.

The Covariance Matrix of Random Effects

When a model includes both random intercepts and random slopes, the random effects for each group are typically assumed to follow a bivariate normal distribution. The covariance matrix of the random effects includes three parameters: the variance of the random intercept (σ²_u0), the variance of the random slope (σ²_u1), and the covariance between them (σ_u01).

A positive covariance means groups with higher intercepts tend to have steeper slopes; a negative covariance means the opposite. This covariance should generally be estimated rather than assumed to be zero.

Hierarchical Model Interpretation

Random slopes models are sometimes called hierarchical or multilevel models because predictors can have effects at each level of the hierarchy. At the individual level, predictors explain variation within groups; at the group level, predictors or random effects explain variation between groups. This formulation shows that each predictor can enter as a fixed effect, a random effect at the group level, or both.

Scenario: Treatment Effects Across Clinical Sites

Imagine a multi-center clinical trial measuring blood pressure reduction across 30 hospitals. A random intercept model would allow hospitals to differ in their patients’ baseline blood pressure. A random slopes model would additionally allow the treatment effect itself to vary across hospitals—some hospitals might show a larger treatment effect than others due to differences in patient populations, adherence, or clinical practice. The variance of the random slope tells you how much the treatment effect varies across sites.

Practical Considerations

Models with multiple random effects can have many parameters in the covariance matrix, and identifiability can become an issue. Practical parsimony is important: only include random effects for which there is theoretical justification and sufficient data. Adding unnecessary random effects can lead to convergence problems, unstable estimates, or singular covariance matrices.

Section 2 Knowledge Check

1. A random slopes model allows:

Only the intercept to vary across groups The effect of a predictor to vary across groups No variation across groups Only fixed effects

In a random slopes model, the regression coefficient (slope) for a predictor is allowed to be different for each group, modeled as random draws from a distribution.

2. In a random slopes model, the covariance between random intercept and slope:

Is always zero Is always positive Can be positive, negative, or zero and should generally be estimated Does not exist

The covariance between random intercepts and slopes can take any value. A positive covariance means groups with higher intercepts tend to have steeper slopes; a negative covariance means the opposite.

3. Random slope models are sometimes called hierarchical models because:

They have a strict hierarchy of importance Predictors can have effects at each of the hierarchical levels They can only handle 2 levels They require balanced data

The hierarchical model formulation shows that each predictor can enter as a fixed effect, a random effect at the group level, or both — allowing effects at multiple levels of the data hierarchy.

Reflection

Consider a multi-center clinical trial where the treatment effect might vary across centers. What would a random slopes model for the treatment effect tell you that a random intercept model would not?

Reflection saved!

* Complete the quiz and reflection to continue.

Section 3

Contextual Effects & Statistical Analysis

⏱ Estimated time: 20 minutes

Contextual Effects

A contextual effect occurs when a predictor measured at the individual level has a different effect depending on whether you examine variation within groups or between groups. For a contextual effect to exist, two conditions must hold: (1) the predictor must vary both between and within groups, and (2) the within-group and between-group slopes must differ.

Understanding Contextual Effects

Imagine studying the relationship between income and health across neighborhoods. At the individual level (within a neighborhood), higher income might improve health modestly. But the neighborhood-level (between-group) effect of average income could be much stronger because wealthier neighborhoods have better infrastructure, cleaner environments, and more health services. The difference between these two slopes is the contextual effect—the additional benefit of living in a high-income neighborhood beyond one’s own income level.

Group-Mean Centering

To separate within-group and between-group effects, we can use group-mean centering. This replaces the original predictor X_1i with:

Group-Mean Centered Variable

Z_1i = X_1i − X_1,group(i)

The centered variable Z_1i captures purely within-group variation (how an individual differs from their group mean). The group mean X_1,group can then be included as a separate predictor to capture between-group variation.

Contextual Effects Model (Eq 21.12 & 21.13)

Y_i = β₀ + β_WZ_1i + β_BX_1,group(i) + u_group(i) + ε_i

Here, β_W is the within-group effect and β_B is the between-group effect. If β_W ≠ β_B, a contextual effect is present. Ignoring this distinction can lead to the ecological fallacy (wrongly attributing group-level associations to individuals) or the atomistic fallacy (wrongly attributing individual-level associations to groups).

Estimation Methods: ML vs. REML

Maximum Likelihood (ML)

ML estimation simultaneously estimates all parameters (fixed effects and variance components) by maximizing the full likelihood. However, ML does not account for the degrees of freedom used in estimating fixed effects, which leads to downward-biased estimates of variance components—analogous to dividing by n instead of n−1 in sample variance estimation. ML is required when comparing models with different fixed effects structures (e.g., likelihood ratio tests for fixed effects).

Restricted Maximum Likelihood (REML)

REML estimation adjusts for the degrees of freedom lost in estimating fixed effects by restricting the likelihood to a subspace orthogonal to the fixed effects. This produces less biased variance component estimates, especially when the number of groups is small. REML is generally the preferred method for estimating variance components. However, REML log-likelihoods are not comparable across models with different fixed effects.

Inference in Mixed Models

Inference for fixed effects

Wald tests are commonly used to test individual fixed effects. These are approximate tests that rely on asymptotic theory. For finite samples, approximations such as the Satterthwaite or Kenward–Roger methods provide better reference distributions (t or F) by estimating effective degrees of freedom. These corrections are especially important when the number of groups is small.

Inference for random effects (variance components)

To test whether a variance component is significantly different from zero (e.g., H₀: σ²_g = 0), we use a likelihood ratio test comparing the model with and without that random effect. However, because the null value (0) is on the boundary of the parameter space, the standard χ² reference distribution is too conservative. The recommended correction is to halve the P-value from the χ² test.

Model comparison strategies

When comparing models with different random effects but the same fixed effects, use REML-based likelihood ratio tests (with P-value halving for boundary tests). When comparing models with different fixed effects, use ML-based likelihood ratio tests or information criteria (AIC, BIC). Always ensure that models being compared are nested and fit to the same data.

Section 3 Knowledge Check

1. A contextual effect exists when:

The within-group and between-group regressions of Y on X have different slopes All groups have identical means The ICC is zero There are no group-level predictors

A contextual effect is present when the effect of a predictor differs depending on whether you look at variation within groups or between groups, meaning the within-group and between-group slopes differ.

2. REML estimation is generally preferred over ML because:

It is computationally simpler It produces less biased estimates of variance components It always gives smaller standard errors It does not require iteration

REML (Restricted Maximum Likelihood) adjusts for the loss of degrees of freedom from estimating fixed effects, leading to less biased variance component estimates compared to ML, especially with fewer groups.

3. When testing whether a random effect variance is significantly different from zero:

Use a standard chi-square test Use a likelihood ratio test and halve the P-value Use a t-test No testing is possible

Because the null hypothesis (σ² = 0) is on the boundary of the parameter space, the standard chi-square reference distribution is too conservative. The recommended correction is to halve the P-value obtained from the chi-square distribution.

Reflection

Explain in your own words why the ecological fallacy can occur when contextual effects are ignored. Give an example from epidemiology where the group-level and individual-level associations might differ.

Reflection saved!

* Complete the quiz and reflection to continue.

Section 4

Prediction, Residuals & Diagnostics

⏱ Estimated time: 15 minutes

BLUPs: Best Linear Unbiased Predictors

In a mixed model, the random effects u are not directly observed—they are latent variables. We estimate them using BLUPs (Best Linear Unbiased Predictors), which are the predicted values of the random effects conditional on the observed data.

📊

BLUPs

Click to learn more

📈

Empirical Bayes

Click to learn more

🔍

Shrinkage

Click to learn more

The Shrinkage Factor

BLUPs are weighted averages of the group-specific estimate and the overall mean. The amount of shrinkage depends on the group size and the ICC:

Shrinkage Factor

Shrinkage = σ²_g / (σ²_g + σ² / m)

When m (the group size) is large, the shrinkage factor approaches 1 and the BLUP closely approximates the raw group mean. When m is small, the factor is closer to 0 and the BLUP is pulled substantially toward the overall mean.

Shrinkage in Practice

Consider a random intercept model for patient outcomes across hospitals. Suppose σ²_g = 5 and σ² = 20. For a hospital with 100 patients: shrinkage factor = 5/(5 + 20/100) = 5/5.2 = 0.96 — the BLUP is very close to the hospital’s raw mean. For a hospital with only 4 patients: shrinkage factor = 5/(5 + 20/4) = 5/10 = 0.50 — the BLUP is pulled halfway toward the overall mean. This borrowing of strength from other groups is a key advantage of mixed models.

Residuals in Mixed Models

Mixed models produce multiple sets of residuals, one for each level of the hierarchy:

Individual-level residuals (ε): the difference between observed values and the group-specific predictions (using BLUPs)
Group-level residuals (u): the BLUPs themselves, representing how each group deviates from the overall mean

Model Diagnostics

Checking model assumptions in mixed models involves examining residuals at each level of the hierarchy. A recommended strategy is to work from the highest hierarchical level downward:

Practical Diagnostic Advice

Start by examining group-level residuals (BLUPs): check for normality using Q-Q plots and look for influential groups. Then examine individual-level residuals: check normality, homoscedasticity, and look for outliers. Working top-down helps identify whether problems originate at the group level rather than being caused by a few individual outliers within groups. Also check that residuals at each level are uncorrelated with predictors and fitted values.

Box-Cox Transformation for Mixed Models

When model assumptions (normality, homoscedasticity) are violated, a Box-Cox transformation of the response variable may help. In mixed models, the procedure involves computing transformed Y values for a range of λ values, fitting the same model to each, and comparing log-likelihoods. Crucially, ML estimation (not REML) must be used for comparing log-likelihoods across different λ values, because REML likelihoods are not comparable when the response variable changes.

Section 4 Knowledge Check

1. BLUPs (Best Linear Unbiased Predictors) exhibit shrinkage, meaning:

They are always zero They are pulled toward the overall mean, especially for small groups They overestimate group effects They ignore the data entirely

Shrinkage occurs because BLUPs are weighted averages of the group-specific estimate and the overall mean. Smaller groups contribute less information, so their BLUPs are pulled more toward the overall mean.

2. When checking residuals in a mixed model, it is recommended to:

Only check individual-level residuals Start with the highest hierarchical level and work downward Ignore group-level residuals Only use visual inspection

Mixed models have residuals at multiple levels. Starting at the highest level helps identify whether problems (influential groups, non-normality) exist at the group level rather than being caused by individual outliers within groups.

3. The Box-Cox transformation in mixed models:

Is identical to Box-Cox in ordinary regression Requires ML (not REML) estimation for comparing log-likelihoods across different λ values Cannot be applied to mixed models Only works with random intercepts

When comparing models with different transformations of Y, ML estimation must be used because REML log-likelihoods are not comparable across models with different response variables. The transformation otherwise follows the same principles as in standard regression.

Reflection

A random intercept model for hospital costs has a group of 5 hospitals with only 3 patients each and another group of 20 hospitals with 100 patients each. How would shrinkage affect the BLUP estimates differently for these two groups?

Reflection saved!

* Complete the quiz and reflection to continue.

Final Assessment

Lesson 9 — Comprehensive Assessment

⏱ Estimated time: 25 minutes

This final assessment covers all material from this lesson. You must answer all 15 questions correctly (100%) and complete the final reflection to finish the lesson.

Final Reflection

Reflecting on this lesson, summarize the key decisions a researcher must make when fitting a linear mixed model (choosing fixed vs. random effects, estimation method, model diagnostics). Which aspect do you find most challenging?

Reflection saved!

Final Assessment (15 Questions)

1. In a linear mixed model, random effects are assumed to follow:

A uniform distribution A normal distribution with mean zero A binomial distribution No distributional assumption

Standard linear mixed models assume random effects follow a normal distribution with mean zero and some variance to be estimated (e.g., u ~ N(0, σ²_g)).

2. The variance component σ²_g in a random intercept model represents:

Within-group variance Between-group variance Total variance Measurement error variance

σ²_g captures the variance of the random intercepts across groups — that is, how much the group means vary around the overall mean, which is the between-group variance.

3. If σ²_g = 3 and σ² = 12, the ICC is:

0.15 0.20 0.25 0.33

ICC = σ²_g / (σ²_g + σ²) = 3 / (3 + 12) = 3/15 = 0.20.

4. A random slopes model differs from a random intercept model by:

Having no random effects Allowing predictor effects to vary across groups Removing the intercept Fixing all parameters

A random slopes model extends the random intercept model by allowing one or more regression coefficients (slopes) to vary randomly across groups.

5. The covariance between random intercepts and slopes in a random slopes model:

Must always be zero Should generally be estimated as part of the model Is always equal to 1 Has no interpretation

The covariance between random intercepts and slopes is a model parameter that should be estimated. It tells us whether groups with higher baseline levels tend to have stronger or weaker effects.

6. A contextual effect is detected when:

The between-group and within-group regression slopes differ significantly All groups have the same mean The ICC is exactly 1 There are no predictors in the model

A significant contextual effect means the group-level (between-group) relationship between X and Y differs from the individual-level (within-group) relationship.

7. Group-mean centering replaces X_1i with:

X_1i − X (overall mean) X_1i − X_1,group(i) (group mean) X_1i² log(X_1i)

Group-mean centering subtracts the group mean from each individual value, creating a variable Z_1i = X_1i − X_1,group(i) that captures purely within-group variation.

8. REML differs from ML estimation in that REML:

Does not use the likelihood function Adjusts for degrees of freedom lost in estimating fixed effects, producing less biased variance estimates Is computationally much slower Cannot handle random slopes

REML restricts the likelihood to a subspace orthogonal to the fixed effects, effectively adjusting for the degrees of freedom used in estimating β, which reduces bias in variance component estimates.

9. When comparing ML and REML for model selection:

REML should be used to compare models with different fixed effects ML should be used to compare models with different fixed effects Either can always be used interchangeably Neither is appropriate for model comparison

REML likelihoods are not comparable when the fixed effects differ between models (because the restricted subspace changes). ML must be used for comparing models with different fixed effects structures.

10. Wald tests in mixed models:

Are exact Are approximate and use reference distributions that may need finite-sample corrections Cannot be used for fixed effects Always use a chi-square distribution

Wald tests in mixed models are approximate because the reference distribution (Z or t) is only asymptotically correct. Finite-sample corrections (e.g., Satterthwaite approximation) provide better F or t reference distributions.

11. BLUPs are also known as:

Maximum likelihood estimates Empirical Bayes estimates Ordinary least squares estimates Design effects

BLUPs are called empirical Bayes estimates because they can be derived from a Bayesian framework where the random effects have a prior (normal) distribution, and the “empirical” part refers to estimating the prior parameters from data.

12. Shrinkage in BLUPs is stronger when:

The group sample size is large The group sample size is small and/or the ICC is small All groups are identical The ICC is 1

The shrinkage factor σ²_g/(σ²_g + σ²/m) approaches 0 (more shrinkage) when m is small or when σ²_g is small relative to σ². Small groups have less information, so their estimates are pulled more toward the overall mean.

13. Mixed models have residuals:

Only at the individual level At each level of the random effects hierarchy Only at the group level No residuals

Mixed models produce residuals at each level: individual-level residuals (ε) and predicted values of random effects at each grouping level, all of which should be examined in diagnostics.

14. The Box-Cox transformation for mixed models requires using ML rather than REML because:

REML is not available for transformed data REML log-likelihoods are not comparable across models with different response variables ML is always preferred Box-Cox only works with ML

Since Box-Cox involves comparing models with different transformations of Y, and REML likelihoods depend on the fixed effects design matrix (which changes meaning when Y is transformed), ML must be used for valid comparisons.

15. In a hierarchical model formulation, each predictor can enter the model:

Only as a fixed effect As a fixed effect, a random effect at higher levels, or both Only as a random effect Only at the lowest level

The hierarchical model perspective shows that every predictor can potentially have effects at multiple levels — as a fixed (population-average) effect and as a random (group-varying) effect at one or more higher levels.

Lesson 9 Complete!

Congratulations! You have successfully completed the Mixed Models for Continuous Data module. Your responses have been downloaded automatically.

HSCI 410 — Lesson 9

Exploratory Data Analysis For Epidemiology

Mixed Models for Continuous Data

Learning objectives for this lesson:

Introduction & The Linear Mixed Model

Fixed vs. Random Effects

Variance Components

The Linear Mixed Model Equation

The Intraclass Correlation Coefficient (ICC)

Matrix Notation

Section 1 Knowledge Check

Reflection

Random Slopes & Hierarchical Models

Random Slopes

Random Intercept Model

Random Intercept + Random Slope Model

The Covariance Matrix of Random Effects

Hierarchical Model Interpretation

Practical Considerations

Section 2 Knowledge Check

Reflection

Contextual Effects & Statistical Analysis

Contextual Effects

Group-Mean Centering

Estimation Methods: ML vs. REML

Maximum Likelihood (ML)

Restricted Maximum Likelihood (REML)

Inference in Mixed Models

Section 3 Knowledge Check

Reflection

Prediction, Residuals & Diagnostics

BLUPs: Best Linear Unbiased Predictors

The Shrinkage Factor

Residuals in Mixed Models

Model Diagnostics

Box-Cox Transformation for Mixed Models

Section 4 Knowledge Check

Reflection

Lesson 9 — Comprehensive Assessment

Final Reflection

Final Assessment (15 Questions)

Lesson 9 Complete!