Repeated Measures Data

Exploratory Data Analysis For Epidemiology

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Recognize and describe the unique characteristics of repeated measures data structures
Use descriptive and graphical tools to explore repeated measures datasets
Apply simple univariate approaches (separate time point analyses, summary statistics) to analyze repeated measures
Understand the limitations of random-intercept mixed models for repeated measures and why correlation structures matter
Choose among correlation structures (compound symmetry, AR(1), ARMA(1,1), Toeplitz, unstructured) for repeated measures
Apply linear mixed models with appropriate correlation structures to repeated measures data
Understand trend models with random slopes for time
Describe the challenges of extending GLMMs to discrete repeated measures data including transition models
Use GEE procedures to analyze clustered and repeated measures data

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 2

Univariate & Multivariate Approaches

⏱ Estimated time: 20 minutes

Simple Approaches to Repeated Measures

Before turning to complex mixed models, it is worth understanding the simpler methods that have traditionally been used for repeated measures data. These methods either reduce the data to avoid modelling correlations altogether, or make strong assumptions about the correlation structure.

Separate Time Point Analysis

The simplest approach is to analyze each time point independently—for example, running a separate t-test or regression at each visit. This is straightforward but wasteful: it ignores the within-subject correlations and creates a multiple testing problem. If there are m time points, a Bonferroni correction divides α by m, which can be very conservative.

Summary Statistics Approach

A more elegant simple approach is to compute a single summary value per subject—such as the slope of their trajectory, the drop from first to last measurement, or the area under the curve (AUC)—and then perform a standard between-subjects analysis on these summaries.

Advantages: Simple, robust to model assumptions about correlation structure, and easy to interpret.

Disadvantages: Loss of information about the temporal pattern, difficulty incorporating within-subject time-varying predictors, and potential loss of power.

Repeated Measures ANOVA

Repeated measures ANOVA treats time as a within-subject factor and tests for differences across time points. However, it assumes compound symmetry—that all pairs of time points have the same correlation. This is the same assumption as a random intercept model.

When compound symmetry is violated (which is common with autocorrelated data), the F-test becomes liberal (anti-conservative). The Huyhn-Feldt correction factor (ε) adjusts the degrees of freedom to account for this violation. When ε = 1, compound symmetry holds perfectly; as ε decreases, the violation is more severe.

MANOVA (Multivariate Analysis of Variance)

MANOVA treats the entire vector of repeated measurements as a multivariate outcome, making no assumptions about the correlation structure. This is its key advantage over repeated measures ANOVA.

Limitations: Requires completely balanced data with no missing values, cannot easily handle within-subject continuous predictors, and uses wide-format data. It also becomes impractical with many time points.

Covariance and Correlation Matrices

Covariance Matrix (Eq 23.1)

Σ = Cov(Y_i) — an m × m matrix of variances and covariances across time points

Correlation Matrix (Eq 23.2)

R = Corr(Y_i) — standardised version with 1s on the diagonal and correlations off-diagonal

Limitations of Each Approach

Separate time points: Multiple testing, ignores correlations, wasteful of information. Summary statistics: Loses temporal detail, cannot incorporate time-varying covariates. RM ANOVA: Assumes compound symmetry, which is rarely true. MANOVA: Requires complete, balanced data with no missing values. All of these limitations motivate the use of mixed models with flexible correlation structures.

Approach	Handles Missing Data?	Assumes Equal Correlations?	Time-Varying Covariates?
Separate Time Points	Yes (per time point)	N/A (ignores structure)	Yes
Summary Statistics	Partially	No	No
RM ANOVA	No	Yes (compound symmetry)	No
MANOVA	No	No	No
Mixed Models	Yes	Flexible	Yes

Reflection

When would you choose a simple summary statistic approach over a mixed model for repeated measures data? What information might you lose by simplifying the analysis in this way?

Reflection saved!

* Complete the quiz and reflection to continue.

Section 3

Linear Mixed Models with Correlation Structure

⏱ Estimated time: 20 minutes

Beyond Random Intercepts

A random intercept model assumes compound symmetry—that all pairs of measurements on the same subject are equally correlated. For most repeated measures data, this assumption is violated because of autocorrelation. We need to extend the mixed model to include explicit correlation structures for the error term ε.

Choosing a Correlation Structure

The choice of correlation structure is one of the most important decisions in repeated measures analysis. Start by examining the empirical correlation matrix. If correlations clearly decay with increasing time lag, consider AR(1) or ARMA(1,1). If the decay is minimal, compound symmetry may suffice. If the pattern is complex, consider Toeplitz or unstructured. Use AIC to compare non-nested structures and likelihood ratio tests for nested ones.

Key Correlation Structures

Compound Symmetry (Exchangeable)

All pairs of measurements have the same correlation ρ, regardless of how far apart in time they are. This is the simplest structure and is equivalent to a random intercept model. It has only 1 correlation parameter.

When appropriate: When there is no autocorrelation—i.e., the correlation between measurements does not depend on time distance. This is rare in practice for true repeated measures data.

First-Order Autoregressive — AR(1)

Correlations decay as powers of ρ with increasing time distance: Corr(Y_j, Y_k) = ρ^|j−k|. This produces an exponential decay pattern. It has only 1 parameter (ρ) and is a good default for equally spaced repeated measures.

When appropriate: When the correlation matrix shows a clear pattern of decreasing correlations with increasing time lag, and the decay appears approximately geometric.

ARMA(1,1)

An extension of AR(1) that allows a slower or more flexible decay in correlations. It has 2 parameters and can accommodate patterns where the initial drop in correlation is steep but then levels off.

Toeplitz (Stationary)

Each lag has its own unconstrained correlation. For m time points, there are m − 1 correlation parameters. The structure is “banded”—the correlation depends only on the time lag, not on which specific time points are involved.

When appropriate: When the pattern of decay is irregular and cannot be well approximated by AR(1) or ARMA, but you still believe the correlation depends only on lag distance.

Unstructured

Completely unconstrained correlations and variances for each pair of time points. For m time points, there are m(m+1)/2 parameters. This is the most flexible but requires the most parameters.

When appropriate: Only with few time points and large sample sizes. With many time points, the number of parameters becomes impractical.

AR(1) Correlation Structure

Corr(Y_j, Y_k) = ρ^|j−k| — e.g., lag 1: ρ, lag 2: ρ², lag 3: ρ³

Structure	Parameters	Key Feature	Assumption
Compound Symmetry	1	Equal correlations	No autocorrelation
AR(1)	1	Geometric decay	Equidistant time points
ARMA(1,1)	2	Flexible decay	Equidistant time points
Toeplitz	m − 1	Lag-specific correlations	Equidistant time points
Unstructured	m(m+1)/2	Completely flexible	None

Combining Random Effects with Correlation Structures

An important practical consideration is how random effects interact with error correlation structures. Some combinations are redundant and cannot be separately identified:

Random intercepts + compound symmetry errors = redundant — both produce the same correlation structure
Random intercepts + AR(1) errors = useful — produces a structure where correlations decay but do not reach zero
Unstructured errors + random effects = pointless — the unstructured covariance already captures everything

Covariance pattern models use no random effects at all, relying entirely on the structured covariance of the errors to capture within-subject correlation.

Model Selection

For nested correlation structures (e.g., AR(1) is nested within Toeplitz), use likelihood ratio tests. For non-nested structures (e.g., AR(1) vs. compound symmetry), use AIC or similar information criteria. Models should be compared with the same fixed effects and random effects structure.

Example: Comparing Correlation Structures

In a study with 6 equally-spaced measurements, the empirical correlations ranged from 0.72 (lag 1) to 0.31 (lag 5). An AR(1) model with ρ = 0.73 fit well (AIC = 2,341), while compound symmetry (AIC = 2,398) fit poorly because it predicted equal correlations of 0.52 at all lags. The Toeplitz model (AIC = 2,338) offered a slight improvement over AR(1) but used 4 more parameters. Based on parsimony, AR(1) was selected.

Reflection

A study measures blood pressure at 6 monthly visits. The correlation between visits 1 and 2 is 0.60, between visits 1 and 6 is 0.15. Which correlation structure would you initially consider, and why?

Reflection saved!

* Complete the quiz and reflection to continue.

Section 4

Trend Models, Discrete Outcomes & GEE

⏱ Estimated time: 20 minutes

Trend Models with Random Slopes

An alternative to modelling the error correlation directly is to include random slopes for time. This allows each subject to have their own rate of change (growth or decline) over time, with the population-average trend captured by the fixed effect of time.

The variation in individual trajectories naturally induces autocorrelation—subjects who start high and decline slowly will have correlated measurements. This can be sufficient to capture the temporal structure in many datasets, especially when the primary interest is in individual trajectories.

The time variable can be parameterized in different ways: linear (for constant rates of change), polynomial (for curved trajectories), or log-transformed (for rapid early change that levels off).

Discrete Repeated Measures Data

Extending mixed models to discrete outcomes (binary, count) with correlation structures is much harder than for continuous outcomes. The fundamental challenge is that in GLMs, the error term and the linear predictor operate on different scales—the link function transforms the relationship, making it difficult to add correlation structures to the error term in a meaningful way.

When to Use GEE vs. Mixed Models

Use GEE when your research question focuses on population-averaged (marginal) effects—for example, “What is the average treatment effect across the population?” Use mixed models when you want subject-specific (conditional) effects or when the random effects themselves are of scientific interest—for example, “How much do individual subjects vary in their response?”

Transition Models

One approach for discrete repeated measures is the transition model, which includes the previous outcome as a predictor. This captures autocorrelation informally through dependence on the prior outcome.

Transition Model (Eq 23.5)

logit(p_ij) = Xβ + Zu + γY_i,j−1

Here, γ is the log odds ratio comparing those with versus without the previous event. A positive γ means that having the event at the previous time point increases the odds of having it at the current time point.

Generalised Estimating Equations (GEE)

GEE is a population-averaged (marginal) approach that does not require specifying random effects. Instead, it specifies a “working” correlation structure and uses robust (sandwich) standard errors that provide valid inference even if the working correlation is misspecified.

Trend Models

Trend models add random slopes for time, allowing each subject to have their own trajectory. The random slope induces autocorrelation through the variation in individual trajectories. This approach is particularly natural when the scientific question is about individual growth or decline rates.

Key considerations: Choice of time parameterization (linear, polynomial, log), whether to include both random intercepts and slopes, and whether the induced autocorrelation is sufficient or additional error correlation is needed.

Transition Models

Transition models include the previous outcome Y_i,j−1 as a predictor in the model. The coefficient γ represents the log OR for the event given the previous event occurred. This approach is intuitive and can be combined with random effects.

Limitations: Difficult to interpret coefficients for other predictors (they are conditional on the previous outcome), requires careful handling of the first observation (which has no “previous” value), and may not fully capture complex autocorrelation patterns.

Generalised Estimating Equations (GEE)

GEE estimates population-averaged effects using a quasi-likelihood approach. Key features:

Specifies a working correlation (e.g., exchangeable, AR(1), unstructured)
With robust (sandwich) SEs, inference is valid even if the working correlation is wrong
Requires enough clusters/subjects (≥20–30) for reliable sandwich SEs
Cannot estimate cluster-specific (random) effects—gives only PA estimates
Better working correlation = more efficient estimates (but always valid with robust SEs)

👥

Population-Averaged

Click to learn more

👤

Subject-Specific

Click to learn more

🔐

Robust (Sandwich) SEs

Click to learn more

Example: GEE Analysis of Repeated Binary Outcome

A study followed 200 patients over 4 visits, recording whether they experienced a symptom (yes/no) at each visit along with a treatment indicator. A GEE model with exchangeable working correlation and robust SEs estimated the treatment OR as 0.65 (95% CI: 0.48–0.88), suggesting treatment reduced the odds of symptoms by 35% on average across the population. The working correlation was estimated as 0.42.

Feature	GEE	Mixed Models (GLMM)
Estimate type	Population-averaged (PA)	Subject-specific (SS)
Random effects	Not estimated	Estimated
Correlation	Working correlation + robust SEs	Explicit random effects / correlation
Missing data assumption	MCAR	MAR
Minimum clusters	≥20–30	Fewer acceptable
Best for	PA inference	SS inference, variance components

Reflection

Compare the GEE approach and the mixed model approach for analyzing repeated binary outcomes. In what research context would you prefer each approach, and why?

Reflection saved!

* Complete the quiz and reflection to continue.

HSCI 410 — Lesson 11

Exploratory Data Analysis For Epidemiology

Repeated Measures Data

Learning objectives for this lesson:

Introduction & Descriptive Approaches

What Are Repeated Measures?

Key Terminology

Missing Data and Drop-Outs

Descriptive Approaches

Section 1 Knowledge Check

Reflection

Univariate & Multivariate Approaches

Simple Approaches to Repeated Measures

Separate Time Point Analysis

Summary Statistics Approach

Repeated Measures ANOVA

MANOVA (Multivariate Analysis of Variance)

Covariance and Correlation Matrices

Section 2 Knowledge Check

Reflection

Linear Mixed Models with Correlation Structure

Beyond Random Intercepts

Key Correlation Structures

Compound Symmetry (Exchangeable)

First-Order Autoregressive — AR(1)

ARMA(1,1)

Toeplitz (Stationary)

Unstructured

Combining Random Effects with Correlation Structures

Model Selection

Section 3 Knowledge Check

Reflection

Trend Models, Discrete Outcomes & GEE

Trend Models with Random Slopes

Discrete Repeated Measures Data

Transition Models

Generalised Estimating Equations (GEE)

Trend Models

Transition Models

Generalised Estimating Equations (GEE)

Section 4 Knowledge Check

Reflection

Lesson 11 — Comprehensive Assessment

Final Reflection

Final Assessment (15 Questions)

Lesson 11 Complete!