Mixed Models for Discrete Data

Exploratory Data Analysis For Epidemiology

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Understand the generalised linear mixed model (GLMM) framework for discrete outcomes
Write and interpret logistic regression models with random effects
Distinguish between subject-specific (SS) and population-averaged (PA) interpretations
Calculate the median odds ratio (MOR) and latent variable ICC for binary outcomes
Apply mixed models to count data using Poisson regression with random effects
Understand estimation challenges in GLMMs including ML, quasi-likelihood, and Laplace approximation
Extend mixed models to ordinal, multinomial, and other discrete outcome types
Evaluate when different estimation methods are appropriate and their trade-offs

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 2

GLMMs for Count, Binary & Categorical Data

⏱ Estimated time: 20 minutes

Poisson Regression with Random Effects

For count outcomes, Poisson regression with random effects models the log of the expected count as a function of predictors plus random effects:

Poisson GLMM with Random Intercept

log(μ_ij) = β₀ + β₁X_1ij + u_group(i)

Because the random effects enter on the log scale, they translate to multiplicative effects on the rate scale: μ = exp(Xβ) × exp(u). The group-level multipliers exp(u) follow a log-normal distribution when u is normally distributed.

Poisson with Normal Random Effects

The standard GLMM approach uses normally distributed random effects on the log scale. The group-level effects ν_group = exp(u_group) are then log-normally distributed. This is the most common parameterisation in software.

This approach is flexible and can accommodate multiple levels of random effects, random slopes, and complex covariance structures—just like the linear mixed model framework.

Poisson with Log-Gamma Random Effects

An alternative parameterisation uses gamma-distributed random effects on the rate scale (equivalently, log-gamma on the log scale). When u follows a log-gamma distribution, the marginal distribution of counts has a convenient closed form.

This connection leads directly to the negative binomial distribution: a Poisson model with gamma-distributed rates yields negative binomial counts.

Negative Binomial as a Random Effects Model

The negative binomial distribution can be interpreted as a Poisson-gamma mixture: counts follow a Poisson distribution, but the rate varies across units according to a gamma distribution. This naturally accounts for overdispersion (variance > mean).

The negative binomial can be extended with additional random effects at higher levels, combining the overdispersion correction with explicit hierarchical structure.

Model	Random Effect Distribution	Marginal Distribution	Key Feature
Poisson + Normal RE	Normal on log scale	No closed form	Standard GLMM; flexible
Poisson + Gamma RE	Gamma on rate scale	Negative binomial	Closed-form marginal
Negative Binomial	Implicit gamma	Negative binomial	Handles overdispersion

GLMMs for Other Discrete Outcomes

Binary data: alternative link functions

While the logit link is most common for binary data, probit and complementary log-log links can also be used in GLMMs. For probit models, the SS-to-PA conversion uses a constant of 1 instead of 0.346: β^PA ≈ β^SS / √(1 + σ²_g). This is because the probit model uses the normal distribution, whose variance is 1.

Ordinal data: proportional odds with random effects

The proportional odds model for ordinal outcomes can be extended by adding random effects to the latent variable underlying the ordinal categories. The subject-specific interpretation and the latent variable ICC methods from logistic regression apply similarly. Random effects capture cluster-level variation in the propensity to be in higher or lower categories.

Multinomial data: random effects models

Random effects multinomial logistic models are less common and harder to estimate. The computational burden increases because each category (beyond the reference) has its own set of parameters, and the random effects may need to be correlated across categories. Specialised software and careful model specification are required.

Zero-inflated models with random effects

Zero-inflated models combine a point mass at zero with a count distribution. Random effects can be added to the count part, the zero-inflation part, or both. This flexibility allows the model to capture clustering in both the probability of being a “structural zero” and in the count process among non-zeros.

Choosing Between Models for Count Data

When faced with overdispersed count data, consider: (1) Is the overdispersion due to unmeasured heterogeneity between known clusters? Use a Poisson GLMM. (2) Is the overdispersion due to general extra-Poisson variation without a clear clustering structure? A negative binomial may suffice. (3) Are there excess zeros beyond what either model predicts? Consider a zero-inflated model. Often, comparing model fit statistics (AIC, BIC) across competing models is the practical approach.

Reflection

A researcher finds that a Poisson model for disease counts across 50 communities has significant overdispersion. What are the relative advantages of addressing this with random effects versus using a negative binomial model?

Reflection saved!

* Complete the quiz and reflection to continue.

Section 3

Estimation Methods for GLMMs

⏱ Estimated time: 20 minutes

The Estimation Challenge

Unlike linear mixed models, the likelihood in a GLMM cannot be computed in closed form. The likelihood involves an integral over the random effects distribution that generally has no analytic solution. This is the fundamental computational challenge of GLMMs, and different estimation methods represent different strategies for handling this integral.

Maximum Likelihood (ML) Estimation

ML estimation is the gold standard for GLMMs. It uses Gauss-Hermite quadrature to numerically approximate the integral over the random effects. The integrand is evaluated at carefully chosen points (quadrature points), and a weighted sum provides the approximation.

Adaptive quadrature improves accuracy by centring and scaling the quadrature points based on the mode and curvature of each cluster’s contribution to the likelihood. The number of quadrature points controls accuracy: more points yield better approximations but require more computation. Default values are typically around 7.

ML estimation can be computationally intensive or unstable, especially with multiple random effects (where the dimensionality of integration grows) or with sparse data.

Quasi-Likelihood (QL) Estimation

Quasi-likelihood methods avoid numerical integration by using Taylor expansions to linearise the model, then applying iterative weighted least squares. Key variants include:

First-order vs. second-order Taylor expansion: second-order is more accurate
MQL (Marginal Quasi-Likelihood): omits random effect predictions from the working variate; gives PA interpretation
PQL (Penalised Quasi-Likelihood): includes random effect predictions; gives estimates closer to SS interpretation

PQL with second-order expansion is generally preferred among QL methods. However, simulation studies show that first-order MQL can be markedly biased, especially with large variance components or small cluster sizes.

Laplace Approximation

The Laplace approximation corresponds to adaptive quadrature with a single quadrature point—the lowest order of adaptive quadrature. It approximates the integral by a normal distribution centred at the mode of the integrand.

Laplace is intermediate in both accuracy and computational cost between full quadrature ML and quasi-likelihood. It is widely available in software (e.g., glmer in R uses Laplace by default) and provides a good starting point before increasing quadrature points.

Method	Accuracy	Computation	Interpretation
ML (Adaptive Quadrature)	High (gold standard)	High; increases with random effects	SS
Laplace Approximation	Moderate (1 quad point)	Moderate	SS
PQL (2nd order)	Moderate	Low	Close to SS
MQL (2nd order)	Lower	Low	PA
MQL (1st order)	Lowest; may be biased	Lowest	PA

Practical Recommendations

Check stability: Vary the number of quadrature points and confirm that estimates do not change substantially. Compare methods: Run both ML and QL and check agreement. Use caution: With small clusters, binary outcomes, or large random effects variances, simpler QL methods may be markedly biased—prefer ML or at least second-order PQL. Start simple: Begin with Laplace and increase quadrature points if feasible.

How Gauss-Hermite quadrature works

Gauss-Hermite quadrature approximates integrals of the form ∫ f(x) exp(-x²) dx by a weighted sum: ∑ w_k f(x_k), where x_k are the quadrature points and w_k are the corresponding weights. The points and weights are chosen to give exact results for polynomial integrands up to a certain degree. For GLMM likelihoods, more quadrature points provide better approximations to the non-polynomial integrand.

Why adaptive quadrature is better

Non-adaptive quadrature uses fixed points centred at zero. Adaptive quadrature shifts and scales the points to match the mode and curvature of each cluster’s integrand. This means fewer points are needed for the same accuracy, and the method is more robust when random effects are large or when clusters differ substantially.

When QL methods fail

Quasi-likelihood methods can produce seriously biased estimates when: (1) random effects variance is large, (2) cluster sizes are small (especially with binary data), (3) prevalence is extreme (close to 0 or 1), or (4) the model has crossed random effects. In these situations, ML estimation with sufficient quadrature points is strongly preferred.

Reflection

You are fitting a 3-level logistic GLMM and the ML estimates are unstable. What steps would you take to diagnose the problem and what alternative estimation approaches might you consider?

Reflection saved!

* Complete the quiz and reflection to continue.

Section 4

Inference, Diagnostics & Other Random Effects Models

⏱ Estimated time: 15 minutes

Inference for Fixed and Random Effects

Testing and constructing confidence intervals in GLMMs involves the same general principles as in linear mixed models, but with additional complications related to the estimation method used.

For fixed effects, Wald-type tests (based on the estimate divided by its SE) are most common. However, Wald statistics can be unreliable when parameters are near boundary values. Likelihood-based inference (likelihood ratio tests and profile likelihood confidence intervals) is preferred when feasible, but requires ML estimation—not quasi-likelihood.

For random effect variances, the same boundary issues discussed for linear mixed models apply: variance parameters cannot be negative, so the usual chi-square distribution for LR tests may not be appropriate near zero.

Caution with Wald Tests Near Boundaries

Wald statistics assume the sampling distribution of the parameter estimate is approximately normal. This assumption breaks down when parameters are near boundary values (e.g., variance components near zero, or probabilities near 0 or 1). In such cases, confidence intervals based on Wald statistics can include impossible values, and p-values may be misleading. Likelihood-based methods are more reliable.

Alternative Random Effects Models

📊

GLMM (Standard)

Click to learn more

📈

Beta-Binomial

Click to learn more

📉

Negative Binomial

Click to learn more

Approach	Likelihood	Multi-Level	Individual Predictors	Computational Cost
GLMM (normal RE)	Requires integration	Yes	Yes	High
Beta-binomial	Closed form	No	Limited	Low
Negative binomial	Closed form	With extensions	Yes	Low to moderate

Practical Guidance

When to use the beta-binomial

The beta-binomial model is most appropriate when you have grouped binary data (e.g., proportion of animals testing positive in each herd) with only group-level predictors. Its closed-form likelihood makes it computationally simple, and it directly estimates the ICC. However, it cannot accommodate individual-level predictors or multiple hierarchical levels.

Simulation evidence on estimation methods

Simulation studies consistently show that first-order MQL can be markedly biased, underestimating both fixed effects and variance components. Second-order PQL performs better but can still be biased with small cluster sizes or large variances. ML with adaptive quadrature is generally the most accurate, though computational constraints may require starting with Laplace approximation.

Choosing the right estimation method

Start with Laplace (the default in many packages). If feasible, increase to adaptive quadrature with 7+ points and check that estimates are stable. If ML is computationally prohibitive (e.g., many random effects), use second-order PQL and compare results with Laplace. Always report which estimation method was used and, ideally, show sensitivity to the choice.

Reflection

Compare the GLMM approach to the beta-binomial approach for modeling clustered binary data. In what situations would each be preferred?

Reflection saved!

* Complete the quiz and reflection to continue.

HSCI 410 — Lesson 10

Exploratory Data Analysis For Epidemiology

Mixed Models for Discrete Data

Learning objectives for this lesson:

Introduction & Logistic Regression with Random Effects

The Generalised Linear Mixed Model (GLMM)

Subject-Specific vs. Population-Averaged Interpretation

Measures of Cluster Heterogeneity

Median Odds Ratio (MOR)

ICC for Binary Data

The Latent Variable Approach

Section 1 Knowledge Check

Reflection

GLMMs for Count, Binary & Categorical Data

Poisson Regression with Random Effects

Poisson with Normal Random Effects

Poisson with Log-Gamma Random Effects

Negative Binomial as a Random Effects Model

GLMMs for Other Discrete Outcomes

Section 2 Knowledge Check

Reflection

Estimation Methods for GLMMs

The Estimation Challenge

Maximum Likelihood (ML) Estimation

Quasi-Likelihood (QL) Estimation

Laplace Approximation

Section 3 Knowledge Check

Reflection

Inference, Diagnostics & Other Random Effects Models

Inference for Fixed and Random Effects

Alternative Random Effects Models

Practical Guidance

Section 4 Knowledge Check

Reflection

Lesson 10 — Comprehensive Assessment

Final Reflection

Final Assessment (15 Questions)

Lesson 10 Complete!