HSCI 410 — Lesson 10

Mixed Models for Discrete Data

Exploratory Data Analysis For Epidemiology

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Understand the generalised linear mixed model (GLMM) framework for discrete outcomes
  • Write and interpret logistic regression models with random effects
  • Distinguish between subject-specific (SS) and population-averaged (PA) interpretations
  • Calculate the median odds ratio (MOR) and latent variable ICC for binary outcomes
  • Apply mixed models to count data using Poisson regression with random effects
  • Understand estimation challenges in GLMMs including ML, quasi-likelihood, and Laplace approximation
  • Extend mixed models to ordinal, multinomial, and other discrete outcome types
  • Evaluate when different estimation methods are appropriate and their trade-offs

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Introduction & Logistic Regression with Random Effects

⏱ Estimated time: 20 minutes

The Generalised Linear Mixed Model (GLMM)

Generalised linear mixed models (GLMMs) extend generalised linear models (GLMs) by adding random effects to the linear predictor. Just as linear mixed models allowed us to handle clustered continuous data, GLMMs handle clustered discrete data—binary, count, ordinal, and multinomial outcomes—while accounting for the correlation within clusters.

Key Concept

The core idea of a GLMM is straightforward: take the linear predictor from a GLM and add random effects. For a logistic regression with a random intercept, the model becomes: logit(pij) = β0 + β1X1ij + ugroup(i), where ugroup(i) ~ N(0, σ²g). The random effect u captures between-cluster variation in the log-odds of the outcome.

Subject-Specific vs. Population-Averaged Interpretation

A critical distinction in GLMMs for discrete data is between subject-specific (SS) and population-averaged (PA) interpretations of the model parameters. This distinction does not arise in linear mixed models but is fundamental for logistic and other non-linear models.

👤
Subject-Specific (SS)
Click to learn more
👥
Population-Averaged (PA)
Click to learn more

Because of the non-collapsibility of the odds ratio, SS coefficients are always larger in magnitude than PA coefficients. The approximate conversion formula is:

SS-to-PA Conversion for Logistic Models (Eq 22.2)
βPA ≈ βSS / √(1 + 0.346 × σ²g)
Comparing SS and PA Interpretations

Consider a study of antibiotic resistance in cattle herds. A logistic GLMM with a random intercept for herd yields βSS = 0.85 for a treatment variable, with σ²g = 1.2. The SS odds ratio is exp(0.85) = 2.34—within any given herd, treatment doubles the odds of resistance.

Converting to PA: βPA ≈ 0.85 / √(1 + 0.346 × 1.2) = 0.85 / 1.19 = 0.71. The PA odds ratio is exp(0.71) = 2.03—across the population of herds, the average effect is smaller. The SS estimate is larger because it conditions on a specific cluster.

Measures of Cluster Heterogeneity

Median Odds Ratio (MOR)

The Median Odds Ratio quantifies between-cluster heterogeneity on the odds ratio scale. If you randomly select two clusters and compare a person from the higher-risk cluster to a person from the lower-risk cluster (with the same covariates), the MOR is the median of that odds ratio distribution.

Median Odds Ratio (Eq 22.5)
MOR = exp(√(2σ²g) × 0.6745)

An MOR of 1 means no between-cluster variation. Larger values indicate greater heterogeneity. The MOR allows direct comparison of the importance of cluster-level variation relative to fixed-effect odds ratios.

ICC for Binary Data

For binary outcomes, the latent variable ICC uses the fact that the individual-level variance on the logistic scale is π²/3 ≈ 3.29:

Latent Variable ICC for Logistic Models
ρ = σ²g / (σ²g + π²/3)

This provides an estimate of the proportion of total variance (on the latent scale) attributable to between-cluster differences.

The Latent Variable Approach

The binary outcome can be thought of as the result of thresholding a continuous latent variable. If the latent variable exceeds a threshold, the outcome is 1; otherwise, it is 0. In the logistic model, the individual-level error follows a standard logistic distribution with variance π²/3. This framework provides the basis for both the ICC calculation and the SS-to-PA conversion.

Section 1 Knowledge Check

1. In a logistic GLMM, subject-specific (SS) coefficients are:

Due to the non-collapsibility of the odds ratio, SS (conditional) coefficients are always larger in magnitude than PA (marginal) coefficients. The SS interpretation is conditional on the random effect being at its mean value.

2. The Median Odds Ratio (MOR) measures:

The MOR quantifies how much the odds of the outcome vary between two randomly chosen clusters. An MOR of 1 indicates no between-cluster variation; larger values indicate greater heterogeneity.

3. The ICC for binary outcomes using the latent variable approach uses:

For logistic models, the latent variable approach assumes the individual-level variance on the logit scale is π²/3, so ICC = σ²g / (σ²g + π²/3).

Reflection

Why does the distinction between subject-specific and population-averaged estimates matter in practice? Think of a scenario where you would prefer one interpretation over the other.

Reflection saved!
* Complete the quiz and reflection to continue.
Section 2

GLMMs for Count, Binary & Categorical Data

⏱ Estimated time: 20 minutes

Poisson Regression with Random Effects

For count outcomes, Poisson regression with random effects models the log of the expected count as a function of predictors plus random effects:

Poisson GLMM with Random Intercept
log(μij) = β0 + β1X1ij + ugroup(i)

Because the random effects enter on the log scale, they translate to multiplicative effects on the rate scale: μ = exp(Xβ) × exp(u). The group-level multipliers exp(u) follow a log-normal distribution when u is normally distributed.

Poisson with Normal Random Effects

The standard GLMM approach uses normally distributed random effects on the log scale. The group-level effects νgroup = exp(ugroup) are then log-normally distributed. This is the most common parameterisation in software.

This approach is flexible and can accommodate multiple levels of random effects, random slopes, and complex covariance structures—just like the linear mixed model framework.

Poisson with Log-Gamma Random Effects

An alternative parameterisation uses gamma-distributed random effects on the rate scale (equivalently, log-gamma on the log scale). When u follows a log-gamma distribution, the marginal distribution of counts has a convenient closed form.

This connection leads directly to the negative binomial distribution: a Poisson model with gamma-distributed rates yields negative binomial counts.

Negative Binomial as a Random Effects Model

The negative binomial distribution can be interpreted as a Poisson-gamma mixture: counts follow a Poisson distribution, but the rate varies across units according to a gamma distribution. This naturally accounts for overdispersion (variance > mean).

The negative binomial can be extended with additional random effects at higher levels, combining the overdispersion correction with explicit hierarchical structure.

ModelRandom Effect DistributionMarginal DistributionKey Feature
Poisson + Normal RENormal on log scaleNo closed formStandard GLMM; flexible
Poisson + Gamma REGamma on rate scaleNegative binomialClosed-form marginal
Negative BinomialImplicit gammaNegative binomialHandles overdispersion

GLMMs for Other Discrete Outcomes

Binary data: alternative link functions

While the logit link is most common for binary data, probit and complementary log-log links can also be used in GLMMs. For probit models, the SS-to-PA conversion uses a constant of 1 instead of 0.346: βPA ≈ βSS / √(1 + σ²g). This is because the probit model uses the normal distribution, whose variance is 1.

Ordinal data: proportional odds with random effects

The proportional odds model for ordinal outcomes can be extended by adding random effects to the latent variable underlying the ordinal categories. The subject-specific interpretation and the latent variable ICC methods from logistic regression apply similarly. Random effects capture cluster-level variation in the propensity to be in higher or lower categories.

Multinomial data: random effects models

Random effects multinomial logistic models are less common and harder to estimate. The computational burden increases because each category (beyond the reference) has its own set of parameters, and the random effects may need to be correlated across categories. Specialised software and careful model specification are required.

Zero-inflated models with random effects

Zero-inflated models combine a point mass at zero with a count distribution. Random effects can be added to the count part, the zero-inflation part, or both. This flexibility allows the model to capture clustering in both the probability of being a “structural zero” and in the count process among non-zeros.

Choosing Between Models for Count Data

When faced with overdispersed count data, consider: (1) Is the overdispersion due to unmeasured heterogeneity between known clusters? Use a Poisson GLMM. (2) Is the overdispersion due to general extra-Poisson variation without a clear clustering structure? A negative binomial may suffice. (3) Are there excess zeros beyond what either model predicts? Consider a zero-inflated model. Often, comparing model fit statistics (AIC, BIC) across competing models is the practical approach.

Section 2 Knowledge Check

1. In a Poisson GLMM with normal random effects on the log scale, the group-level effects on the rate scale are:

Since random effects enter on the log scale and the link function is log, exponentiating gives multiplicative random effects: μ = exp(Xβ + u) = exp(Xβ) × exp(u), where exp(u) follows a log-normal distribution.

2. The negative binomial distribution can be viewed as:

The negative binomial arises when the Poisson rate parameter itself has a gamma distribution across observations. This Poisson-gamma mixture produces overdispersion (variance > mean) commonly seen in count data.

3. Random effects can be added to proportional odds models for ordinal data by:

The proportional odds model is extended by adding random effects to the latent variables underlying the ordinal categories. The subject-specific interpretation and latent variable ICC methods from logistic regression apply similarly.

Reflection

A researcher finds that a Poisson model for disease counts across 50 communities has significant overdispersion. What are the relative advantages of addressing this with random effects versus using a negative binomial model?

Reflection saved!
* Complete the quiz and reflection to continue.
Section 3

Estimation Methods for GLMMs

⏱ Estimated time: 20 minutes

The Estimation Challenge

Unlike linear mixed models, the likelihood in a GLMM cannot be computed in closed form. The likelihood involves an integral over the random effects distribution that generally has no analytic solution. This is the fundamental computational challenge of GLMMs, and different estimation methods represent different strategies for handling this integral.

Maximum Likelihood (ML) Estimation

ML estimation is the gold standard for GLMMs. It uses Gauss-Hermite quadrature to numerically approximate the integral over the random effects. The integrand is evaluated at carefully chosen points (quadrature points), and a weighted sum provides the approximation.

Adaptive quadrature improves accuracy by centring and scaling the quadrature points based on the mode and curvature of each cluster’s contribution to the likelihood. The number of quadrature points controls accuracy: more points yield better approximations but require more computation. Default values are typically around 7.

ML estimation can be computationally intensive or unstable, especially with multiple random effects (where the dimensionality of integration grows) or with sparse data.

Quasi-Likelihood (QL) Estimation

Quasi-likelihood methods avoid numerical integration by using Taylor expansions to linearise the model, then applying iterative weighted least squares. Key variants include:

  • First-order vs. second-order Taylor expansion: second-order is more accurate
  • MQL (Marginal Quasi-Likelihood): omits random effect predictions from the working variate; gives PA interpretation
  • PQL (Penalised Quasi-Likelihood): includes random effect predictions; gives estimates closer to SS interpretation

PQL with second-order expansion is generally preferred among QL methods. However, simulation studies show that first-order MQL can be markedly biased, especially with large variance components or small cluster sizes.

Laplace Approximation

The Laplace approximation corresponds to adaptive quadrature with a single quadrature point—the lowest order of adaptive quadrature. It approximates the integral by a normal distribution centred at the mode of the integrand.

Laplace is intermediate in both accuracy and computational cost between full quadrature ML and quasi-likelihood. It is widely available in software (e.g., glmer in R uses Laplace by default) and provides a good starting point before increasing quadrature points.

MethodAccuracyComputationInterpretation
ML (Adaptive Quadrature)High (gold standard)High; increases with random effectsSS
Laplace ApproximationModerate (1 quad point)ModerateSS
PQL (2nd order)ModerateLowClose to SS
MQL (2nd order)LowerLowPA
MQL (1st order)Lowest; may be biasedLowestPA
Practical Recommendations

Check stability: Vary the number of quadrature points and confirm that estimates do not change substantially. Compare methods: Run both ML and QL and check agreement. Use caution: With small clusters, binary outcomes, or large random effects variances, simpler QL methods may be markedly biased—prefer ML or at least second-order PQL. Start simple: Begin with Laplace and increase quadrature points if feasible.

How Gauss-Hermite quadrature works

Gauss-Hermite quadrature approximates integrals of the form ∫ f(x) exp(-x²) dx by a weighted sum: ∑ wk f(xk), where xk are the quadrature points and wk are the corresponding weights. The points and weights are chosen to give exact results for polynomial integrands up to a certain degree. For GLMM likelihoods, more quadrature points provide better approximations to the non-polynomial integrand.

Why adaptive quadrature is better

Non-adaptive quadrature uses fixed points centred at zero. Adaptive quadrature shifts and scales the points to match the mode and curvature of each cluster’s integrand. This means fewer points are needed for the same accuracy, and the method is more robust when random effects are large or when clusters differ substantially.

When QL methods fail

Quasi-likelihood methods can produce seriously biased estimates when: (1) random effects variance is large, (2) cluster sizes are small (especially with binary data), (3) prevalence is extreme (close to 0 or 1), or (4) the model has crossed random effects. In these situations, ML estimation with sufficient quadrature points is strongly preferred.

Section 3 Knowledge Check

1. Gauss-Hermite quadrature in GLMM estimation is used to:

The likelihood in a GLMM involves integrating over the random effects distribution, which generally cannot be done analytically. Quadrature approximates this integral by a weighted sum evaluated at selected points.

2. Quasi-likelihood estimation methods:

QL methods are faster than ML because they avoid numerical integration, but they rely on Taylor approximations that can introduce bias, particularly when random effects variances are large or cluster sizes are small.

3. Increasing the number of quadrature points in ML estimation:

More quadrature points provide a better approximation to the integral, improving accuracy. However, computation time increases, especially with multiple random effects where the dimensionality of integration grows.

Reflection

You are fitting a 3-level logistic GLMM and the ML estimates are unstable. What steps would you take to diagnose the problem and what alternative estimation approaches might you consider?

Reflection saved!
* Complete the quiz and reflection to continue.
Section 4

Inference, Diagnostics & Other Random Effects Models

⏱ Estimated time: 15 minutes

Inference for Fixed and Random Effects

Testing and constructing confidence intervals in GLMMs involves the same general principles as in linear mixed models, but with additional complications related to the estimation method used.

For fixed effects, Wald-type tests (based on the estimate divided by its SE) are most common. However, Wald statistics can be unreliable when parameters are near boundary values. Likelihood-based inference (likelihood ratio tests and profile likelihood confidence intervals) is preferred when feasible, but requires ML estimation—not quasi-likelihood.

For random effect variances, the same boundary issues discussed for linear mixed models apply: variance parameters cannot be negative, so the usual chi-square distribution for LR tests may not be appropriate near zero.

Caution with Wald Tests Near Boundaries

Wald statistics assume the sampling distribution of the parameter estimate is approximately normal. This assumption breaks down when parameters are near boundary values (e.g., variance components near zero, or probabilities near 0 or 1). In such cases, confidence intervals based on Wald statistics can include impossible values, and p-values may be misleading. Likelihood-based methods are more reliable.

Alternative Random Effects Models

📊
GLMM (Standard)
Click to learn more
📈
Beta-Binomial
Click to learn more
📉
Negative Binomial
Click to learn more
ApproachLikelihoodMulti-LevelIndividual PredictorsComputational Cost
GLMM (normal RE)Requires integrationYesYesHigh
Beta-binomialClosed formNoLimitedLow
Negative binomialClosed formWith extensionsYesLow to moderate

Practical Guidance

When to use the beta-binomial

The beta-binomial model is most appropriate when you have grouped binary data (e.g., proportion of animals testing positive in each herd) with only group-level predictors. Its closed-form likelihood makes it computationally simple, and it directly estimates the ICC. However, it cannot accommodate individual-level predictors or multiple hierarchical levels.

Simulation evidence on estimation methods

Simulation studies consistently show that first-order MQL can be markedly biased, underestimating both fixed effects and variance components. Second-order PQL performs better but can still be biased with small cluster sizes or large variances. ML with adaptive quadrature is generally the most accurate, though computational constraints may require starting with Laplace approximation.

Choosing the right estimation method

Start with Laplace (the default in many packages). If feasible, increase to adaptive quadrature with 7+ points and check that estimates are stable. If ML is computationally prohibitive (e.g., many random effects), use second-order PQL and compare results with Laplace. Always report which estimation method was used and, ideally, show sensitivity to the choice.

Section 4 Knowledge Check

1. Wald-type statistics for GLMM parameters:

Wald statistics assume the sampling distribution is approximately normal, which breaks down when parameters are near boundaries (e.g., variance parameters near zero), making likelihood-based inference preferable when available.

2. The beta-binomial model:

The beta-binomial model’s main advantage is its closed-form likelihood, which avoids the computational challenges of GLMMs. However, it is limited to grouped/replicated binary data at the lowest level.

3. First-order quasi-likelihood estimates in GLMMs:

Simulation studies have shown that first-order QL (especially MQL) can produce substantially biased estimates. Second-order PQL and ML estimation are preferred for more accurate results.

Reflection

Compare the GLMM approach to the beta-binomial approach for modeling clustered binary data. In what situations would each be preferred?

Reflection saved!
* Complete the quiz and reflection to continue.
Final Assessment

Lesson 10 — Comprehensive Assessment

⏱ Estimated time: 25 minutes

This final assessment covers all material from this lesson. You must answer all 15 questions correctly (100%) and complete the final reflection to finish the lesson.

Final Reflection

Reflecting on this lesson, what do you see as the biggest challenges in fitting GLMMs for discrete data compared to linear mixed models for continuous data? How do the estimation challenges affect your confidence in the results?

Reflection saved!

Final Assessment (15 Questions)

1. A GLMM extends a GLM by:

GLMMs add random effects (e.g., random intercepts, random slopes) to the linear predictor of a GLM, allowing for correlation within clusters.

2. In a logistic GLMM, subject-specific (SS) odds ratios are:

Due to non-collapsibility of the odds ratio, conditioning on the random effects (SS interpretation) yields larger coefficient magnitudes compared to marginalizing over them (PA interpretation).

3. The conversion formula βPA ≈ βSS / √(1 + 0.346σ²g) applies to:

This specific formula (with constant 0.346) applies to logistic models. For probit models, the constant changes to 1. The formula converts between SS and PA coefficient estimates.

4. The Median Odds Ratio (MOR) equals 1 when:

MOR = exp(√(2σ²g) × 0.6745). When σ²g = 0, MOR = exp(0) = 1, indicating no heterogeneity between clusters in the odds of the outcome.

5. The latent variable ICC for binary data uses π²/3 because:

In the latent variable interpretation, the individual-level error follows a logistic distribution with variance π²/3 ≈ 3.29. This serves as the “within-cluster” variance on the latent scale.

6. In a Poisson GLMM, random effects on the log scale translate to:

Because exp(a + b) = exp(a) × exp(b), random effects that are additive on the log scale become multiplicative on the original rate scale.

7. The connection between Poisson random effects and the negative binomial is:

When the Poisson rate parameter follows a gamma distribution across units, the marginal distribution of counts is negative binomial. This is the Poisson-gamma mixture interpretation.

8. Gauss-Hermite quadrature:

Quadrature approximates the integral over random effects by evaluating the integrand at carefully chosen points and taking a weighted sum. More points give better approximations but at higher computational cost.

9. Adaptive quadrature differs from non-adaptive by:

Adaptive quadrature centers and scales the quadrature points based on the mode and curvature of each cluster’s contribution to the likelihood, improving accuracy, especially when random effects are large.

10. MQL quasi-likelihood gives estimates with:

MQL (Marginal Quasi-Likelihood) omits the random effects predictions from the adjusted variate, yielding parameter estimates that have a PA interpretation, unlike PQL which is closer to SS.

11. First-order quasi-likelihood compared to second-order:

First-order QL uses a cruder Taylor approximation, leading to more bias than second-order methods. Second-order PQL is generally recommended when QL methods are used.

12. The Laplace approximation in GLMM estimation:

The Laplace approximation is equivalent to adaptive quadrature with a single point. It is intermediate in accuracy and computation between full quadrature ML and quasi-likelihood.

13. The beta-binomial model’s main limitation is:

While computationally attractive, the beta-binomial model is essentially for grouped binary data and doesn’t naturally include individual-level predictors or extend to multi-level hierarchies.

14. When ML estimation for a GLMM is computationally unstable, one should:

Instability in ML estimation should be investigated by checking sensitivity to starting values and quadrature settings, and comparing results with alternative estimation methods to assess robustness.

15. For a probit GLMM, the SS-to-PA conversion formula uses the constant:

For probit regression, βPA ≈ βSS / √(1 + σ²g), where the constant is 1 (instead of 0.346 for logistic regression). This is because the probit model uses the normal distribution whose variance is 1.

Lesson 10 Complete!

Congratulations! You have successfully completed the Mixed Models for Discrete Data module. Your responses have been downloaded automatically.