HSCI 410 — Lesson 11

Mixed Models for Discrete Data

Exploratory Data Analysis For Epidemiology

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Understand the generalised linear mixed model (GLMM) framework for discrete outcomes
  • Write and interpret logistic regression models with random effects
  • Distinguish between subject-specific (SS) and population-averaged (PA) interpretations
  • Calculate the median odds ratio (MOR) and latent variable ICC for binary outcomes
  • Apply mixed models to count data using Poisson regression with random effects
  • Understand estimation challenges in GLMMs including ML, quasi-likelihood, and Laplace approximation
  • Extend mixed models to ordinal, multinomial, and other discrete outcome types
  • Evaluate when different estimation methods are appropriate and their trade-offs

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas
Generalized linear mixed model (GLMM)An extension of the linear mixed model to non-normal outcomes (binary, count, ordinal, multinomial). Combines a generalized linear model (link function + exponential-family distribution) with random effects to handle clustered data.
Link functionA monotonic function that connects the linear predictor to the mean of the response (e.g., logit for binary, log for counts). Determines the scale on which fixed and random effects act additively.
Conditional (subject- or cluster-specific) interpretationThe effect of a covariate within a particular cluster, holding the random effect fixed. GLMM coefficients are conditional by construction.
Marginal (population-averaged) interpretationThe effect of a covariate averaged over the random-effects distribution. Differs from the conditional effect for non-linear link functions; estimated directly by GEE.
Attenuation (marginal vs conditional)The phenomenon that, with non-linear links (e.g., logit), marginal coefficients are smaller in magnitude than conditional ones. The discrepancy grows with the random-effects variance.
Random-intercept logistic regressionThe simplest GLMM for binary data: cluster-specific intercepts on the logit scale, draws from N(0, σ²u); fixed effects shared across clusters.
Latent-variable formulationAn alternative way to derive logistic and probit GLMMs by assuming an unobserved continuous outcome with a threshold. Useful for deriving ICCs on the latent scale.
OverdispersionMore variability in the data than the assumed distribution permits (e.g., greater than Poisson variance). In clustered settings, overdispersion is often a sign of unmodeled clustering or omitted covariates.
Random-effects variance σ²u (binary)The variance of cluster-specific random intercepts on the logit scale. Larger values indicate more heterogeneity between clusters and produce greater divergence between marginal and conditional effects.
Median Odds Ratio (MOR)A summary of cluster-level heterogeneity in a logistic GLMM, defined as the median odds ratio for the outcome between two randomly selected clusters. Useful translation of σ²u into the OR scale.
Ordinal / multinomial mixed modelsGLMM extensions for ordered or unordered categorical responses (e.g., proportional-odds mixed model, multinomial logit mixed model). Random effects are typically shared across categories.
Methods & Statistical Concepts
Generalized estimating equations (GEE)A marginal-model approach for clustered data (Liang & Zeger, 1986; Zeger, Liang, & Albert, 1988). Specifies a working correlation structure and uses sandwich variance estimators; estimates have population-average interpretation and are robust to misspecification of the correlation.
Laplace approximationA method for approximating the integral over random effects in GLMM likelihoods. Fast and reasonably accurate for many problems but can be biased when random-effects variance is large or cluster sizes are small.
Adaptive Gaussian quadratureA more accurate (but slower) alternative to Laplace for evaluating GLMM likelihoods. Increasing the number of quadrature points improves accuracy at the cost of computation; standard for high-stakes inference.
Penalized quasi-likelihood (PQL)An older GLMM estimation method that linearizes the model around current estimates. Computationally fast but biased for binary outcomes with small clusters; largely superseded by Laplace and quadrature methods.
Bayesian GLMMA GLMM fit using MCMC (e.g., via Stan, brms, or JAGS). Avoids approximation by sampling from the posterior; especially useful when random-effects structure is complex or when interest is in cluster-level effects.
Working correlation structure (GEE)A user-specified guess at the within-cluster correlation pattern (independence, exchangeable, AR1, unstructured). Need not be correct for valid GEE inference but affects efficiency.
Sandwich (robust) variance estimatorA variance estimator that gives valid standard errors even when the working correlation is misspecified. Central to GEE; can also be applied to mixed-model fits.
QIC (Quasi-likelihood under the Independence model Criterion)An information criterion for selecting GEE models. Plays a role analogous to AIC for likelihood-based methods, since GEE has no full likelihood.
ICC on the latent scaleThe intraclass correlation for a logistic random-intercept model derived using the latent-variable formulation: ICC = σ²u / (σ²u + π²/3).
Convergence diagnostics (GLMM)Checks that an iterative GLMM estimator has converged: gradient near zero, positive-definite Hessian, no boundary variance estimates. Failures often signal small clusters, sparse data, or model misspecification.
Key People
Kung-Yee Liang & Scott ZegerJohns Hopkins biostatisticians who introduced generalized estimating equations (GEE) in 1986. Their framework supplied the marginal alternative to GLMMs for clustered discrete data.
Norman Breslow & David ClaytonCo-authors of the influential 1993 paper on penalized quasi-likelihood (PQL) estimation for GLMMs, which spurred widespread use of mixed models for binary and count data.
John Nelder (1924–2010)British statistician who, with Robert Wedderburn, formulated the generalized linear model (GLM) framework that GLMMs extend. Also a key figure in the development of GENSTAT software.
Douglas Bates (b. 1949)Lead author of R's lme4 package, which provides Laplace and adaptive quadrature fitting of GLMMs and is the most widely used GLMM software in applied research.
Peter Diggle (b. 1950)British statistician known for his contributions to longitudinal and spatial data analysis. Co-author with Liang and Zeger of the standard text Analysis of Longitudinal Data.
No matching entries. Try a different search term.
Section 1

Introduction & Logistic Regression with Random Effects

⏱ Estimated time: 20 minutes

Introduction and Overview

Where this lesson fits

Lesson 10 developed mixed models for continuous outcomes — random intercepts, random slopes, contextual effects, BLUPs, and diagnostics. Lesson 11 extends every one of those ideas to discrete outcomes: binary (logistic), count (Poisson and negative binomial), ordinal, and multinomial. The framework you built last lesson carries over almost verbatim, but the link function, the likelihood, and the interpretation of effects all shift — and a new question becomes important: do you want a conditional (subject-specific) effect or a marginal (population-average) one?

The four content sections progress from the simplest extension to the practical realities of fitting and validating GLMMs. Section 1 introduces the GLMM by way of logistic regression with a random intercept — the discrete analogue of the random-intercept linear mixed model. Section 2 generalises to count, ordinal, and categorical outcomes, and confronts the conditional-vs-marginal interpretation head on. Section 3 turns to estimation: penalised quasi-likelihood, Laplace approximation, adaptive Gauss–Hermite quadrature, MCMC — the menu of techniques used because no closed-form likelihood exists. Section 4 closes with inference, diagnostics, and a tour of related random-effects extensions (random-effects survival models, latent-variable formulations) that connect this lesson to the rest of the course.

Learning Objectives

  • Specify a logistic GLMM with a random intercept and identify the role of the link function on the linear predictor.
  • Distinguish subject-specific (conditional) from population-averaged (marginal) effects in non-linear mixed models.
  • Apply the SS-to-PA conversion formula for logistic models and explain why the two coefficients differ.
  • Interpret measures of cluster heterogeneity (median odds ratio, ICC on the latent scale) in a binary GLMM.

The Generalised Linear Mixed Model (GLMM)

Generalised linear mixed models (GLMMs) extend generalised linear models (GLMs) by adding random effects to the linear predictor (Stiratelli, Laird, & Ware, 1984; Breslow & Clayton, 1993). Just as linear mixed models allowed us to handle clustered continuous data, GLMMs handle clustered discrete data—binary, count, ordinal, and multinomial outcomes—while accounting for the correlation within clusters (Bolker et al., 2009).

Key Concept

The core idea of a GLMM is straightforward: take the linear predictor from a GLM and add random effects. For a logistic regression with a random intercept, the model becomes: logit(pij) = β0 + β1X1ij + ugroup(i), where ugroup(i) ~ N(0, σ²g). The random effect u captures between-cluster variation in the log-odds of the outcome.

Subject-Specific vs. Population-Averaged Interpretation

A critical distinction in GLMMs for discrete data is between subject-specific (SS) and population-averaged (PA) interpretations of the model parameters (Zeger, Liang, & Albert, 1988; Heagerty & Zeger, 2000). This distinction does not arise in linear mixed models but is fundamental for logistic and other non-linear models.

👤
Subject-Specific (SS)
Click to learn more
👥
Population-Averaged (PA)
Click to learn more

Because of the non-collapsibility of the odds ratio, SS coefficients are always larger in magnitude than PA coefficients. The approximate conversion formula is:

SS-to-PA Conversion for Logistic Models (Eq 22.2)
βPA ≈ βSS / √(1 + 0.346 × σ²g)
Comparing SS and PA Interpretations

Consider a study of antibiotic resistance in cattle herds. A logistic GLMM with a random intercept for herd yields βSS = 0.85 for a treatment variable, with σ²g = 1.2. The SS odds ratio is exp(0.85) = 2.34—within any given herd, treatment doubles the odds of resistance. The median odds ratio (Larsen & Merlo, 2005) provides a complementary summary of between-herd heterogeneity.

Converting to PA: βPA ≈ 0.85 / √(1 + 0.346 × 1.2) = 0.85 / 1.19 = 0.71. The PA odds ratio is exp(0.71) = 2.03—across the population of herds, the average effect is smaller. The SS estimate is larger because it conditions on a specific cluster.

Measures of Cluster Heterogeneity

Median Odds Ratio (MOR)

The Median Odds Ratio quantifies between-cluster heterogeneity on the odds ratio scale. If you randomly select two clusters and compare a person from the higher-risk cluster to a person from the lower-risk cluster (with the same covariates), the MOR is the median of that odds ratio distribution.

Median Odds Ratio (Eq 22.5)
MOR = exp(√(2σ²g) × 0.6745)

An MOR of 1 means no between-cluster variation. Larger values indicate greater heterogeneity. The MOR allows direct comparison of the importance of cluster-level variation relative to fixed-effect odds ratios.

ICC for Binary Data

For binary outcomes, the latent variable ICC uses the fact that the individual-level variance on the logistic scale is π²/3 ≈ 3.29:

Latent Variable ICC for Logistic Models
ρ = σ²g / (σ²g + π²/3)

This provides an estimate of the proportion of total variance (on the latent scale) attributable to between-cluster differences.

The Latent Variable Approach

The binary outcome can be thought of as the result of thresholding a continuous latent variable. If the latent variable exceeds a threshold, the outcome is 1; otherwise, it is 0. In the logistic model, the individual-level error follows a standard logistic distribution with variance π²/3. This framework provides the basis for both the ICC calculation and the SS-to-PA conversion.

Knowledge Check — Section 1

1. In a logistic GLMM, subject-specific (SS) coefficients are:

Due to the non-collapsibility of the odds ratio, SS (conditional) coefficients are always larger in magnitude than PA (marginal) coefficients. The SS interpretation is conditional on the random effect being at its mean value.

2. The Median Odds Ratio (MOR) measures:

The MOR quantifies how much the odds of the outcome vary between two randomly chosen clusters. An MOR of 1 indicates no between-cluster variation; larger values indicate greater heterogeneity.

3. The ICC for binary outcomes using the latent variable approach uses:

For logistic models, the latent variable approach assumes the individual-level variance on the logit scale is π²/3, so ICC = σ²g / (σ²g + π²/3).

Reflection

Why does the distinction between subject-specific and population-averaged estimates matter in practice? Think of a scenario where you would prefer one interpretation over the other.

Model answerSubject-specific vs. population-averaged interpretations differ in non-linear models like logistic. Subject-specific (GLMM): "holding cluster membership and other covariates constant, this is the within-person/within-cluster effect." Population-averaged (GEE): "averaging over the distribution of cluster-specific intercepts, this is the marginal effect on the population." Preference depends on the question: a clinician treating individual patients wants subject-specific ("if I prescribe this to my patient, what's the OR change?"); a public-health planner allocating resources across populations wants population-averaged ("if we roll out this programme, what's the population-level effect?"). The two coincide for linear-link models but diverge for logit/log links, with population-averaged effects closer to the null because of the non-linearity. Reporting both with explicit interpretation labels is best practice.
Reflection saved!
* Complete the quiz and reflection to continue.
Section 2

GLMMs for Count, Binary & Categorical Data

⏱ Estimated time: 20 minutes

Introduction and Overview

From logistic to the full discrete-outcome family. Section 1 used logistic regression with a random intercept to introduce the GLMM. This section extends the framework outward to count outcomes (Poisson and negative binomial GLMMs — revisiting the count-data tools from Lesson 7 with clustering layered on top), ordinal outcomes (cumulative-link mixed models — Lesson 6 with random effects), and multinomial outcomes. Throughout, the conceptual difference between subject-specific and population-average effects becomes central: in non-linear models the two are not the same, and which one a stakeholder cares about should drive your modelling choice.

Learning Objectives

  • Fit Poisson and negative-binomial GLMMs and interpret random effects as multiplicative cluster-level rate modifiers.
  • Choose between normal and gamma random effects, and recognise when negative-binomial GLMMs handle residual overdispersion.
  • Extend the GLMM framework to ordinal (cumulative-link mixed) and multinomial outcomes.
  • Decide between conditional and marginal effects depending on the substantive question and stakeholder.

Poisson Regression with Random Effects

For count outcomes, Poisson regression with random effects models the log of the expected count as a function of predictors plus random effects:

Poisson GLMM with Random Intercept
log(μij) = β0 + β1X1ij + ugroup(i)

Because the random effects enter on the log scale, they translate to multiplicative effects on the rate scale: μ = exp(Xβ) × exp(u). The group-level multipliers exp(u) follow a log-normal distribution when u is normally distributed.

Poisson with Normal Random Effects

The standard GLMM approach uses normally distributed random effects on the log scale. The group-level effects νgroup = exp(ugroup) are then log-normally distributed. This is the most common parameterisation in software.

This approach is flexible and can accommodate multiple levels of random effects, random slopes, and complex covariance structures—just like the linear mixed model framework.

Poisson with Log-Gamma Random Effects

An alternative parameterisation uses gamma-distributed random effects on the rate scale (equivalently, log-gamma on the log scale). When u follows a log-gamma distribution, the marginal distribution of counts has a convenient closed form.

This connection leads directly to the negative binomial distribution: a Poisson model with gamma-distributed rates yields negative binomial counts.

Negative Binomial as a Random Effects Model

The negative binomial distribution can be interpreted as a Poisson-gamma mixture: counts follow a Poisson distribution, but the rate varies across units according to a gamma distribution. This naturally accounts for overdispersion (variance > mean).

The negative binomial can be extended with additional random effects at higher levels, combining the overdispersion correction with explicit hierarchical structure.

ModelRandom Effect DistributionMarginal DistributionKey Feature
Poisson + Normal RENormal on log scaleNo closed formStandard GLMM; flexible
Poisson + Gamma REGamma on rate scaleNegative binomialClosed-form marginal
Negative BinomialImplicit gammaNegative binomialHandles overdispersion
R Activity — Logistic GLMM and GEE on the same clustered data

The clustered dataset phaa_clinics.csv (carried forward from Lessons 9-10) has a binary outcome referred (1 = referred to a specialist). Same predictors as before but a binary outcome means a logistic GLMM. The full annotated script is in r-activities/HSCI_410_Lesson_11_Mixed_Models_for_Discrete_Data.R.

library(lme4);  library(performance);  library(geepack)
clinics <- read.csv("phaa_clinics.csv", stringsAsFactors = FALSE)
clinics$clinic_id    <- factor(clinics$clinic_id)
clinics$clinic_urban <- factor(clinics$clinic_urban,
                                  levels = c("rural","urban"))
clinics$smoker       <- factor(clinics$smoker, levels = c("No","Yes"))

# 1. Logistic GLMM (subject-specific)
m_glmm <- glmer(referred ~ age + female + smoker + bmi + clinic_urban
                            + (1 | clinic_id),
                data    = clinics,
                family  = binomial,
                control = glmerControl(optimizer = "bobyqa"))
summary(m_glmm)
exp(fixef(m_glmm))                     # subject-specific ORs
icc(m_glmm)                            # latent-scale ICC

# 2. GEE (population-averaged)
m_gee <- geeglm(referred ~ age + female + smoker + bmi + clinic_urban,
                id     = clinic_id, data = clinics,
                family = binomial, corstr = "exchangeable")
summary(m_gee)
exp(coef(m_gee))                       # population-averaged ORs

# 3. Compare side-by-side
cbind(GLMM_OR = exp(fixef(m_glmm)),
      GEE_OR  = exp(coef(m_gee)))

Subject-specific vs population-averaged. glmer() returns subject-specific ORs ("within the same clinic, two patients differ by ..."). GEE returns population-averaged ORs ("across the population, the average difference is ..."). For non-linear link functions like logit they are NOT the same; pick the one that matches your scientific question.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console / plot before answering.

1. From exp(fixef(m_glmm)) and exp(confint(m_glmm, method = "Wald")), report the subject-specific OR (and 95% CI) for smokerYes. Translate it into one sentence that explicitly conditions on the clinic (e.g., "within the same clinic, two patients...").

Model answerexp(fixef(m_glmm)) for smokerYes typically returns a subject-specific OR around 1.65, 95% CI roughly (1.30, 2.10). Interpretation, conditioning explicitly: within the same clinic, two patients who differ only in smoking status (one smoker, one non-smoker) differ in their odds of the outcome by a factor of about 1.65. This is a conditional, subject-specific effect — the comparison is between hypothetical patients in the same random-effect group.

2. From icc(m_glmm), report the latent-scale ICC. Why does the binary ICC require a special formula (rather than just sigma^2_u / total variance like in linear mixed models)?

Model answericc(m_glmm) typically returns a latent-scale ICC around 0.10–0.15. The binary ICC requires a special formula because the residual variance on the binary scale is the logit's variance π²/3 ≈ 3.29 by convention (not estimated like in linear mixed models); the latent-scale ICC is σ²u / (σ²u + 3.29). In linear mixed models you have an estimable residual variance σ², but in logistic GLMMs the residual variance on the underlying continuous (latent) scale is fixed by the link function, so the ICC depends only on between-cluster variance.

3. Compare GLMM ORs vs GEE ORs from cbind(GLMM_OR, GEE_OR). Which set of ORs is larger in magnitude, and why is that expected with a logit link? In one sentence, state when a public-health audience would prefer the GEE interpretation over the GLMM.

Model answerGLMM ORs are larger in magnitude than GEE ORs — for example, GLMM OR = 1.65 might correspond to GEE OR = 1.45 for the same data. This is expected with the logit link: GLMM gives subject-specific (conditional) effects — the effect within a cluster, holding the cluster's random effect constant; GEE gives population-averaged (marginal) effects — the effect averaged across the distribution of cluster random effects, which is closer to 1 due to the non-linearity of the logit. A public-health audience usually prefers GEE when the question is population-level (e.g., "if we ran this intervention across all clinics, what change in odds would we see?"); the GLMM is preferred for clinical decision-making within a specific clinic.
Saved.

GLMMs for Other Discrete Outcomes

Binary data: alternative link functions

While the logit link is most common for binary data, probit and complementary log-log links can also be used in GLMMs. For probit models, the SS-to-PA conversion uses a constant of 1 instead of 0.346: βPA ≈ βSS / √(1 + σ²g). This is because the probit model uses the normal distribution, whose variance is 1.

Ordinal data: proportional odds with random effects

The proportional odds model for ordinal outcomes can be extended by adding random effects to the latent variable underlying the ordinal categories. The subject-specific interpretation and the latent variable ICC methods from logistic regression apply similarly. Random effects capture cluster-level variation in the propensity to be in higher or lower categories.

Multinomial data: random effects models

Random effects multinomial logistic models are less common and harder to estimate. The computational burden increases because each category (beyond the reference) has its own set of parameters, and the random effects may need to be correlated across categories. Specialised software and careful model specification are required.

Zero-inflated models with random effects

Zero-inflated models combine a point mass at zero with a count distribution. Random effects can be added to the count part, the zero-inflation part, or both. This flexibility allows the model to capture clustering in both the probability of being a “structural zero” and in the count process among non-zeros.

Choosing Between Models for Count Data

When faced with overdispersed count data, consider: (1) Is the overdispersion due to unmeasured heterogeneity between known clusters? Use a Poisson GLMM. (2) Is the overdispersion due to general extra-Poisson variation without a clear clustering structure? A negative binomial may suffice. (3) Are there excess zeros beyond what either model predicts? Consider a zero-inflated model. Often, comparing model fit statistics (AIC, BIC) across competing models is the practical approach.

Knowledge Check — Section 2

1. In a Poisson GLMM with normal random effects on the log scale, the group-level effects on the rate scale are:

Since random effects enter on the log scale and the link function is log, exponentiating gives multiplicative random effects: μ = exp(Xβ + u) = exp(Xβ) × exp(u), where exp(u) follows a log-normal distribution.

2. The negative binomial distribution can be viewed as:

The negative binomial arises when the Poisson rate parameter itself has a gamma distribution across observations. This Poisson-gamma mixture produces overdispersion (variance > mean) commonly seen in count data.

3. Random effects can be added to proportional odds models for ordinal data by:

The proportional odds model is extended by adding random effects to the latent variables underlying the ordinal categories. The subject-specific interpretation and latent variable ICC methods from logistic regression apply similarly.

Reflection

A researcher finds that a Poisson model for disease counts across 50 communities has significant overdispersion. What are the relative advantages of addressing this with random effects versus using a negative binomial model?

Model answerBoth random effects and negative binomial address overdispersion, but through different mechanisms. Random effects: introduces community-level random intercepts to capture between-community variation; preferred when there is a clear hierarchical structure (communities), you want to estimate community-level variance, and you want to make predictions for new communities. Useful when overdispersion arises from community-level heterogeneity. Negative binomial: introduces a single overdispersion parameter that inflates the variance uniformly; preferred when overdispersion is “unstructured” (no clear hierarchical source), the sample is small, or computational simplicity matters. Often the best practice is to use both: a negative binomial GLMM, which combines hierarchical random effects with individual-level overdispersion. Compare AIC and residual diagnostics; report sensitivity to the choice.
Reflection saved!
* Complete the quiz and reflection to continue.
Section 3

Estimation Methods for GLMMs

⏱ Estimated time: 20 minutes

Introduction and Overview

Why estimation deserves its own section. Sections 1 and 2 specified what a GLMM is and what its coefficients mean. This section focuses on a problem that linear mixed models did not face: the likelihood for a GLMM has no closed form. The integral over the random-effects distribution must be approximated, and the choice of approximation matters — penalised quasi-likelihood is fast but biased (Breslow & Clayton, 1993), the Laplace approximation is the practical default (Bates, Mächler, Bolker, & Walker, 2015), and adaptive Gauss–Hermite quadrature (Pinheiro & Chao, 2006) or MCMC become necessary when high accuracy is required. Knowing which method your software is using (and when it will fail) is essential for trustworthy GLMM inference, especially with rare events or strongly clustered data.

Learning Objectives

  • Explain why the GLMM likelihood has no closed form and requires integration over the random-effects distribution.
  • Compare penalised quasi-likelihood (PQL), Laplace approximation, and adaptive Gauss–Hermite quadrature in terms of speed, bias, and accuracy.
  • Recognise the conditions (rare events, small clusters, strong clustering) under which PQL is unreliable.
  • Identify when Bayesian / MCMC estimation is the most defensible choice for a GLMM.
  • Read software output critically — tying estimator choice to the trustworthiness of standard errors and p-values.

The Estimation Challenge

Unlike linear mixed models, the likelihood in a GLMM cannot be computed in closed form. The likelihood involves an integral over the random effects distribution that generally has no analytic solution. This is the fundamental computational challenge of GLMMs, and different estimation methods represent different strategies for handling this integral.

Maximum Likelihood (ML) Estimation

ML estimation is the gold standard for GLMMs (Pinheiro & Chao, 2006). It uses Gauss-Hermite quadrature to numerically approximate the integral over the random effects. The integrand is evaluated at carefully chosen points (quadrature points), and a weighted sum provides the approximation.

Adaptive quadrature improves accuracy by centring and scaling the quadrature points based on the mode and curvature of each cluster’s contribution to the likelihood. The number of quadrature points controls accuracy: more points yield better approximations but require more computation. Default values are typically around 7.

ML estimation can be computationally intensive or unstable, especially with multiple random effects (where the dimensionality of integration grows) or with sparse data.

Quasi-Likelihood (QL) Estimation

Quasi-likelihood methods avoid numerical integration by using Taylor expansions to linearise the model, then applying iterative weighted least squares. Key variants include:

  • First-order vs. second-order Taylor expansion: second-order is more accurate
  • MQL (Marginal Quasi-Likelihood): omits random effect predictions from the working variate; gives PA interpretation
  • PQL (Penalised Quasi-Likelihood): includes random effect predictions; gives estimates closer to SS interpretation

PQL with second-order expansion is generally preferred among QL methods (Breslow & Clayton, 1993). However, simulation studies show that first-order MQL can be markedly biased, especially with large variance components or small cluster sizes (Bolker et al., 2009).

Laplace Approximation

The Laplace approximation corresponds to adaptive quadrature with a single quadrature point—the lowest order of adaptive quadrature. It approximates the integral by a normal distribution centred at the mode of the integrand.

Laplace is intermediate in both accuracy and computational cost between full quadrature ML and quasi-likelihood. It is widely available in software (e.g., glmer in R uses Laplace by default; Bates et al., 2015) and provides a good starting point before increasing quadrature points (Pinheiro & Chao, 2006).

MethodAccuracyComputationInterpretation
ML (Adaptive Quadrature)High (gold standard)High; increases with random effectsSS
Laplace ApproximationModerate (1 quad point)ModerateSS
PQL (2nd order)ModerateLowClose to SS
MQL (2nd order)LowerLowPA
MQL (1st order)Lowest; may be biasedLowestPA

Practical Recommendations

Check stability: Vary the number of quadrature points and confirm that estimates do not change substantially. Compare methods: Run both ML and QL and check agreement. Use caution: With small clusters, binary outcomes, or large random effects variances, simpler QL methods may be markedly biased—prefer ML or at least second-order PQL. Start simple: Begin with Laplace and increase quadrature points if feasible.

How Gauss-Hermite quadrature works

Gauss-Hermite quadrature approximates integrals of the form ∫ f(x) exp(-x²) dx by a weighted sum: ∑ wk f(xk), where xk are the quadrature points and wk are the corresponding weights. The points and weights are chosen to give exact results for polynomial integrands up to a certain degree. For GLMM likelihoods, more quadrature points provide better approximations to the non-polynomial integrand.

Why adaptive quadrature is better

Non-adaptive quadrature uses fixed points centred at zero. Adaptive quadrature shifts and scales the points to match the mode and curvature of each cluster’s integrand. This means fewer points are needed for the same accuracy, and the method is more robust when random effects are large or when clusters differ substantially.

When QL methods fail

Quasi-likelihood methods can produce seriously biased estimates when: (1) random effects variance is large, (2) cluster sizes are small (especially with binary data), (3) prevalence is extreme (close to 0 or 1), or (4) the model has crossed random effects. In these situations, ML estimation with sufficient quadrature points is strongly preferred.

Knowledge Check — Section 3

1. Gauss-Hermite quadrature in GLMM estimation is used to:

The likelihood in a GLMM involves integrating over the random effects distribution, which generally cannot be done analytically. Quadrature approximates this integral by a weighted sum evaluated at selected points.

2. Quasi-likelihood estimation methods:

QL methods are faster than ML because they avoid numerical integration, but they rely on Taylor approximations that can introduce bias, particularly when random effects variances are large or cluster sizes are small.

3. Increasing the number of quadrature points in ML estimation:

More quadrature points provide a better approximation to the integral, improving accuracy. However, computation time increases, especially with multiple random effects where the dimensionality of integration grows.

Reflection

You are fitting a 3-level logistic GLMM and the ML estimates are unstable. What steps would you take to diagnose the problem and what alternative estimation approaches might you consider?

Model answerSteps to diagnose unstable ML estimates in a 3-level logistic GLMM: (a) check convergence — many software packages report convergence warnings; non-convergence often indicates the model is over-parameterised for the data. (b) Inspect random-effects variances — variances near zero or huge suggest unidentifiable parameters. (c) Examine cluster sizes — if some clusters have very few observations, the random effects for those clusters are poorly estimated and can destabilise the model. (d) Run profile likelihoods for variance parameters. Alternative estimation: Bayesian estimation (Stan, brms) with weakly informative priors stabilises variance estimates; Laplace approximation with fewer adaptive quadrature points may converge when full Laplace doesn't; penalised likelihood approaches; simplification by removing one level of nesting if data don't support it; centering covariates can help with convergence in models with random slopes.
Reflection saved!
* Complete the quiz and reflection to continue.
Section 4

Inference, Diagnostics & Other Random Effects Models

⏱ Estimated time: 15 minutes

Introduction and Overview

Closing the loop on GLMMs. Sections 1–3 covered specification, interpretation, and estimation. This final section addresses the questions that finish the workflow: how do we test fixed effects and variance components when boundaries and approximations complicate standard tests? what residuals make sense in a non-linear mixed model, and how do we read them? and how does the random-effects framework extend to other discrete-outcome problems — random-effects survival models, latent-variable formulations, and zero-inflated count models — that you will see in applied research. The emphasis here is on becoming a critical user of GLMM software output rather than a passive consumer of p-values.

Learning Objectives

  • Choose between Wald, likelihood-ratio, and profile-likelihood inference for fixed effects in a GLMM.
  • Apply boundary corrections when testing variance components, and recognise when Wald tests near boundaries are misleading.
  • Interpret Pearson and deviance residuals in the GLMM setting and use them for cluster-level diagnostics.
  • Connect GLMMs to related random-effects extensions: survival, latent-variable, and zero-inflated count models.
  • Read GLMM software output critically and identify when reported quantities are unreliable.

Inference for Fixed and Random Effects

Testing and constructing confidence intervals in GLMMs involves the same general principles as in linear mixed models, but with additional complications related to the estimation method used.

For fixed effects, Wald-type tests (based on the estimate divided by its SE) are most common. However, Wald statistics can be unreliable when parameters are near boundary values. Likelihood-based inference (likelihood ratio tests and profile likelihood confidence intervals) is preferred when feasible, but requires ML estimation—not quasi-likelihood.

For random effect variances, the same boundary issues discussed for linear mixed models apply: variance parameters cannot be negative, so the usual chi-square distribution for LR tests may not be appropriate near zero.

Caution with Wald Tests Near Boundaries

Wald statistics assume the sampling distribution of the parameter estimate is approximately normal. This assumption breaks down when parameters are near boundary values (e.g., variance components near zero, or probabilities near 0 or 1). In such cases, confidence intervals based on Wald statistics can include impossible values, and p-values may be misleading. Likelihood-based methods are more reliable.

Alternative Random Effects Models

📊
GLMM (Standard)
Click to learn more
📈
Beta-Binomial
Click to learn more
📉
Negative Binomial
Click to learn more
ApproachLikelihoodMulti-LevelIndividual PredictorsComputational Cost
GLMM (normal RE)Requires integrationYesYesHigh
Beta-binomialClosed formNoLimitedLow
Negative binomialClosed formWith extensionsYesLow to moderate

Practical Guidance

When to use the beta-binomial

The beta-binomial model is most appropriate when you have grouped binary data (e.g., proportion of animals testing positive in each herd) with only group-level predictors. Its closed-form likelihood makes it computationally simple, and it directly estimates the ICC. However, it cannot accommodate individual-level predictors or multiple hierarchical levels.

Simulation evidence on estimation methods

Simulation studies consistently show that first-order MQL can be markedly biased, underestimating both fixed effects and variance components (Breslow & Clayton, 1993; Bolker et al., 2009). Second-order PQL performs better but can still be biased with small cluster sizes or large variances. ML with adaptive quadrature is generally the most accurate (Pinheiro & Chao, 2006), though computational constraints may require starting with Laplace approximation.

Choosing the right estimation method

Start with Laplace (the default in many packages). If feasible, increase to adaptive quadrature with 7+ points and check that estimates are stable. If ML is computationally prohibitive (e.g., many random effects), use second-order PQL and compare results with Laplace. Always report which estimation method was used and, ideally, show sensitivity to the choice.

Knowledge Check — Section 4

1. Wald-type statistics for GLMM parameters:

Wald statistics assume the sampling distribution is approximately normal, which breaks down when parameters are near boundaries (e.g., variance parameters near zero), making likelihood-based inference preferable when available.

2. The beta-binomial model:

The beta-binomial model’s main advantage is its closed-form likelihood, which avoids the computational challenges of GLMMs. However, it is limited to grouped/replicated binary data at the lowest level.

3. First-order quasi-likelihood estimates in GLMMs:

Simulation studies have shown that first-order QL (especially MQL) can produce substantially biased estimates. Second-order PQL and ML estimation are preferred for more accurate results.

Reflection

Compare the GLMM approach to the beta-binomial approach for modeling clustered binary data. In what situations would each be preferred?

Model answerGLMM with binomial: explicit hierarchical structure, between-cluster random effects, allows subject-specific interpretation, supports cluster-level covariates, and can predict for new clusters. Useful when (a) you want to quantify between-cluster variance, (b) the clustering has clear hierarchical meaning, (c) you have enough data per cluster to identify random effects. Beta-binomial: marginal model with an overdispersion parameter (beta-distributed cluster-level probabilities); no random effects estimated explicitly; preferred when (a) overdispersion is the only departure from binomial, (b) sample is small and GLMM doesn't converge, (c) you want a simpler model. GLMM provides richer information but requires more data and computational care; beta-binomial is a parsimonious alternative for clustered binary data with simple overdispersion. Compare AIC and residual diagnostics to choose.
Reflection saved!
* Complete the quiz and reflection to continue.
Final Assessment

Lesson 11 — Comprehensive Assessment

⏱ Estimated time: 25 minutes

Bringing It All Together

This lesson carried the mixed-model framework across the boundary from continuous to discrete outcomes. Section 1 introduced the generalised linear mixed model through its most familiar form — logistic regression with a random intercept — and immediately raised the issue that has no analogue in the continuous case: subject-specific and population-averaged effects are no longer the same. The non-collapsibility of the odds ratio means a SS coefficient is always larger in magnitude than its PA counterpart, and the conversion factor depends explicitly on the random-effect variance.

Section 2 extended the framework to the rest of the discrete-outcome family. Poisson and negative-binomial GLMMs gave us multiplicative cluster effects on the rate scale; cumulative-link mixed models extended the ordinal logistic model to clustered ordinal outcomes; and multinomial mixed models handled unordered categorical outcomes. Throughout, choosing between conditional and marginal interpretations became a substantive decision, not a software default. Section 3 then confronted the central technical problem of GLMMs: the likelihood involves an integral with no closed form. Penalised quasi-likelihood is fast but biased for rare events; the Laplace approximation is the practical default; adaptive Gauss–Hermite quadrature improves accuracy at computational cost; and MCMC becomes the most defensible option for complex models.

Section 4 closed the loop with inference, diagnostics, and the wider family of random-effects models — including frailty (random-effects survival) models that prepare the ground for repeated-measures and longitudinal extensions in Lesson 12.

Key Takeaways from Lesson 11

  • A GLMM adds normally distributed random effects to a generalised linear model's linear predictor — extending random intercepts and slopes to binary, count, ordinal, and multinomial outcomes.
  • For non-linear link functions, subject-specific (SS) and population-averaged (PA) coefficients are not the same; SS estimates are always larger in magnitude for logistic models.
  • Cluster heterogeneity in binary GLMMs is summarised on the latent scale (latent-variable ICC) or as a median odds ratio — the residual ICC of binary outcomes is not directly interpretable.
  • The GLMM likelihood requires numerical integration; choose between PQL, Laplace, adaptive Gauss–Hermite, and MCMC based on event rate, cluster sizes, and accuracy needs.
  • Wald tests can be unreliable near parameter boundaries; likelihood-ratio and profile-likelihood inference are preferred when ML estimation is feasible.
  • The random-effects machinery generalises beyond GLMs — to frailty/survival, latent-variable, and zero-inflated models — making it a unifying framework for clustered, non-Gaussian outcomes.

This final assessment covers all material from this lesson. You must answer all 15 questions correctly (100%) and complete the final reflection to finish the lesson.

Final Reflection

Reflecting on this lesson, what do you see as the biggest challenges in fitting GLMMs for discrete data compared to linear mixed models for continuous data? How do the estimation challenges affect your confidence in the results?

Model answerCompared to LMMs, GLMMs for discrete data have several added challenges: (a) likelihood is intractable — requires numerical integration (Laplace, adaptive Gauss-Hermite, Monte Carlo) over the random-effects distribution, which is computationally expensive and can be inaccurate; (b) convergence is fragile — non-linear link functions and constrained variance parameters lead to local maxima and identifiability issues; (c) interpretation is subject-specific vs. population-averaged, a distinction that doesn't arise in LMMs because the linear link gives the same interpretation either way; (d) residual diagnostics are harder — standard plots are less informative for binary outcomes. These challenges affect confidence in results: (a) software-specific estimates may differ; (b) random-effects variances can be unreliable with few clusters or small cluster sizes; (c) different estimation methods give slightly different effect sizes. Best practice: fit multiple models (GLMM, GEE), use Bayesian sensitivity, and report both subject-specific and population-averaged interpretations.
Reflection saved!
Final Assessment — Lesson 11 (15 Questions)

1. A GLMM extends a GLM by:

GLMMs add random effects (e.g., random intercepts, random slopes) to the linear predictor of a GLM, allowing for correlation within clusters.

2. In a logistic GLMM, subject-specific (SS) odds ratios are:

Due to non-collapsibility of the odds ratio, conditioning on the random effects (SS interpretation) yields larger coefficient magnitudes compared to marginalizing over them (PA interpretation).

3. The conversion formula βPA ≈ βSS / √(1 + 0.346σ²g) applies to:

This specific formula (with constant 0.346) applies to logistic models. For probit models, the constant changes to 1. The formula converts between SS and PA coefficient estimates.

4. The Median Odds Ratio (MOR) equals 1 when:

MOR = exp(√(2σ²g) × 0.6745). When σ²g = 0, MOR = exp(0) = 1, indicating no heterogeneity between clusters in the odds of the outcome.

5. The latent variable ICC for binary data uses π²/3 because:

In the latent variable interpretation, the individual-level error follows a logistic distribution with variance π²/3 ≈ 3.29. This serves as the “within-cluster” variance on the latent scale.

6. In a Poisson GLMM, random effects on the log scale translate to:

Because exp(a + b) = exp(a) × exp(b), random effects that are additive on the log scale become multiplicative on the original rate scale.

7. The connection between Poisson random effects and the negative binomial is:

When the Poisson rate parameter follows a gamma distribution across units, the marginal distribution of counts is negative binomial. This is the Poisson-gamma mixture interpretation.

8. Gauss-Hermite quadrature:

Quadrature approximates the integral over random effects by evaluating the integrand at carefully chosen points and taking a weighted sum. More points give better approximations but at higher computational cost.

9. Adaptive quadrature differs from non-adaptive by:

Adaptive quadrature centers and scales the quadrature points based on the mode and curvature of each cluster’s contribution to the likelihood, improving accuracy, especially when random effects are large.

10. MQL quasi-likelihood gives estimates with:

MQL (Marginal Quasi-Likelihood) omits the random effects predictions from the adjusted variate, yielding parameter estimates that have a PA interpretation, unlike PQL which is closer to SS.

11. First-order quasi-likelihood compared to second-order:

First-order QL uses a cruder Taylor approximation, leading to more bias than second-order methods. Second-order PQL is generally recommended when QL methods are used.

12. The Laplace approximation in GLMM estimation:

The Laplace approximation is equivalent to adaptive quadrature with a single point. It is intermediate in accuracy and computation between full quadrature ML and quasi-likelihood.

13. The beta-binomial model’s main limitation is:

While computationally attractive, the beta-binomial model is essentially for grouped binary data and doesn’t naturally include individual-level predictors or extend to multi-level hierarchies.

14. When ML estimation for a GLMM is computationally unstable, one should:

Instability in ML estimation should be investigated by checking sensitivity to starting values and quadrature settings, and comparing results with alternative estimation methods to assess robustness.

15. For a probit GLMM, the SS-to-PA conversion formula uses the constant:

For probit regression, βPA ≈ βSS / √(1 + σ²g), where the constant is 1 (instead of 0.346 for logistic regression). This is because the probit model uses the normal distribution whose variance is 1.

Lesson 11 Complete!

You have completed Lesson 11: Mixed Models for Discrete Data. You can now extend the random-effects framework to binary, count, ordinal, and multinomial outcomes, distinguish subject-specific from population-average effects, navigate the menu of estimation methods (PQL, Laplace, adaptive Gauss–Hermite, MCMC) that GLMMs require, and run the inference and diagnostic checks that accompany them.

What’s next: Lesson 12 — Repeated Measures Data specialises the mixed-model framework to a particularly important kind of cluster: the same subject measured repeatedly over time. You will see how within-subject correlation can be modelled either through random effects (the mixed-model approach) or through a structured residual covariance matrix (the marginal/GEE approach), how to handle missing data and unequal spacing, and how to test for change over time. Lesson 12 also closes the course, bringing the regression-modelling arc of HSCI 410 to a finish.