Mixed Models for Discrete Data

Exploratory Data Analysis For Epidemiology

Learning objectives for this lesson:

Understand the generalised linear mixed model (GLMM) framework for discrete outcomes
Write and interpret logistic regression models with random effects
Distinguish between subject-specific (SS) and population-averaged (PA) interpretations
Calculate the median odds ratio (MOR) and latent variable ICC for binary outcomes
Apply mixed models to count data using Poisson regression with random effects
Understand estimation challenges in GLMMs including ML, quasi-likelihood, and Laplace approximation
Extend mixed models to ordinal, multinomial, and other discrete outcome types
Evaluate when different estimation methods are appropriate and their trade-offs

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas

Generalized linear mixed model (GLMM)An extension of the linear mixed model to non-normal outcomes (binary, count, ordinal, multinomial). Combines a generalized linear model (link function + exponential-family distribution) with random effects to handle clustered data.

Link functionA monotonic function that connects the linear predictor to the mean of the response (e.g., logit for binary, log for counts). Determines the scale on which fixed and random effects act additively.

Conditional (subject- or cluster-specific) interpretationThe effect of a covariate within a particular cluster, holding the random effect fixed. GLMM coefficients are conditional by construction.

Marginal (population-averaged) interpretationThe effect of a covariate averaged over the random-effects distribution. Differs from the conditional effect for non-linear link functions; estimated directly by GEE.

Attenuation (marginal vs conditional)The phenomenon that, with non-linear links (e.g., logit), marginal coefficients are smaller in magnitude than conditional ones. The discrepancy grows with the random-effects variance.

Random-intercept logistic regressionThe simplest GLMM for binary data: cluster-specific intercepts on the logit scale, draws from N(0, σ²_u); fixed effects shared across clusters.

Latent-variable formulationAn alternative way to derive logistic and probit GLMMs by assuming an unobserved continuous outcome with a threshold. Useful for deriving ICCs on the latent scale.

OverdispersionMore variability in the data than the assumed distribution permits (e.g., greater than Poisson variance). In clustered settings, overdispersion is often a sign of unmodeled clustering or omitted covariates.

Random-effects variance σ²_u (binary)The variance of cluster-specific random intercepts on the logit scale. Larger values indicate more heterogeneity between clusters and produce greater divergence between marginal and conditional effects.

Median Odds Ratio (MOR)A summary of cluster-level heterogeneity in a logistic GLMM, defined as the median odds ratio for the outcome between two randomly selected clusters. Useful translation of σ²_u into the OR scale.

Ordinal / multinomial mixed modelsGLMM extensions for ordered or unordered categorical responses (e.g., proportional-odds mixed model, multinomial logit mixed model). Random effects are typically shared across categories.

Methods & Statistical Concepts

Generalized estimating equations (GEE)A marginal-model approach for clustered data (Liang & Zeger, 1986; Zeger, Liang, & Albert, 1988). Specifies a working correlation structure and uses sandwich variance estimators; estimates have population-average interpretation and are robust to misspecification of the correlation.

Laplace approximationA method for approximating the integral over random effects in GLMM likelihoods. Fast and reasonably accurate for many problems but can be biased when random-effects variance is large or cluster sizes are small.

Adaptive Gaussian quadratureA more accurate (but slower) alternative to Laplace for evaluating GLMM likelihoods. Increasing the number of quadrature points improves accuracy at the cost of computation; standard for high-stakes inference.

Penalized quasi-likelihood (PQL)An older GLMM estimation method that linearizes the model around current estimates. Computationally fast but biased for binary outcomes with small clusters; largely superseded by Laplace and quadrature methods.

Bayesian GLMMA GLMM fit using MCMC (e.g., via Stan, brms, or JAGS). Avoids approximation by sampling from the posterior; especially useful when random-effects structure is complex or when interest is in cluster-level effects.

Working correlation structure (GEE)A user-specified guess at the within-cluster correlation pattern (independence, exchangeable, AR1, unstructured). Need not be correct for valid GEE inference but affects efficiency.

Sandwich (robust) variance estimatorA variance estimator that gives valid standard errors even when the working correlation is misspecified. Central to GEE; can also be applied to mixed-model fits.

QIC (Quasi-likelihood under the Independence model Criterion)An information criterion for selecting GEE models. Plays a role analogous to AIC for likelihood-based methods, since GEE has no full likelihood.

ICC on the latent scaleThe intraclass correlation for a logistic random-intercept model derived using the latent-variable formulation: ICC = σ²_u / (σ²_u + π²/3).

Convergence diagnostics (GLMM)Checks that an iterative GLMM estimator has converged: gradient near zero, positive-definite Hessian, no boundary variance estimates. Failures often signal small clusters, sparse data, or model misspecification.

Key People

Kung-Yee Liang & Scott ZegerJohns Hopkins biostatisticians who introduced generalized estimating equations (GEE) in 1986. Their framework supplied the marginal alternative to GLMMs for clustered discrete data.

Norman Breslow & David ClaytonCo-authors of the influential 1993 paper on penalized quasi-likelihood (PQL) estimation for GLMMs, which spurred widespread use of mixed models for binary and count data.

John Nelder (1924–2010)British statistician who, with Robert Wedderburn, formulated the generalized linear model (GLM) framework that GLMMs extend. Also a key figure in the development of GENSTAT software.

Douglas Bates (b. 1949)Lead author of R's lme4 package, which provides Laplace and adaptive quadrature fitting of GLMMs and is the most widely used GLMM software in applied research.

Peter Diggle (b. 1950)British statistician known for his contributions to longitudinal and spatial data analysis. Co-author with Liang and Zeger of the standard text Analysis of Longitudinal Data.

No matching entries. Try a different search term.

Section 2

GLMMs for Count, Binary & Categorical Data

⏱ Estimated time: 20 minutes

Section 2 of 4

GLMMs for Count, Binary & Categorical Data

Poisson and negative binomial random-effects models, ordinal mixed models, and the conditional versus marginal decision.

Poisson GLMM

Count outcomes on the log scale

Poisson GLMM with random intercept

\[ \color{#0B7B6B}{\log(\mu_{ij})} = \color{#C2410C}{\beta_0 + \beta_1 X_{1ij}} + \color{#6D28D9}{u_j}, \quad u_j \sim N(0,\sigma^2_u) \]

log(μ_ij) log expected count β₀+β₁X fixed effects u_j cluster random intercept

Exponentiating gives \(\mu_{ij} = \exp(X\beta)\cdot\exp(u_j)\). The cluster effects are multiplicative on the rate scale, and \(\exp(u_j)\) follows a log-normal distribution.

Three approaches

Poisson, gamma random effects, and negative binomial

Poisson + Normal RE

Random effects on the log scale; no closed-form marginal; standard GLMM approach.

Poisson + Gamma RE

Gamma on the rate scale; marginal distribution is negative binomial with closed form.

Negative binomial

Implicit gamma random effects; handles overdispersion; can be extended to multi-level structures.

Beyond counts

Ordinal and multinomial mixed models

Ordinal (cumulative-link): add random effects to the latent variable underlying the ordered categories. The proportional odds assumption and latent ICC methods carry over from the logistic GLMM.

Multinomial: a separate log-odds relative to the reference category for each level. Random effects may be correlated across categories, substantially increasing computational complexity.

R activity

GLMM versus GEE on the same data

Fitting both models to the same clustered dataset makes the SS versus PA distinction concrete.

glmer() with family = binomial returns subject-specific odds ratios.
geeglm() with corstr = "exchangeable" returns population-averaged odds ratios.
The divergence between the two reflects the between-clinic variance the GLMM estimates.

Carry forward

Decision framework and bridge to a later section

Identifiable clustering: Poisson GLMM with random intercepts targets the source of overdispersion.
Unstructured overdispersion: negative binomial may suffice with a single parameter.
Excess zeros: zero-inflated models add a structural component for the zero mass.

All of these models share one problem: the likelihood has no closed form. A later section explains what to do about that.

Introduction and Overview

From logistic to the full discrete-outcome family. An earlier section used logistic regression with a random intercept to introduce the GLMM. This section extends the framework outward to count outcomes (Poisson and negative binomial GLMMs, revisiting the count-data tools from an earlier lesson with clustering layered on top), ordinal outcomes (cumulative-link mixed models, an earlier lesson with random effects), and multinomial outcomes. Throughout, the conceptual difference between subject-specific and population-average effects becomes central: in non-linear models the two are not the same, and which one a stakeholder cares about should drive your modelling choice.

Learning Objectives

Fit Poisson and negative-binomial GLMMs and interpret random effects as multiplicative cluster-level rate modifiers.
Choose between normal and gamma random effects, and recognise when negative-binomial GLMMs handle residual overdispersion.
Extend the GLMM framework to ordinal (cumulative-link mixed) and multinomial outcomes.
Decide between conditional and marginal effects depending on the substantive question and stakeholder.

Poisson Regression with Random Effects

For count outcomes, Poisson regression with random effects models the log of the expected count as a function of predictors plus random effects:

Poisson GLMM with random intercept

\[ \color{#0B7B6B}{\log(\mu_{ij})} = \color{#C2410C}{\beta_0 + \beta_1 X_{1ij}} + \color{#6D28D9}{u_{\text{group}(i)}} \]

The log of the expected count is the fixed-effect predictor plus a group random intercept. On the log scale these act multiplicatively on the rate.

Because the random effects enter on the log scale, they translate to multiplicative effects on the rate scale: μ = exp(Xβ) × exp(u). The group-level multipliers exp(u) follow a log-normal distribution when u is normally distributed.

Poisson with Normal Random Effects

The standard GLMM approach uses normally distributed random effects on the log scale. The group-level effects ν_group = exp(u_group) are then log-normally distributed. This is the most common parameterisation in software.

This approach is flexible and can accommodate multiple levels of random effects, random slopes, and complex covariance structures, just like the linear mixed model framework.

Poisson with Log-Gamma Random Effects

An alternative parameterisation uses gamma-distributed random effects on the rate scale (equivalently, log-gamma on the log scale). When u follows a log-gamma distribution, the marginal distribution of counts has a convenient closed form.

This connection leads directly to the negative binomial distribution: a Poisson model with gamma-distributed rates yields negative binomial counts.

Negative Binomial as a Random Effects Model

The negative binomial distribution can be interpreted as a Poisson-gamma mixture: counts follow a Poisson distribution, but the rate varies across units according to a gamma distribution. This naturally accounts for overdispersion (variance > mean).

The negative binomial can be extended with additional random effects at higher levels, combining the overdispersion correction with explicit hierarchical structure.

Model	Random Effect Distribution	Marginal Distribution	Key Feature
Poisson + Normal RE	Normal on log scale	No closed form	Standard GLMM; flexible
Poisson + Gamma RE	Gamma on rate scale	Negative binomial	Closed-form marginal
Negative Binomial	Implicit gamma	Negative binomial	Handles overdispersion

R Activity: Logistic GLMM and GEE on the same clustered data

The clustered dataset phaa_clinics.csv (carried forward from earlier lessons) has a binary outcome referred (1 = referred to a specialist). Same predictors as before but a binary outcome means a logistic GLMM. The full annotated script is in r-activities/HSCI_410_Lesson_11_Mixed_Models_for_Discrete_Data.R.

library(lme4);  library(performance);  library(geepack)
clinics <- read.csv("phaa_clinics.csv", stringsAsFactors = FALSE)
clinics$clinic_id    <- factor(clinics$clinic_id)
clinics$clinic_urban <- factor(clinics$clinic_urban,
                                  levels = c("rural","urban"))
clinics$smoker       <- factor(clinics$smoker, levels = c("No","Yes"))

# 1. Logistic GLMM (subject-specific)
m_glmm <- glmer(referred ~ age + female + smoker + bmi + clinic_urban
                            + (1 | clinic_id),
                data    = clinics,
                family  = binomial,
                control = glmerControl(optimizer = "bobyqa"))
summary(m_glmm)
exp(fixef(m_glmm))                     # subject-specific ORs
icc(m_glmm)                            # latent-scale ICC

# 2. GEE (population-averaged)
m_gee <- geeglm(referred ~ age + female + smoker + bmi + clinic_urban,
                id     = clinic_id, data = clinics,
                family = binomial, corstr = "exchangeable")
summary(m_gee)
exp(coef(m_gee))                       # population-averaged ORs

# 3. Compare side-by-side
cbind(GLMM_OR = exp(fixef(m_glmm)),
      GEE_OR  = exp(coef(m_gee)))

Subject-specific vs population-averaged. glmer() returns subject-specific ORs ("within the same clinic, two patients differ by ..."). GEE returns population-averaged ORs ("across the population, the average difference is ..."). For non-linear link functions like logit they are NOT the same; pick the one that matches your scientific question.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console / plot before answering.

1. From exp(fixef(m_glmm)) and exp(confint(m_glmm, method = "Wald")), report the subject-specific OR (and 95% CI) for smokerYes. Translate it into one sentence that explicitly conditions on the clinic (e.g., "within the same clinic, two patients...").

Model answerexp(fixef(m_glmm)) for smokerYes typically returns a subject-specific OR around 1.65, 95% CI roughly (1.30, 2.10). Interpretation, conditioning explicitly: within the same clinic, two patients who differ only in smoking status (one smoker, one non-smoker) differ in their odds of the outcome by a factor of about 1.65. This is a conditional, subject-specific effect: the comparison is between hypothetical patients in the same random-effect group.

2. From icc(m_glmm), report the latent-scale ICC. Why does the binary ICC require a special formula (rather than just sigma^2_u / total variance like in linear mixed models)?

Model answericc(m_glmm) typically returns a latent-scale ICC around 0.10–0.15. The binary ICC requires a special formula because the residual variance on the binary scale is the logit's variance π²/3 ≈ 3.29 by convention (not estimated like in linear mixed models); the latent-scale ICC is σ²_u / (σ²_u + 3.29). In linear mixed models you have an estimable residual variance σ², but in logistic GLMMs the residual variance on the underlying continuous (latent) scale is fixed by the link function, so the ICC depends only on between-cluster variance.

3. Compare GLMM ORs vs GEE ORs from cbind(GLMM_OR, GEE_OR). Which set of ORs is larger in magnitude, and why is that expected with a logit link? In one sentence, state when a public-health audience would prefer the GEE interpretation over the GLMM.

Model answerGLMM ORs are larger in magnitude than GEE ORs; for example, GLMM OR = 1.65 might correspond to GEE OR = 1.45 for the same data. This is expected with the logit link: GLMM gives subject-specific (conditional) effects: the effect within a cluster, holding the cluster's random effect constant; GEE gives population-averaged (marginal) effects: the effect averaged across the distribution of cluster random effects, which is closer to 1 due to the non-linearity of the logit. A public-health audience usually prefers GEE when the question is population-level (e.g., "if we ran this intervention across all clinics, what change in odds would we see?"); the GLMM is preferred for clinical decision-making within a specific clinic.

Saved.

GLMMs for Other Discrete Outcomes

Binary data: alternative link functions

While the logit link is most common for binary data, probit and complementary log-log links can also be used in GLMMs. For probit models, the SS-to-PA conversion uses a constant of 1 instead of 0.346: β^PA ≈ β^SS / √(1 + σ²_g). This is because the probit model uses the normal distribution, whose variance is 1.

Ordinal data: proportional odds with random effects

The proportional odds model for ordinal outcomes can be extended by adding random effects to the latent variable underlying the ordinal categories. The subject-specific interpretation and the latent variable ICC methods from logistic regression apply similarly. Random effects capture cluster-level variation in the propensity to be in higher or lower categories.

Multinomial data: random effects models

Random effects multinomial logistic models are less common and harder to estimate. The computational burden increases because each category (beyond the reference) has its own set of parameters, and the random effects may need to be correlated across categories. Specialised software and careful model specification are required.

Zero-inflated models with random effects

Zero-inflated models combine a point mass at zero with a count distribution. Random effects can be added to the count part, the zero-inflation part, or both. This flexibility allows the model to capture clustering in both the probability of being a “structural zero” and in the count process among non-zeros.

Choosing Between Models for Count Data

When faced with overdispersed count data, consider: (1) Is the overdispersion due to unmeasured heterogeneity between known clusters? Use a Poisson GLMM. (2) Is the overdispersion due to general extra-Poisson variation without a clear clustering structure? A negative binomial may suffice. (3) Are there excess zeros beyond what either model predicts? Consider a zero-inflated model. Often, comparing model fit statistics (AIC, BIC) across competing models is the practical approach.

Reflection

A researcher finds that a Poisson model for disease counts across 50 communities has significant overdispersion. What are the relative advantages of addressing this with random effects versus using a negative binomial model?

Model answerBoth random effects and negative binomial address overdispersion, but through different mechanisms. Random effects: introduces community-level random intercepts to capture between-community variation; preferred when there is a clear hierarchical structure (communities), you want to estimate community-level variance, and you want to make predictions for new communities. Useful when overdispersion arises from community-level heterogeneity. Negative binomial: introduces a single overdispersion parameter that inflates the variance uniformly; preferred when overdispersion is “unstructured” (no clear hierarchical source), the sample is small, or computational simplicity matters. Often the best practice is to use both: a negative binomial GLMM, which combines hierarchical random effects with individual-level overdispersion. Compare AIC and residual diagnostics; report sensitivity to the choice.

Reflection saved!

* Complete the quiz and reflection to continue.

Section 3

Estimation Methods for GLMMs

⏱ Estimated time: 20 minutes

Section 3 of 4

Estimation Methods for GLMMs

Why the likelihood has no closed form, and the practical hierarchy from quasi-likelihood to full adaptive quadrature.

The core challenge

An intractable integral

Marginal likelihood (one cluster)

\[ \color{#0B7B6B}{L_j(\boldsymbol{\beta},\sigma^2_u)} = \int_{-\infty}^{\infty} \color{#C2410C}{\prod_i f(y_{ij}\mid u_j)}\; \color{#6D28D9}{\phi(u_j;0,\sigma^2_u)}\; du_j \]

L_j cluster likelihood ∏ f(y∣u) outcome likelihood given the random effect φ(u) normal density of the random effect

The product inside the integral is the conditional likelihood for all observations in cluster \(j\); \(\phi\) is the normal density for the random effect. No closed form exists for non-Gaussian outcomes.

Quasi-likelihood

Penalised quasi-likelihood (PQL)

How it works

Linearises the model with a Taylor expansion, then applies iterative weighted least squares. Avoids numerical integration entirely.

Known limits

First-order MQL can be substantially biased with large variance, small clusters, or rare events. Second-order PQL is better but still approximates.

The default method

Laplace approximation

Approximates the integrand by a Gaussian at its mode: fast, reasonably accurate, and the default in glmer().

Schematic of the Laplace approximation

\[ \int \color{#0B7B6B}{f(u)}\,du \approx \color{#C2410C}{f(\hat{u})}\,(2\pi)^{1/2}\,\color{#6D28D9}{[- f''(\hat{u})]^{-1/2}} \]

∫f(u)du the intractable integral f(û) height at the peak [−f″(û)]^−1/2 width at the peak

Equivalent to adaptive Gauss-Hermite quadrature with one quadrature point. Increase points to verify stability.

Higher accuracy

Adaptive quadrature and Bayesian estimation

Adaptive quadrature

Places quadrature points at each cluster's integrand mode. More points improve accuracy; cost scales with number of random effects.

Bayesian MCMC

Samples from the posterior directly via Stan or brms. No approximation; especially valuable for complex random-effects structures or weakly identified variance components.

Carry forward

A practical hierarchy

Start with Laplace (the default), then increase quadrature points to check stability.
Avoid first-order MQL for binary data with small clusters or large variance.
Use MCMC when the model is complex or variance components are weakly identified.
Always report which estimation method was used, and ideally demonstrate sensitivity to that choice.

Introduction and Overview

Why estimation deserves its own section. Earlier sections specified what a GLMM is and what its coefficients mean. This section focuses on a problem that linear mixed models did not face: the likelihood for a GLMM has no closed form. The integral over the random-effects distribution must be approximated, and the choice of approximation matters: penalised quasi-likelihood is fast but biased (Breslow & Clayton, 1993), the Laplace approximation is the practical default (Bates, Mächler, Bolker, & Walker, 2015), and adaptive Gauss–Hermite quadrature (Pinheiro & Chao, 2006) or MCMC become necessary when high accuracy is required. Knowing which method your software is using (and when it will fail) is essential for trustworthy GLMM inference, especially with rare events or strongly clustered data.

Learning Objectives

Explain why the GLMM likelihood has no closed form and requires integration over the random-effects distribution.
Compare penalised quasi-likelihood (PQL), Laplace approximation, and adaptive Gauss–Hermite quadrature in terms of speed, bias, and accuracy.
Recognise the conditions (rare events, small clusters, strong clustering) under which PQL is unreliable.
Identify when Bayesian / MCMC estimation is the most defensible choice for a GLMM.
Read software output critically, tying estimator choice to the trustworthiness of standard errors and p-values.

The Estimation Challenge

Unlike linear mixed models, the likelihood in a GLMM cannot be computed in closed form. The likelihood involves an integral over the random effects distribution that generally has no analytic solution. This is the fundamental computational challenge of GLMMs, and different estimation methods represent different strategies for handling this integral.

Maximum Likelihood (ML) Estimation

ML estimation is the gold standard for GLMMs (Pinheiro & Chao, 2006). It uses Gauss-Hermite quadrature to numerically approximate the integral over the random effects. The integrand is evaluated at carefully chosen points (quadrature points), and a weighted sum provides the approximation.

Adaptive quadrature improves accuracy by centring and scaling the quadrature points based on the mode and curvature of each cluster’s contribution to the likelihood. The number of quadrature points controls accuracy: more points yield better approximations but require more computation. Default values are typically around 7.

ML estimation can be computationally intensive or unstable, especially with multiple random effects (where the dimensionality of integration grows) or with sparse data.

Quasi-Likelihood (QL) Estimation

Quasi-likelihood methods avoid numerical integration by using Taylor expansions to linearise the model, then applying iterative weighted least squares. Key variants include:

First-order vs. second-order Taylor expansion: second-order is more accurate
MQL (Marginal Quasi-Likelihood): omits random effect predictions from the working variate; gives PA interpretation
PQL (Penalised Quasi-Likelihood): includes random effect predictions; gives estimates closer to SS interpretation

PQL with second-order expansion is generally preferred among QL methods (Breslow & Clayton, 1993). However, simulation studies show that first-order MQL can be markedly biased, especially with large variance components or small cluster sizes (Bolker et al., 2009).

Laplace Approximation

The Laplace approximation corresponds to adaptive quadrature with a single quadrature point, the lowest order of adaptive quadrature. It approximates the integral by a normal distribution centred at the mode of the integrand.

Laplace is intermediate in both accuracy and computational cost between full quadrature ML and quasi-likelihood. It is widely available in software (e.g., glmer in R uses Laplace by default; Bates et al., 2015) and provides a good starting point before increasing quadrature points (Pinheiro & Chao, 2006).

Method	Accuracy	Computation	Interpretation
ML (Adaptive Quadrature)	High (gold standard)	High; increases with random effects	SS
Laplace Approximation	Moderate (1 quad point)	Moderate	SS
PQL (2nd order)	Moderate	Low	Close to SS
MQL (2nd order)	Lower	Low	PA
MQL (1st order)	Lowest; may be biased	Lowest	PA

Practical Recommendations

Check stability: Vary the number of quadrature points and confirm that estimates do not change substantially. Compare methods: Run both ML and QL and check agreement. Use caution: With small clusters, binary outcomes, or large random effects variances, simpler QL methods may be markedly biased; prefer ML or at least second-order PQL. Start simple: Begin with Laplace and increase quadrature points if feasible.

How Gauss-Hermite quadrature works

Gauss-Hermite quadrature approximates integrals of the form ∫ f(x) exp(-x²) dx by a weighted sum: ∑ w_k f(x_k), where x_k are the quadrature points and w_k are the corresponding weights. The points and weights are chosen to give exact results for polynomial integrands up to a certain degree. For GLMM likelihoods, more quadrature points provide better approximations to the non-polynomial integrand.

Why adaptive quadrature is better

Non-adaptive quadrature uses fixed points centred at zero. Adaptive quadrature shifts and scales the points to match the mode and curvature of each cluster’s integrand. This means fewer points are needed for the same accuracy, and the method is more robust when random effects are large or when clusters differ substantially.

When QL methods fail

Quasi-likelihood methods can produce seriously biased estimates when: (1) random effects variance is large, (2) cluster sizes are small (especially with binary data), (3) prevalence is extreme (close to 0 or 1), or (4) the model has crossed random effects. In these situations, ML estimation with sufficient quadrature points is strongly preferred.

Reflection

You are fitting a 3-level logistic GLMM and the ML estimates are unstable. What steps would you take to diagnose the problem and what alternative estimation approaches might you consider?

Model answerSteps to diagnose unstable ML estimates in a 3-level logistic GLMM: (a) check convergence: many software packages report convergence warnings; non-convergence often indicates the model is over-parameterised for the data. (b) Inspect random-effects variances: variances near zero or huge suggest unidentifiable parameters. (c) Examine cluster sizes: if some clusters have very few observations, the random effects for those clusters are poorly estimated and can destabilise the model. (d) Run profile likelihoods for variance parameters. Alternative estimation: Bayesian estimation (Stan, brms) with weakly informative priors stabilises variance estimates; Laplace approximation with fewer adaptive quadrature points may converge when full Laplace doesn't; penalised likelihood approaches; simplification by removing one level of nesting if data don't support it; centering covariates can help with convergence in models with random slopes.

Reflection saved!

* Complete the quiz and reflection to continue.

Section 4

Inference, Diagnostics & Other Random Effects Models

⏱ Estimated time: 15 minutes

Section 4 of 4

Inference, Diagnostics & Other Random Effects Models

Wald and likelihood-ratio tests, boundary issues, model alternatives, and reading software output critically.

Fixed-effect inference

Wald versus likelihood-ratio tests

Wald tests

Divide estimate by standard error. Fast and convenient, but unreliable near parameter boundaries.

Likelihood-ratio and profile

Compare model log-likelihoods or profile the likelihood surface. More accurate near boundaries; require ML estimation.

Variance component tests

Boundary problems

Variance parameters are constrained to be non-negative. Standard chi-square reference distributions are incorrect at the boundary.

Use profile-likelihood intervals for variance components when feasible.
A Wald interval that includes negative values signals unreliable inference.
Treat likelihood-ratio test p-values near the boundary as upper bounds, not exact.

Simpler alternatives

Beta-binomial and negative binomial

GLMM

Flexible; multiple levels; random slopes; individual predictors. Requires numerical integration.

Beta-binomial

Closed-form likelihood; grouped binary data; group-level predictors only. Estimates ICC directly.

Negative binomial

Closed-form; implicit gamma random effects; handles overdispersion. Extendable with additional levels.

The wider family

Beyond the standard GLMM

Frailty models: random effects on the hazard in survival analysis. Share the same hierarchical logic and estimation challenges.

Zero-inflated models: random effects in the count component, the zero-inflation component, or both.

Latent-variable models: derive logistic GLMMs by thresholding unobserved continuous outcomes; bridge to measurement models.

Lesson 11 · Synthesis

Into the final assessment

GLMMs add normally distributed random effects to a generalised linear model's linear predictor.
For logistic and log-link models, subject-specific and population-averaged effects are distinct; SS is always larger in magnitude.
The likelihood requires approximation; the choice of method affects accuracy and can affect conclusions.
Inference near boundaries requires profile-likelihood or likelihood-ratio methods, not Wald tests.

Introduction and Overview

Closing the loop on GLMMs. Earlier sections covered specification, interpretation, and estimation. This final section addresses the questions that finish the workflow: how do we test fixed effects and variance components when boundaries and approximations complicate standard tests? what residuals make sense in a non-linear mixed model, and how do we read them? and how does the random-effects framework extend to other discrete-outcome problems: random-effects survival models, latent-variable formulations, and zero-inflated count models, that you will see in applied research. The emphasis here is on becoming a critical user of GLMM software output rather than a passive consumer of p-values.

Learning Objectives

Choose between Wald, likelihood-ratio, and profile-likelihood inference for fixed effects in a GLMM.
Apply boundary corrections when testing variance components, and recognise when Wald tests near boundaries are misleading.
Interpret Pearson and deviance residuals in the GLMM setting and use them for cluster-level diagnostics.
Connect GLMMs to related random-effects extensions: survival, latent-variable, and zero-inflated count models.
Read GLMM software output critically and identify when reported quantities are unreliable.

Inference for Fixed and Random Effects

Testing and constructing confidence intervals in GLMMs involves the same general principles as in linear mixed models, but with additional complications related to the estimation method used.

For fixed effects, Wald-type tests (based on the estimate divided by its SE) are most common. However, Wald statistics can be unreliable when parameters are near boundary values. Likelihood-based inference (likelihood ratio tests and profile likelihood confidence intervals) is preferred when feasible, but requires ML estimation, not quasi-likelihood.

For random effect variances, the same boundary issues discussed for linear mixed models apply: variance parameters cannot be negative, so the usual chi-square distribution for LR tests may not be appropriate near zero.

Caution with Wald Tests Near Boundaries

Wald statistics assume the sampling distribution of the parameter estimate is approximately normal. This assumption breaks down when parameters are near boundary values (e.g., variance components near zero, or probabilities near 0 or 1). In such cases, confidence intervals based on Wald statistics can include impossible values, and p-values may be misleading. Likelihood-based methods are more reliable.

Alternative Random Effects Models

GLMM (Standard)Click to explore

Beta-BinomialClick to explore

Negative BinomialClick to explore

Approach	Likelihood	Multi-Level	Individual Predictors	Computational Cost
GLMM (normal RE)	Requires integration	Yes	Yes	High
Beta-binomial	Closed form	No	Limited	Low
Negative binomial	Closed form	With extensions	Yes	Low to moderate

Practical Guidance

When to use the beta-binomial

The beta-binomial model is most appropriate when you have grouped binary data (e.g., proportion of animals testing positive in each herd) with only group-level predictors. Its closed-form likelihood makes it computationally simple, and it directly estimates the ICC. However, it cannot accommodate individual-level predictors or multiple hierarchical levels.

Simulation evidence on estimation methods

Simulation studies consistently show that first-order MQL can be markedly biased, underestimating both fixed effects and variance components (Breslow & Clayton, 1993; Bolker et al., 2009). Second-order PQL performs better but can still be biased with small cluster sizes or large variances. ML with adaptive quadrature is generally the most accurate (Pinheiro & Chao, 2006), though computational constraints may require starting with Laplace approximation.

Choosing the right estimation method

Start with Laplace (the default in many packages). If feasible, increase to adaptive quadrature with 7+ points and check that estimates are stable. If ML is computationally prohibitive (e.g., many random effects), use second-order PQL and compare results with Laplace. Always report which estimation method was used and, ideally, show sensitivity to the choice.

Reflection

Compare the GLMM approach to the beta-binomial approach for modeling clustered binary data. In what situations would each be preferred?

Model answerGLMM with binomial: explicit hierarchical structure, between-cluster random effects, allows subject-specific interpretation, supports cluster-level covariates, and can predict for new clusters. Useful when (a) you want to quantify between-cluster variance, (b) the clustering has clear hierarchical meaning, (c) you have enough data per cluster to identify random effects. Beta-binomial: marginal model with an overdispersion parameter (beta-distributed cluster-level probabilities); no random effects estimated explicitly; preferred when (a) overdispersion is the only departure from binomial, (b) sample is small and GLMM doesn't converge, (c) you want a simpler model. GLMM provides richer information but requires more data and computational care; beta-binomial is a parsimonious alternative for clustered binary data with simple overdispersion. Compare AIC and residual diagnostics to choose.

Reflection saved!

* Complete the quiz and reflection to continue.

HSCI 410, Lesson 11

Exploratory Data Analysis For Epidemiology

Mixed Models for Discrete Data

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction & Logistic Regression with Random Effects

Mixed Models for Discrete Data

What changes with discrete outcomes

Link function

Two interpretations

Intractable likelihood

Introduction & Logistic Regression with Random Effects

Adding random effects to a generalised linear model

Subject-specific versus population-averaged

From subject-specific to population-averaged

Latent-variable ICC and median odds ratio

Latent-variable ICC

Median odds ratio

What to take into the next section

Introduction and Overview

Learning Objectives

The Generalised Linear Mixed Model (GLMM)

Key Concept

Subject-Specific vs. Population-Averaged Interpretation

In plain words: which effect am I reading?

Measures of Cluster Heterogeneity

Median Odds Ratio (MOR)

ICC for Binary Data

The Latent Variable Approach

Reflection

GLMMs for Count, Binary & Categorical Data

GLMMs for Count, Binary & Categorical Data

Count outcomes on the log scale

Poisson, gamma random effects, and negative binomial

Poisson + Normal RE

Poisson + Gamma RE

Negative binomial

Ordinal and multinomial mixed models

GLMM versus GEE on the same data

Decision framework and bridge to a later section

Introduction and Overview

Learning Objectives

Poisson Regression with Random Effects

Poisson with Normal Random Effects

Poisson with Log-Gamma Random Effects

Negative Binomial as a Random Effects Model

R Reflect on what you just ran

GLMMs for Other Discrete Outcomes

Choosing Between Models for Count Data

Reflection

Estimation Methods for GLMMs

Estimation Methods for GLMMs

An intractable integral

Penalised quasi-likelihood (PQL)

How it works

Known limits

Laplace approximation

Adaptive quadrature and Bayesian estimation

Adaptive quadrature

Bayesian MCMC

A practical hierarchy

Introduction and Overview

Learning Objectives

The Estimation Challenge

Maximum Likelihood (ML) Estimation

Quasi-Likelihood (QL) Estimation

Laplace Approximation

Practical Recommendations

Reflection

Inference, Diagnostics & Other Random Effects Models

Inference, Diagnostics & Other Random Effects Models

Wald versus likelihood-ratio tests

Wald tests

Likelihood-ratio and profile

Boundary problems

Beta-binomial and negative binomial

GLMM

Beta-binomial

Negative binomial

Beyond the standard GLMM