Modelling Survival Data

Exploratory Data Analysis For Epidemiology

Learning objectives for this lesson:

Distinguish between non-parametric, semi-parametric, and parametric analyses of survival data
Carry out non-parametric analyses using actuarial and Kaplan-Meier life tables
Generate and interpret survivor and cumulative hazard function graphs
Understand relationships among survivor S(t), failure F(t), hazard h(t), and cumulative hazard H(t) functions
Fit and interpret a Cox proportional hazards model including hazard ratios
Evaluate the proportional hazards assumption using graphical and statistical methods
Incorporate time-varying covariates and stratified analyses in Cox models
Describe parametric survival models (exponential, Weibull, Gompertz) and accelerated failure time models
Understand frailty models for accounting for unmeasured covariates

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas

Survival analysisA family of statistical methods for analyzing the time until an event of interest occurs (e.g., death, recurrence, recovery). Handles incomplete follow-up via censoring and uses the survival function S(t) and hazard function h(t) as central quantities.

Time-to-event outcomeAn outcome defined by both whether an event occurred and how long it took. Distinct from binary or count outcomes because it carries information about timing as well as occurrence.

CensoringIncomplete observation of an individual's event time. The most common form is right censoring: we know the event has not occurred up to the last time observed, but we do not know if or when it occurs afterward.

Right censoringThe event has not yet occurred by the end of follow-up; the true event time is known only to be greater than the observed time. Standard survival methods assume censoring is non-informative (independent of the underlying hazard).

Left censoringThe event has already occurred before the subject entered observation; we know it happened before some time t but not exactly when.

Interval censoringThe event is known to have occurred between two observation times but the exact moment is unknown (common with periodic screening or follow-up visits).

Left truncation (delayed entry)Subjects only enter the risk set after some time origin has already passed; those who experienced the event earlier are never observed. Different from left censoring and requires explicit handling in the risk-set construction.

Survival function S(t)The probability that an individual survives (is event-free) beyond time t. It is monotonically non-increasing, equal to 1 at t = 0, and approaches 0 as t increases.

Hazard function h(t)The instantaneous rate of the event at time t given survival up to t. Unlike a probability, it can take any non-negative value and characterizes the risk profile over time.

Cumulative hazard H(t)The integral of the hazard from 0 to t. Related to the survival function by S(t) = exp(−H(t)).

Hazard ratio (HR)The ratio of hazard functions for two groups (or for a one-unit change in a covariate). Constant over time under the proportional hazards assumption. HR = 1 means equal hazards; HR > 1 means higher risk.

Proportional hazards assumptionAssumption underlying Cox regression: the ratio of hazards between any two individuals is constant over time. Tested using Schoenfeld residuals or by including time-by-covariate interactions.

Time-varying covariateA predictor whose value can change during follow-up (e.g., treatment status, current dose). Requires special data structuring (counting process / start-stop format) for Cox regression.

FrailtyA random effect added to a survival model to account for unobserved heterogeneity or clustering (e.g., shared genes within families, shared exposures within communities).

Competing risksA setting where multiple mutually exclusive event types can occur, and the occurrence of one prevents the others (e.g., death from different causes). Standard Kaplan–Meier overestimates the cumulative incidence in this setting; subdistribution methods are preferred (Fine & Gray, 1999).

Methods & Statistical Concepts

Kaplan–Meier (KM) estimatorA non-parametric estimator of the survival function. Builds a step function with drops at each observed event time, properly accounting for censoring through the risk set at each time.

Nelson–Aalen estimatorA non-parametric estimator of the cumulative hazard function. The complementary KM estimate can be derived from it via S(t) = exp(−Ĥ(t)).

Log-rank testA non-parametric test comparing survival curves across two or more groups. Most powerful when proportional hazards holds; weighted variants (e.g., Wilcoxon, Peto) emphasize early or late differences.

Cox proportional hazards regressionA semi-parametric model: h(t|x) = h₀(t) · exp(β'x). The baseline hazard h₀(t) is left unspecified; coefficients are estimated by maximizing the partial likelihood, and exp(β) is the hazard ratio.

Partial likelihoodThe likelihood function used in Cox regression. Conditions on the observed event times so the baseline hazard cancels out, allowing inference on covariate effects without specifying h₀(t).

Schoenfeld residualsResiduals from a Cox model used to check the proportional hazards assumption. Plotting them against time should show no trend if the assumption holds.

Stratified Cox modelA Cox model that allows the baseline hazard to differ across strata while constraining covariate effects to be the same. Used when proportional hazards fails for a stratification variable.

Parametric survival modelsModels that specify a distribution for the event time (exponential, Weibull, Gompertz, log-logistic, log-normal). Provide more efficient estimates if the distribution is correct and allow extrapolation beyond observed times.

Weibull modelA flexible parametric survival distribution with a hazard that increases or decreases monotonically with time. Can be parameterized as either a proportional hazards or accelerated failure time model.

Accelerated failure time (AFT) modelA class of parametric survival models in which covariates act multiplicatively on the time scale rather than the hazard. exp(β) is interpreted as a time-acceleration factor.

Risk setAt each event time t, the set of subjects who are still under observation and have not yet experienced the event. Central to KM, log-rank, and Cox partial likelihood calculations.

Key People

Edward L. Kaplan (1920–2006)American mathematician who, with Paul Meier, introduced the product-limit estimator of the survival function in 1958 (Kaplan & Meier, 1958). The Kaplan–Meier paper is one of the most cited statistics papers ever.

Paul Meier (1924–2011)American biostatistician who co-developed the Kaplan–Meier estimator (Kaplan & Meier, 1958) and was a leading advocate for randomized clinical trials and rigorous statistical methods in medicine.

Sir David R. Cox (1924–2022)British statistician who introduced the proportional hazards model and the partial likelihood in his landmark 1972 paper (Cox, 1972). His work transformed survival analysis and made covariate-adjusted analysis tractable for time-to-event data.

Wayne NelsonAmerican statistician who proposed the cumulative hazard estimator (1969, 1972) used in reliability and survival contexts; later extended by Odd Aalen.

Odd O. Aalen (b. 1947)Norwegian statistician who built the modern counting-process foundation for survival analysis and developed the additive hazards model and Nelson–Aalen estimator.

No matching entries. Try a different search term.

Section 1

Introduction & Non-Parametric Analyses

⏱ Estimated time: 20 minutes

Lesson 8

Modelling Survival Data

Time-to-event outcomes, from censoring and Kaplan–Meier curves to the Cox model and beyond.

Where we are

A new outcome type

Previous lessons handled continuous, binary, and count outcomes. This lesson adds a structural feature those methods cannot handle: time-to-event data with incomplete follow-up.

The outcome now carries two pieces of information: whether the event occurred, and how long it took. For participants still event-free at the study's end, we have partial information rather than no information.

Section 1 of 4

Introduction & Non-Parametric Analyses

Censoring defined; actuarial life tables; the Kaplan–Meier and Nelson–Aalen estimators.

The central problem

Why censoring breaks ordinary regression

Dropping censored rows

Wastes the partial information we have and typically biases estimates, especially if censoring is related to risk.

Using censoring time as event time

Treats the person as if they had the event when last seen. Systematically biases survival toward shorter times.

Survival methods use both kinds of observation correctly: event times give full information; censored times contribute partial information through the risk set.

Types of censoring

Right, left, and interval censoring

Right censoring

Event not yet observed. True event time exceeds last observation. By far the most common form.

Left censoring

Event occurred before study entry. Exact time unknown; we know only it preceded observation.

Interval censoring

Event occurred between two visits. Common with periodic screening or scheduled follow-up.

Truncation differs: with truncation a subject never enters the study (e.g. delayed entry); standard methods assume right censoring that is non-informative.

Non-parametric estimators

Actuarial tables and the Kaplan–Meier curve

Each drop in the step function corresponds to an observed event. The curve stays flat between events. Censored observations contribute to the risk set up to their censoring time but do not produce a drop.

The product-limit formula

Kaplan–Meier and Nelson–Aalen

Kaplan–Meier survival estimate

\[ \color{#0B7B6B}{\hat{S}(t)} = \prod_{j:\, t_j \le t} \frac{\color{#1D4ED8}{r_j} - \color{#C2410C}{d_j}}{\color{#1D4ED8}{r_j}} \]

Ŝ(t) estimated survival d_j events at time j r_j number at risk

Nelson–Aalen cumulative hazard estimate

\[ \color{#6D28D9}{\hat{H}(t)} = \sum_{j:\, t_j \le t} \frac{\color{#C2410C}{d_j}}{\color{#1D4ED8}{r_j}} \qquad \Rightarrow \qquad \color{#0B7B6B}{\hat{S}(t)} = e^{-\color{#6D28D9}{\hat{H}(t)}} \]

Ĥ(t) cumulative hazard d_j events r_j at risk Ŝ(t) survival

At each event time \(t_j\): \(r_j\) is the number still at risk; \(d_j\) is the number of events. The two estimators give slightly different survival curves but converge as sample size grows.

Carry forward

What to take into the next section

Censoring is partial information, handled correctly by survival methods through the risk set at each event time.
Kaplan–Meier describes survival non-parametrically; the step function drops at each observed event and is flat between them.
Nelson–Aalen estimates the cumulative hazard, the scale on which regression models for survival will operate.

Introduction and Overview

Earlier lessons stepped through regression for continuous, binary, ordered/multi-category, and count outcomes. Each new outcome type required a new likelihood and a new way of relating predictors to the response. This lesson takes on one of the most distinctive outcome types in epidemiology: time to an event. The wrinkle is that, in real studies, we rarely observe every event, since people are lost to follow-up, the study ends, or other events intervene. That incomplete information, called censoring, is what makes survival analysis its own toolbox rather than a special case of linear or count regression.

The four content sections work from the ground up. This section introduces censoring, life tables, and the non-parametric Kaplan–Meier estimator, the workhorses for describing survival without making distributional assumptions. A later section formalises the survivor, failure, hazard, and cumulative hazard functions and shows how they are mathematically linked. A later section introduces the Cox proportional hazards model, the most widely used regression for time-to-event data, and the proportional hazards assumption it relies on. A later section turns to fully parametric and accelerated failure time (AFT) models, plus frailty extensions for unmeasured heterogeneity. By the end you will know which estimator to reach for given the structure of your data and the question you are asking.

Learning Objectives

Define time-to-event data and explain why standard regression methods cannot handle censoring.
Distinguish right, left, and interval censoring from left truncation, and identify each in study descriptions.
Construct and interpret an actuarial life table for grouped survival data.
Estimate and plot a Kaplan–Meier survival curve with a 95% confidence band.
Explain how the Nelson–Aalen estimator complements Kaplan–Meier on the cumulative-hazard scale.

What Is Survival Analysis?

Survival analysis (also called time-to-event analysis) is concerned with a specific type of outcome: the time that passes before a particular event of interest occurs. The event might be death, disease onset, recovery, relapse, or any other well-defined transition. What distinguishes survival data from other continuous outcomes is the presence of censoring, the incomplete information about when the event occurred for some individuals.

Survival data have two important structural features. First, survival time is bounded below by zero, since the time to an event can never be negative. Second, the distribution of survival times is typically right-skewed, meaning that a few individuals have very long survival times that pull the mean to the right. These features make standard linear regression inappropriate for modelling survival outcomes.

Understanding Censoring

Censoring is a unique and defining feature of survival analysis. It occurs when we have incomplete information about a subject's survival time. A censored observation tells us that the true event time exceeds the observed follow-up time, but we do not know the exact event time. Standard regression methods cannot properly handle censored observations. Survival analysis methods are specifically designed to extract the maximum information from both censored and uncensored observations.

Types of Censoring

There are several types of censoring, each arising from different circumstances. Understanding the type of censoring present in your data is critical for choosing the appropriate analytical method.

Right CensoringClick to explore

Interval CensoringClick to explore

Left CensoringClick to explore

Truncation vs Censoring

Truncation differs from censoring in an important way. With censoring, we know the subject exists but have incomplete event time information. With truncation, the subject is entirely excluded from the study because their event time falls outside an observable window. Left truncation (delayed entry) occurs when subjects are only observed if they have survived long enough to enter the study. This distinction affects how the risk set is calculated.

Quantifying Survival Time

Several summary measures can describe survival data:

Mean survival time: often difficult to estimate accurately because the longest survival times may be censored
Median survival time: the time at which 50% of subjects have experienced the event; more robust to censoring than the mean
n-year survival: the proportion of subjects surviving beyond a specified time point (e.g., 5-year survival rate)
Incidence rate: the number of events divided by the total person-time at risk

Three Approaches to Survival Analysis

There are three broad approaches to analysing survival data, each with different assumptions and capabilities:

Non-parametric methods: make no assumptions about the shape of the survival or hazard function (e.g., Kaplan-Meier, actuarial life tables)
Semi-parametric methods: leave the baseline hazard unspecified while modelling the effect of predictors parametrically (e.g., Cox proportional hazards model)
Parametric methods: specify a particular distributional form for the baseline hazard (e.g., exponential, Weibull, Gompertz models)

Actuarial Life Tables

The actuarial (or life-table) method is one of the oldest approaches to estimating survival. Time is divided into pre-specified intervals, and survival is estimated within each interval. The key quantities in an actuarial life table are:

Symbol	Quantity	Description
l_j	Starting number at risk	Number of subjects alive at the start of interval j
w_j	Withdrawals	Number censored (withdrawn) during interval j
r_j	Adjusted risk set	r_j = l_j − w_j/2 (assumes withdrawals at midpoint)
d_j	Failures (events)	Number experiencing the event during interval j
q_j	Risk of failure	q_j = d_j / r_j
p_j	Survival probability	p_j = 1 − q_j
S_j	Cumulative survival	Product of all p_j values up to interval j

The Kaplan-Meier Estimator

The Kaplan-Meier (KM) estimator, also called the product-limit estimator, is the most widely used non-parametric method for estimating the survivor function (Kaplan & Meier, 1958). Unlike the actuarial method, the KM estimator recalculates the survival probability at each actual event time rather than at pre-specified intervals.

▸ INTERACTIVE STORY: THE KAPLAN-MEIER CLIFF
Open full screen ↗

A side-scrolling cohort: events drop the curve, censored ticks do not. Next ▶ advances scenes.

A 6-scene visualization of survival analysis: 10 patients lined up at time zero, walking forward through follow-up, events and censoring, the staircase Kaplan-Meier curve building, two-group comparison with the log-rank test, and the survival toolbox.

Kaplan–Meier estimator (Eq 19.1)

\[ \color{#0B7B6B}{\hat{S}(t)} = \prod_{j:\, t_j \le t} \frac{\color{#1D4ED8}{r_j} - \color{#C2410C}{d_j}}{\color{#1D4ED8}{r_j}} \]

The estimated survival at time t is the running product, over every earlier event time, of the number still at risk minus the number of events, divided by the number at risk.

Key properties of the Kaplan-Meier estimator:

It is a step function that only changes at observed failure times
It is piecewise constant between events
It is non-increasing: the estimated survival can only stay the same or decrease over time
It is right-continuous: at each event time, the function takes the value after the drop

Worked example: building the step by hand

Suppose 10 people are event-free at the start. At year 1, 2 of them have the event, so the chance of getting through year 1 is (10 − 2) / 10 = 0.80. Of the 8 still at risk, 1 has the event at year 2, so the chance of getting through year 2, given survival to its start, is (8 − 1) / 8 = 0.875. Kaplan–Meier multiplies these conditional pieces: S(1) = 0.80 and S(2) = 0.80 × 0.875 = 0.70. The estimate holds at 1.0 until year 1, steps down to 0.80, stays flat, then steps down to 0.70 at year 2. Each step uses only the people still at risk just before it, which is exactly how censored subjects keep contributing to the denominator up to the moment they leave.

Two Kaplan-Meier survival curves for two groups. Each is a downward step function that drops at observed event times, with short vertical tick marks indicating censored observations. The lower-hazard group stays higher for longer. — Two Kaplan–Meier curves. Survival is a step function that drops only at observed events and stays flat between them; tick marks indicate censored subjects, who contribute to the risk set up to their censoring time.

⌛ Interactive: Kaplan-Meier & Censoring

A study with n patients followed up to time T. Each patient has a true (latent) event time and a censoring time. Adjust event hazard, censoring rate, and follow-up; watch the K-M curve build step-by-step. Crank up informative censoring to see what happens when censoring is correlated with risk.

Patient timelines

Each row = a patient. ● = event; ┤ = censored.

Kaplan-Meier survivor function

Sample size n 30

Event hazard rate λ 0.10

Censoring rate 0.05

Maximum follow-up T 20

Informative censoring (sicker drop out) 0.00

Events observed

n/a

Censored

n/a

Median survival

n/a

S(T) at end

n/a

Try this: keep censoring ≈ 0.05 and crank "informative censoring" to 1.0. Sicker patients drop out, so the K-M curve increasingly overestimates survival, because censoring is no longer non-informative.

The Nelson-Aalen Estimator

While the Kaplan-Meier estimator directly estimates the survivor function S(t), the Nelson-Aalen estimator provides a non-parametric estimate of the cumulative hazard function H(t).

Nelson–Aalen cumulative hazard estimator (Eq 19.2)

\[ \color{#0B7B6B}{\hat{H}(t)} = \sum_{j:\, t_j \le t} \frac{\color{#C2410C}{d_j}}{\color{#1D4ED8}{r_j}} \]

The cumulative hazard up to time t sums, across earlier event times, the number of events divided by the number at risk.

The Nelson-Aalen estimator sums the ratio of events to the risk set at each failure time up to time t. It estimates the expected number of events that would have occurred up to time t if the process could be repeated. Note that the survivor function can be estimated from the cumulative hazard as S(t) = e^−H(t).

Reflection

Why is censoring a unique challenge in survival analysis compared to other regression approaches? How might ignoring censored observations bias your results?

Model answerCensoring is unique because the outcome is only partially observed: for a censored individual you know they survived at least through their censoring time but not their actual event time. Standard regression methods treat the outcome as fully observed and would (a) ignore censored observations (lose 30–50% of data in many cohorts), or (b) use the censoring time as if it were the event time (systematically biased toward shorter times). Ignoring censoring consequences: biased estimates of effects (typically toward null because censored times look like events that didn't happen), underestimated survival probabilities, and inflated apparent treatment effects in any analysis where censoring is differential across groups. Survival methods (Kaplan-Meier, log-rank, Cox) handle censoring correctly by using the partial information each observation provides until censoring.

Reflection saved!

* Complete the quiz and reflection to continue.

Section 2

Survivor, Failure & Hazard Functions

⏱ Estimated time: 20 minutes

Section 2 of 4

Survivor, Failure & Hazard Functions

The four interlocking views of a time-to-event distribution, and the hazard shapes they imply.

Two complementary views

Survivor and failure functions

Survivor function

\[ \color{#0B7B6B}{S(t)} = P(\color{#C2410C}{T} \ge \color{#6D28D9}{t}) \qquad S(0) = 1, \quad S(t) \downarrow \text{ as } t \uparrow \]

S(t) survival probability T event time t elapsed time

Failure function (cumulative distribution)

\[ \color{#0B7B6B}{F(t)} = 1 - \color{#C2410C}{S(t)} = P(T < t) \]

F(t) probability event has occurred S(t) survivor function

A steep decline in \(S(t)\) means events cluster early; a gradual decline means they spread across follow-up. The Kaplan–Meier step function is a direct non-parametric estimate of \(S(t)\).

The key quantity

The hazard function

Hazard function (instantaneous rate)

\[ \color{#0B7B6B}{h(t)} = \lim_{\Delta t \to 0} \frac{P(t \le T < t + \Delta t \mid T \ge t)}{\Delta t} = \frac{\color{#C2410C}{f(t)}}{\color{#6D28D9}{S(t)}} \]

h(t) instantaneous event rate f(t) event density S(t) survivor function

The hazard \(h(t)\) is a rate, not a probability. It can exceed 1. A high hazard at time \(t\) means events are occurring rapidly among those still at risk at that moment. The cumulative hazard integrates this over time:

Cumulative hazard

\[ \color{#6D28D9}{H(t)} = -\ln \color{#0B7B6B}{S(t)} \qquad \Leftrightarrow \qquad \color{#0B7B6B}{S(t)} = e^{-\color{#6D28D9}{H(t)}} \]

H(t) cumulative hazard S(t) survivor function

Hazard shapes

What the shape tells you

Constant \(h(t) = \lambda\)

Exponential distribution. Memoryless: risk same regardless of time survived.

Increasing \((p>1)\)

Weibull with shape above 1. Typical for chronic diseases and ageing processes.

Decreasing \((p<1)\)

Weibull with shape below 1. Post-surgical mortality, early infant mortality.

Comparing groups

Tests for survival curves

Log-rank

Equal weights across all times. Most powerful when hazard ratio is constant throughout follow-up.

Wilcoxon (Breslow)

Weights by risk-set size. Emphasises early differences when most participants are still at risk.

Peto–Peto–Prentice

Uses Kaplan–Meier estimate as weights. More robust when censoring differs between groups.

The Tarone–Ware test is a compromise between log-rank and Wilcoxon. If the two curves cross mid-follow-up, proportional hazards does not hold and a single log-rank statistic can mislead; time-stratified analyses are then appropriate.

Carry forward

What to take into the next section

The four functions \(S(t)\), \(F(t)\), \(h(t)\), \(H(t)\) are all equivalent views of the same distribution; knowing one determines all others.
The hazard is an instantaneous rate conditional on survival, not a probability, and it can take any positive value.
Hazard shapes carry substantive meaning: constant suggests a memoryless process; increasing suggests accumulating risk with time.

Introduction and Overview

From description to mathematics. An earlier section used Kaplan–Meier curves and life tables to describe survival in a sample. Those descriptions are useful, but to compare groups, build regression models, or simulate survival processes we need a more formal language. This section introduces the four interlocking functions that make up that language: the survivor S(t), failure F(t), hazard h(t), and cumulative hazard H(t) functions. Each describes the same underlying time-to-event distribution from a different angle, and each becomes the foundation for a different modelling approach in later sections.

Learning Objectives

Define the survivor, failure, hazard, and cumulative hazard functions and state their mathematical relationships.
Interpret the hazard function as an instantaneous rate conditional on survival to time t.
Recognise common hazard shapes (constant, increasing, decreasing, bathtub) and the substantive processes they imply.
Use log-rank, Wilcoxon, and stratified tests to compare survival curves between groups.

The Survivor Function

The survivor function (also called the survival function) is the central quantity in survival analysis, with the hazard and density functions both derivable from it. It gives the probability that an individual survives beyond a specified time t.

Survivor function (Eq 19.3)

\[ \color{#0B7B6B}{S(t)} = P(\color{#C2410C}{T} \ge \color{#6D28D9}{t}) \]

The survivor function is the probability that the event time is at least t: the chance of surviving past t.

At time t = 0, S(0) = 1 (everyone is alive). As time increases, S(t) decreases. If we follow subjects long enough, S(t) will approach zero. A steep decline in the survivor function indicates rapid event occurrence, while a gradual decline indicates slow event occurrence.

The Failure Function

The failure function (also called the cumulative distribution function, CDF) is simply the complement of the survivor function. It gives the probability that the event has already occurred by time t.

Failure function (Eq 19.4)

\[ \color{#0B7B6B}{F(t)} = 1 - \color{#C2410C}{S(t)} = P(T < t) \]

The failure function is one minus the survivor function: the probability the event has occurred by time t.

The Probability Density Function and Hazard Function

The probability density function f(t) = dF(t)/dt describes the instantaneous rate of failure events at time t. However, in survival analysis, the most important function is the hazard function, which conditions on survival up to time t.

Hazard function (Eq 19.5–19.7)

\[ \color{#0B7B6B}{h(t)} = \frac{\color{#C2410C}{f(t)}}{\color{#6D28D9}{S(t)}} = \lim_{\Delta t \to 0} \frac{P(t \le T < t + \Delta t \mid T \ge t)}{\Delta t} \]

The hazard is the density of events divided by the survivor function: the instantaneous event rate among those still at risk.

The hazard function gives the instantaneous rate of the event at time t, conditional on having survived to that point. It is a rate per unit time, rather than a probability, and can exceed 1. The hazard is the quantity most directly related to the biological or clinical mechanism driving events.

A concrete way to picture the hazard: stand at time t and look only at the people who are still event-free. The hazard measures how quickly the event is arriving in that group right at that instant, expressed per unit of time. Because it is a speed rather than a proportion, it has no ceiling of 1; a hazard of 2 per year simply means events are arriving quickly among those still at risk. Where the survivor function tracks how many remain event-free, the hazard tracks how fast the event is striking the ones who are left.

The Cumulative Hazard Function

The cumulative hazard integrates the hazard over time and has a direct relationship with the survivor function:

Cumulative hazard (Eq 19.8)

\[ \color{#0B7B6B}{H(t)} = -\ln \color{#C2410C}{S(t)} \]

The cumulative hazard is the negative natural log of the survivor function: total accumulated risk up to time t.

Key Relationships Among Functions

A useful feature of these functions is that knowing any one determines all the others (Eq 19.9–19.10). For example, from the survivor function alone we can derive: F(t) = 1 − S(t), H(t) = −ln S(t), f(t) = −dS(t)/dt, and h(t) = f(t)/S(t). This means that if we can estimate any one of these functions, we automatically have estimates of all the others.

Hazard Function Shapes

The shape of the hazard function tells us about the underlying process generating the events. Different parametric distributions correspond to different hazard shapes.

Constant Hazard (Exponential Distribution)

When the hazard is constant over time, with h(t) = λ, the underlying survival times follow an exponential distribution (Eq 19.11). The survivor function is S(t) = e^−λt. A constant hazard means the risk of the event is the same regardless of how long the subject has already survived. This implies a “memoryless” property: the risk at time t = 100 is the same as at time t = 1.

Example: Certain infectious diseases where the risk of contracting the infection is roughly constant over time, regardless of how long the individual has been at risk.

Increasing Hazard (Weibull with p > 1)

When the Weibull shape parameter p > 1, the hazard h(t) = λpt^p−1 is monotonically increasing over time (Eq 19.12). The survivor function is S(t) = exp(−λt^p). This means the risk of the event grows as time passes, a phenomenon often seen in aging-related diseases or mechanical wear-out.

Example: Age-related diseases such as cancer or cardiovascular disease, where the risk increases progressively with duration of exposure or age.

Decreasing Hazard (Weibull with p < 1)

When p < 1, the Weibull hazard is monotonically decreasing over time. The risk is highest at the beginning of the observation period and diminishes as time progresses. This pattern is common in situations where the most vulnerable individuals fail early, leaving a “healthier” surviving cohort.

Example: Post-surgical mortality, where the risk is highest immediately after surgery and decreases as patients recover; or infant mortality, where risk is highest in the neonatal period.

Comparing Survival Curves

When comparing the survival experience of two or more groups, several statistical tests are available. Each test weights the contributions of different time points differently.

Log-Rank Test

The log-rank test (Peto & Peto, 1972) assigns equal weight to all time points. It is the most commonly used test for comparing survival curves and is optimal when the hazards are truly proportional (constant hazard ratio over time). It is equivalent to a stratified Mantel-Haenszel test.

Wilcoxon (Breslow) Test

The Wilcoxon test (also called the Breslow or generalised Wilcoxon test) weights each time point by the number of subjects still at risk. This gives more weight to early differences in survival (when the risk set is large) and is more sensitive to detecting differences that occur in the early part of follow-up.

Tarone-Ware Test

The Tarone-Ware test uses the square root of the number at risk as the weight. It represents a compromise between the log-rank and Wilcoxon tests, giving moderate emphasis to early time points.

Peto-Peto-Prentice Test

The Peto-Peto-Prentice test (Peto & Peto, 1972) uses the Kaplan-Meier estimate of the survival function as weights. Like the Wilcoxon test, it is more sensitive to early differences but is less influenced by outlying late events. It is also more robust when censoring patterns differ between groups.

Scenario: Comparing Post-Heart Attack Survival

Imagine you are comparing two-year survival after a heart attack between patients who received a new drug versus standard care. You plot Kaplan-Meier curves for both groups. The curves separate early (within the first 3 months) and then run roughly parallel. In this case, the log-rank test (which assumes proportional hazards) would be appropriate since the hazard ratio appears constant after the initial separation. However, if the curves crossed midway through follow-up, you would need to consider that the proportional hazards assumption is violated and might report results from different tests or time-specific analyses.

Reflection

Consider how different hazard function shapes (constant, increasing, decreasing) might apply to different diseases or health conditions. What does the shape of the hazard tell us about the underlying biological process?

Model answerConstant hazard (exponential distribution) is rare in real diseases: it implies the instantaneous risk of the event is the same throughout follow-up. Sometimes appropriate for engineering failure or drug-side-effect studies where the underlying mechanism is memoryless. Increasing hazard (Weibull with shape > 1) is typical for chronic diseases like cancer or cardiovascular events: risk rises with age and exposure duration; this is the most common shape in epidemiology. Decreasing hazard is rare but occurs in some acute conditions where the riskiest period is early (e.g., post-surgical complications, transplant rejection in the first month). Bathtub-shape (high early, low middle, high late) is seen in mortality across the human lifespan: high infant mortality, low adult, rising late-life. The hazard shape tells you the timing of biological risk and informs intervention timing, since high-hazard periods are where surveillance and prevention should concentrate.

Reflection saved!

* Complete the quiz and reflection to continue.

Section 3

Cox Proportional Hazards Model

⏱ Estimated time: 25 minutes

Section 3 of 4

Cox Proportional Hazards Model

The workhorse semi-parametric regression for time-to-event data: hazard ratios, partial likelihood, and model diagnostics.

The model

Cox proportional hazards regression

Cox model

\[ \color{#0B7B6B}{h(t \mid \mathbf{x})} = \color{#C2410C}{h_0(t)} \cdot \color{#6D28D9}{e^{\boldsymbol{\beta}^\top \mathbf{x}}} \]

h(t∣x) subject hazard h₀(t) baseline hazard e^β′x predictor effect

\(h_0(t)\) is the baseline hazard: the hazard when all predictors equal zero. Its shape is left entirely unspecified. \(e^{\boldsymbol{\beta}^\top \mathbf{x}}\) is the multiplicative effect of the covariate pattern.

Hazard ratio for a one-unit increase in predictor \(X\)

\[ \color{#0B7B6B}{\text{HR}} = \frac{\color{#C2410C}{h(t \mid X+1)}}{\color{#6D28D9}{h(t \mid X)}} = \color{#1D4ED8}{e^{\beta}} \]

HR hazard ratio e^β effect per unit of X

Interpreting the output

Reading a Cox model table

Predictor	HR \(= e^{\hat{\beta}}\)	95% CI	P-value
Treatment (new drug)	0.63	0.48–0.82	0.001
Age (per 10 years)	1.42	1.21–1.66	<0.001
Stage III vs Stage I	3.00	2.04–4.44	<0.001

HR \(<1\) means lower hazard (protective); HR \(>1\) means higher hazard. The confidence interval must exclude 1 for a statistically significant result.

Estimation

Partial likelihood and ties

Partial likelihood contribution at event time \(t_j\)

\[ \color{#0B7B6B}{L_j(\boldsymbol{\beta})} = \frac{\color{#C2410C}{e^{\boldsymbol{\beta}^\top \mathbf{x}_j}}}{\color{#1D4ED8}{\sum_{i \in \mathcal{R}(t_j)} e^{\boldsymbol{\beta}^\top \mathbf{x}_i}}} \]

L_j(β) likelihood at event time j numerator risk score of the subject who failed denominator summed risk scores of all still at risk

\(\mathcal{R}(t_j)\) is the risk set at time \(t_j\): all subjects still under observation and event-free just before \(t_j\). The baseline hazard cancels in this ratio, so only covariate values matter.

Breslow method

Simplest tie approximation; adequate when ties are rare.

Efron method

Better approximation; default in most software. Preferred for moderate ties.

Diagnostics

Testing proportional hazards

Schoenfeld residuals

One residual per predictor per event time. A non-zero slope when plotted against time flags a time-varying effect.

Log-cumulative hazard plot

Plot \(\ln H(t)\) against \(\ln t\) for each group. Parallel curves support proportional hazards; converging or crossing curves do not.

When proportional hazards fails: stratify on the offending variable (each stratum gets its own \(h_0(t)\) but shares \(\boldsymbol{\beta}\)), or add a predictor-by-time interaction term to model the changing effect explicitly.

Carry forward

What to take into the next section

The Cox model is semi-parametric: no baseline hazard assumption required; \(e^{\hat{\beta}}\) is the adjusted hazard ratio.
The proportional hazards assumption must be tested with Schoenfeld residuals; stratification or time interactions fix violations.
Cox models give relative hazards efficiently but cannot extrapolate absolute survival without additional estimation of \(h_0(t)\).

Introduction and Overview

From description to regression. Earlier sections gave us the language and the descriptive tools for time-to-event data. We can now describe a survival distribution, but the central epidemiologic question is comparative: does the hazard differ between exposure groups, and by how much, after adjusting for covariates? The Cox proportional hazards model is the regression workhorse for that question. It plugs predictors into the hazard function we just defined while sidestepping the need to specify the baseline hazard’s shape, a feature that makes it the default first model in most applied survival analyses.

Learning Objectives

Specify the Cox proportional hazards model and explain why it is called semi-parametric.
Interpret an exponentiated coefficient as an adjusted hazard ratio.
Estimate Cox model parameters using partial likelihood, and read standard software output for them.
Test the proportional hazards assumption with Schoenfeld residuals and respond to violations using stratification or time-varying coefficients.
Extend the basic Cox model to handle stratified baseline hazards and time-varying predictors.

The Cox Model

The Cox proportional hazards model is the most widely used regression model for survival data. Introduced by Sir David Cox (1972), it is a semi-parametric model: it models the effect of predictors on the hazard without making any assumptions about the shape of the baseline hazard function.

Cox proportional hazards model (Eq 19.13)

\[ \color{#0B7B6B}{h(t)} = \color{#C2410C}{h_0(t)} \cdot \color{#6D28D9}{e^{\beta X}} \]

The hazard for a subject equals the baseline hazard times a multiplicative effect of the predictors. The baseline shape is left unspecified.

In this model, h₀(t) is the baseline hazard (the hazard when all predictors are zero), and e^βX is the multiplicative effect of the predictor(s). The baseline hazard h₀(t) is left completely unspecified, so it can take any shape. This is the key advantage of the Cox model: no distributional assumption about the underlying survival times is needed.

The Hazard Ratio

Hazard ratio (Eq 19.14)

\[ \color{#0B7B6B}{\text{HR}} = \frac{\color{#C2410C}{h(t)}}{\color{#6D28D9}{h_0(t)}} = \color{#1D4ED8}{e^{\beta X}} \]

The hazard ratio is the subject's hazard relative to the baseline hazard, equal to the exponentiated linear predictor.

The hazard ratio (HR) is the primary measure of effect in the Cox model. For a one-unit increase in a predictor X, the hazard is multiplied by e^β. For example, if β = 0.693 for a treatment variable, then HR = e^0.693 = 2.0, meaning the treated group has twice the hazard (rate of events) compared to the reference group. An HR > 1 indicates increased hazard (shorter survival), while HR < 1 indicates decreased hazard (longer survival). Caution: under non-proportional hazards or selection effects induced by conditioning on survival, the HR can be a misleading effect measure (Hernán, 2010).

Log-hazard form (Eq 19.15)

\[ \color{#0B7B6B}{\ln h(t)} = \color{#C2410C}{\ln h_0(t)} + \color{#6D28D9}{\beta X} \]

On the log scale, the log hazard is the log baseline hazard plus a linear predictor, so predictor effects are additive in log-hazard.

The Proportional Hazards Assumption

The Cox model assumes that the hazard ratio between any two individuals is constant over time (Cox, 1972). This means the effect of a predictor does not change as time passes. For example, if treatment halves the hazard at 1 month, it must also halve the hazard at 12 months. When this assumption is violated, the estimated hazard ratio represents a weighted average of the time-varying effects, and its interpretation becomes ambiguous (Hernán, 2010). Testing this assumption (Schoenfeld, 1982) is a critical step in Cox model validation.

Estimation: Partial Likelihood

Because the baseline hazard is left unspecified, the Cox model cannot use standard maximum likelihood estimation. Instead, it uses partial likelihood (also called conditional likelihood), which estimates the β coefficients without needing to estimate h₀(t). The partial likelihood considers only the ordering of events. At each failure time, it asks: given the current risk set, what is the probability that this particular individual was the one to fail?

Handling Ties

When two or more events occur at exactly the same time (ties), the exact partial likelihood becomes computationally expensive. Several approximations are available:

Breslow method: the simplest approximation, adequate when ties are few
Efron method: a better approximation that is the default in many software packages
Exact methods: computationally intensive but most accurate when ties are common

Example: Cox Model Results

Predictor	β	SE(β)	HR (e^β)	95% CI	P-value
Treatment (1 = new drug)	−0.47	0.14	0.63	0.48–0.82	0.001
Age (per 10 years)	0.35	0.08	1.42	1.21–1.66	<0.001
Stage III vs I	1.10	0.20	3.00	2.04–4.44	<0.001
Stage II vs I	0.52	0.18	1.68	1.18–2.40	0.004

In this example, patients receiving the new drug have 37% lower hazard of death (HR = 0.63) compared to the control group, after adjusting for age and disease stage. Each 10-year increase in age is associated with a 42% increase in the hazard. Stage III patients have 3 times the hazard compared to Stage I patients.

Read a hazard ratio as a statement about the event rate at any given instant, not about the eventual chance of the event. HR = 0.63 does not mean that 37% of the drug group are spared overall; it means that at any moment during follow-up, a still-event-free patient on the drug faces about 63% of the event rate of a comparable control. This is the same rate-scale reading you used earlier for the incidence rate ratio, now applied to the instantaneous hazard.

Stratified Cox Models

Stratified Cox model (Eq 19.16)

\[ \color{#0B7B6B}{h_j(t)} = \color{#C2410C}{h_{0j}(t)} \cdot \color{#6D28D9}{e^{\beta X}} \]

Each stratum j gets its own baseline hazard but shares the same predictor effects, which relaxes proportional hazards for the stratifying variable.

When the proportional hazards assumption is violated for a particular variable, one solution is to stratify on that variable. In the stratified Cox model, each stratum j has its own baseline hazard h_0j(t), but the regression coefficients (β) are assumed to be the same across all strata. This allows the stratifying variable to have a completely flexible effect on survival without needing to estimate or specify its functional form.

Time-Varying Predictors and Effects

The standard Cox model assumes that predictor values are measured at baseline and remain constant. However, some predictors change over the course of follow-up:

Time-varying predictors: The actual value of the predictor changes during follow-up (e.g., a patient is discharged from hospital and their treatment status changes from “inpatient” to “outpatient”). This requires splitting the follow-up time into intervals and updating predictor values.
Time-varying effects: The predictor itself may be fixed at baseline, but its effect on the hazard changes over time. This is modelled by including a predictor × time interaction term and indicates a violation of the proportional hazards assumption.

R Activity: Kaplan-Meier curves and a Cox model

The companion dataset phaa_followup.csv extends the survey cohort with cardiovascular event follow-up: fu_years (time observed) and cv_event (1 = event, 0 = censored). The full annotated script is in r-activities/HSCI_410_Lesson_8_Survival_Data.R.

library(survival);  library(survminer)
phaa <- read.csv("phaa_followup.csv", stringsAsFactors = FALSE)
phaa$smoker <- factor(phaa$smoker, levels = c("No","Yes"))

# 1. Build the Surv object: time + event indicator
y <- Surv(time = phaa$fu_years, event = phaa$cv_event)

# 2. Kaplan-Meier: event-free probability over time, by smoking status
km <- survfit(y ~ smoker, data = phaa)
km_all <- survfit(y ~ 1, data = phaa)   # overall event-free curve (ignores smoking)
ggsurvplot(km, data = phaa, conf.int = TRUE,
           pval = TRUE, risk.table = TRUE,
           xlab = "Years of follow-up")

# 3. Log-rank test: do the curves differ overall?
survdiff(y ~ smoker, data = phaa)

# 4. Cox model with multiple covariates -> hazard ratios
cox <- coxph(y ~ smoker + age + gender + bmi + hypertension,
             data = phaa)
summary(cox)

# 5. Test the proportional-hazards assumption
cox.zph(cox)

Reading the Cox output. exp(coef) is the hazard ratio (HR). An HR of 1.7 for smokerYes means smokers have 70% higher instantaneous risk at any time, holding age, gender, BMI, and hypertension constant. cox.zph() tests proportional hazards globally and per predictor; a small p-value flags a predictor whose effect changes over time, which is your cue to add a time-varying or stratified term.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console / plot before answering.

1. From your ggsurvplot() output and summary(km_all, times = c(1,3,5,7,10)), what is the event-free probability at 5 years overall? Compare smokers vs non-smokers at the same time point. Which group's curve drops faster?

Model answerFrom summary(km_all, times = c(1,3,5,7,10)), overall 5-year event-free probability is typically around 0.72–0.78. Stratifying by smoking: non-smokers at 5 years around 0.82–0.86 and smokers around 0.62–0.68. The smokers' curve drops faster, with consistently lower survival across all time points and the gap widening over follow-up. Visually on the KM plot, the smoker curve sits below the non-smoker curve throughout the follow-up period.

2. From survdiff(y ~ smoker) and summary(cox), report the log-rank chi-square (with p-value) and the adjusted hazard ratio for smokerYes with its 95% CI. Translate the HR into one sentence about the instantaneous hazard.

Model answersurvdiff(y ~ smoker) typically returns a log-rank χ² around 25–40 with p < 0.001, strong evidence against the null hypothesis of equal survival. The adjusted hazard ratio from summary(cox) is roughly HR = 1.85 (95% CI 1.45, 2.35). Interpretation: at any given time during follow-up, smokers face an instantaneous hazard of the event that is about 85% higher than non-smokers' hazard, holding other covariates constant. The CI excludes 1, confirming statistical significance.

3. From cox.zph(cox), what is the global p-value and which predictors (if any) flag a violation of proportional hazards? What does a small p-value tell you about how that predictor's effect behaves over time?

Model answercox.zph(cox) typically returns a global p-value around 0.10–0.30 (no overall violation) but may flag age (p ≈ 0.04) or smoker (p ≈ 0.05) for borderline violations. A small p-value for a predictor means its hazard ratio varies over time; for example, a smoker's effect on hazard might be strongest in early follow-up and diminish later (or vice versa). The practical implication: a single summary HR averages over the time-varying effect, hiding meaningful patterns. Remedies: (a) include a time interaction (HR × t); (b) stratify on the offending variable; (c) use an Aalen additive hazards or time-varying coefficient model.

Saved.

Validating the Cox Model

Thorough model validation involves checking several aspects of model performance. Different types of residuals serve different diagnostic purposes.

Cox-Snell ResidualsClick to explore

Schoenfeld ResidualsClick to explore

Martingale ResidualsClick to explore

Deviance ResidualsClick to explore

Score ResidualsClick to explore

Graphical assessment of proportional hazards

Two graphical methods are commonly used. First, log-cumulative hazard plots: plot ln H(t) (or equivalently, ln(−ln S(t))) against ln(t) for each group. If the curves are roughly parallel, the PH assumption is reasonable. Second, observed vs predicted plots: compare the Kaplan-Meier survival curves to the Cox model-predicted curves for each group. Good agreement supports the model.

Statistical tests for proportional hazards

The most common statistical test uses Schoenfeld residuals (Schoenfeld, 1982). The test regresses the scaled Schoenfeld residuals against time (or a function of time). A significant P-value indicates that the effect of the predictor changes with time, violating the PH assumption. A global test that combines results across all predictors is also available.

Overall model fit and discrimination

Several measures assess how well the model fits and discriminates. Cox-Snell residuals assess overall goodness-of-fit. Harrell’s C concordance statistic measures discriminative ability: the proportion of all pairs of subjects that the model correctly orders by predicted risk. Values of C range from 0.5 (chance) to 1.0 (perfect discrimination). An R² analogue has also been proposed for survival models.

Independent censoring assumption

All standard survival analysis methods assume that censoring is independent (non-informative), meaning that censored subjects have the same future risk of the event as uncensored subjects who are still being followed. If sicker patients are more likely to drop out (informative censoring), the survival estimates will be biased. This assumption cannot be fully tested from the data, but sensitivity analyses can explore the impact of different censoring mechanisms.

Reflection

When the proportional hazards assumption is violated for a predictor, what are the practical implications for interpreting the hazard ratio? How would you communicate this to a clinical audience?

Model answerPractical implications when proportional hazards is violated: the HR is no longer a single constant; the effect varies over time. Reporting a single HR averages over time and can be misleading, since the average may not represent the effect at any specific time point. Communication: instead of "smoking doubles the hazard," say "the hazard ratio for smoking is 2.3 in the first 5 years and 1.4 thereafter" (or whatever the data show). Visualise with separate Kaplan-Meier curves or time-stratified HRs. For clinical decision-making this matters: a treatment that has strong early benefit but waning effect needs different communication than one with constant benefit. Statistical approaches: (a) fit a model with HR×t interaction; (b) stratify on the violating variable; (c) use a parametric model that explicitly handles non-proportional hazards; (d) report restricted mean survival time (RMST) which is interpretable even under non-proportional hazards.

Reflection saved!

* Complete the quiz and reflection to continue.

Section 4

Parametric Models, AFT & Frailty

⏱ Estimated time: 20 minutes

Section 4 of 4

Parametric Models, AFT & Frailty

Specifying the hazard shape for efficiency; accelerated failure time models; frailty for clustering.

Three parametric families

Exponential, Weibull, Gompertz

Exponential: constant hazard

\[ \color{#0B7B6B}{h(t)} = \color{#C2410C}{\lambda} = e^{\color{#6D28D9}{\beta_0 + \boldsymbol{\beta}^\top \mathbf{x}}} \]

h(t) hazard λ constant rate β₀+β′x intercept and predictors

Weibull: monotone hazard (shape \(p\))

\[ \color{#0B7B6B}{h(t)} = \color{#C2410C}{\lambda} \color{#6D28D9}{p} \, \color{#1D4ED8}{t^{p-1}} \cdot e^{\boldsymbol{\beta}^\top \mathbf{x}} \qquad p=1 \Rightarrow \text{exponential} \]

h(t) hazard λ scale p shape t^p−1 power of time

Gompertz: exponentially changing hazard

\[ \color{#0B7B6B}{h_0(t)} = \color{#C2410C}{\lambda} \color{#6D28D9}{e^{pt}} \qquad \ln h_0(t) = \ln\lambda + pt \]

h₀(t) baseline hazard λ scale e^pt exponential in time

A different framing

Accelerated failure time models

AFT model (log-linear in time)

\[ \color{#0B7B6B}{\ln t} = \color{#C2410C}{\boldsymbol{\beta}^\top \mathbf{x}} + \color{#6D28D9}{\ln \tau} \]

ln t log survival time β′x linear predictor ln τ error term

The measure of effect is the time ratio \(\text{TR} = e^{\hat{\beta}}\):

TR \(>1\)

Expected survival time is longer. A TR of 2 means the exposure doubles expected time to the event.

TR \(<1\)

Expected survival time is shorter. A TR of 0.5 means the exposure halves expected time.

The Weibull is unique: it can be written as either a PH or an AFT model. Log-logistic and log-normal are AFT only.

Model selection

Choosing the right parametric form

Plot the hazard

Does the empirical hazard look flat, monotone, or peaked? This guides the candidate set.

Generalised gamma

Nests exponential, Weibull, and log-normal. Test simpler forms with likelihood ratio tests.

AIC comparison

Lower AIC after penalising for parameters signals better-fitting model across the candidate set.

Always check biological plausibility. A Weibull that fits numerically but implies a decreasing hazard for a cancer with known rising risk should prompt caution.

Unmeasured heterogeneity

Frailty models

Individual frailty model

\[ \color{#0B7B6B}{h(t \mid \alpha)} = \color{#C2410C}{\alpha} \cdot \color{#6D28D9}{h(t)} \quad \alpha \sim \text{Gamma}(1, \theta) \]

h(t∣α) conditional hazard α frailty h(t) population hazard

Shared frailty for cluster \(k\)

\[ \color{#0B7B6B}{h_{ik}(t)} = \color{#C2410C}{\alpha_k} \cdot \color{#6D28D9}{h_0(t)} \cdot \color{#1D4ED8}{e^{\boldsymbol{\beta}^\top \mathbf{x}_{ik}}} \]

h_ik(t) hazard for subject i in cluster k α_k shared cluster frailty h₀(t) baseline e^β′x predictors

Individuals in cluster \(k\) share frailty \(\alpha_k\), inducing within-cluster correlation in survival times. This is the survival-data analogue of a random intercept in mixed-effects regression.

Lesson recap

Three branches, one toolkit

Non-parametric

Kaplan–Meier and Nelson–Aalen describe survival without distributional assumptions.

Semi-parametric

Cox model adjusts for covariates, leaves hazard shape free, produces hazard ratios.

Parametric & frailty

More efficient when the hazard shape is known; frailty handles clustering and unmeasured variation.

Next: a short reflection matching hazard shapes to diseases, then the knowledge check.

Introduction and Overview

From semi-parametric to fully parametric, and beyond. The Cox model bought flexibility by leaving the baseline hazard unspecified, but that flexibility comes at a cost: less efficiency, no direct prediction of absolute survival times, and limited ability to extrapolate. When we have a defensible reason to specify the shape of the hazard, or when our research question requires absolute predictions, we shift to fully parametric models (exponential, Weibull, Gompertz). This section also introduces accelerated failure time (AFT) models, which reframe covariate effects as stretching or shrinking survival time rather than scaling the hazard, and frailty models, which extend survival regression to handle unmeasured heterogeneity, a natural bridge to the clustered-data lessons that follow.

Learning Objectives

Compare the exponential, Weibull, and Gompertz hazards and recognise the substantive shapes they imply.
Distinguish proportional-hazards from accelerated failure time (AFT) parameterisations of the same data.
Choose between Cox and parametric models based on the question (relative effect vs absolute prediction) and the available data.
Use frailty terms to absorb unmeasured heterogeneity within clusters of subjects.

Why Parametric Models?

While the Cox model’s flexibility is a strength, parametric models offer important advantages when the distributional assumption is correct. By specifying the form of the baseline hazard h₀(t), parametric models are more statistically efficient, producing narrower confidence intervals and more powerful tests. They also allow direct estimation of the baseline hazard, prediction of survival times, and extrapolation beyond the observed data.

Semi-Parametric vs Parametric: A Comparison

The Cox model (semi-parametric) leaves the baseline hazard unspecified, giving maximum flexibility but less efficiency. Parametric models specify the baseline hazard, which is more efficient if the assumption is correct but biased if it is wrong. In practice, the Cox model is preferred when the shape of the baseline hazard is unknown or when the primary interest is in hazard ratios rather than absolute survival predictions. Parametric models are preferred when the distributional form is well justified or when extrapolation is needed.

Common Parametric Models

Exponential Model

The simplest parametric model assumes a constant hazard over time (Eq 19.17):

Exponential hazard

\[ \color{#0B7B6B}{h(t)} = \color{#C2410C}{\lambda} = \exp(\color{#6D28D9}{\beta_0 + \beta_1 X_1 + \cdots}) \]

The hazard is a constant rate set by an intercept and predictors: it does not change with time.

The exponential model includes an intercept β₀ (unlike the Cox model). The constant hazard assumption is very restrictive and is appropriate only when the risk truly does not change over time. This model has the “memoryless” property: the predicted future survival does not depend on how long the subject has already survived.

Weibull Model

The Weibull model extends the exponential by adding a shape parameter p (Eq 19.18):

Weibull hazard

\[ \color{#0B7B6B}{h(t)} = \color{#C2410C}{\lambda}\,\color{#6D28D9}{p}\,\color{#1D4ED8}{t^{\,p-1}} \cdot \color{#BE185D}{e^{\beta X}} \]

The hazard combines a scale, a shape p, and a power of time times the predictor effect. When p = 1 it reduces to the exponential.

When p = 1, the Weibull reduces to the exponential. When p < 1, the hazard decreases over time; when p > 1, the hazard increases over time. The Weibull model can be assessed by plotting ln H(t) vs ln(t): if the data follow a Weibull distribution, this plot should be approximately linear with slope p and intercept ln λ. The Weibull is the most commonly used parametric survival model because of its flexibility.

Gompertz Model

The Gompertz model has a baseline hazard that changes exponentially with time (Eq 19.19):

Gompertz baseline hazard

\[ \color{#0B7B6B}{h_0(t)} = \color{#C2410C}{\lambda}\,\color{#6D28D9}{e^{pt}} \]

The baseline hazard is a scale times an exponential in time, so the log hazard rises (or falls) linearly with time.

The log of the baseline hazard is linear in time: ln h₀(t) = ln λ + pt. When p > 0, the hazard increases exponentially; when p < 0, it decreases. The Gompertz model is widely used in demography and actuarial science because human mortality rates often increase approximately exponentially with age over much of the adult lifespan.

Accelerated Failure Time (AFT) Models

Parametric survival models can also be formulated as accelerated failure time models, which model the effect of predictors on the log of survival time directly, rather than on the hazard.

Accelerated failure time model (Eq 19.20)

\[ \color{#0B7B6B}{\ln t} = \color{#C2410C}{\beta X} + \color{#6D28D9}{\ln \tau} \]

The log survival time is modelled directly as a linear predictor plus an error term. Effects act on the time scale rather than the hazard.

In the AFT framework, the measure of effect is the time ratio (TR = e^β) rather than the hazard ratio. A time ratio of 2 means the expected survival time is doubled; a TR of 0.5 means it is halved. The AFT model essentially “accelerates” or “decelerates” time for different covariate patterns.

Log-LogisticClick to explore

Log-NormalClick to explore

Generalised GammaClick to explore

Choosing a Parametric Model

Several strategies help guide the choice of parametric distribution:

Fit the generalised gamma model and test whether simpler models are adequate (κ = 1 for Weibull, κ = 0 for log-normal, etc.)
Compare models using AIC (Akaike Information Criterion); lower values indicate better fit after penalising for complexity
Examine diagnostic plots (e.g., ln H(t) vs ln(t) for Weibull, ln H(t) vs t for Gompertz)
Consider the biological plausibility of the hazard shape implied by each distribution

Scenario: Choosing Between Models

You are analysing time to recurrence of a tumour after surgical removal. You suspect the hazard may not be constant: risk is likely highest in the first year post-surgery and then declines. You fit several models: the exponential model gives AIC = 842; the Weibull gives AIC = 810 with p = 0.72 (decreasing hazard); the log-logistic gives AIC = 806 with a hazard that rises briefly and then falls. The generalised gamma model confirms that the Weibull is significantly better than the exponential (κ significantly different from 1 is rejected, but the Weibull shape is significantly different from 1). Based on AIC and biological plausibility, you select the log-logistic AFT model, which captures the initial rise in hazard risk followed by a decline as patients who survive the early period enter a lower-risk phase.

Frailty Models

Frailty models address the problem of unmeasured heterogeneity in survival data. Even after including all known predictors, individuals may differ in their underlying susceptibility to the event due to unmeasured factors. Frailty models introduce a random effect α that represents this unobserved heterogeneity.

Individual frailty model (Eq 19.27)

\[ \color{#0B7B6B}{h(t \mid \alpha)} = \color{#C2410C}{\alpha} \cdot \color{#6D28D9}{h(t)} \]

A subject's conditional hazard is its individual frailty multiplied by the population hazard: frailty above one means higher-than-average risk.

Individual Frailty

Individual frailty accounts for unmeasured individual-level covariates that create extra variation in survival times beyond what measured predictors explain. It is analogous to overdispersion in Poisson models: just as the negative binomial adds extra-Poisson variation, the frailty model adds extra variation to the survival model. Subjects with α > 1 are more “frail” (higher hazard), while those with α < 1 are more robust. The frailty is typically assumed to follow a gamma or inverse Gaussian distribution with mean 1.

Shared Frailty (Eq 19.28–19.29)

Shared frailty models account for clustering of observations within groups. All members of a group (e.g., patients treated at the same hospital, animals in the same herd) share the same frailty value α_k. This is analogous to random effects in mixed models. The shared frailty induces positive within-group correlation in survival times, since subjects in the same cluster tend to have more similar survival experiences than subjects in different clusters.

Multiple Outcome Events

Several extensions handle multiple or recurring events. Competing risks models handle situations where different types of events can occur (e.g., death from cancer vs death from heart disease); the Fine & Gray (1999) subdistribution hazards approach directly models the cumulative incidence function in this setting. Recurrence data models handle repeated events (e.g., hospital readmissions). The Anderson-Gill model treats each event independently (assuming events are independent), while the Prentice-Williams-Peterson models account for event ordering. Discrete-time survival analysis is used when event times are recorded in intervals rather than exactly.

Reflection

When would you prefer a parametric survival model over a Cox model? What are the trade-offs between flexibility and efficiency?

Model answerParametric survival models (exponential, Weibull, log-normal, generalised gamma) are preferable when (a) the hazard shape is known or biologically motivated, for example Weibull for chronic-disease incidence; (b) you need to extrapolate beyond observed follow-up (Cox cannot, parametric models can); (c) the sample is small and you need efficiency gains from parametric assumptions; (d) the question requires absolute survival predictions (Cox gives relative hazards, not absolute survival without baseline-hazard estimation). Trade-offs: parametric models are more efficient (smaller SEs) but at the cost of distributional assumptions that, if wrong, produce biased estimates. Cox makes no assumption about the baseline hazard but requires proportional hazards. Always plot the empirical hazard and assess fit before committing to a parametric form; consider semi-parametric flexible models (splines on the log-hazard) as a compromise.

Reflection saved!

* Complete the quiz and reflection to continue.

Final Assessment

Lesson 8: Comprehensive Assessment

⏱ Estimated time: 25 minutes

Bringing It All Together

This lesson built up the survival-analysis toolkit from the ground up. We started with the defining feature of time-to-event data, censoring, and the non-parametric estimators (life tables, Kaplan–Meier, Nelson–Aalen) that describe survival without making distributional assumptions. An earlier section turned those descriptions into the formal language of survivor, failure, hazard, and cumulative hazard functions, the four interlocking views that every later model rests on.

An earlier section introduced the Cox proportional hazards model, the workhorse semi-parametric regression that scales the baseline hazard by exp(βX) and produces hazard ratios as its primary effect measure. An earlier section added fully parametric and accelerated failure time alternatives, which trade flexibility for efficiency and the ability to predict absolute survival times, plus frailty extensions that bridge into the clustered-data lessons coming next. Together these methods cover descriptive, comparative, and predictive questions about time-to-event outcomes.

The final assessment asks you to recognise censoring patterns on sight, choose between non-parametric, semi-parametric, and parametric tools, and interpret hazard ratios on the rate scale rather than the probability scale.

Key Takeaways from this lesson

Censoring is the defining feature of survival data; methods must use partial information from subjects whose event time is unknown.
The Kaplan–Meier estimator is the standard non-parametric description of S(t); the log-rank test compares curves between groups.
The hazard function gives an instantaneous rate conditional on survival, the quantity most directly tied to underlying biological mechanisms.
The Cox proportional hazards model regresses on the hazard without specifying its baseline shape; exp(β) is an adjusted hazard ratio.
The proportional hazards assumption must be tested (Schoenfeld residuals); when it fails, use stratification or time-varying coefficients.
Parametric and AFT models trade flexibility for efficiency and direct prediction; frailty models extend survival regression to clustered or unobserved heterogeneity.

Reflection

This chapter covered a wide range of survival analysis methods. If you were planning a study with time-to-event outcomes, what factors would guide your choice between non-parametric, semi-parametric (Cox), and parametric approaches?

Model answerDecision factors for survival analysis approach: (1) Question: relative effect (Cox), absolute survival probabilities (parametric or NPMLE), differences between groups (log-rank, restricted mean survival time). (2) Sample size and event count: with < 10 events per predictor, Cox is unstable; consider penalised regression or simpler models. (3) Proportional hazards: if violated, Cox without modification is misleading. (4) Hazard shape: if known and biologically motivated, parametric is more efficient; if unknown, Cox or flexible parametric (splines) is safer. (5) Censoring pattern: heavy informative censoring requires inverse-probability-of-censoring weights. (6) Competing risks: if more than one event type and they exclude each other, use Fine-Gray subdistribution hazards or cause-specific Cox. Standard default for most epidemiological time-to-event questions: Cox model with PH diagnostics; report restricted mean survival time as a more robust summary; visualise with Kaplan-Meier curves stratified by exposure.

Reflection saved!

HSCI 410, Lesson 8

Exploratory Data Analysis For Epidemiology

Modelling Survival Data

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction & Non-Parametric Analyses

Modelling Survival Data

A new outcome type

Introduction & Non-Parametric Analyses

Why censoring breaks ordinary regression

Dropping censored rows

Using censoring time as event time

Right, left, and interval censoring

Right censoring

Left censoring

Interval censoring

Actuarial tables and the Kaplan–Meier curve

Kaplan–Meier and Nelson–Aalen

What to take into the next section

Introduction and Overview

Learning Objectives

What Is Survival Analysis?

Understanding Censoring

Types of Censoring

Truncation vs Censoring

Quantifying Survival Time

Three Approaches to Survival Analysis

Actuarial Life Tables

The Kaplan-Meier Estimator

Worked example: building the step by hand

⌛ Interactive: Kaplan-Meier & Censoring

Patient timelines

Kaplan-Meier survivor function

The Nelson-Aalen Estimator

Reflection

Survivor, Failure & Hazard Functions

Survivor, Failure & Hazard Functions

Survivor and failure functions

The hazard function

What the shape tells you

Constant \(h(t) = \lambda\)

Increasing \((p>1)\)

Decreasing \((p<1)\)

Tests for survival curves

Log-rank

Wilcoxon (Breslow)

Peto–Peto–Prentice

What to take into the next section

Introduction and Overview

Learning Objectives

The Survivor Function

The Failure Function

The Probability Density Function and Hazard Function

The Cumulative Hazard Function

Key Relationships Among Functions

Hazard Function Shapes

Constant Hazard (Exponential Distribution)

Increasing Hazard (Weibull with p > 1)

Decreasing Hazard (Weibull with p < 1)

Comparing Survival Curves

Reflection

Cox Proportional Hazards Model

Cox Proportional Hazards Model

Cox proportional hazards regression

Reading a Cox model table

Partial likelihood and ties

Breslow method

Efron method

Testing proportional hazards

Schoenfeld residuals

Log-cumulative hazard plot

What to take into the next section

Introduction and Overview

Learning Objectives

The Cox Model

The Hazard Ratio

The Proportional Hazards Assumption

Estimation: Partial Likelihood

Handling Ties

Example: Cox Model Results