Cohort
Studies
Evaluating Epidemiological Research
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Distinguish between open and closed source populations as they relate to cohort study design
- Describe the major design features of risk-based and rate-based cohort studies
- Identify hypotheses and population types consistent with risk-based and rate-based cohort studies
- Elaborate the principles used to select and measure the exposure in cohort studies
- Design and implement a valid cohort study to investigate a specific hypothesis
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Glossary — Key Terms, People & Concepts
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.
Introduction & Cohort Study Design
⏱ Estimated reading time: 15 minutes
Introduction and Overview
Lesson 4 ended with a promise: cohort studies invert the case-control logic by sampling on the exposure rather than the disease, and that inversion lets us measure incidence directly without the rare-disease assumption that complicated odds-ratio interpretation. This lesson cashes that promise. Across four content sections we walk from the basic logic of cohort design (Section 1), to the choice between risk-based and rate-based flavors that should now feel familiar from Lesson 4 (Section 2), to the surprisingly difficult problem of measuring exposure in a longitudinal setting (Section 3), and finally to the practical questions of comparability, follow-up, outcome ascertainment, and analysis (Section 4). The unified-design discipline from Lesson 3 still applies; the lessons of Lesson 1 about pre-specified analysis plans apply with extra force, because cohort studies often run for decades and offer many opportunities for selective reporting.
Learning Objectives
- Describe the fundamental logic of the cohort study design.
- Distinguish between open and closed source populations.
- Differentiate between prospective and retrospective cohort designs.
- Recognise how cohort studies relate to controlled trials.
What Is a Cohort Study?
The word cohort denotes a group of study subjects that has a defined characteristic in common. In epidemiological study design, that characteristic is usually exposure status. In a cohort study, we follow subjects from exposure to outcome (Grimes and Schulz, 2002).
Walk through the Framingham Heart Study from 1948 enrollment to three generations of follow-up. Next ▶ advances scenes.
A 7-scene retelling of the most famous cohort study ever launched: town enrollment (Dawber, Meadors, & Moore, 1951), baseline measurements, decades of follow-up, incidence comparisons, the birth of the term "risk factor" (Kannel et al, 1961), and the three generations still under study today (Mahmood et al, 2014).
Key Idea
A cohort study closely resembles a controlled trial — without the randomisation of exposure. We start with subjects who do not yet have the disease, classify them by exposure, follow them forward in time, and compare the frequency of the outcome between exposure groups.
Most frequently, the outcome is the occurrence of a specific disease, but cohort studies can also examine outcomes such as birth weight, body mass index, blood pressure, or quality of life. Subjects are usually individuals, but can also be groups (e.g., families).
The cohort design's modern reputation rests on a small number of landmark studies that the rest of this lesson will return to repeatedly: the Framingham Heart Study (Dawber, Meadors, & Moore, 1951), which gave epidemiology the term “risk factor” (Kannel et al, 1961); the British Doctors Study (Doll & Hill, 1954; Doll et al, 2004), which followed UK physicians for 50 years and pinned down the smoking–lung cancer link; the Whitehall II civil-servant cohort (Marmot et al, 1991), which exposed a graded socioeconomic gradient in chronic disease; the Nurses' Health Study (Colditz, Manson, & Hankinson, 1997); the multinational EPIC cohort (Riboli et al, 2002); and the recent generation of population biobanks — UK Biobank (Sudlow et al, 2015) and the Canadian Longitudinal Study on Aging (Raina et al, 2019).
Figure — The logic of cohort design: classify disease-free subjects by exposure, follow them forward, compare the disease frequency between groups.
Selecting the Study Group
How we select the cohort depends on what we know in advance. The three flip cards below name the standard choices — click each one and notice that the choice flows from the data already in hand, not from any abstract preference for one design over another.
In both two-cohort and single-cohort designs, after selecting subjects we (1) verify they meet inclusion criteria, (2) confirm exposure status, (3) ensure they do not yet have the outcome, then (4) follow them for a defined period and compare incidence between exposure groups.
Whichever cohort structure you pick, the next decision is whether the follow-up has already happened or whether you will be doing it as the study runs.
Prospective vs. Retrospective Designs
Cohort studies can be conducted either way, depending on whether suitable records already exist (Euser et al, 2009). The two tabs below put the trade-offs side by side.
In a prospective cohort study, the disease has not yet occurred when the study begins. Subjects are recruited, exposure is assessed at baseline, and they are followed forward in time as outcomes develop.
Advantages: Allows more detailed information-gathering and careful recording of exposure, confounders, and outcome timing (see Examples 8.6, 8.7, and 8.9).
Disadvantages: Time-consuming and expensive; vulnerable to losses to follow-up over long study periods.
In a retrospective cohort study, the follow-up period has already ended and the disease event has already occurred when subjects are selected (Hudson et al, 2005). Investigators reconstruct exposure and outcome from existing records.
Advantages: Faster and cheaper; useful when good historical records exist (Examples 8.1, 8.4, 8.5).
Disadvantages: Requires suitable existing databases; depth of information is limited to what was recorded.
Beyond the timing question is a structural one about the population itself: does its membership stay fixed for the duration of follow-up, or do people enter and leave? You met this distinction in Lesson 4; it returns here as a more central concern, because cohort follow-up is what makes it operational.
Open vs. Closed Source Populations
The nature of the source population determines the appropriate design. This is a critical decision that affects everything from sample-size calculations to the choice of analytic methods. Read the table below as a checklist for matching disease type to design type — chronic outcomes almost always require open-population, rate-based handling, and Section 2 builds out exactly what that requires.
| Feature | Closed Population | Open Population |
|---|---|---|
| Membership | Fixed at start of study | Subjects can enter and leave |
| Follow-up | All subjects observed for full risk period | Variable time-at-risk per subject |
| Best disease type | Short risk period (e.g., outbreaks) | Long or chronic risk period (e.g., cancers) |
| Disease frequency | Risk (cumulative incidence) | Rate (incidence density) |
| Acceptable losses | Few or none preferred (<10%) | Time-at-risk accounted for explicitly |
Key Examples
Three published examples bring the design choices we have just enumerated into one place. We will refer back to these throughout the lesson by number, so it is worth pausing on each one to identify which boxes the investigators ticked: prospective vs. retrospective, two-cohort vs. single, open vs. closed, risk-based vs. rate-based.
Choi et al (2011) conducted a hospital-based cohort study where the exposure was discharge against medical advice (DAMA) versus discharged with medical advice (DWMA). The outcome was readmission within 14 days. Each DAMA patient was matched with one DWMA patient by 10-year age group, gender, and clinical characteristics. Because all patients were observable for the full 14-day risk period, this is a classic risk-based design. Conditional logistic regression accounted for the matching. Result: 26% of DAMA patients were readmitted within 14 days versus only 3% of DWMA patients.
Crane et al (2011) conducted a retrospective cohort study based on interviews with 11,000+ women who gave birth in two Canadian provinces (2001–2009). Eleven per cent self-declared exposure to environmental tobacco smoke. Outcomes included infant body dimensions, Apgar scores, respiratory distress syndrome, and stillbirth. Multiple regression was used to control for confounders. Tobacco smoke was associated with lower birth weight, smaller body size, and increased stillbirths. Note: when outcomes are on a continuous scale (e.g., birth weight), the cohort design still applies — we just use linear rather than logistic regression.
Mehta et al (2010) used a population-based retrospective cohort to investigate falls and fractures in adults ≥50 years. The exposure was atypical versus typical antipsychotic agents. More than 60 covariates were combined into a propensity score, and the “Greedy 5-1 matching technique” was used to match subjects with similar scores. Each exposure group contained 5,580 people. While the hazard ratio did not differ significantly between drug classes, taking any antipsychotic for >90 days was associated with HR = 1.8 for falls or fractures.
Stating the Study Objective
Each study should clearly specify:
- The target population (to which inferences will be made)
- The source population (from which the study group will be drawn)
- The unit of observation (individuals or groups)
- The exposure, the disease, and the follow-up period
- The setting (context or venue) of interest
- If biology is known: the amount or duration of exposure thought to cause disease, and the relevant time window for exposure (current vs. lifetime vs. historical)
Key Takeaways
- Cohort studies follow disease-free subjects from exposure forward to outcome.
- The design parallels a controlled trial, minus randomisation.
- Two-cohort designs select by exposure status; longitudinal designs select a single group with a range of exposures.
- Studies can be prospective or retrospective; the difference is timing relative to outcome occurrence.
- Closed populations call for risk-based designs; open populations call for rate-based designs.
The takeaways above name what changed conceptually compared with case-control designs. The R box that follows makes the change concrete: because we sampled on exposure, the same kind of 2×2 table you met in Lessons 3 and 4 now yields a risk ratio and an incidence rate ratio directly — no rare-disease assumption required.
What you'll do: compute risk, incidence rate, the risk ratio, and the incidence rate ratio from a small simulated cohort. What to take away: sampling on exposure unlocks measures of disease frequency that case-control designs simply cannot deliver — and Section 2 will show why the choice between risk-based and rate-based handling determines which of these two ratios is the right summary in any given study.
A cohort lets you compute a risk (cumulative incidence) or a rate (incidence density) directly — you sampled by exposure, not by outcome. Below is a hand calculation of both from a small simulated cohort.
# 1000 exposed and 1000 unexposed individuals followed for up to 5 years.
# exposed: 80 events in 4500 person-years
# unexposed: 30 events in 4900 person-years
events <- c(exposed = 80, unexposed = 30)
n <- c(exposed = 1000, unexposed = 1000)
py <- c(exposed = 4500, unexposed = 4900)
risk <- events / n
rate <- events / py * 1000 # per 1000 person-years
RR <- risk["exposed"] / risk["unexposed"] # risk ratio
IRR <- rate["exposed"] / rate["unexposed"] # incidence rate ratio
round(data.frame(risk, rate, RR = RR, IRR = IRR), 3)
Why both? The risk ratio answers "how many times more likely is an exposed person to develop disease over the follow-up window?" The rate ratio answers "per unit of person-time, how much more frequent is the event in exposed people?" In an open cohort with variable follow-up, the rate-based answer is usually the right one.
R Reflect on what you just ran
Use the questions below to interpret the output you produced. Look at your console table before answering.
1. The risk in the exposed group was 0.080 (80/1000) and in the unexposed group 0.030 (30/1000), giving RR = 2.667. Translate that risk ratio into a plain-English sentence about how the cohort's 5-year cumulative incidence differs by exposure.
2. The incidence rate ratio (IRR = 2.904) is slightly larger than the risk ratio (RR = 2.667). Looking at the person-time denominators (4500 vs 4900), why does dividing by person-years instead of headcount nudge the ratio upward? Which group lost more person-time, and what does that suggest about follow-up in this cohort?
3. If the cohort were instead a closed population with everyone followed exactly 5 years (no losses), would the RR and IRR converge? Explain which measure you would report and why.
The reflection below asks you to use the timing distinction in a concrete research scenario. After working through it and the knowledge check, Section 2 returns to the risk-vs.-rate split that the R box just previewed and shows what each design buys, costs, and assumes.
Reflection
Reflection
Think of a health question you find compelling. Would you address it with a prospective or retrospective cohort design? What records or recruitment infrastructure would you need? What might be lost or gained by each choice in your specific case?
Minimum 20 characters required.
1. The fundamental logic of a cohort study is to:
2. A cohort study most closely resembles which of the following designs?
3. In a retrospective cohort study, when has the outcome event occurred relative to the start of the study?
4. Which of the following best describes a closed source population?
Risk-Based & Rate-Based Designs
⏱ Estimated reading time: 15 minutes
Introduction and Overview
Section 1 established what a cohort study is and named the design choices its investigators have to make. This section drills into the most consequential of those choices: whether to count events per person (risk) or events per unit of person-time (rate). The two designs share a 2×2 layout but differ in what they assume about the population and what they let you say about disease frequency. Sample-size planning, surprisingly, is a useful place to start, because the calculation is the same for both designs even when the analysis ends up being different.
Learning Objectives
- Describe the design and assumptions of risk-based (cumulative incidence) cohort studies.
- Describe the design and analysis of rate-based (incidence density) cohort studies.
- Identify hypotheses and population types appropriate for each design.
- Calculate and interpret the basic measures of disease frequency for each design.
Sample Size
Initial sample-size estimates are usually performed assuming an equal number of exposed and non-exposed subjects, and assuming the disease is measured by risk (Section 8.2.2). This approach is often sufficient for initial planning even if the population is open and a rate-based design must ultimately be used.
Modern Sample-Size Software
Recent software allows for unequal sample sizes, repeated measures, multivariable regression models, and proportional hazards models. Specialised methods exist for competing risks (Latouche and Porcher, 2007), survival-time outcomes (Matsui, 2005), strata-matched designs (Mazumdar et al, 2006), and time-varying exposures (Basagana et al, 2011).
Risk-Based (Cumulative Incidence) Designs
This is the simplest form of cohort study, but several assumptions must hold:
- Exposure groups are defined at the start of the study and remain unchanged (fixed cohorts).
- The study groups are closed — all subjects must be observed for the full risk period.
- There should be few or no losses (some authors use >10% losses as a cut-point that casts doubt on validity).
When Risk-Based Designs Work Best
Risk-based designs work best for diseases with a relatively short risk period (e.g., acute infections, post-surgical complications). For chronic diseases such as many cancers, where the risk period is lifelong and often longer than feasible follow-up, a rate-based design is preferred.
2×2 Table: Risk-Based Cohort Design
| Exposed | Non-exposed | Total | |
|---|---|---|---|
| Diseased | a1 | a0 | m1 |
| Non-diseased | b1 | b0 | m0 |
| Total | n1 | n0 | n |
We select n1 exposed and n0 non-exposed individuals (free of disease) from the source population, follow them for the full follow-up period, and observe a1 exposed cases and a0 non-exposed cases. The two risks of interest are:
The Denominator
In risk-based designs, the denominator is the number of subjects in each exposure category. This is only valid because every subject is observed for the full risk period — otherwise, who you count and who you don’t would depend on follow-up time.
Risk-based designs are conceptually clean but operationally fragile — their assumptions break the moment people leave the cohort or the risk period extends beyond a few months. The rate-based alternative was developed precisely to handle the populations where those assumptions do not hold.
Rate-Based (Incidence Density) Designs
In many cohort studies, not every subject is under observation for the full risk period — especially when:
- The source population is dynamic (subjects enter and leave).
- The follow-up period is long.
- Subjects are added part-way through the biological risk period.
- A significant proportion of subjects withdraw from the study.
- Exposure status itself changes during the study.
In these situations, we cannot just count exposed and non-exposed subjects. Instead, we accumulate the amount of ‘at-risk time’ contributed by each subject in each exposure category. The denominator becomes person-time, not persons.
2×2 Table: Rate-Based Cohort Design
| Exposed | Non-exposed | Total | |
|---|---|---|---|
| Diseased | a1 | a0 | m1 |
| Person-time at risk | t1 | t0 | T |
Each subject contributes ‘at-risk’ time until they develop the disease, are lost to follow-up, or the study ends. The two rates of interest are:
Choice of Analysis
If follow-up is relatively short and rates are reasonably constant, Poisson models are appropriate. If follow-up is long and the assumption of a constant rate is not tenable, survival analysis (e.g., Cox proportional hazards) is preferred (Cox, 1972; see Chapter 19).
You have now seen both designs from the inside. The next subsection puts them side by side; read it as a decision aid for matching design to research situation, not as a statement that one design is generally better than the other.
Comparing the Two Designs
Risk-based designs are best when:
- The population is closed (fixed cohort).
- The risk period is short (so all subjects can be observed for the full period).
- Losses to follow-up are minimal (under ~10%).
- Examples: acute outbreaks, surgical complications within 30 days, hospital readmissions within 14 days (Example 8.1).
Rate-based designs are best when:
- The population is open (dynamic).
- Follow-up is long or the risk period is chronic.
- Subjects enter or leave the study at different times.
- Exposure status may change during follow-up.
- Examples: rugby injury rates over a season (Example 8.6), invasive breast cancer over 10+ years (Example 8.7), fracture incidence over decades (Example 8.8).
Risk denominator: the number of subjects in each exposure category. Counts people.
Rate denominator: the cumulative person-time at risk in each exposure category. Counts time.
This means risk is dimensionless (a proportion between 0 and 1), while a rate has units of cases per person-time (e.g., 4.0 per 1,000 person-years).
Key Examples
The four examples below illustrate the design choice with real published studies. The first two are risk-based; the last two are rate-based. As you expand each, ask yourself why the investigators chose what they chose — in every case the source population's behaviour and the follow-up window's length will be doing most of the work.
Kelz et al (2009) compared morbidity and mortality following 56,000+ general and vascular surgical procedures (2001–2004). Time of operation was grouped into seven 2-hour periods. Risk of mortality within 30 days had a moderately strong association with start times after 9:30 pm (OR = 1.22), and morbidity had OR = 1.32 for late-night surgeries. However, when emergency cases were excluded, no odds ratios were significant. The excess crude risk was largely explained by the nature of the clinical cases — an important reminder about confounding by indication.
Leece et al (2010) followed approximately 250 HIV-positive women receiving care at the Ottawa Hospital General Campus Immunodeficiency Clinic (2002–2005). The outcome was undergoing cervical screening; predictors included demographics, HIV status, and primary care provider status. Analysis combined χ2 tests with logistic regression. The 12 women without a primary-care provider were less likely to undergo screening (RR = 1.6) than the 84 women with providers. The authors noted that abnormal screening results were common and that recent low CD4 cell count was the only significant predictor.
Chalmers et al (2011) followed 704 male amateur rugby players (aged 13+) over a season. The ‘time’ component was a game, with a total of 6,263 player-games of follow-up. Exposures included age, ethnicity, experience, BMI, smoking, previous injury, training, weather, ground conditions, foul play, and protective equipment. Because rates were reasonably constant over the period, Poisson regression was appropriate. Notable findings: Pacific Island vs. Maori ethnicity (IR = 1.5), ≥40 hours of strenuous activity weekly (IR = 1.5), playing while injured (IR = 1.5), foul play (IR = 1.9), and headgear use (IR = 1.2).
Luo et al (2011) drew on the Women’s Health Initiative Observational Study: 90,000+ women aged 50–79 followed across 40 US clinical centres. Smoking exposure was characterised in detail (status, age started, age quit, cigarettes/day, pack-years). Over an average of 10.3 years of follow-up, 3,520 incident invasive breast cancers were identified. Because of the long follow-up, Cox proportional hazards models (rather than Poisson) were used. Findings: HR = 1.09 for former smokers and HR = 1.16 for current smokers. Among lifetime non-smokers, only those with the highest passive-smoke exposure had increased risk; no significant dose-response trend was seen.
Key Takeaways
- Risk-based designs use number of subjects as the denominator and require a closed cohort followed for the full risk period.
- Rate-based designs use person-time as the denominator and accommodate dynamic populations and variable follow-up.
- The choice between Poisson and Cox proportional hazards depends on whether the rate is reasonably constant or changes substantially over follow-up.
- Initial sample-size calculations can be done assuming a risk-based design even when the analysis will ultimately be rate-based.
The reflection below asks you to apply the choice from this section to a specific long-running occupational cohort. After working through it, Section 3 turns to a problem that stays mostly hidden in textbook treatments: how do you actually measure exposure when it can change over years or decades of follow-up?
Reflection
Reflection
Suppose you are studying the effect of a workplace exposure (e.g., shift work) on cardiovascular disease over 20 years. Workers can join or leave the company at any time. Which design (risk-based or rate-based) is more appropriate, and why? What practical issues would arise that wouldn’t arise in a 14-day hospital readmission study?
Minimum 20 characters required.
1. Which of the following is a key requirement of a risk-based cohort study?
2. In a rate-based cohort study, the denominator of the rate is:
3. Which study design is most appropriate when the source population is dynamic and follow-up is long?
4. When follow-up is long and the assumption of a constant rate is not tenable, which analysis is preferred?
The Exposure
⏱ Estimated reading time: 15 minutes
Introduction and Overview
Sections 1 and 2 set up the architecture of the cohort study and the choice of risk-vs.-rate design. Both treated “exposed” as if it were a fixed property of each subject. In real cohorts that is rarely true. People start smoking, quit, switch jobs, change diets; doses accumulate. This section is about how exposure is actually measured and handled when it can vary across years or decades of follow-up — the technical problem that distinguishes a working cohort study from a textbook one.
Learning Objectives
- Elaborate the principles used to select and measure exposure in cohort studies.
- Distinguish between permanent and non-permanent exposures.
- Describe the concept of an induction period and its role in time-at-risk calculations.
- Identify how dichotomous, ordinal, and continuous exposure scales differ in measurement and analysis.
Why Exposure Measurement Matters
In cohort studies, the objective is to identify the consequences of a specific exposure factor. Exposures can range from study-subject characteristics (sex, age) to infectious or noxious agents, environmental exposures, or food-related factors. Exposures that can be manipulated are of special interest because they lead more directly to disease control.
Measurement Is Not Trivial
Although measuring exposure might seem simple at first glance, careful thought should be given to how it is measured and expressed. Each study should specify what constitutes exposure and, when possible, the ‘induction period’ — how long after exposure is reached before disease might reasonably arise. The more complex the exposure, the more important it is to validate the assessment.
Scales of Exposure Measurement
Exposure status can be measured on different scales, each with implications for design and analysis. The four flip cards below run from the simplest binary contrast up to compound measures that combine intensity and duration. The pattern to take away: more granular measurement makes the design more powerful for detecting dose-response, but only if the underlying biology really has a graded effect.
The choice of measurement scale is independent of a second question: does the exposure stay fixed for the rest of the subject's life, or can it change?
Permanent vs. Non-Permanent Exposures
Permanent exposures are time-invariant — factors such as sex, race, or one-time exposures such as a vaccination. These are relatively easy to measure, but a moment’s thought reveals subtleties:
- Age at exposure may matter (e.g., age at vaccination, age at smoking initiation).
- Even ‘one-time’ exposures may have a threshold or dosage requirement to count as ‘exposed’.
- If the disease event occurs before exposure is completed, it should not be counted as an outcome event — the exposure could not have caused it.
In early studies, the goal may be to determine the threshold at which exposure becomes biologically meaningful (Rohan et al, 2007).
Non-permanent exposures change over time: food intake, lifestyle factors, environmental exposures, or any exposure where the timing matters. These add complexity:
- When did exposure start? (e.g., age started smoking)
- When did exposure stop? (e.g., age quit smoking)
- How intense was exposure? (cigarettes per day)
- How long did it last? (years smoked)
Sometimes a simple summary will suffice (e.g., years smoked); sometimes a compound measure (e.g., pack-years) is needed. The more information collected, the more credibly causal relationships can be inferred and the more useful the findings are for prevention.
Both permanent and non-permanent exposures share a complication that becomes visible only when you start counting person-time: the gap between when exposure happens and when disease can plausibly result.
The Induction Period & Time-at-Risk
An important concept in cohort design is the induction period — the time after exposure is completed before disease can reasonably arise.
Figure 8.1 — Life experience: exposure, induction period, and time-at-risk.
Handling the Induction Period
If there is a known induction period following completion of exposure, then until that period is over, the time-at-risk of ‘exposed’ individuals should be added to the non-exposed group. Some researchers prefer to discard disease experience during the induction period because of uncertainty about its duration; this is often the safest choice provided sufficient time-at-risk remains in the exposed group to maintain precision.
Changing Exposure Status
If exposure status changes during follow-up, an individual subject can contribute time-at-risk to both exposed and non-exposed categories:
- Previously non-exposed subjects contribute to the exposed category after the exposure threshold is reached.
- Previously exposed subjects contribute to the non-exposed category after any lag effects have ended.
- If a subject develops the disease, the exposure category assigned is the one they were in at the time the outcome occurred.
Losses to Follow-Up
For subjects lost to follow-up, time-at-risk accumulates until the last date their exposure status is known. If the precise time of loss is unknown, the midpoint of the last known exposure period is conventionally used.
One last twist on the meaning of “exposure” is worth flagging before we move on, because it shows up surprisingly often in chronic-disease cohorts.
Disease as Exposure
Disease itself can serve as an exposure for other outcomes such as additional diseases, mortality, or quality-of-life measures. Lazo et al (2011) followed 11,000+ adults for 18 years using non-alcoholic fatty-liver disease (NAFLD) as the exposure. Those with NAFLD — whether or not they had elevated liver enzymes — had a similar mortality hazard ratio as those without NAFLD, an interesting null result.
Key Examples
The three published cohorts below illustrate the full range of exposure handling we just covered: a continuous diet exposure validated against multiple recalls, a multi-exposure design with biospecimens, and a multinational study that pushes from estimated relative risk to population-attributable fraction. Expand each one and notice which exposure-measurement decisions shape the rest of the design.
Warensjo et al (2011) studied women in the Swedish Mammography Cohort. Calcium intake from diet, supplements (1 dose = 500 mg), and multivitamins (1 dose = 120 mg) was the major exposure. Total calcium intake correlated well (r = 0.77) between food-frequency questionnaire and 14 repeated 24-hour recalls — an example of careful exposure validation. Cumulative dietary calcium intake (in quintiles) was related to fracture and osteoporosis incidence using Cox proportional hazards and logistic models. Findings: a chronic, low-dietary calcium intake was associated with increased fracture and osteoporosis. Above the base level, only minor differences were observed; in the highest-intake group, hip fracture rate was somewhat increased.
Rohan et al (2007) recruited alumni from three Ontario universities (1995–1998). The major outcome was new cancer incidence. Participants completed lifestyle and food-frequency questionnaires, measured waist and hip circumferences, and provided hair and toenail specimens (for trace element and DNA analysis). Of the 73,000+ recruits, 97% provided biological specimens. Exposures included exercise, lifestyle factors, molecular markers, and dietary characteristics. The paper includes a particularly good discussion of creating compound nutritional variables from food-frequency data and verifying them.
Schutze et al (2011) reported on the European Prospective Investigation into Cancer and Nutrition (EPIC; Riboli et al, 2002), a multicentre prospective cohort that recruited 520,000 men and women aged 35–70 from 10 European countries (1992–2000). Alcohol consumption was measured at recruitment in grams/day and classified as never, former, or lifetime consumer. Cancer incidence was identified through cancer registries (follow-up ended 2002–2005). The analysis combined regression coefficients with population prevalence of consumption above recommended levels. If causality is assumed, alcohol consumption beyond recommended levels was responsible for an estimated 10% of all cancers in men and 3% of all cancers in women.
Key Takeaways
- Exposure can be dichotomous, ordinal, continuous, or compound — choose the scale that best captures the biology.
- Permanent exposures (sex, race, vaccination) are easier but still require careful operational definitions.
- Non-permanent exposures require timing, intensity, and duration data; the more detail, the more credible the inferences.
- Time before the induction period ends should not be counted as ‘exposed’ time-at-risk.
- Subjects can contribute time-at-risk to multiple exposure categories if their status changes.
- Disease itself can serve as an exposure for downstream outcomes.
The reflection below puts the measurement-scale decision into a real research scenario. After working through it and the knowledge check, Section 4 closes the lesson by addressing the four practical questions that any cohort investigator faces once exposure is settled: keeping the groups comparable, managing the follow-up period, ascertaining outcomes, and analysing the data.
Reflection
Reflection
Imagine designing a cohort study of physical activity and cardiovascular disease. How would you measure activity — dichotomous (active vs. inactive), ordinal (low/moderate/high), or continuous (MET-hours/week)? What might be lost or gained at each level of measurement? Consider how you would handle people whose activity changes substantially during follow-up.
Minimum 20 characters required.
1. The induction period refers to:
2. Which exposure measurement is the example of a compound variable?
3. If a subject’s exposure status changes during the study, how is their time-at-risk handled?
4. Why is categorising a continuous exposure variable often discouraged in analysis?
Comparability, Follow-up, Outcomes & Analysis
⏱ Estimated reading time: 18 minutes
Introduction and Overview
Sections 1–3 walked through the design choices: cohort architecture, risk-vs.-rate handling, and exposure measurement. With those locked in, the remaining work is operational. Four practical questions structure this section in turn: how do you make sure the exposed and non-exposed groups are comparable on the variables that aren't your exposure of interest, how long should you follow them, how do you confirm the outcome happened, and how do you analyse what you collect?
Learning Objectives
- Identify the three approaches to ensuring comparability of exposed and non-exposed groups.
- Describe principles for unbiased follow-up and outcome ascertainment.
- Recognise appropriate analytic approaches for risk-based and rate-based cohort designs.
- Apply STROBE reporting guidelines to a cohort study.
Ensuring Exposed & Non-Exposed Groups Are Comparable
If exposed and non-exposed groups are not comparable (i.e., not exchangeable) with respect to factors related to both exposure and outcome, a biased (confounded) assessment results (Klein-Geltink et al, 2007).
The Achilles Heel of Observational Studies
As Hernan (2012) notes, this is a key reason to prefer randomised experiments — exchangeability is expected when exposure is randomised. In observational studies, investigators must use expert knowledge to identify and measure all potential confounders in hopes of achieving exchangeability conditional on the measured covariates. Unfortunately, exchangeability cannot be empirically tested, so we never know with certainty whether we have succeeded.
Three approaches help ensure comparability:
Comparability addresses the cross-sectional baseline. The next question is what happens over time — specifically, how long the follow-up should last and what counts as time-at-risk for each subject.
Follow-Up Period
To enhance validity, the follow-up process must be as complete as possible and unbiased with respect to exposure. Achieving unbiased follow-up may require some form of blinding to exposure status:
- In prospective studies: assign follow-up tasks to researchers unaware of exposure status.
- In retrospective studies: keep record reviewers unaware of exposure status when possible.
Active vs. Passive Surveillance
With passive surveillance, cases are identified when they present (e.g., date of first symptoms or physician examination). With active surveillance and regular evaluation of subjects, more accurate timing of outcome occurrence is feasible. Tooth et al (2005) recommend enumerating the at-risk population at specified times during the study period.
Losses to follow-up are a perennial concern. Chang et al (2009) describe shared parameter models to reduce bias when losses are not randomly distributed.
Even careful follow-up only matters if the outcome itself is measured well. Cohort studies tend to be ambitious about exposure characterization but surprisingly casual about outcome ascertainment, and the next subsection is the corrective.
Measuring the Outcome
Although the most frequent outcome in cohort studies is the occurrence of a specific disease (measured as risk or rate), outcomes can also be:
- Ordinal: e.g., none, mild, moderate, severe.
- Continuous: e.g., birth weight, blood pressure, BMI, quality of life index (Example 8.2).
Harley et al (2011), for instance, examined polybrominated diphenyl ethers (PBDEs — a flame retardant) and infant birth weight, length, and head circumference — both exposure and outcome on continuous scales.
Diagnostic Criteria
Each study needs explicit protocols for determining outcome occurrence and timing. Clear diagnostic criteria minimise diagnostic errors. In retrospective studies, this can be challenging when only summary diagnostic information is available.
Incidence Requires Two Examinations
Strictly, measuring incidence requires:
- An examination at the start of follow-up to ensure subjects do not yet have the disease.
- A second examination to determine whether (and when) the disease developed.
Why Incidence, Not Prevalence?
Including only new disease events avoids the reverse-causation problem from measuring prevalence and ensures that associations are not biased by duration-of-disease effects and survival bias (see Chapter 12). In retrospective studies, freedom from disease at the start of follow-up often must be assumed; in prospective studies, it should be formally verified.
If clinical diagnostic data are used, the incident date is usually based on the time of diagnosis, not the time of underlying disease occurrence — an important caveat for diseases with long subclinical phases. If subjects are screened at regular intervals, the time of disease occurrence is conventionally placed at the midpoint between examinations.
Multiple Outcomes
The Multiple-Comparisons Problem
One advantage of cohort studies is the ability to assess multiple outcomes from a given exposure. However, with multiple outcomes, some may be statistically significant by chance alone. The remedy is either to consider the study as hypothesis-generating rather than hypothesis-testing, or to apply a penalty to the P-value threshold — unless outcomes were specified a priori.
Comparability, follow-up, and outcome ascertainment are all design-stage decisions. The analysis stage is where those decisions translate into a quantitative result — and where pre-specified analysis plans (the EGAP-style discipline from Lesson 1) earn their keep, because cohort data tempt almost limitless slicing.
Analysis
For closed populations, average risk and survival times can be measured during follow-up.
- Bivariable: methods in Chapter 6.
- Stratified analysis to control confounding: Chapter 13.
- Multivariable: traditionally logistic regression, which uses odds ratios as the base measure.
- For estimating risk ratios directly in multivariable settings: see Section 18.4.1.
- For risk differences as the association measure: linear regression as described by Cheung (2007).
- For population attributable fractions: log-linear models (Cox, 2006) or various models including logistic, log-linear, and Poisson (Greenland, 2004; Example 8.10).
For open populations, rates measure disease frequency. The choice depends on follow-up length:
- If the rate is reasonably constant over follow-up: Poisson regression is appropriate. Subject time-at-risk is included as the offset.
- If the rate varies substantially over follow-up: Cox proportional hazards models are preferred (most rate-based cohort analyses in the medical literature use these).
- For grouped data with multivariable analysis: Poisson with offsets gives direct incidence rate ratio estimates.
- If time of occurrence matters more than just whether the outcome occurred: survival models (Case et al, 2002).
Callas et al (1998) compared proportional hazards, Poisson, and logistic models, concluding that the first two are preferable to logistic for cohort data — a finding confirmed by Greenland (2004).
Hernan (2010) describes two drawbacks to hazard ratios (HRs):
- Average HRs can change over time. The reported average HR depends on the duration of follow-up.
- Period-specific HRs have a built-in bias. The HR at time t is conditional on not having developed the outcome before time t. As follow-up lengthens, susceptible people in the exposed group are progressively depleted, so the apparent risk in the exposed group decreases relative to the non-exposed group.
Hernan describes how to circumvent these problems using covariate-adjusted survival curves. Hernan et al (2008) propose subdividing follow-up into shorter intervals and treating each as a ‘trial’ — an analytic strategy that more closely approximates a randomised trial. Danaei et al (2011) elaborate this approach.
Time-Dependent Confounders
Gran et al (2010) describe a sequential Cox technique for data with time-dependent confounders — covariates that are affected by past exposure and predict future exposure and outcome (e.g., CD4 count when assessing HIV treatment effects).
Repeated Measurements
Xue et al (2010) explain how marginal and mixed-effect models can handle cohort data with repeated measurements of exposure and covariates, contrasting these with logistic and proportional hazards approaches.
The last operational concern, as in Lessons 3 and 4, is reporting. STROBE returns one more time, now with the items that matter specifically for cohort designs.
Reporting of Cohort Studies (STROBE)
The STROBE statement (von Elm et al, 2007) provides reporting guidelines for observational studies. Tooth et al (2005) elaborate criteria specific to cohort studies. These should be used both to plan and report studies, and to assess the validity of published work.
Study definition:
- Are objectives or hypotheses stated?
- Are the target population, sampling frame, and study population defined?
- Are the setting, geographic location, and dates stated?
- Are eligibility criteria stated?
- Is the number of participants justified?
Recruitment & participation:
- Are numbers meeting and not meeting eligibility criteria stated?
- Are reasons for ineligibility and refusal given?
- Were responders compared with non-responders?
Measurement:
- Are methods of data collection stated?
- Was the reliability and validity of measurement methods mentioned?
- Were any confounders mentioned?
Follow-up & analysis:
- Was the number of participants at each stage specified?
- Were reasons for loss to follow-up quantified?
- Was the type of analysis stated, including longitudinal methods?
- Were absolute and relative effect sizes reported?
- Were confounders and missing data accounted for in analyses?
Discussion:
- Was the impact of biases assessed (qualitatively or quantitatively)?
- Did authors relate results to a target population?
- Was generalisability discussed?
Reporting Study Designs Before Results
Tooth et al (2005) and others note an increased frequency of reporting on the design and implementation of proposed or early-stage studies (e.g., Gern et al, 2009; Hermsen et al, 2011; Origasa et al, 2011; Poulos et al, 2011; Schuz et al, 2011). This is a positive trend — it allows assessment of strengths and weaknesses without the bias that comes from already knowing the results.
Key Takeaways
- Three tools for comparability: restriction (before selection), matching (at selection), analytic control (during analysis).
- Exchangeability cannot be empirically tested in observational studies — we always work in some uncertainty.
- Blinded follow-up reduces bias; active surveillance gives more accurate event timing than passive.
- Always measure incidence (not prevalence) to avoid reverse-causation, duration, and survival biases.
- Risk-based studies are typically analysed with logistic or log-linear regression; rate-based studies use Poisson or Cox proportional hazards.
- Hernan (2010) cautions about hazard ratios changing over time and being conditional on prior survival.
- STROBE provides a checklist for both planning and assessing cohort studies.
The reflection below is the section's exit ticket — a realistic review-the-paper prompt that requires you to use everything from this section. After working through it and the knowledge check, the final assessment integrates the four content sections into one 15-question exam.
Reflection
Reflection
You are reviewing a published cohort study reporting that a dietary supplement reduced cardiovascular events with HR = 0.75. Drawing on this section, what specific methodological aspects would you scrutinise to decide whether you trust this result? Consider comparability of groups, follow-up completeness, outcome ascertainment, choice of analysis, and STROBE reporting.
Minimum 20 characters required.
1. Which approach to ensuring comparability is applied before subject selection?
2. Why are incident cases preferred over prevalent cases for measuring outcomes?
3. According to Hernan (2010), which is a documented ‘hazard’ of the hazard ratio?
4. According to STROBE-style criteria, which of the following should be reported in a cohort study?
Final Review & Assessment
⏱ Estimated time: 20 minutes
Bringing It All Together
Where Lesson 4 looked backward from outcome to exposure, this lesson followed the arrow forward. Section 1 framed the cohort study as something close to a controlled trial without randomisation: pick a source population, classify exposure, follow people, and watch incidence accumulate. Section 2 then forced the central design choice — risk-based (cumulative incidence) for closed populations followed for a fixed time, or rate-based (incidence density) for open populations where person-time is the natural denominator — and connected each to its analytic toolkit (binomial / log-binomial vs. Poisson / survival).
Sections 3 and 4 zoomed in on the operational details that decide whether a cohort study survives appraisal. How exposure is scaled (dichotomous, ordinal, continuous, compound), whether it changes over time, how the induction period is handled, and how comparability is engineered through restriction, matching, and analytic control all determine whether the rate ratio you eventually report is estimating what you think it is. Blinded outcome ascertainment, careful handling of loss to follow-up, and STROBE-aligned reporting then turn a defensible design into a study other researchers can actually use.
The final reflection asks you to put the entire arc to work as a brief cohort proposal of your own; the 15-question assessment then checks the conceptual content directly. Lesson 6 will pull the camera back further to look at ecological and group-level designs — where the unit of analysis stops being the person — and the comparability and inference logic you just built will keep paying off there.
Key Takeaways from Lesson 5
- Cohort studies follow exposed and unexposed people forward in time, giving you direct access to incidence — something case-control studies cannot deliver.
- The choice between risk-based (closed cohort, cumulative incidence) and rate-based (open cohort, incidence density) designs is dictated by the source population and the follow-up structure, not by preference.
- Person-time is the right denominator whenever follow-up is uneven or membership is dynamic; it makes Poisson and survival analyses possible.
- Exposure must be measured on a meaningful scale, with explicit handling of induction periods and time-varying status — misclassification here flows through the whole analysis.
- Comparability is engineered through restriction, matching, and analytic control; blinded outcome ascertainment and rigorous tracking of loss-to-follow-up protect internal validity.
- STROBE-aligned reporting closes the loop: a cohort study is only as useful as its design, conduct, and analysis are visible to the reader.
The companion R script r-activities/HSCI_230_Lesson_5_Cohort_Studies.R walks through a small simulated cohort end-to-end: compute cumulative incidence (risk) and incidence rate (per 1000 person-years) for exposed vs. unexposed groups, derive the risk ratio (RR) and incidence rate ratio (IRR), then build a Wald 95% confidence interval for the IRR on the log scale — the same workflow you will reach for when appraising a published cohort.
# 1000 exposed and 1000 unexposed individuals followed for up to 5 years.
# exposed: 80 events in 4500 person-years
# unexposed: 30 events in 4900 person-years
events <- c(exposed = 80, unexposed = 30)
n <- c(exposed = 1000, unexposed = 1000)
py <- c(exposed = 4500, unexposed = 4900)
risk <- events / n
rate <- events / py * 1000 # per 1000 person-years
RR <- risk["exposed"] / risk["unexposed"] # risk ratio
IRR <- rate["exposed"] / rate["unexposed"] # incidence rate ratio
round(data.frame(risk, rate, RR = RR, IRR = IRR), 3)
## -----------------------------------------------------------------------------
## Stretch: 95% CI for the rate ratio (Wald approximation on the log scale)
## -----------------------------------------------------------------------------
log_irr <- log(IRR)
se_logirr <- sqrt(1/events["exposed"] + 1/events["unexposed"])
ci_irr <- exp(log_irr + c(-1, 1) * 1.96 * se_logirr)
round(c(IRR = IRR, lower = ci_irr[1], upper = ci_irr[2]), 3)
Final Reflection
Design a brief cohort study proposal for a health question of your choice. Specify: (1) the research question and hypothesis, (2) whether the source population is open or closed, (3) whether you would use a risk-based or rate-based design and why, (4) how you would define and measure exposure (and on what scale), (5) how you would ensure comparability of groups, and (6) what analytic approach you would use.
Minimum 20 characters required.
Final Assessment
This assessment covers all sections of this lesson. You must score 100% to complete the lesson. Review the feedback after each attempt.
1. The fundamental logic of a cohort study is to:
2. A cohort study most resembles which other design?
3. In a closed source population, all subjects:
4. In a risk-based cohort study, what is the denominator of the risk?
5. In a rate-based cohort study, the denominator is:
6. Which design is best suited for studying a chronic disease with a long, lifelong risk period?
7. Which is an example of a compound exposure variable?
8. The induction period is:
9. If a subject’s exposure status changes during follow-up:
10. Which approach to comparability is applied during analysis rather than design?
11. Which statement about exchangeability in observational studies is correct?
12. Why are incident cases preferred over prevalent cases for cohort outcomes?
13. For a rate-based cohort study where rates can be assumed reasonably constant, which model is appropriate?
14. Hernan (2010) describes which problem with the average hazard ratio?
15. According to STROBE-style criteria for cohort studies, which item should be reported?