HSCI 230 — Lesson 8

Sampling, Selection Processes &
External Validity

Evaluating Epidemiological Research — HSCI 230

Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Explain Berkson’s bias and identify when hospital-based sampling distorts exposure-outcome associations
  • Describe the healthy worker effect and interpret standardized mortality ratios in occupational epidemiology
  • Distinguish attrition bias from nonresponse bias and evaluate strategies for mitigating each
  • Recognize prevalence-incidence (Neyman) bias and explain how cross-sectional designs miss rapidly fatal cases
  • Identify survivorship bias in cohort studies and its implications for causal inference
  • Evaluate transportability of study findings to target populations with differing characteristics
  • Critically assess whether epidemiological studies have adequately addressed selection-related threats to validity
Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Sampling & Population Concepts
Target Population The full population to which a study’s conclusions are intended to apply (e.g., “Canadian adults aged 18–64”). Inferences should be argued explicitly back to this group.
Study Population The subset of the target population from which the sample is actually drawn—limited by who is reachable through the sampling frame.
Sampling Frame The list or operational definition used to enumerate the study population (e.g., a voter registry, hospital records, a phone-number bank). Coverage gaps in the frame are a primary source of selection bias.
Probability Sampling Sampling in which every member of the frame has a known, non-zero probability of selection. Enables unbiased inference and valid standard errors.
Non-Probability Sampling Sampling without known selection probabilities (convenience, snowball, opt-in panels). Cheap and fast, but inferences require strong assumptions about who is and is not represented.
Simple Random Sampling Each unit in the frame has the same probability of selection, drawn independently. The benchmark against which other sampling designs are evaluated.
Stratified Sampling The frame is split into subgroups (strata) and a random sample is drawn within each. Improves precision and can guarantee representation of small groups.
Cluster Sampling Naturally occurring groups (schools, neighbourhoods, clinics) are sampled, then individuals within selected clusters are studied. Cheaper to field but reduces precision because units within clusters are correlated.
Generalizability The extent to which findings from a study sample apply to a broader population. Requires both internal validity and an argument that the sample resembles the target on relevant effect modifiers.
Internal Validity Whether the estimated association reflects the true effect within the studied sample (free of confounding, selection, and information bias). A precondition for any external claim.
External Validity Whether findings from the study sample transfer to other populations, settings, or time periods. Distinct from internal validity and harder to establish.
Transportability Formal generalization of study results to a new target population, accounting for differences in effect modifiers between samples and targets.
Selection Biases
Selection Bias A distortion of the exposure–outcome association caused by who ends up in (or stays in) the study. Arises whenever inclusion or retention depends on both exposure and outcome.
Berkson’s Bias A selection bias that arises in hospital-based studies because admission probabilities differ across exposure–outcome combinations. Can fabricate or reverse associations seen in the source population.
Healthy Worker Effect Workers tend to be healthier than the general population because illness selects people out of work. Comparing occupational cohorts to the general population biases occupational mortality risks toward the null.
Attrition Bias Bias in longitudinal studies caused by differential loss to follow-up that depends on both exposure and outcome (e.g., the sickest exposed participants drop out fastest).
Nonresponse Bias Bias arising in cross-sectional surveys when those who respond differ systematically from those who do not, on variables related to the outcome.
Response Bias A broad term for distortions introduced by how respondents engage with surveys—agreeing, providing socially desirable answers, or skipping items in patterned ways.
Prevalence–Incidence (Neyman) Bias In cross-sectional or prevalent-case studies, rapidly fatal or rapidly resolving cases are missed, so prevalent cases are unrepresentative of all incident cases. Distorts exposure–outcome estimates.
Survivorship Bias A bias arising when only survivors are observed—e.g., a cohort study of older adults that misses those already dead from the exposure’s effects.
Self-Selection Bias A form of selection bias where participants opt themselves in or out of a study in ways correlated with both exposure and outcome (e.g., motivated participants in a smoking-cessation trial).
Standardized Mortality Ratio (SMR) The ratio of observed deaths in a study population to the deaths expected based on age- and sex-specific rates from a reference population. A staple of occupational epidemiology, but vulnerable to the healthy worker effect.
Key People
Joseph Berkson (1899–1982) Statistician at the Mayo Clinic whose 1946 note showed that hospital-based case-control studies could fabricate associations—the bias now bearing his name.
Jerzy Neyman (1894–1981) Statistician who articulated the prevalence–incidence bias and made foundational contributions to sampling theory and confidence intervals.
Miguel Hernán Epidemiologist whose work has reframed selection bias and external validity using causal diagrams and target-trial logic.
Elias Bareinboim Computer scientist who, with Judea Pearl, developed the formal theory of transportability—when and how causal effects estimated in one population can be carried to another.
No matching entries. Try a different search term.
Section 1 of 4

Selection Bias Mechanisms

⏱ Estimated reading time: 20 minutes

Introduction and Overview

Lesson 7 worked on what we measure and how we model causation. Lesson 8 turns to the third great source of bias in observational research: who ends up in the study at all. Even with perfect measurement and a correctly specified DAG, a study can produce a wrong answer if its sample systematically differs from the target population — either because of how participants were chosen at the outset, because of who dropped out during follow-up, or because the people available to study are themselves a survival-filtered subgroup. Across three content sections we work through the canonical mechanisms in order: Section 1 covers selection biases that arise at enrollment (Berkson's bias and the healthy worker effect); Section 2 covers attrition during follow-up and nonresponse at data collection; Section 3 covers prevalence–incidence (Neyman) bias, survivorship bias, and the related question of transportability — whether a study's findings hold in a different population. By the end of the lesson, you should have a working vocabulary for “the sample is wrong” that complements the “the variables are wrong” vocabulary from Lesson 7, ready to be combined with the information-bias inventory of Lesson 9. The unifying structural account — selection bias as conditioning on a common effect of exposure and outcome — comes from Hernán, Hernández-Díaz, & Robins (2004).

Learning Objectives

  • Define selection bias as a structural problem in how data are generated, distinct from confounding.
  • Explain the mechanism of Berkson’s bias and recognize when hospital-based case-control designs produce spurious associations.
  • Describe the healthy worker effect and interpret SMRs in occupational cohort studies.
  • Identify design strategies (internal comparisons, lagged analyses, cause-specific outcomes) that mitigate enrollment-stage selection bias.

What Is Selection Bias?

Selection bias occurs when the relationship between exposure and outcome differs between study participants and the target population because of the process by which individuals were selected into (or remained in) the study. Unlike confounding, which involves a third variable, selection bias distorts the exposure-outcome relationship through the very mechanics of who ends up being studied.

Key Concept: Selection Bias

Selection bias arises when the association observed in the study sample systematically differs from the association in the target population, due to the procedures used to select participants or factors that influence study participation. It is a structural problem in how data are generated—not a statistical problem that can be fixed by larger sample sizes (Greenland, 2003).

In this section, we examine two classic mechanisms of selection bias: Berkson’s bias and the healthy worker effect. Both illustrate how the process of selecting participants into a study can create spurious associations or mask real ones.

Berkson’s Bias

Case Study: Diabetes and Cholecystitis

In the 1940s, hospital-based case-control studies observed a strong association between diabetes mellitus and cholecystitis (gallbladder inflammation). Researchers hypothesized a biological mechanism linking the two conditions. However, when population-based studies were later conducted, the association largely disappeared. What went wrong?

The answer lies in Berkson’s bias, first described by Berkson (1946). This form of selection bias occurs specifically in hospital-based case-control studies. The core insight is that people with two conditions are more likely to be hospitalized than people with only one condition. When you draw both cases and controls from a hospital population, you artificially inflate the co-occurrence of conditions.

The Mechanism of Berkson’s Bias

Consider two conditions, A and B, that are independent in the general population. A person with condition A has some probability of hospitalization (pA), and a person with condition B has probability pB. A person with both conditions has a hospitalization probability of approximately pA + pB − pA × pB, which is always greater than either alone. This differential hospitalization creates a spurious positive association between the two conditions within the hospital sample, even when none exists in the population.

Click each card to explore key aspects of Berkson’s bias.

🏫
Hospital Controls
Click to learn more
📊
Population Re-Analysis
Click to learn more
When It Does Not Apply
Click to learn more

Berkson's bias is the textbook hospital-sampling problem. The next mechanism is its occupational cousin — selection that happens not at the moment of recruitment but at the moment people enter (or stay in) the labour force.

The Healthy Worker Effect

Case Study: Asbestos Exposure and Mortality

Occupational epidemiologists studying asbestos-exposed workers in the mid-20th century found a puzzling result: the overall mortality rate among asbestos workers was lower than the general population, despite known hazardous exposure. The standardized mortality ratio (SMR) for all-cause mortality was consistently below 1.0 in many studies. How could workers exposed to a known carcinogen have lower mortality?

This paradox is explained by the healthy worker effect (McMichael, 1976)—a form of selection bias inherent to occupational cohort studies. Workers are a selected subgroup of the population: they must be healthy enough to obtain and maintain employment. People who are chronically ill, disabled, or otherwise frail are less likely to enter the workforce.

How it works: The general population includes people who are too ill to work, institutionalized, or otherwise selected out of the labor force. When we compare workers to this general population, we are comparing a “healthier” group to a more heterogeneous one. The result: overall mortality appears lower among workers, masking the true hazard of occupational exposures.

Importantly, the healthy worker effect is strongest for causes of death unrelated to the occupational exposure (such as cardiovascular disease) and weakens or reverses for exposure-specific outcomes (such as mesothelioma in asbestos workers).

The standardized mortality ratio (SMR) compares observed deaths in a worker cohort to expected deaths based on general population rates. An SMR < 1.0 does not mean the workplace is safe—it reflects the selection of healthier individuals into the workforce.

Cause of DeathSMR Among Asbestos WorkersInterpretation
All causes0.85Healthy worker effect masks overall risk
Cardiovascular disease0.78Strong healthy worker effect for non-occupational causes
Lung cancer1.45Elevated despite healthy worker effect—true risk is likely higher
Mesothelioma8.20Very strong exposure-specific signal overcomes selection bias

Note: These values are illustrative based on patterns observed in occupational asbestos studies.

Researchers have developed several strategies to mitigate the healthy worker effect:

  • Internal comparisons: Compare exposed workers to unexposed workers within the same workforce, rather than to the general population
  • Healthy worker survivor effect adjustment: Account for the fact that workers who remain employed over time are progressively healthier (those who become ill leave the workforce)
  • Cause-specific analyses: Focus on outcomes with known biological links to the exposure, where the effect of occupational hazard exceeds the healthy worker selection
  • Lagged analyses: Introduce exposure lag periods to account for latency between exposure and disease onset

Hands-on: Selection & Recruitment Bias Simulator

What you'll do: the simulator below holds a true population fixed and lets you set the participation probability for each combination of exposure and outcome. The presets reproduce the classic mechanisms — Berkson's bias, healthy worker effect, volunteer self-selection, and loss to follow-up. What to take away: the same true population can produce a 2×2 table whose odds ratio is double, half, or even reversed compared with the truth, depending on who chooses to enroll. You are not adjusting any analysis — you are watching the bias appear in the data themselves. Try the “Berkson's bias” preset first to reproduce the diabetes–cholecystitis story you just read.

🎯 Interactive: Selection & Recruitment Bias Simulator

A true population of 2,000 people with a known exposure (E) and outcome (Y). You set the participation probability for each subgroup. When participation depends on both E and Y, the observed exposure–outcome association drifts away from the truth. That is selection bias.

Population (true) vs. Observed sample

Each tile = a person. Bright = enrolled in study; faded = excluded. Color encodes E/Y combination.

2×2 tables & effect estimates
True (population)
Y+Y−
E+
E−
Observed (sample)
Y+Y−
E+
E−
True OR
Observed OR
Bias factor
Presets:
Try the Berkson’s preset: cases (Y+) and exposed (E+) are each more likely to be hospitalized; both happening together inflates participation in the E+/Y+ cell. The observed OR shoots above the true OR, even when the true association is null.
R Simulate the healthy-worker effect with a convenience sample

What you'll do: build a 10,000-person "population" with a known true mean BMI of 27. Draw a simple random sample of 200 people and compare it to a convenience sample of 200 gym-goers, where the probability of being a gym-goer is higher for people with lower BMI. Then replicate each sampling strategy 1,000 times and visualise the sampling distributions.

What to take away: the convenience sample produces a biased mean that does not improve with more replicates — selection bias is not a sample-size problem.

set.seed(230)

# 10,000-person "population" with true mean BMI = 27
N        <- 10000
bmi      <- rnorm(N, mean = 27, sd = 5)
gym_goer <- rbinom(N, 1, prob = plogis(2 - 0.15*bmi))   # lower BMI -> more likely

mean(bmi)                              # truth: ~27

# (1) Simple random sample of 200
srs <- sample(bmi, size = 200)
mean(srs)

# (2) Convenience sample: 200 gym-goers
conv <- sample(bmi[gym_goer == 1], size = 200)
mean(conv)

# Stretch: 1000 replicates of each strategy
srs_means  <- replicate(1000, mean(sample(bmi, 200)))
conv_means <- replicate(1000, mean(sample(bmi[gym_goer == 1], 200)))

par(mfrow = c(1, 2))
hist(srs_means,  main = "Simple Random Sample", xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
hist(conv_means, main = "Convenience (gym)",    xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
Console output (approx.)
[1] 26.99 # true population mean [1] 26.93 # SRS mean -- centred near truth [1] 24.41 # convenience (gym) mean -- biased low # srs_means histogram: centred near 27 (truth) # conv_means histogram: centred near 24.4, well below the red line at 27

Reading the two histograms. The SRS sampling distribution is centred at the red line (truth). The convenience-sample distribution sits about 2.5 BMI units to the left of the red line, no matter how many replicates you run. This is the healthy-worker effect in miniature: the workers (gym-goers) are systematically healthier than the general population.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console and histograms before answering.

1. The true population mean BMI was ~27. What was mean(srs) and what was mean(conv)? By how many BMI units did the convenience sample miss the true mean, and in which direction?

Model answerWith the seeded simulation mean(srs) sits very close to 27 (the population truth) — usually 26.8–27.2 depending on sampling jitter — while mean(conv) drops several BMI units, typically ~24–25. So the convenience sample misses the truth by 2–3 BMI units in the downward direction. The point of the simulation is that the SRS is centred on the right answer and varies only by sampling error, while the convenience sample is centred on the wrong answer by construction.

2. The line gym_goer <- rbinom(N, 1, prob = plogis(2 - 0.15*bmi)) sets the probability of being a gym-goer. Trace through what happens for a person with BMI = 20 vs BMI = 35. How does this code mechanically generate the bias you saw in mean(conv)?

Model answerThe logistic plogis(2 - 0.15*bmi) evaluates to P(gym-goer | BMI=20) = plogis(2 - 3) = plogis(-1) ≈ 0.27 — wait, plogis(2 - 0.15*20) = plogis(2 - 3) = plogis(-1) ≈ 0.27. For BMI=35: plogis(2 - 5.25) = plogis(-3.25) ≈ 0.037. So a lean person (BMI 20) has ~73% chance of being a gym-goer (sampled by the convenience design), while an obese person (BMI 35) has only ~4%. When you sample preferentially from the gym, you systematically over-represent low-BMI individuals — the population mean of the sampled subset is pulled below the true mean. The bias is mechanical: it is built into the selection probability function, not introduced by chance.

3. The two replicated histograms have similar widths but very different centres. In one sentence, explain why doubling the convenience sample size from 200 to 400 would NOT fix this problem — and connect it to the asbestos-worker SMR example from earlier in this section.

Model answerDoubling n from 200 to 400 narrows the histogram (smaller sampling variance) but does not shift its centre — the convenience sample is still drawn under the same selection rule that excludes high-BMI individuals, so the average of any sample drawn that way remains biased. This is exactly the lesson of the asbestos-worker SMR example: the standardized mortality ratio < 1.0 in long-tenured workers is not noise that bigger studies fix — it is structural healthy-worker survivor bias, and the only remedy is a different sampling frame (or analytic adjustment for the selection mechanism), not more of the same.
Saved.

Key Takeaways

  • Berkson’s bias creates spurious associations in hospital-based case-control studies because hospitalization probability depends on having multiple conditions
  • The healthy worker effect masks true occupational risks because workers are systematically healthier than the general population
  • Both biases are structural—they arise from how participants are selected, not from measurement or confounding
  • Population-based designs and internal comparisons are primary strategies for avoiding these biases
Knowledge Check — Section 1

1. In Berkson’s bias, the spurious association between two conditions arises because:

Berkson’s bias occurs because having two conditions jointly increases the probability of hospitalization beyond either condition alone. This makes the conditions appear associated within the hospital sample even when they are independent in the population (Berkson, 1946).

2. An occupational cohort study of chemical plant workers reports an SMR of 0.82 for all-cause mortality. The most likely explanation is:

An SMR below 1.0 for all-cause mortality in an occupational cohort almost always reflects the healthy worker effect. Workers are selected for their ability to work, excluding the chronically ill and frail. This does not mean the exposure is safe—cause-specific analyses may reveal elevated risks for exposure-related outcomes.

3. Which study design best avoids Berkson’s bias?

Population-based case-control studies avoid Berkson’s bias by drawing controls from the general population, eliminating the differential hospitalization probabilities that create spurious associations in hospital-based samples.
Section 2 of 4

Attrition & Nonresponse Bias

⏱ Estimated reading time: 20 minutes

Introduction and Overview

Section 1 covered selection bias at the moment of enrollment. The mechanisms in this section operate later: attrition reshapes the cohort over time as participants drop out, and nonresponse hollows out a survey before any follow-up has even started. Both are forms of selection bias, but they are easy to miss because the original recruitment was sound. The key diagnostic question for both is the same: does the dropout (or non-participation) depend on both the exposure and the outcome?

Learning Objectives

  • Distinguish attrition bias (post-enrollment loss to follow-up) from nonresponse bias (failure to participate at the outset).
  • Recognize when loss to follow-up is differential with respect to exposure and outcome, and explain why that — not the absolute attrition rate — is what matters.
  • Evaluate strategies for detecting and adjusting for attrition (baseline comparisons, sensitivity analyses, IPCW, multiple imputation).
  • Identify common drivers of nonresponse and assess how they distort prevalence and exposure-outcome estimates in survey-based research.

Attrition Bias in Longitudinal Studies

Longitudinal studies follow participants over time, but not everyone stays. When loss to follow-up is related to both the exposure and the outcome, the resulting estimates become biased. This is attrition bias—a form of selection bias that occurs after study enrollment, gradually reshaping the study sample in ways that distort associations.

Case Study: The Framingham Heart Study

The Framingham Heart Study, initiated in 1948, is one of the most influential longitudinal studies in cardiovascular epidemiology. Over decades of follow-up, researchers noticed that participants lost to follow-up had systematically higher risk profiles—they were more likely to smoke, have higher blood pressure, and have lower socioeconomic status. Because these same factors predict cardiovascular events, the loss of high-risk participants led to underestimation of risk factor–outcome associations in certain analyses.

When Attrition Bias Matters Most

Attrition bias is most problematic when the probability of dropping out depends on both the exposure and the outcome (or factors strongly related to both). If loss to follow-up is random with respect to the exposure-outcome relationship, estimates remain unbiased even with substantial attrition. The critical question is not “how many participants were lost?” but “is loss to follow-up differential with respect to the exposure and outcome?”

Mechanisms of Differential Attrition

Several mechanisms can produce differential attrition in epidemiological studies:

  • Illness-related dropout: Participants who become sicker may be unable or unwilling to continue (e.g., advanced cancer patients missing follow-up visits)
  • Exposure-related migration: People who experience adverse effects of an exposure may relocate (e.g., workers who develop respiratory symptoms leaving a polluted area)
  • Competing mortality: Participants who die from causes related to the study exposure are lost from the sample, removing the most affected individuals
  • Socioeconomic barriers: Disadvantaged participants often face transportation, childcare, or work-schedule barriers to continued study participation
Quantifying and Addressing Attrition Bias

Epidemiologists use several strategies to detect and address attrition bias:

  • Compare baseline characteristics: Compare those retained versus those lost to follow-up on key exposure, outcome, and confounder variables
  • Sensitivity analyses: Conduct worst-case and best-case scenarios for missing outcomes among those lost to follow-up
  • Inverse probability of censoring weighting (IPCW): Weight remaining participants to represent those who were lost, based on predictors of attrition
  • Multiple imputation: Use statistical models to fill in plausible values for missing outcome data

None of these methods perfectly eliminates attrition bias, but they provide important evidence about its potential magnitude and direction.

Framingham Heart Study: Lessons Learned

Analysis of Framingham participants who were lost to follow-up revealed:

CharacteristicRetained ParticipantsLost to Follow-Up
Current smoking (%)28%42%
Mean systolic BP (mmHg)132141
Mean BMI (kg/m²)26.428.1
High school education (%)72%54%

Values are illustrative of patterns reported in Framingham attrition analyses.

Because those lost to follow-up were more likely to have the exposures and more likely to develop cardiovascular outcomes, the estimated associations between risk factors and heart disease were attenuated in the retained sample.

Nonresponse Bias in Cross-Sectional Surveys

Attrition is the longitudinal-cohort version of selection during follow-up. Cross-sectional surveys have the same problem at the moment of recruitment: nonresponse bias occurs at the point of data collection in cross-sectional studies and surveys (Galea & Tracy, 2007). When people who choose not to respond differ systematically from those who do respond, the resulting data do not represent the target population.

Case Study: NHANES and Health Survey Nonresponse

The National Health and Nutrition Examination Survey (NHANES) is designed to be nationally representative of the U.S. population. However, response rates have declined over time. Research comparing early responders to late responders (a proxy for nonrespondents) and using linked administrative data has shown that nonrespondents tend to be less healthy—they have higher rates of smoking, obesity, and chronic disease. Standard survey weights partially correct for this, but cannot fully account for unmeasured differences between respondents and nonrespondents.

The direction of nonresponse bias depends on how nonresponse relates to the variables being studied:

  • Health behavior surveys: If people with unhealthier behaviors (smoking, sedentary lifestyles) are less likely to participate, prevalence estimates will underestimate the true burden of these behaviors
  • Stigmatized conditions: People with HIV, substance use disorders, or mental illness may avoid surveys, leading to underestimation of prevalence
  • Health-conscious responders: Conversely, people interested in health may be more likely to participate, inflating apparent health literacy or preventive behavior prevalence

Surveys like NHANES use post-stratification weights to adjust for nonresponse. These weights are calibrated to known population totals (from census data) on variables like age, sex, race/ethnicity, and geography.

The logic: if young men are underrepresented in the survey relative to the census, each young man who did respond receives a higher weight. However, this only corrects for nonresponse that is explained by the weighting variables. If nonresponse is driven by unmeasured factors (like health status itself), weighting alone is insufficient.

Key limitation: Weighting can only correct for nonresponse that is “missing at random” (MAR)—where nonresponse depends on observed variables used in the weighting model. If nonresponse is “missing not at random” (MNAR)—related to the outcome itself—no amount of weighting will fully remove the bias.

For example, if people who are severely depressed are less likely to complete a mental health survey because of their depression, no adjustment for age, sex, or socioeconomic status can recover the true depression prevalence. This is a fundamental limitation of survey-based research.

Reflection

Consider a longitudinal cohort study of cannabis use and psychotic symptoms among young adults. Over 5 years, 30% of participants are lost to follow-up, and those lost are more likely to be heavy cannabis users. How might this attrition bias the study’s findings? What strategies would you recommend to assess or mitigate this bias?

Model answer30% loss-to-follow-up that is differentially higher among heavy cannabis users threatens the study at three levels. First, the remaining cohort under-represents heavy users, so prevalence and incidence of both exposure and outcome are biased downwards. Second, if heavy users who drop out are precisely the ones who would have developed psychotic symptoms, the apparent exposure-outcome association is attenuated — classical informative censoring. Third, depending on whether dropout is driven by symptom severity or by exposure intensity, the bias direction can reverse. Mitigations: (i) build a tracing protocol with alternate contacts and provincial data linkage to recover outcomes for dropouts; (ii) compare baseline characteristics of completers vs. dropouts and report explicitly; (iii) use inverse-probability-of-censoring weights or multiple imputation under MAR; (iv) run sensitivity analyses for plausible MNAR scenarios (e.g., assume all dropouts who were heavy users had the outcome and show how the effect estimate moves).
Reflection saved.
Knowledge Check — Section 2

1. In the Framingham Heart Study, participants lost to follow-up had higher rates of smoking and hypertension. This most likely resulted in:

When participants with the highest risk profiles (exposure and outcome risk) are preferentially lost, the remaining sample underrepresents the extreme end of the exposure-outcome relationship. This attenuates (underestimates) the observed association.

2. NHANES uses post-stratification weighting to address nonresponse. This approach is limited because:

Post-stratification weighting corrects for the missing-at-random (MAR) component of nonresponse—the part explained by observed variables like age, sex, and race. It cannot correct for missing-not-at-random (MNAR) nonresponse, where the decision not to participate is driven by the outcome variable itself.

3. Which of the following best distinguishes attrition bias from nonresponse bias?

The key distinction is timing: attrition bias occurs during study follow-up (participants leave after enrollment), while nonresponse bias occurs at initial data collection (target population members never participate). Both are forms of selection bias but operate at different stages of the research process.
Section 3 of 4

Survivorship Bias & Transportability

⏱ Estimated reading time: 25 minutes

Introduction and Overview

Sections 1 and 2 covered selection biases that arise during the study — how participants were enrolled, who dropped out. This section addresses biases that arise from a more upstream filter: by the time we study a population, some people are already gone (dead, recovered, lost to history). When the people we end up with are the survival-filtered residue of the people we wish we could study, conventional analysis answers a different question than we think it answers. The section closes by zooming out one step further to ask the related external-validity question: even when our internal estimates are unbiased, do they apply to anyone outside our sample?

Learning Objectives

  • Explain prevalence–incidence (Neyman) bias and use the P ≈ I × D relationship to predict its direction.
  • Identify survivorship bias across mortality, treatment persistence, and successful-aging contexts and recognize when conclusions overgeneralize.
  • Distinguish internal validity from external validity and articulate the conditions under which study findings transport to a new target population.
  • Choose study designs (incident-case, nested case-control, restricted target population) that reduce survival-filter and transportability problems.

Prevalence-Incidence (Neyman) Bias

One of the most subtle forms of selection bias occurs when we use cross-sectional data to study risk factors for diseases. The problem: cross-sectional studies capture prevalent cases—people who currently have the disease—rather than incident cases—people who are newly developing it. If some cases die quickly while others survive for years, the cross-sectional sample will overrepresent long-surviving cases and underrepresent rapidly fatal ones.

Case Study: Myocardial Infarction Risk Factors

Early cross-sectional studies of myocardial infarction (MI) survivors examined which risk factors were associated with having had an MI. However, patients with the most severe risk profiles—particularly those with extremely high cholesterol or severe hypertension—were more likely to die from their initial MI before they could be included in a prevalence sample. The result: cross-sectional studies underestimated the strength of these risk factors because the most affected individuals were already dead and absent from the sample.

The Neyman Bias Mechanism

Named after Jerzy Neyman (1955), prevalence-incidence bias occurs because prevalent cases are a subset of all incident cases—specifically, those who survived long enough to be sampled. If the risk factor under study is associated with case fatality (more severe disease or faster death), it will appear less strongly associated with the disease in cross-sectional data than it truly is. The bias is toward the null for risk factors that increase case fatality, and away from the null for factors that improve survival.

📐
Prevalence = I x D
Click to learn more
🔬
Incident Case Designs
Click to learn more

Neyman bias is the cross-sectional version of the survival-filter problem. The same logic shows up in cohort studies whenever the sample is built around “people still here,” whether that means people still alive, still on treatment, or still showing up for follow-up.

Survivorship Bias in Cohort Studies

▸ INTERACTIVE STORY — WALD'S PLANES Open full screen ↗

Walk through the WWII bomber-armor problem (Abraham Wald) that named survivorship bias. Next ▶ advances scenes.

A 7-scene retelling of Abraham Wald's most famous insight: returning bombers mapped with bullet holes, the obvious-but-wrong conclusion, the ghost planes that didn't return, and the rule that armor belongs where the holes AREN'T.

Case Study: HIV Long-Term Survivors

In the early years of the HIV epidemic, before effective antiretroviral therapy, researchers studied “long-term non-progressors”—individuals who remained healthy for years despite HIV infection. Studies of this group identified genetic factors (such as CCR5-delta32 mutations) and immune characteristics associated with slow progression. However, drawing general conclusions about HIV pathogenesis from these survivors was problematic: they represented a highly selected subset of all HIV-infected individuals. The majority of infected individuals had already died, and the survivors differed systematically in ways beyond the factors being studied.

Survivorship bias occurs when we analyze only those who “survived” a process—whether survival means remaining alive, staying in a study, or maintaining a condition—and draw conclusions that we mistakenly generalize to the full original population.

Survivorship Bias Beyond Mortality

While the term “survivorship” evokes mortality, this bias extends to any selective retention process:

  • Treatment persistence studies: Patients who remain on a medication for 2 years are systematically different from those who discontinued—they had fewer side effects, better response, and likely better adherence behaviors. Analyzing only those who persisted overestimates treatment efficacy.
  • Cancer survivor cohorts: Studies of quality of life among “cancer survivors” may overestimate well-being because those who died (often with the worst quality of life) are excluded.
  • Successful aging studies: Research on cognitive function in elderly cohorts necessarily excludes those who died before reaching old age, potentially missing the very exposures that most strongly impair cognition.
HIV Cohort Example: What Survivors Tell Us (and Don’t)
CharacteristicLong-Term SurvivorsAll HIV-Infected (Estimated)
CCR5-delta32 heterozygote (%)15–20%~10%
Strong CD8+ T-cell response (%)80%~40%
Younger age at infection (%)65%~45%
Access to care (%)90%~50%

Values are illustrative of patterns described in early HIV natural history studies.

Because long-term survivors differed on multiple dimensions—genetic, immunological, demographic, and socioeconomic—findings from survivor cohorts could not be straightforwardly generalized to all people living with HIV.

Survivorship and Neyman-style selection can also be read structurally as collider bias — conditioning on “made it into the sample” opens a non-causal path between exposure and outcome (Cole et al., 2010; Munafò et al., 2018). Neyman bias and survivorship bias are about whether the sample we have represents the population we are trying to study. The last topic of the section steps further out: even when the sample does represent its source population, the study's findings may not apply elsewhere. This is the external-validity question that the rest of evidence-based practice depends on.

Transportability of Study Results

Even a perfectly internally valid study may yield misleading conclusions when its findings are applied to a different population. This is the problem of transportability—also called external validity or generalizability—and it is increasingly recognized as a critical challenge in evidence-based practice (Pearl & Bareinboim, 2014; Westreich, Edwards, Lesko, Stuart, & Cole, 2017).

Case Study: Antihypertensive Therapy Across Populations

The ALLHAT trial (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) found that thiazide-type diuretics were as effective as newer, more expensive antihypertensives in reducing cardiovascular events. However, ALLHAT participants were predominantly older adults with established hypertension and multiple comorbidities. When clinicians applied these results to younger patients with fewer comorbidities, the effectiveness of different drug classes sometimes differed—illustrating a target population mismatch.

Transportability vs. Internal Validity

Internal validity asks: “Is the observed association causal within this study?” Transportability asks: “Would the same causal effect hold in a different target population?” A study can have perfect internal validity yet poor transportability if the study population differs from the target population in ways that modify the effect of interest. Effect modification is the key concept linking internal validity to external generalizability.

Study results are transportable to a target population when:

  • The causal mechanism operates the same way in both populations
  • There are no effect modifiers whose distribution differs between the study and target populations
  • The versions of treatment (or exposure) are comparable across settings
  • The outcome measurement is equivalent in both populations

If any of these conditions are violated, the effect observed in the study may not replicate in the target population—not because the study was wrong, but because the populations differ in ways that matter.

Effect modification is the core mechanism that threatens transportability. If the treatment effect differs across subgroups (e.g., by age, comorbidity, or genetics), and the study and target populations have different compositions of these subgroups, the average treatment effect will differ between populations.

Population% With ComorbiditiesMean Treatment Effect (BP Reduction)
ALLHAT trial participants78%−12 mmHg
Young adults (<40 years)15%−8 mmHg
Elderly with renal disease92%−14 mmHg

Values are illustrative of how treatment effects can vary across populations due to effect modification.

Researchers have developed formal methods for assessing and improving transportability:

  • Inverse probability of selection weighting: Reweight the study sample so that its covariate distribution matches the target population (Westreich et al., 2017)
  • Standardization: Estimate the treatment effect within subgroups defined by effect modifiers, then average across the target population’s subgroup distribution
  • Sensitivity analysis for transportability: Assess how much unmeasured effect modification would be needed to qualitatively change conclusions
  • Target trial emulation: Use observational data from the target population to emulate the trial design and compare results

Reflection

A randomized controlled trial of a new diabetes medication was conducted exclusively among patients aged 50–75 at academic medical centers. A community health clinic serving a predominantly young, uninsured population wants to adopt this medication. What factors would you consider when evaluating whether the trial results are transportable to this new setting?

Model answerTransportability is a separate question from internal validity. Factors to consider: (a) baseline risk — the community population has different absolute diabetes risk than the trial population, so even a true relative effect translates to different absolute benefit (and absolute risk reduction is what policy needs). (b) Effect modification — age and uninsured status are likely modifiers (different comorbidities, medication adherence, baseline glycemic control); if the trial enrolled only 50–75-year-olds, the effect in 25–45-year-olds is extrapolation. (c) Healthcare context — academic medical centres deliver intervention with specialist follow-up; a community clinic with under-resourced staffing may not reproduce the trial's adherence support. (d) Comorbidities and concomitant medications — uninsured young adults often have different patterns of co-occurring conditions and access to other therapies. Methodological remedy: run a formal transportability analysis (Pearl & Bareinboim) or a target-trial emulation in the community population, and ask the trialists to publish subgroup effects that can be propagated forward.
Reflection saved.
Knowledge Check — Section 3

1. In a cross-sectional study of MI survivors, the association between extremely high cholesterol and MI is weaker than in prospective cohort data. This is best explained by:

Neyman bias occurs because cross-sectional studies capture prevalent (surviving) cases. If extremely high cholesterol increases MI case fatality, the most affected individuals die before they can be included in a cross-sectional sample. This selectively removes the strongest exposure-outcome signal, biasing the association toward the null.

2. A study examines quality of life among people who have been on antidepressant therapy for at least 2 years. This study is most susceptible to:

By restricting to patients who persisted on therapy for 2 years, the study excludes those who discontinued due to adverse effects, lack of efficacy, or other problems. The remaining sample is selectively enriched for treatment responders, producing an overly optimistic picture of treatment benefit. This is survivorship bias in a treatment persistence context.

3. The ALLHAT trial results for antihypertensive therapy may not be directly transportable to a younger population because:

Transportability fails when effect modifiers are distributed differently in the study vs. target population. ALLHAT enrolled primarily older adults with comorbidities. If age or comorbidity status modifies the treatment effect, the average effect in a younger, healthier population may differ from the trial estimate—even if the trial was perfectly conducted.

4. Which of the following is NOT a valid strategy for improving the transportability of clinical trial results?

Increasing sample size improves precision (narrower confidence intervals) but does not improve transportability. If the trial population differs from the target population on effect modifiers, a larger trial of the same population will simply produce a more precise estimate of the wrong effect. Transportability requires methods that explicitly account for population differences, such as reweighting, standardization, or sensitivity analysis.
Section 4 of 4

Final Assessment

⏱ Estimated time: 25 minutes

Bringing It All Together

This lesson built a working inventory of selection-related biases in the order they typically arise. Section 1 handled biases that operate at the moment of enrollment — Berkson’s bias in hospital-based case-control designs and the healthy worker effect in occupational cohorts. Section 2 turned to losses that happen after enrollment: differential attrition in longitudinal studies and nonresponse in surveys, both invisible unless you ask whether the loss depends on both exposure and outcome. Section 3 tackled the upstream survival filter — prevalence–incidence (Neyman) bias and survivorship bias — and then stepped outside the sample altogether to ask the external-validity question: when does an internally valid finding transport?

Read together, these mechanisms form a structured checklist for "the sample is wrong." Combined with Lesson 7’s vocabulary for "the variables are wrong" and the information-bias inventory ahead in Lesson 9, you are building the appraisal toolkit you will use for the rest of the course. The diagnostic question that runs through all three sections is the same: who is in the data, who is missing, and does that missingness depend on the very relationship we are trying to estimate?

The final reflection asks you to apply the full inventory to a single hypothetical study; the 15-question assessment then checks the conceptual material directly. From here, Lesson 9 turns to information bias — what happens when the people in the study are right but the measurements on them are wrong.

Key Takeaways from Lesson 8

  • Berkson’s bias: hospital-based sampling inflates co-occurrence of conditions because being hospitalized is itself a selection filter.
  • Healthy worker effect: workforce participation pre-selects healthier people, so SMRs below 1.0 against the general population can mask real occupational hazard.
  • Attrition and nonresponse: what matters is not the absolute rate of loss but whether loss is differential with respect to exposure and outcome — only then does it bias the estimate.
  • Neyman bias and survivorship bias: cross-sectional and "survivor"-based samples are filtered by prior survival, biasing risk-factor estimates in directions predictable from P ≈ I × D.
  • Transportability: internal validity does not guarantee external validity — findings travel only when effect-modifier distributions are comparable between study and target populations.
  • Diagnostic discipline: for every study, ask who got into the sample, who left, and whether that depends on both exposure and outcome — structural problems cannot be fixed by larger sample sizes.
R Activity — Convenience sampling and the healthy-worker effect

The companion R script r-activities/HSCI_230_Lesson_8_Sampling_Selection_and_External_Validity.R simulates a 10,000-person population with a known true mean BMI of 27, then draws a simple random sample and a convenience sample of gym-goers (whose enrollment probability depends on BMI). You compare each sample mean to the truth, then replicate 1,000 times to see why selection bias shifts the entire sampling distribution — the same structural problem behind the healthy-worker effect.

set.seed(230)

# 10,000-person "population" with true mean BMI = 27
N        <- 10000
bmi      <- rnorm(N, mean = 27, sd = 5)
gym_goer <- rbinom(N, 1, prob = plogis(2 - 0.15*bmi))   # lower BMI -> more likely

mean(bmi)                              # truth: ~27

# (1) Simple random sample of 200
srs <- sample(bmi, size = 200)
mean(srs)

# (2) Convenience sample: 200 gym-goers
conv <- sample(bmi[gym_goer == 1], size = 200)
mean(conv)

## -----------------------------------------------------------------------------
## Stretch: replicate 1000 times and look at the sampling distribution
## -----------------------------------------------------------------------------
srs_means  <- replicate(1000, mean(sample(bmi, 200)))
conv_means <- replicate(1000, mean(sample(bmi[gym_goer == 1], 200)))

par(mfrow = c(1, 2))
hist(srs_means,  main = "Simple Random Sample", xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
hist(conv_means, main = "Convenience (gym)",     xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
par(mfrow = c(1, 1))

Final Reflection

You are designing a study to estimate the effect of air pollution exposure on childhood asthma in a large metropolitan area. Describe at least three different selection biases that could threaten your study and explain one specific design decision you would make to address each. Be sure to distinguish between biases that affect internal validity versus external validity (transportability).

Model answerThree selection biases for a child-asthma / air-pollution study, with one design decision each. (1) Recruitment / participation bias (internal validity): families in high-pollution neighbourhoods who participate may be more health-aware than non-participants. Fix: use a probability sample of the metropolitan census frame with active follow-up and reporting of refusal rates by neighbourhood. (2) Differential loss-to-follow-up (internal validity): high-pollution-neighbourhood families are more likely to move, biasing 5-year incidence estimates downward in the exposed group. Fix: build administrative-data linkage (provincial health records) so attrition does not erase outcomes; report exposure-stratified attrition. (3) Generalizability / transportability (external validity): the cohort, even if internally clean, may not transport to a different metro with different pollution mix and age structure. Fix: pre-specify the target population, characterise covariate distributions, and produce both within-study and transported-effect estimates. Distinguish clearly: (1)–(2) threaten the effect here; (3) threatens the effect elsewhere.
Reflection saved.
Final Assessment — Lesson 8 (15 Questions)

1. A hospital-based case-control study finds that gallstones are associated with appendicitis. When replicated in a population-based sample, no association is found. This discrepancy is most likely due to:

This is the classic pattern of Berkson’s bias: an association observed in hospital-based samples that disappears in population-based samples. Both conditions independently increase hospitalization probability, creating a spurious positive association in the hospital sample.

2. A researcher compares mortality rates of nuclear power plant workers to the general population and finds an SMR of 0.75 for all-cause mortality. The researcher concludes that radiation exposure is not harmful. What is the primary flaw in this reasoning?

An SMR below 1.0 for all-cause mortality in any occupational cohort should raise suspicion for the healthy worker effect. Workers are a selected healthy subgroup. The appropriate response is to conduct cause-specific analyses (especially for cancers linked to radiation) and use internal comparisons between more- and less-exposed workers.

3. In a 10-year cohort study of alcohol consumption and liver disease, participants who develop early-stage liver disease are more likely to drop out. This attrition will most likely:

When people who develop the outcome (liver disease) drop out and these same people are more likely to have the exposure (heavy alcohol use), the remaining sample underrepresents the exposure-outcome co-occurrence. This attenuates (underestimates) the observed association.

4. A survey on depression prevalence has a 45% response rate. Compared to census benchmarks, respondents are more educated and more likely to be employed. The survey’s depression prevalence estimate is most likely:

When nonrespondents have higher rates of the outcome (depression is more common among less educated, unemployed individuals) and they are underrepresented in the sample, prevalence estimates will undercount the outcome. The more educated, employed respondents have lower depression rates, dragging the estimate below the true population value.

5. A cross-sectional study finds no association between a genetic variant and pancreatic cancer. A subsequent cohort study finds a strong positive association. The most likely explanation is:

Pancreatic cancer has very high case fatality. If the genetic variant increases the aggressiveness of the cancer (faster progression, shorter survival), individuals with the variant are more likely to die before being captured in a cross-sectional sample. The cohort study, which captures incident cases at diagnosis, detects the true association.

6. A study of “successful agers” (people over 85 in good health) finds that moderate alcohol consumption is associated with better cognitive function. This finding should be interpreted cautiously because:

Studying “successful agers” inherently involves survivorship bias. Those who reach age 85 in good health are a highly selected group. Both moderate alcohol consumption and preserved cognitive function may be consequences of underlying factors (genetics, socioeconomic advantage, overall health) that enabled survival. The association found among survivors may not represent a causal effect of alcohol on cognition.

7. A clinical trial conducted exclusively in academic medical centers finds that a new chemotherapy regimen reduces mortality by 30%. When the regimen is used in community oncology practices, the benefit is only 15%. This discrepancy most likely reflects:

This is a classic transportability problem. Clinical trial participants at academic centers are typically younger, have fewer comorbidities, better performance status, and access to more intensive supportive care. When the treatment is applied in a community setting with different patient characteristics, the average effect size changes because of effect modification by these characteristics.

8. Which of the following correctly describes the relationship between Neyman bias and the formula P = I x D?

The P = I x D relationship shows that prevalence depends on both how many new cases arise (incidence) and how long cases persist (duration). Neyman bias occurs because cross-sectional studies measure prevalence. If a risk factor increases incidence but also increases case fatality (decreasing duration), the prevalence effect is attenuated or even reversed, masking the true risk factor association.

9. Inverse probability of censoring weighting (IPCW) is used to address:

IPCW addresses attrition bias by calculating each participant’s probability of remaining in the study, then weighting those who remain by the inverse of that probability. This effectively “up-weights” participants who are similar to those who dropped out, attempting to reconstruct a representative sample.

10. A researcher wants to study risk factors for sudden cardiac death. Why would a case-control study using prevalent coronary heart disease cases be inappropriate?

Sudden cardiac death, by definition, kills rapidly. Using prevalent cases of coronary heart disease means studying survivors who, by definition, did not die suddenly. The risk factors for sudden death may be systematically different from those associated with chronic coronary disease, and Neyman bias means the most relevant risk factors are systematically underrepresented.

11. In a study of occupational asbestos exposure, the SMR for mesothelioma is 8.2 while the SMR for all-cause mortality is 0.85. This pattern indicates that:

This divergence between all-cause and cause-specific SMRs is the hallmark of the healthy worker effect. The workforce is selected for general health (low all-cause mortality), but asbestos creates such a large excess risk for mesothelioma that the exposure-specific signal overwhelms the healthy worker selection. This pattern means the true all-cause mortality risk is underestimated.

12. A survey of health behaviors among university students achieves an 80% response rate. The researcher states this eliminates nonresponse bias. This claim is:

The magnitude of nonresponse bias depends not on the response rate alone, but on the product of the nonresponse rate and the difference between respondents and nonrespondents on the variable of interest. Even with 80% response, if the 20% who do not respond have very different health behaviors (e.g., heavy substance users who avoid surveys), the bias can be substantial.

13. HIV long-term non-progressors were found to have distinctive genetic and immune profiles. Generalizing these findings to all HIV-infected individuals is problematic primarily because of:

Long-term non-progressors survived while most HIV-infected individuals in the pre-treatment era died. This creates extreme survivorship bias: any characteristics associated with survival will be enriched among survivors, making it impossible to determine which factors truly caused slow progression versus merely co-occurred with other survival advantages.

14. A trial of a new vaccine was conducted in a well-nourished population in a high-income country. When the vaccine is deployed in a malnourished population in a low-income setting, efficacy is substantially lower. The best explanation for this difference is:

This is a transportability problem driven by effect modification. Immune response to vaccination depends partly on nutritional status. If the vaccine’s efficacy differs between well-nourished and malnourished individuals (effect modification by nutritional status), then results from a well-nourished trial population will not transport directly to a malnourished target population.

15. A researcher argues that increasing the sample size of a clinical trial will automatically make the results more generalizable to other populations. This reasoning is:

Larger samples increase statistical precision but not transportability. The law of large numbers ensures convergence to the parameter in the study population, not the target population. If the study population is enriched for certain characteristics relative to the target population, a larger sample simply estimates the study-population-specific effect more precisely. Generalizability requires explicit attention to the distribution of effect modifiers in both populations.

Lesson 8 Complete!

Congratulations! You have successfully completed the lesson on Sampling, Selection Processes, and External Validity. Your responses have been downloaded automatically.

You demonstrated understanding of Berkson’s bias, the healthy worker effect, attrition and nonresponse bias, prevalence-incidence bias, survivorship bias, and transportability.

Lesson 9 stays inside the same broad inventory but pivots to its third leg. Where Lesson 7 covered measurement and causal-specification problems and this lesson covered selection problems, Lesson 9: Information Bias and Data Quality turns to information bias — the systematic errors that arise from how exposure and outcome are measured once participants are in the study. The three-part bias triad (information, selection, confounding) will be complete by the end of Lesson 9.