Sampling, Selection Processes &
External Validity
Evaluating Epidemiological Research — HSCI 230
Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Explain Berkson’s bias and identify when hospital-based sampling distorts exposure-outcome associations
- Describe the healthy worker effect and interpret standardized mortality ratios in occupational epidemiology
- Distinguish attrition bias from nonresponse bias and evaluate strategies for mitigating each
- Recognize prevalence-incidence (Neyman) bias and explain how cross-sectional designs miss rapidly fatal cases
- Identify survivorship bias in cohort studies and its implications for causal inference
- Evaluate transportability of study findings to target populations with differing characteristics
- Critically assess whether epidemiological studies have adequately addressed selection-related threats to validity
Glossary — Key Terms, People & Concepts
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.
Selection Bias Mechanisms
Introduction and Overview
Lesson 7 worked on what we measure and how we model causation. Lesson 8 turns to the third great source of bias in observational research: who ends up in the study at all. Even with perfect measurement and a correctly specified DAG, a study can produce a wrong answer if its sample systematically differs from the target population — either because of how participants were chosen at the outset, because of who dropped out during follow-up, or because the people available to study are themselves a survival-filtered subgroup. Across three content sections we work through the canonical mechanisms in order: Section 1 covers selection biases that arise at enrollment (Berkson's bias and the healthy worker effect); Section 2 covers attrition during follow-up and nonresponse at data collection; Section 3 covers prevalence–incidence (Neyman) bias, survivorship bias, and the related question of transportability — whether a study's findings hold in a different population. By the end of the lesson, you should have a working vocabulary for “the sample is wrong” that complements the “the variables are wrong” vocabulary from Lesson 7, ready to be combined with the information-bias inventory of Lesson 9. The unifying structural account — selection bias as conditioning on a common effect of exposure and outcome — comes from Hernán, Hernández-Díaz, & Robins (2004).
Learning Objectives
- Define selection bias as a structural problem in how data are generated, distinct from confounding.
- Explain the mechanism of Berkson’s bias and recognize when hospital-based case-control designs produce spurious associations.
- Describe the healthy worker effect and interpret SMRs in occupational cohort studies.
- Identify design strategies (internal comparisons, lagged analyses, cause-specific outcomes) that mitigate enrollment-stage selection bias.
What Is Selection Bias?
Selection bias occurs when the relationship between exposure and outcome differs between study participants and the target population because of the process by which individuals were selected into (or remained in) the study. Unlike confounding, which involves a third variable, selection bias distorts the exposure-outcome relationship through the very mechanics of who ends up being studied.
Key Concept: Selection Bias
Selection bias arises when the association observed in the study sample systematically differs from the association in the target population, due to the procedures used to select participants or factors that influence study participation. It is a structural problem in how data are generated—not a statistical problem that can be fixed by larger sample sizes (Greenland, 2003).
In this section, we examine two classic mechanisms of selection bias: Berkson’s bias and the healthy worker effect. Both illustrate how the process of selecting participants into a study can create spurious associations or mask real ones.
Berkson’s Bias
In the 1940s, hospital-based case-control studies observed a strong association between diabetes mellitus and cholecystitis (gallbladder inflammation). Researchers hypothesized a biological mechanism linking the two conditions. However, when population-based studies were later conducted, the association largely disappeared. What went wrong?
The answer lies in Berkson’s bias, first described by Berkson (1946). This form of selection bias occurs specifically in hospital-based case-control studies. The core insight is that people with two conditions are more likely to be hospitalized than people with only one condition. When you draw both cases and controls from a hospital population, you artificially inflate the co-occurrence of conditions.
The Mechanism of Berkson’s Bias
Consider two conditions, A and B, that are independent in the general population. A person with condition A has some probability of hospitalization (pA), and a person with condition B has probability pB. A person with both conditions has a hospitalization probability of approximately pA + pB − pA × pB, which is always greater than either alone. This differential hospitalization creates a spurious positive association between the two conditions within the hospital sample, even when none exists in the population.
Click each card to explore key aspects of Berkson’s bias.
Berkson's bias is the textbook hospital-sampling problem. The next mechanism is its occupational cousin — selection that happens not at the moment of recruitment but at the moment people enter (or stay in) the labour force.
The Healthy Worker Effect
Occupational epidemiologists studying asbestos-exposed workers in the mid-20th century found a puzzling result: the overall mortality rate among asbestos workers was lower than the general population, despite known hazardous exposure. The standardized mortality ratio (SMR) for all-cause mortality was consistently below 1.0 in many studies. How could workers exposed to a known carcinogen have lower mortality?
This paradox is explained by the healthy worker effect (McMichael, 1976)—a form of selection bias inherent to occupational cohort studies. Workers are a selected subgroup of the population: they must be healthy enough to obtain and maintain employment. People who are chronically ill, disabled, or otherwise frail are less likely to enter the workforce.
How it works: The general population includes people who are too ill to work, institutionalized, or otherwise selected out of the labor force. When we compare workers to this general population, we are comparing a “healthier” group to a more heterogeneous one. The result: overall mortality appears lower among workers, masking the true hazard of occupational exposures.
Importantly, the healthy worker effect is strongest for causes of death unrelated to the occupational exposure (such as cardiovascular disease) and weakens or reverses for exposure-specific outcomes (such as mesothelioma in asbestos workers).
The standardized mortality ratio (SMR) compares observed deaths in a worker cohort to expected deaths based on general population rates. An SMR < 1.0 does not mean the workplace is safe—it reflects the selection of healthier individuals into the workforce.
| Cause of Death | SMR Among Asbestos Workers | Interpretation |
|---|---|---|
| All causes | 0.85 | Healthy worker effect masks overall risk |
| Cardiovascular disease | 0.78 | Strong healthy worker effect for non-occupational causes |
| Lung cancer | 1.45 | Elevated despite healthy worker effect—true risk is likely higher |
| Mesothelioma | 8.20 | Very strong exposure-specific signal overcomes selection bias |
Note: These values are illustrative based on patterns observed in occupational asbestos studies.
Researchers have developed several strategies to mitigate the healthy worker effect:
- Internal comparisons: Compare exposed workers to unexposed workers within the same workforce, rather than to the general population
- Healthy worker survivor effect adjustment: Account for the fact that workers who remain employed over time are progressively healthier (those who become ill leave the workforce)
- Cause-specific analyses: Focus on outcomes with known biological links to the exposure, where the effect of occupational hazard exceeds the healthy worker selection
- Lagged analyses: Introduce exposure lag periods to account for latency between exposure and disease onset
Hands-on: Selection & Recruitment Bias Simulator
What you'll do: the simulator below holds a true population fixed and lets you set the participation probability for each combination of exposure and outcome. The presets reproduce the classic mechanisms — Berkson's bias, healthy worker effect, volunteer self-selection, and loss to follow-up. What to take away: the same true population can produce a 2×2 table whose odds ratio is double, half, or even reversed compared with the truth, depending on who chooses to enroll. You are not adjusting any analysis — you are watching the bias appear in the data themselves. Try the “Berkson's bias” preset first to reproduce the diabetes–cholecystitis story you just read.
🎯 Interactive: Selection & Recruitment Bias Simulator
A true population of 2,000 people with a known exposure (E) and outcome (Y). You set the participation probability for each subgroup. When participation depends on both E and Y, the observed exposure–outcome association drifts away from the truth. That is selection bias.
Population (true) vs. Observed sample
Each tile = a person. Bright = enrolled in study; faded = excluded. Color encodes E/Y combination.
2×2 tables & effect estimates
True (population)
| Y+ | Y− | |
|---|---|---|
| E+ | — | — |
| E− | — | — |
Observed (sample)
| Y+ | Y− | |
|---|---|---|
| E+ | — | — |
| E− | — | — |
What you'll do: build a 10,000-person "population" with a known true mean BMI of 27. Draw a simple random sample of 200 people and compare it to a convenience sample of 200 gym-goers, where the probability of being a gym-goer is higher for people with lower BMI. Then replicate each sampling strategy 1,000 times and visualise the sampling distributions.
What to take away: the convenience sample produces a biased mean that does not improve with more replicates — selection bias is not a sample-size problem.
set.seed(230)
# 10,000-person "population" with true mean BMI = 27
N <- 10000
bmi <- rnorm(N, mean = 27, sd = 5)
gym_goer <- rbinom(N, 1, prob = plogis(2 - 0.15*bmi)) # lower BMI -> more likely
mean(bmi) # truth: ~27
# (1) Simple random sample of 200
srs <- sample(bmi, size = 200)
mean(srs)
# (2) Convenience sample: 200 gym-goers
conv <- sample(bmi[gym_goer == 1], size = 200)
mean(conv)
# Stretch: 1000 replicates of each strategy
srs_means <- replicate(1000, mean(sample(bmi, 200)))
conv_means <- replicate(1000, mean(sample(bmi[gym_goer == 1], 200)))
par(mfrow = c(1, 2))
hist(srs_means, main = "Simple Random Sample", xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
hist(conv_means, main = "Convenience (gym)", xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
Reading the two histograms. The SRS sampling distribution is centred at the red line (truth). The convenience-sample distribution sits about 2.5 BMI units to the left of the red line, no matter how many replicates you run. This is the healthy-worker effect in miniature: the workers (gym-goers) are systematically healthier than the general population.
R Reflect on what you just ran
Use the questions below to interpret the output you produced. Look at your console and histograms before answering.
1. The true population mean BMI was ~27. What was mean(srs) and what was mean(conv)? By how many BMI units did the convenience sample miss the true mean, and in which direction?
mean(srs) sits very close to 27 (the population truth) — usually 26.8–27.2 depending on sampling jitter — while mean(conv) drops several BMI units, typically ~24–25. So the convenience sample misses the truth by 2–3 BMI units in the downward direction. The point of the simulation is that the SRS is centred on the right answer and varies only by sampling error, while the convenience sample is centred on the wrong answer by construction.2. The line gym_goer <- rbinom(N, 1, prob = plogis(2 - 0.15*bmi)) sets the probability of being a gym-goer. Trace through what happens for a person with BMI = 20 vs BMI = 35. How does this code mechanically generate the bias you saw in mean(conv)?
plogis(2 - 0.15*bmi) evaluates to P(gym-goer | BMI=20) = plogis(2 - 3) = plogis(-1) ≈ 0.27 — wait, plogis(2 - 0.15*20) = plogis(2 - 3) = plogis(-1) ≈ 0.27. For BMI=35: plogis(2 - 5.25) = plogis(-3.25) ≈ 0.037. So a lean person (BMI 20) has ~73% chance of being a gym-goer (sampled by the convenience design), while an obese person (BMI 35) has only ~4%. When you sample preferentially from the gym, you systematically over-represent low-BMI individuals — the population mean of the sampled subset is pulled below the true mean. The bias is mechanical: it is built into the selection probability function, not introduced by chance.3. The two replicated histograms have similar widths but very different centres. In one sentence, explain why doubling the convenience sample size from 200 to 400 would NOT fix this problem — and connect it to the asbestos-worker SMR example from earlier in this section.
Key Takeaways
- Berkson’s bias creates spurious associations in hospital-based case-control studies because hospitalization probability depends on having multiple conditions
- The healthy worker effect masks true occupational risks because workers are systematically healthier than the general population
- Both biases are structural—they arise from how participants are selected, not from measurement or confounding
- Population-based designs and internal comparisons are primary strategies for avoiding these biases
1. In Berkson’s bias, the spurious association between two conditions arises because:
2. An occupational cohort study of chemical plant workers reports an SMR of 0.82 for all-cause mortality. The most likely explanation is:
3. Which study design best avoids Berkson’s bias?
Attrition & Nonresponse Bias
Introduction and Overview
Section 1 covered selection bias at the moment of enrollment. The mechanisms in this section operate later: attrition reshapes the cohort over time as participants drop out, and nonresponse hollows out a survey before any follow-up has even started. Both are forms of selection bias, but they are easy to miss because the original recruitment was sound. The key diagnostic question for both is the same: does the dropout (or non-participation) depend on both the exposure and the outcome?
Learning Objectives
- Distinguish attrition bias (post-enrollment loss to follow-up) from nonresponse bias (failure to participate at the outset).
- Recognize when loss to follow-up is differential with respect to exposure and outcome, and explain why that — not the absolute attrition rate — is what matters.
- Evaluate strategies for detecting and adjusting for attrition (baseline comparisons, sensitivity analyses, IPCW, multiple imputation).
- Identify common drivers of nonresponse and assess how they distort prevalence and exposure-outcome estimates in survey-based research.
Attrition Bias in Longitudinal Studies
Longitudinal studies follow participants over time, but not everyone stays. When loss to follow-up is related to both the exposure and the outcome, the resulting estimates become biased. This is attrition bias—a form of selection bias that occurs after study enrollment, gradually reshaping the study sample in ways that distort associations.
The Framingham Heart Study, initiated in 1948, is one of the most influential longitudinal studies in cardiovascular epidemiology. Over decades of follow-up, researchers noticed that participants lost to follow-up had systematically higher risk profiles—they were more likely to smoke, have higher blood pressure, and have lower socioeconomic status. Because these same factors predict cardiovascular events, the loss of high-risk participants led to underestimation of risk factor–outcome associations in certain analyses.
When Attrition Bias Matters Most
Attrition bias is most problematic when the probability of dropping out depends on both the exposure and the outcome (or factors strongly related to both). If loss to follow-up is random with respect to the exposure-outcome relationship, estimates remain unbiased even with substantial attrition. The critical question is not “how many participants were lost?” but “is loss to follow-up differential with respect to the exposure and outcome?”
Several mechanisms can produce differential attrition in epidemiological studies:
- Illness-related dropout: Participants who become sicker may be unable or unwilling to continue (e.g., advanced cancer patients missing follow-up visits)
- Exposure-related migration: People who experience adverse effects of an exposure may relocate (e.g., workers who develop respiratory symptoms leaving a polluted area)
- Competing mortality: Participants who die from causes related to the study exposure are lost from the sample, removing the most affected individuals
- Socioeconomic barriers: Disadvantaged participants often face transportation, childcare, or work-schedule barriers to continued study participation
Epidemiologists use several strategies to detect and address attrition bias:
- Compare baseline characteristics: Compare those retained versus those lost to follow-up on key exposure, outcome, and confounder variables
- Sensitivity analyses: Conduct worst-case and best-case scenarios for missing outcomes among those lost to follow-up
- Inverse probability of censoring weighting (IPCW): Weight remaining participants to represent those who were lost, based on predictors of attrition
- Multiple imputation: Use statistical models to fill in plausible values for missing outcome data
None of these methods perfectly eliminates attrition bias, but they provide important evidence about its potential magnitude and direction.
Analysis of Framingham participants who were lost to follow-up revealed:
| Characteristic | Retained Participants | Lost to Follow-Up |
|---|---|---|
| Current smoking (%) | 28% | 42% |
| Mean systolic BP (mmHg) | 132 | 141 |
| Mean BMI (kg/m²) | 26.4 | 28.1 |
| High school education (%) | 72% | 54% |
Values are illustrative of patterns reported in Framingham attrition analyses.
Because those lost to follow-up were more likely to have the exposures and more likely to develop cardiovascular outcomes, the estimated associations between risk factors and heart disease were attenuated in the retained sample.
Nonresponse Bias in Cross-Sectional Surveys
Attrition is the longitudinal-cohort version of selection during follow-up. Cross-sectional surveys have the same problem at the moment of recruitment: nonresponse bias occurs at the point of data collection in cross-sectional studies and surveys (Galea & Tracy, 2007). When people who choose not to respond differ systematically from those who do respond, the resulting data do not represent the target population.
The National Health and Nutrition Examination Survey (NHANES) is designed to be nationally representative of the U.S. population. However, response rates have declined over time. Research comparing early responders to late responders (a proxy for nonrespondents) and using linked administrative data has shown that nonrespondents tend to be less healthy—they have higher rates of smoking, obesity, and chronic disease. Standard survey weights partially correct for this, but cannot fully account for unmeasured differences between respondents and nonrespondents.
The direction of nonresponse bias depends on how nonresponse relates to the variables being studied:
- Health behavior surveys: If people with unhealthier behaviors (smoking, sedentary lifestyles) are less likely to participate, prevalence estimates will underestimate the true burden of these behaviors
- Stigmatized conditions: People with HIV, substance use disorders, or mental illness may avoid surveys, leading to underestimation of prevalence
- Health-conscious responders: Conversely, people interested in health may be more likely to participate, inflating apparent health literacy or preventive behavior prevalence
Surveys like NHANES use post-stratification weights to adjust for nonresponse. These weights are calibrated to known population totals (from census data) on variables like age, sex, race/ethnicity, and geography.
The logic: if young men are underrepresented in the survey relative to the census, each young man who did respond receives a higher weight. However, this only corrects for nonresponse that is explained by the weighting variables. If nonresponse is driven by unmeasured factors (like health status itself), weighting alone is insufficient.
Key limitation: Weighting can only correct for nonresponse that is “missing at random” (MAR)—where nonresponse depends on observed variables used in the weighting model. If nonresponse is “missing not at random” (MNAR)—related to the outcome itself—no amount of weighting will fully remove the bias.
For example, if people who are severely depressed are less likely to complete a mental health survey because of their depression, no adjustment for age, sex, or socioeconomic status can recover the true depression prevalence. This is a fundamental limitation of survey-based research.
Reflection
Consider a longitudinal cohort study of cannabis use and psychotic symptoms among young adults. Over 5 years, 30% of participants are lost to follow-up, and those lost are more likely to be heavy cannabis users. How might this attrition bias the study’s findings? What strategies would you recommend to assess or mitigate this bias?
1. In the Framingham Heart Study, participants lost to follow-up had higher rates of smoking and hypertension. This most likely resulted in:
2. NHANES uses post-stratification weighting to address nonresponse. This approach is limited because:
3. Which of the following best distinguishes attrition bias from nonresponse bias?
Survivorship Bias & Transportability
Introduction and Overview
Sections 1 and 2 covered selection biases that arise during the study — how participants were enrolled, who dropped out. This section addresses biases that arise from a more upstream filter: by the time we study a population, some people are already gone (dead, recovered, lost to history). When the people we end up with are the survival-filtered residue of the people we wish we could study, conventional analysis answers a different question than we think it answers. The section closes by zooming out one step further to ask the related external-validity question: even when our internal estimates are unbiased, do they apply to anyone outside our sample?
Learning Objectives
- Explain prevalence–incidence (Neyman) bias and use the P ≈ I × D relationship to predict its direction.
- Identify survivorship bias across mortality, treatment persistence, and successful-aging contexts and recognize when conclusions overgeneralize.
- Distinguish internal validity from external validity and articulate the conditions under which study findings transport to a new target population.
- Choose study designs (incident-case, nested case-control, restricted target population) that reduce survival-filter and transportability problems.
Prevalence-Incidence (Neyman) Bias
One of the most subtle forms of selection bias occurs when we use cross-sectional data to study risk factors for diseases. The problem: cross-sectional studies capture prevalent cases—people who currently have the disease—rather than incident cases—people who are newly developing it. If some cases die quickly while others survive for years, the cross-sectional sample will overrepresent long-surviving cases and underrepresent rapidly fatal ones.
Early cross-sectional studies of myocardial infarction (MI) survivors examined which risk factors were associated with having had an MI. However, patients with the most severe risk profiles—particularly those with extremely high cholesterol or severe hypertension—were more likely to die from their initial MI before they could be included in a prevalence sample. The result: cross-sectional studies underestimated the strength of these risk factors because the most affected individuals were already dead and absent from the sample.
The Neyman Bias Mechanism
Named after Jerzy Neyman (1955), prevalence-incidence bias occurs because prevalent cases are a subset of all incident cases—specifically, those who survived long enough to be sampled. If the risk factor under study is associated with case fatality (more severe disease or faster death), it will appear less strongly associated with the disease in cross-sectional data than it truly is. The bias is toward the null for risk factors that increase case fatality, and away from the null for factors that improve survival.
Neyman bias is the cross-sectional version of the survival-filter problem. The same logic shows up in cohort studies whenever the sample is built around “people still here,” whether that means people still alive, still on treatment, or still showing up for follow-up.
Survivorship Bias in Cohort Studies
Walk through the WWII bomber-armor problem (Abraham Wald) that named survivorship bias. Next ▶ advances scenes.
A 7-scene retelling of Abraham Wald's most famous insight: returning bombers mapped with bullet holes, the obvious-but-wrong conclusion, the ghost planes that didn't return, and the rule that armor belongs where the holes AREN'T.
In the early years of the HIV epidemic, before effective antiretroviral therapy, researchers studied “long-term non-progressors”—individuals who remained healthy for years despite HIV infection. Studies of this group identified genetic factors (such as CCR5-delta32 mutations) and immune characteristics associated with slow progression. However, drawing general conclusions about HIV pathogenesis from these survivors was problematic: they represented a highly selected subset of all HIV-infected individuals. The majority of infected individuals had already died, and the survivors differed systematically in ways beyond the factors being studied.
Survivorship bias occurs when we analyze only those who “survived” a process—whether survival means remaining alive, staying in a study, or maintaining a condition—and draw conclusions that we mistakenly generalize to the full original population.
While the term “survivorship” evokes mortality, this bias extends to any selective retention process:
- Treatment persistence studies: Patients who remain on a medication for 2 years are systematically different from those who discontinued—they had fewer side effects, better response, and likely better adherence behaviors. Analyzing only those who persisted overestimates treatment efficacy.
- Cancer survivor cohorts: Studies of quality of life among “cancer survivors” may overestimate well-being because those who died (often with the worst quality of life) are excluded.
- Successful aging studies: Research on cognitive function in elderly cohorts necessarily excludes those who died before reaching old age, potentially missing the very exposures that most strongly impair cognition.
| Characteristic | Long-Term Survivors | All HIV-Infected (Estimated) |
|---|---|---|
| CCR5-delta32 heterozygote (%) | 15–20% | ~10% |
| Strong CD8+ T-cell response (%) | 80% | ~40% |
| Younger age at infection (%) | 65% | ~45% |
| Access to care (%) | 90% | ~50% |
Values are illustrative of patterns described in early HIV natural history studies.
Because long-term survivors differed on multiple dimensions—genetic, immunological, demographic, and socioeconomic—findings from survivor cohorts could not be straightforwardly generalized to all people living with HIV.
Survivorship and Neyman-style selection can also be read structurally as collider bias — conditioning on “made it into the sample” opens a non-causal path between exposure and outcome (Cole et al., 2010; Munafò et al., 2018). Neyman bias and survivorship bias are about whether the sample we have represents the population we are trying to study. The last topic of the section steps further out: even when the sample does represent its source population, the study's findings may not apply elsewhere. This is the external-validity question that the rest of evidence-based practice depends on.
Transportability of Study Results
Even a perfectly internally valid study may yield misleading conclusions when its findings are applied to a different population. This is the problem of transportability—also called external validity or generalizability—and it is increasingly recognized as a critical challenge in evidence-based practice (Pearl & Bareinboim, 2014; Westreich, Edwards, Lesko, Stuart, & Cole, 2017).
The ALLHAT trial (Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial) found that thiazide-type diuretics were as effective as newer, more expensive antihypertensives in reducing cardiovascular events. However, ALLHAT participants were predominantly older adults with established hypertension and multiple comorbidities. When clinicians applied these results to younger patients with fewer comorbidities, the effectiveness of different drug classes sometimes differed—illustrating a target population mismatch.
Transportability vs. Internal Validity
Internal validity asks: “Is the observed association causal within this study?” Transportability asks: “Would the same causal effect hold in a different target population?” A study can have perfect internal validity yet poor transportability if the study population differs from the target population in ways that modify the effect of interest. Effect modification is the key concept linking internal validity to external generalizability.
Study results are transportable to a target population when:
- The causal mechanism operates the same way in both populations
- There are no effect modifiers whose distribution differs between the study and target populations
- The versions of treatment (or exposure) are comparable across settings
- The outcome measurement is equivalent in both populations
If any of these conditions are violated, the effect observed in the study may not replicate in the target population—not because the study was wrong, but because the populations differ in ways that matter.
Effect modification is the core mechanism that threatens transportability. If the treatment effect differs across subgroups (e.g., by age, comorbidity, or genetics), and the study and target populations have different compositions of these subgroups, the average treatment effect will differ between populations.
| Population | % With Comorbidities | Mean Treatment Effect (BP Reduction) |
|---|---|---|
| ALLHAT trial participants | 78% | −12 mmHg |
| Young adults (<40 years) | 15% | −8 mmHg |
| Elderly with renal disease | 92% | −14 mmHg |
Values are illustrative of how treatment effects can vary across populations due to effect modification.
Researchers have developed formal methods for assessing and improving transportability:
- Inverse probability of selection weighting: Reweight the study sample so that its covariate distribution matches the target population (Westreich et al., 2017)
- Standardization: Estimate the treatment effect within subgroups defined by effect modifiers, then average across the target population’s subgroup distribution
- Sensitivity analysis for transportability: Assess how much unmeasured effect modification would be needed to qualitatively change conclusions
- Target trial emulation: Use observational data from the target population to emulate the trial design and compare results
Reflection
A randomized controlled trial of a new diabetes medication was conducted exclusively among patients aged 50–75 at academic medical centers. A community health clinic serving a predominantly young, uninsured population wants to adopt this medication. What factors would you consider when evaluating whether the trial results are transportable to this new setting?
1. In a cross-sectional study of MI survivors, the association between extremely high cholesterol and MI is weaker than in prospective cohort data. This is best explained by:
2. A study examines quality of life among people who have been on antidepressant therapy for at least 2 years. This study is most susceptible to:
3. The ALLHAT trial results for antihypertensive therapy may not be directly transportable to a younger population because:
4. Which of the following is NOT a valid strategy for improving the transportability of clinical trial results?
Final Assessment
Bringing It All Together
This lesson built a working inventory of selection-related biases in the order they typically arise. Section 1 handled biases that operate at the moment of enrollment — Berkson’s bias in hospital-based case-control designs and the healthy worker effect in occupational cohorts. Section 2 turned to losses that happen after enrollment: differential attrition in longitudinal studies and nonresponse in surveys, both invisible unless you ask whether the loss depends on both exposure and outcome. Section 3 tackled the upstream survival filter — prevalence–incidence (Neyman) bias and survivorship bias — and then stepped outside the sample altogether to ask the external-validity question: when does an internally valid finding transport?
Read together, these mechanisms form a structured checklist for "the sample is wrong." Combined with Lesson 7’s vocabulary for "the variables are wrong" and the information-bias inventory ahead in Lesson 9, you are building the appraisal toolkit you will use for the rest of the course. The diagnostic question that runs through all three sections is the same: who is in the data, who is missing, and does that missingness depend on the very relationship we are trying to estimate?
The final reflection asks you to apply the full inventory to a single hypothetical study; the 15-question assessment then checks the conceptual material directly. From here, Lesson 9 turns to information bias — what happens when the people in the study are right but the measurements on them are wrong.
Key Takeaways from Lesson 8
- Berkson’s bias: hospital-based sampling inflates co-occurrence of conditions because being hospitalized is itself a selection filter.
- Healthy worker effect: workforce participation pre-selects healthier people, so SMRs below 1.0 against the general population can mask real occupational hazard.
- Attrition and nonresponse: what matters is not the absolute rate of loss but whether loss is differential with respect to exposure and outcome — only then does it bias the estimate.
- Neyman bias and survivorship bias: cross-sectional and "survivor"-based samples are filtered by prior survival, biasing risk-factor estimates in directions predictable from P ≈ I × D.
- Transportability: internal validity does not guarantee external validity — findings travel only when effect-modifier distributions are comparable between study and target populations.
- Diagnostic discipline: for every study, ask who got into the sample, who left, and whether that depends on both exposure and outcome — structural problems cannot be fixed by larger sample sizes.
The companion R script r-activities/HSCI_230_Lesson_8_Sampling_Selection_and_External_Validity.R simulates a 10,000-person population with a known true mean BMI of 27, then draws a simple random sample and a convenience sample of gym-goers (whose enrollment probability depends on BMI). You compare each sample mean to the truth, then replicate 1,000 times to see why selection bias shifts the entire sampling distribution — the same structural problem behind the healthy-worker effect.
set.seed(230)
# 10,000-person "population" with true mean BMI = 27
N <- 10000
bmi <- rnorm(N, mean = 27, sd = 5)
gym_goer <- rbinom(N, 1, prob = plogis(2 - 0.15*bmi)) # lower BMI -> more likely
mean(bmi) # truth: ~27
# (1) Simple random sample of 200
srs <- sample(bmi, size = 200)
mean(srs)
# (2) Convenience sample: 200 gym-goers
conv <- sample(bmi[gym_goer == 1], size = 200)
mean(conv)
## -----------------------------------------------------------------------------
## Stretch: replicate 1000 times and look at the sampling distribution
## -----------------------------------------------------------------------------
srs_means <- replicate(1000, mean(sample(bmi, 200)))
conv_means <- replicate(1000, mean(sample(bmi[gym_goer == 1], 200)))
par(mfrow = c(1, 2))
hist(srs_means, main = "Simple Random Sample", xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
hist(conv_means, main = "Convenience (gym)", xlim = c(20, 30), xlab = "Mean BMI")
abline(v = 27, col = "red", lwd = 2)
par(mfrow = c(1, 1))
Final Reflection
You are designing a study to estimate the effect of air pollution exposure on childhood asthma in a large metropolitan area. Describe at least three different selection biases that could threaten your study and explain one specific design decision you would make to address each. Be sure to distinguish between biases that affect internal validity versus external validity (transportability).
1. A hospital-based case-control study finds that gallstones are associated with appendicitis. When replicated in a population-based sample, no association is found. This discrepancy is most likely due to:
2. A researcher compares mortality rates of nuclear power plant workers to the general population and finds an SMR of 0.75 for all-cause mortality. The researcher concludes that radiation exposure is not harmful. What is the primary flaw in this reasoning?
3. In a 10-year cohort study of alcohol consumption and liver disease, participants who develop early-stage liver disease are more likely to drop out. This attrition will most likely:
4. A survey on depression prevalence has a 45% response rate. Compared to census benchmarks, respondents are more educated and more likely to be employed. The survey’s depression prevalence estimate is most likely:
5. A cross-sectional study finds no association between a genetic variant and pancreatic cancer. A subsequent cohort study finds a strong positive association. The most likely explanation is:
6. A study of “successful agers” (people over 85 in good health) finds that moderate alcohol consumption is associated with better cognitive function. This finding should be interpreted cautiously because:
7. A clinical trial conducted exclusively in academic medical centers finds that a new chemotherapy regimen reduces mortality by 30%. When the regimen is used in community oncology practices, the benefit is only 15%. This discrepancy most likely reflects:
8. Which of the following correctly describes the relationship between Neyman bias and the formula P = I x D?
9. Inverse probability of censoring weighting (IPCW) is used to address:
10. A researcher wants to study risk factors for sudden cardiac death. Why would a case-control study using prevalent coronary heart disease cases be inappropriate?
11. In a study of occupational asbestos exposure, the SMR for mesothelioma is 8.2 while the SMR for all-cause mortality is 0.85. This pattern indicates that:
12. A survey of health behaviors among university students achieves an 80% response rate. The researcher states this eliminates nonresponse bias. This claim is:
13. HIV long-term non-progressors were found to have distinctive genetic and immune profiles. Generalizing these findings to all HIV-infected individuals is problematic primarily because of:
14. A trial of a new vaccine was conducted in a well-nourished population in a high-income country. When the vaccine is deployed in a malnourished population in a low-income setting, efficacy is substantially lower. The best explanation for this difference is:
15. A researcher argues that increasing the sample size of a clinical trial will automatically make the results more generalizable to other populations. This reasoning is: