# Lesson 8 — Sampling, Selection & External Validity (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5359 words • ~29.0 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 8, Sampling, Selection, and External Validity. This is the third pillar in what this material calls the great trio of bias in observational research. Lesson 7 was about what we measure and how we model causation. Lesson 9 will be about how the data themselves can mislead us. Lesson 8 sits between them, and it asks a question that's almost embarrassingly simple once you hear it.

**Sarah:** What's the question?

**Kiffer:** Who ends up in the study at all. Even with perfect measurement and a correctly specified causal diagram, a study can still produce a wrong answer if the people in the sample systematically differ from the population we actually care about.

**Sarah:** Okay, before we go further, slow down on that. Beginning students often miss this distinction. What is selection bias, and how is it different from confounding, which is the bias most students hear about first?

**Kiffer:** Confounding is when some third variable is correlated with both your exposure and your outcome, producing some of the apparent association. The classic example. If coffee drinkers have higher rates of lung cancer, the confounder is smoking, because smokers drink more coffee and smoking causes lung cancer.

**Sarah:** So confounding is in some sense a problem about variables. It's about a variable you didn't measure or didn't adjust for properly.

**Kiffer:** Right. Selection bias is different. Selection bias is not about a missing variable. It's about who's in your sample. The relationship inside your study can be perfectly clean, with no confounding at all, and still be wrong, because the people who got into the study aren't representative of the population you're trying to make claims about.

**Sarah:** So selection bias is structural in a way confounding isn't.

**Kiffer:** That's the right word. The lesson is explicit. Selection bias is a structural problem in how data are generated. It's not a statistical problem you can fix by collecting more data. If your sampling process is broken, getting ten times as much broken data does not give you the right answer. It just gives you a more precise wrong answer.

**Sarah:** Walk us through the structure of the lesson.

**Kiffer:** Section 1 is about selection bias at enrollment. Berkson's bias and the healthy worker effect. Section 2 covers what happens later. Attrition during follow-up, when participants drop out, and nonresponse, when people in a survey decline to participate from the start. Section 3 is about survival filters, where the people available to study are themselves the residue of a process that's already filtered out a lot of people. That's prevalence-incidence bias, survivorship bias, and the related question of transportability.

**Sarah:** Let's start with Berkson's bias. The lesson opens with a really classic case study. Walk through it slowly because the mechanism is genuinely counterintuitive.

**Kiffer:** The case is from the 1940s. Hospital-based case-control studies kept finding a strong association between diabetes mellitus and cholecystitis. Cholecystitis is inflammation of the gallbladder. So researchers kept looking at hospital populations, finding that people with diabetes were also more likely to have gallbladder inflammation, and concluding there must be some biological link.

**Sarah:** And was there?

**Kiffer:** When population-based studies were eventually done, drawing samples from the general community rather than from hospitals, the association largely disappeared. Something about the hospital sampling itself was producing the apparent association.

**Sarah:** And that something has a name.

**Kiffer:** It's called Berkson's bias, named after Joseph Berkson, who described it in a 1946 paper. Berkson was a mathematician and physician at the Mayo Clinic. He spent most of his career on the statistics of medical research, and he had a knack for spotting patterns other people had missed. The 1946 paper formalized something that had been quietly distorting hospital-based research for decades.

**Sarah:** Walk through the actual mechanism.

**Kiffer:** Imagine two conditions, A and B, that are completely independent in the general population. Knowing someone has condition A tells you nothing about whether they have condition B. Now suppose people with condition A have some probability of being hospitalized for it. Call that p sub A. People with condition B have probability p sub B.

**Sarah:** And what about people who have both?

**Kiffer:** That's the key. A person who has both conditions can be hospitalized for either one. So their hospitalization probability is roughly p sub A plus p sub B minus the small overlap, which is larger than either probability alone. People with two conditions are more likely to end up in a hospital than people with only one, even when the conditions are independent in the population.

**Sarah:** So if you draw your sample from inside the hospital, you've already filtered toward people who have multiple things going on.

**Kiffer:** And that's the whole bias right there. When you draw both your cases and your controls from the hospital, you're sampling from a population enriched for people with multiple conditions. The two conditions look correlated inside the hospital, even though they're independent in the population at large. Population-based studies don't have that filter, so the spurious association evaporates.

**Sarah:** And the fix is structural, not statistical.

**Kiffer:** Right. You don't fix Berkson's bias with a clever adjustment. You change where you draw your sample from. Use community-based controls. Or if you must use hospital controls, choose controls whose hospitalization is unrelated to your exposure of interest.

**Sarah:** Okay, the second classic mechanism. The healthy worker effect.

**Kiffer:** The textbook case is asbestos workers in the mid-twentieth century.

**Sarah:** Quick context. What's the asbestos story for someone who's never heard of it?

**Kiffer:** Asbestos is a naturally occurring fibrous mineral that was used heavily in industrial insulation, shipbuilding, construction, and brake linings through most of the twentieth century. It's also extremely carcinogenic. Inhaled asbestos fibers cause lung cancer and a rare cancer of the lining of the lung called mesothelioma. By the 1960s and 70s, occupational epidemiologists were studying these workers extensively to quantify the risk.

**Sarah:** And what they found was confusing.

**Kiffer:** Despite asbestos being a known carcinogen, the overall mortality of asbestos workers was lower than the general population. The standardized mortality ratio for all-cause mortality was consistently below 1.0 in many studies.

**Sarah:** Hold on. Define standardized mortality ratio for me.

**Kiffer:** Sure. The standardized mortality ratio, abbreviated SMR, is the ratio of observed deaths in a worker cohort to expected deaths based on general population rates of the same age and sex. An SMR of 1.0 means worker mortality matches the general population. Above 1.0 means workers are dying faster. Below 1.0 means slower.

**Sarah:** And asbestos workers had SMRs below 1.0. Lower mortality than the general public, despite breathing in a known carcinogen.

**Kiffer:** Right. And the obvious wrong interpretation is that asbestos is somehow protective. Which is absurd.

**Sarah:** So what's actually going on?

**Kiffer:** Workers are not a random slice of the general population. To get a job and keep it, you have to be reasonably healthy. People who are chronically ill, severely disabled, who can't lift things, can't show up reliably, those people are less likely to be in the workforce. So the working population is healthier than the general population on average, before you consider any occupational exposure.

**Sarah:** And the general population you're comparing them to includes those sicker people.

**Kiffer:** Right. The general population is a heterogeneous mix. Working-age people who can work, plus working-age people who can't work because they're too sick. When you compare an asbestos worker cohort to that mixed population, you're comparing the healthier subset to the whole. Of course they have lower all-cause mortality. The comparison is rigged in their favor before you even count the asbestos.

**Sarah:** So the apparent protection is a selection artifact, not a treatment effect.

**Kiffer:** Right. Now here's the really important nuance. The healthy worker effect is strongest for causes of death unrelated to the exposure. So if I'm an asbestos worker, I'm less likely than the general population to die of cardiovascular disease, because I had to be healthy enough to work. But for diseases asbestos directly causes, the effect of the exposure can eventually overwhelm the selection bias.

**Sarah:** Walk us through what that looks like in numbers.

**Kiffer:** The lesson gives an illustrative table. All-cause mortality SMR around 0.85, so asbestos workers as a whole have 15 percent lower mortality than the general population. Cardiovascular disease SMR around 0.78, even more depressed. Lung cancer SMR around 1.45, elevated despite the healthy worker effect. And mesothelioma SMR around 8.20. Eight times the expected rate. Mesothelioma is so rare in the general population, and asbestos is such a specific cause, that the signal completely overwhelms the selection effect.

**Sarah:** So an SMR below 1.0 in an occupational cohort is almost never evidence the workplace is safe. It's evidence workers are a healthier-than-average baseline.

**Kiffer:** Right. And the lesson lays out four mitigation strategies. First, internal comparisons. Compare highly exposed workers to less exposed workers within the same workforce. Both groups had to pass the workforce filter, so the selection bias is roughly balanced.

**Sarah:** Second?

**Kiffer:** Healthy worker survivor effect adjustment. Workers who stay employed over time become progressively healthier, because the ones who get sick leave. There are statistical methods to adjust for this dynamic selection.

**Sarah:** Third?

**Kiffer:** Cause-specific analyses. Don't just look at all-cause mortality. Look at outcomes with known biological links to the exposure. For asbestos, that's mesothelioma, lung cancer, and asbestosis. The exposure-specific signal is much stronger than any selection bias for those outcomes.

**Sarah:** Fourth?

**Kiffer:** Lagged analyses. Introduce a time lag between exposure and outcome to account for biological latency. Mesothelioma takes thirty to forty years to develop after asbestos exposure. If you look at workers' health right after they start work, you'll miss the signal entirely.

**Sarah:** Okay, that's Section 1. Section 2 turns to selection biases that operate later in a study.

**Kiffer:** Right. Attrition reshapes a longitudinal cohort over time as participants drop out. Nonresponse hollows out a survey at the moment of recruitment. Different timing, but the same logic. People are leaving your sample, and if their leaving is related to the exposure-outcome relationship, your estimates get distorted.

**Sarah:** Let's start with attrition. The textbook case is the Framingham Heart Study. I want full context here because it's referenced everywhere earlier in this series.

**Kiffer:** The Framingham Heart Study started in 1948 in Framingham, Massachusetts, a small town about twenty miles west of Boston. It was initiated by the U.S. Public Health Service and is now run by the National Heart, Lung, and Blood Institute. The original cohort was about 5,200 adults from the town. They were enrolled and have been followed every two years with physical exams, lab work, and questionnaires for the rest of their lives.

**Sarah:** And it didn't stop with the original cohort.

**Kiffer:** No. They added the children of the original participants in 1971, then their grandchildren in 2002, and additional cohorts to capture the diversity of the modern town. The study has now run for over seventy-five years across three generations of the same families. Almost everything we know about cardiovascular risk factors, smoking, blood pressure, cholesterol, body mass index, came out of Framingham.

**Sarah:** And the attrition story is what?

**Kiffer:** Over decades, researchers noticed that people who dropped out of follow-up were not random. Participants lost to follow-up had systematically higher risk profiles. They were more likely to be smokers. They had higher blood pressure. They had lower socioeconomic status. They had higher body mass index. The lesson gives illustrative numbers. Among retained participants, about 28 percent were current smokers. Among those lost to follow-up, 42 percent were smokers. Mean systolic blood pressure was 132 in retained versus 141 in those lost.

**Sarah:** So the people who dropped out were exactly the people you most needed to keep tracking.

**Kiffer:** Exactly. And because those high-risk individuals were also more likely to develop cardiovascular events, when they dropped out, the strength of the risk-factor associations got attenuated. The people who would have had the events were the ones who left.

**Sarah:** I want to slow down on the diagnostic question. The lesson says the critical question is not how many participants were lost. It's whether dropout depends on both the exposure and the outcome.

**Kiffer:** Yeah, this is subtle. If your dropout is random with respect to the exposure-outcome relationship, you can lose a lot of people without bias. The information just gets noisier. Your confidence intervals get wider. But your estimates remain unbiased on average.

**Sarah:** But if dropout is selective in a particular way, even modest attrition can introduce serious bias.

**Kiffer:** Right. The dangerous case is when dropout depends on both the exposure and the outcome, or factors strongly related to both. Then the remaining sample is no longer representative. Smokers with high blood pressure who are heading toward a cardiac event being the ones who drop out, that's the worst case.

**Sarah:** The lesson lists four mechanisms that produce differential attrition. Walk us through them.

**Kiffer:** First, illness-related dropout. Participants who become sicker may simply be unable or unwilling to continue. Advanced cancer patients miss follow-up visits. People with severe depression don't return calls. The very outcomes you're trying to study cause the people who experience them to leave.

**Sarah:** Second?

**Kiffer:** Exposure-related migration. People who experience adverse effects of an exposure may relocate. Workers who develop respiratory symptoms move away from a polluted area. Residents who experience flooding leave the floodplain. So the people most affected by the exposure remove themselves from the sample.

**Sarah:** Third?

**Kiffer:** Competing mortality. Participants who die from causes related to the study exposure are lost from the sample. Especially relevant in studies of older populations where there are many causes of death.

**Sarah:** Fourth?

**Kiffer:** Socioeconomic barriers. Disadvantaged participants face transportation challenges, childcare gaps, and work-schedule conflicts that make continued participation difficult. So your sample slowly drifts toward the more privileged segment of the cohort.

**Sarah:** Then the lesson turns to nonresponse bias. The cross-sectional cousin of attrition.

**Kiffer:** Same logic, different timing. Attrition reshapes a cohort over time. Nonresponse hollows out a survey at the moment of recruitment. The textbook case is NHANES.

**Sarah:** Spell that out for someone hearing it for the first time.

**Kiffer:** NHANES stands for the National Health and Nutrition Examination Survey. It's run by the U.S. Centers for Disease Control and Prevention through the National Center for Health Statistics. It started in the early 1960s and continues today. NHANES is unusual because it doesn't just ask people questions. It also brings them in for physical exams and laboratory measurements. So it captures things like blood pressure, body mass index, cholesterol, and biomarkers, not just self-reported behaviors. It's one of the foundational data sources for U.S. public health surveillance.

**Sarah:** And the nonresponse problem is that response rates have been declining.

**Kiffer:** Yeah, this has been a slow erosion across most population surveys. NHANES is no exception. Research that compares early responders to late responders, or uses linked administrative records, shows a consistent pattern. Nonrespondents tend to be less healthy than respondents. Higher rates of smoking, obesity, chronic disease, and mental health conditions.

**Sarah:** Which means a survey of people who responded systematically underestimates the burden of those conditions.

**Kiffer:** Exactly. And the direction depends on what you're studying. For health behavior surveys, if people with unhealthier behaviors are less likely to participate, prevalence estimates underestimate the true burden. For stigmatized conditions like HIV, substance use disorders, or mental illness, people may avoid surveys, leading to underestimation. Sometimes the bias goes the other way. People interested in health are more likely to participate, which can inflate apparent rates of preventive behavior.

**Sarah:** How do surveys try to fix this?

**Kiffer:** The standard approach is post-stratification weights. Quick definition. Post-stratification weights adjust the sample so it matches known population totals on key variables, usually demographics from the census, like age, sex, race or ethnicity, and geographic region. The logic is, if young men are underrepresented in your survey relative to the census, each young man who did respond gets a higher weight.

**Sarah:** And the limitation?

**Kiffer:** Weighting can only correct for nonresponse explained by the variables you weight on. If young men in the survey are systematically healthier than young men in the population, weighting up their numbers won't fix that. The unhealthier young men just aren't there to be reweighted. So if nonresponse is driven by unmeasured factors like health status itself, no amount of weighting will fully remove the bias.

**Sarah:** The lesson formalizes this with three categories of missingness. Walk through them.

**Kiffer:** Three categories. Missing completely at random, abbreviated MCAR. Missing at random, abbreviated MAR. And missing not at random, abbreviated MNAR. They sound similar but describe quite different mechanisms.

**Sarah:** Start with missing completely at random.

**Kiffer:** MCAR means the probability of being missing is unrelated to anything. The lab dropped the sample. The survey form got lost in the mail. Random data loss with no pattern. Under MCAR, complete-case analysis, where you just drop the missing observations, is unbiased. You lose precision but your estimates still point at the right answer.

**Sarah:** Now missing at random.

**Kiffer:** MAR is more subtle. Missingness depends on observed variables but not on the missing values themselves, after conditioning on what you've observed. Younger participants skip a depression questionnaire more often than older ones. So missingness depends on age, which you observed. But among people of the same age, missingness is unrelated to depression severity itself. If you adjust for age, you can recover unbiased estimates. Multiple imputation works under MAR.

**Sarah:** And missing not at random?

**Kiffer:** MNAR means missingness depends on the unobserved values themselves. People with severe depression are less likely to complete a depression survey because of their depression. The very thing you're trying to measure is causing the missingness. No standard analytic method can fully correct MNAR. You can do sensitivity analyses with different assumptions, but you can't recover the truth from the data alone.

**Sarah:** Okay, Section 3. Survival filters and external validity. This is where the lesson zooms out.

**Kiffer:** Sections 1 and 2 covered selection biases that arose during the study. Section 3 addresses biases from a more upstream filter. By the time we study a population, some people are already gone. Dead, recovered, lost to history. The people we end up with are the survival-filtered residue of the people we wish we could study. Then the section closes by stepping out one more level to ask the external validity question. Even if our internal estimates are unbiased, do they apply to anyone outside our sample?

**Sarah:** Start with prevalence-incidence bias. Also called Neyman bias.

**Kiffer:** Named after Jerzy Neyman, who described it in 1955. Neyman was a Polish-American mathematician and statistician. He was one of the foundational figures in twentieth-century statistics, especially the theory of confidence intervals and the design of sampling. He emigrated from Poland to the U.S. in 1938 and spent most of his career at Berkeley.

**Sarah:** What's the problem he described?

**Kiffer:** Cross-sectional studies capture prevalent cases, not incident cases. Quick reminder. A prevalent case is someone who currently has the disease. An incident case is someone who newly develops the disease during a defined period. Cross-sectional sampling, by definition, gives you the prevalent cases.

**Sarah:** And the survival filter is what?

**Kiffer:** If some cases die quickly and others survive for years, the cross-sectional sample overrepresents the long-surviving cases and underrepresents the rapidly fatal ones. The textbook example is myocardial infarction risk factors. Quick definition. Myocardial infarction, abbreviated MI, is a heart attack. The death of part of the heart muscle due to blocked blood flow.

**Sarah:** Walk through the case.

**Kiffer:** Early cross-sectional studies of MI survivors examined which risk factors were associated with having had a heart attack. Researchers compared people who had survived an MI on cholesterol, blood pressure, smoking, and other risk factors with people who hadn't had one. But here's the catch. Patients with the most severe risk profiles, extremely high cholesterol, severe hypertension, the most damaged hearts, were also more likely to die from their first heart attack before they could be included in any prevalence sample.

**Sarah:** So the people whose risk-factor profile would tell you the most are the ones missing from the data.

**Kiffer:** Right. The result is that cross-sectional studies underestimated the strength of those risk factors. The most affected individuals were already dead and absent from the sample. The bias is toward the null for risk factors that increase case fatality, and away from the null for factors that improve survival.

**Sarah:** And this generalizes way beyond heart attacks.

**Kiffer:** Survivorship bias is the broader name for the same logic. Anywhere you analyze only the survivors of a process and try to draw conclusions that apply to the full original population, you risk this bias.

**Sarah:** The lesson uses the HIV long-term non-progressors example. I want full context, because the listener might not know the early HIV story.

**Kiffer:** Quick context. The acronym HIV stands for human immunodeficiency virus. It's the virus that causes AIDS, acquired immunodeficiency syndrome. The HIV epidemic emerged in the early 1980s and was extremely fatal. Most people infected with HIV in the 1980s and early 1990s progressed to AIDS within about ten years and died shortly after. Effective combination antiretroviral therapy, what's called highly active antiretroviral therapy, only emerged in 1996.

**Sarah:** So before 1996, HIV was essentially a death sentence.

**Kiffer:** For most people, yes. But not for everyone. A small subset of people infected with HIV remained healthy for years, even decades, without progressing to AIDS. They were called long-term non-progressors. Researchers were intensely interested in studying them, because if you could figure out why some people resisted disease progression, you might find clues to treatment or vaccine development.

**Sarah:** What did they find?

**Kiffer:** Genetic factors mattered. The most famous one was the CCR5 delta-32 mutation. Quick context. CCR5 is a protein on the surface of human immune cells that HIV uses as a doorway to get in. The CCR5 delta-32 mutation is a deletion of 32 base pairs in the CCR5 gene that produces a non-functional version of the protein. People who inherit two copies of the mutation are highly resistant to HIV infection. People with one copy progress more slowly when infected.

**Sarah:** And the lesson mentions researchers also identified strong CD8 T-cell responses, younger age at infection, and better access to care.

**Kiffer:** Right. The lesson gives illustrative numbers. Long-term survivors had CCR5 delta-32 heterozygote rates around 15 to 20 percent, compared to about 10 percent in all infected people. Strong CD8 T-cell response in 80 percent of survivors versus around 40 percent overall. So the survivors were different on many dimensions. Genetic, immunologic, demographic, and socioeconomic.

**Sarah:** Why is generalizing from them problematic?

**Kiffer:** Because they were a highly selected subset. The majority of HIV-infected people had already died. The survivors were the residue of an enormous selection process. Drawing conclusions from them about HIV pathogenesis in general would be like drawing conclusions about all marathon runners by studying only the people who finished a marathon.

**Sarah:** And the lesson points out that survivorship bias goes way beyond mortality. Walk through the other examples.

**Kiffer:** Treatment persistence studies are one. People still on a medication two years after starting are systematically different from those who discontinued. They had fewer side effects, better response, better adherence. If you analyze only the persisters, you overestimate treatment efficacy, because the people for whom the drug didn't work or caused side effects are gone from the sample.

**Sarah:** Cancer survivor cohorts are another.

**Kiffer:** Studies of quality of life among cancer survivors may overestimate well-being because the people who died, often the ones who had the worst quality of life in their final months, are excluded by definition. Survivor is in the name.

**Sarah:** And successful aging studies.

**Kiffer:** Same logic. Research on cognitive function in elderly cohorts necessarily excludes those who died before reaching old age. If the exposures that most strongly impair cognition also kill people, those exposures look weaker in the surviving elderly than they really are. Lead exposure, severe cardiovascular disease, untreated hypertension. The people most damaged by them aren't in the elderly cohort because they didn't make it.

**Sarah:** Okay, the last topic. Transportability and external validity.

**Kiffer:** Even when the sample does represent its source population, the findings may not apply elsewhere. This is the problem of transportability, also called external validity or generalizability. And it's increasingly recognized as a critical issue in evidence-based practice, because so much of clinical and public health policy is built on extrapolating from one study population to another.

**Sarah:** The lesson is sharp on the distinction between internal validity and transportability.

**Kiffer:** Internal validity asks, is the observed association causal within this study? Did we get the right answer for the people we studied? Transportability asks a different question. Would the same causal effect hold in a different target population? A study can have perfect internal validity and still fail to transport. The estimate is right for the people studied, but it might be wrong for the people you want to apply it to.

**Sarah:** The textbook example is the ALLHAT trial.

**Kiffer:** Spell out the acronym. ALLHAT stands for the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial. It was a major U.S. randomized trial that ran from 1994 to 2002. It enrolled over 33,000 high-risk hypertensive adults aged 55 and older from across North America. The trial compared different classes of blood pressure medication. Newer agents like calcium channel blockers and ACE inhibitors against older, cheaper thiazide diuretics.

**Sarah:** Quick definition. What's a thiazide diuretic?

**Kiffer:** Thiazide diuretics are an older class of blood pressure medication. They make the kidneys excrete more salt and water, which reduces blood volume and lowers blood pressure. They've been around since the 1950s. They're cheap and well-studied. Drugs like hydrochlorothiazide. The newer drug classes were marketed as more effective and having fewer side effects, often at much higher cost.

**Sarah:** And what did ALLHAT find?

**Kiffer:** The headline finding was that thiazide diuretics were as effective as the newer, more expensive drugs in reducing major cardiovascular events. The cheap old drug worked just as well as the fancy new ones. Which was a big deal for cost-conscious health systems.

**Sarah:** But the transportability problem was what?

**Kiffer:** ALLHAT participants were predominantly older adults with established hypertension and multiple comorbidities. They had to be at least 55, and many had diabetes, kidney disease, or prior cardiovascular events. The trial spoke clearly to that population. But when clinicians applied the results to younger patients with fewer comorbidities, the relative effectiveness of different drug classes sometimes differed. The cheap drug wasn't necessarily as good in the younger, healthier patients, because the underlying biology of their hypertension was different.

**Sarah:** And the lesson uses this to introduce effect modification as the central concept threatening transportability.

**Kiffer:** Quick definition. Effect modification, also called interaction, is when the size or direction of the exposure-outcome relationship differs across subgroups. So the treatment effect of thiazides might be larger in people with kidney disease and smaller in young healthy adults. The effect is modified by the comorbidity status.

**Sarah:** And how does that connect to transportability?

**Kiffer:** It's the central mechanism. If a treatment effect varies across subgroups defined by some characteristic, like age, comorbidity, or genetics, and if your study population and target population have different distributions of that characteristic, then the average effect in the two populations will differ. Even if your trial was perfectly conducted. Even if your causal estimate inside the trial is unbiased. The averages don't transport.

**Sarah:** The lesson lays out four conditions for transportability.

**Kiffer:** First, the causal mechanism has to operate the same way in both populations. The biology has to be similar enough that the intervention works through the same pathways. Second, there can be no effect modifiers whose distribution differs between study and target. Or if there are, you have to be able to adjust for them. Third, the versions of the treatment have to be comparable across settings. The same drug, the same dose, the same delivery mechanism. Fourth, the outcome measurement has to be equivalent.

**Sarah:** And distinguishing effect modifiers from confounders is worth pausing on, because students confuse them.

**Kiffer:** Right. A confounder is a variable that distorts the exposure-outcome relationship and needs to be adjusted for. An effect modifier is a variable across which the effect itself genuinely differs. You don't adjust for an effect modifier. You report effects within levels of it, and you ask whether the levels in your sample match the levels in your target population.

**Sarah:** The lesson lists four formal methods for assessing or improving transportability.

**Kiffer:** First, inverse probability of selection weighting. Reweight your study sample so its covariate distribution matches the target population. Second, standardization. Estimate treatment effects within subgroups defined by effect modifiers, then average across the target population's subgroup distribution. Third, sensitivity analysis for transportability. Assess how much unmeasured effect modification would be needed to qualitatively change conclusions. Fourth, target trial emulation. Use observational data from the target population to emulate the trial design and compare results.

**Sarah:** And the lesson makes one important closing point. Increasing trial sample size does not improve transportability.

**Kiffer:** Right. Bigger trials give you more precision, narrower confidence intervals. They don't fix transportability. If your trial population differs from your target on effect modifiers, a bigger trial of the same population just gives you a more precise estimate of the wrong effect. Transportability is about composition, not precision.

**Sarah:** Okay, time to pull this all together. There's a lot in this lesson, but the takeaways are clean.

**Kiffer:** Six main takeaways.

**Sarah:** Let's hear them.

**Kiffer:** First. Selection bias is structural. It's about how the sample was generated. It's not about a missing variable, the way confounding is. It's not about measurement, the way information bias is. It's about who's in your data. And critically, it cannot be fixed by collecting more data. A larger biased sample is just a more precise wrong answer.

**Sarah:** Second. Berkson's bias. Hospital-based sampling inflates the apparent co-occurrence of conditions because people with multiple conditions are more likely to be hospitalized. Always ask, does my control source have the same hospitalization probability as my case source? If the answer is no, you might be looking at Berkson. The fix is structural. Use community-based controls.

**Kiffer:** Third. The healthy worker effect. A standardized mortality ratio below 1.0 for all-cause mortality in an occupational cohort almost always reflects the selection of healthier individuals into the workforce, not the safety of the exposure. Use internal comparisons, focus on cause-specific outcomes with known biological links to exposure, and use lagged analyses for exposures with long latency.

**Sarah:** Fourth. Attrition and nonresponse are most dangerous when dropout depends on both the exposure and the outcome. Random attrition just adds noise. Differential attrition introduces bias. The three categories of missingness, missing completely at random, missing at random, and missing not at random, are worth memorizing. Multiple imputation handles missing at random. Missing not at random is fundamentally unrecoverable from the data alone. Post-stratification weights help with what you measured. They cannot fix nonresponse driven by what you didn't measure.

**Kiffer:** Fifth. Prevalence-incidence bias and survivorship bias both arise when the sample is the survival-filtered residue of a larger population. Cross-sectional designs are particularly vulnerable. Survivor cohorts in HIV research, treatment persistence studies, cancer survivor cohorts, and successful aging studies all share the same logic. The people who didn't make it are precisely the ones whose data would tell you the most. Incident-case designs are the cleaner alternative when feasible.

**Sarah:** And sixth. Transportability is not the same as internal validity. Internal validity asks whether the estimate is right within this sample. Transportability asks whether it holds in a different population. Effect modification is the mechanism that determines the answer. If your trial population differs from your target on variables that modify the effect, the trial estimate may not apply. Increasing trial sample size improves precision. It does not improve transportability.

**Kiffer:** And one practical recommendation. The lesson includes a Selection and Recruitment Bias Simulator that's worth playing with. It holds a true population fixed and lets you set the participation probability for each combination of exposure and outcome. The presets reproduce Berkson's bias, the healthy worker effect, volunteer self-selection, and loss to follow-up. Watching the same true population produce wildly different observed odds ratios under different participation patterns is the fastest way to internalize that selection bias is a structural problem in how the data are generated.

**Sarah:** Next up is Lesson 9. Information Bias and Data Quality. We move from who's in the study to how the data themselves can mislead us.

**Kiffer:** Take care, everyone.

**Sarah:** See you there.