# Lesson 6 — Cohort Studies (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5,500 words • ~30 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 6, Cohort Studies. The third of the three big sampling designs we previewed in Lesson 4. Cross-sectional, then case-control in Lesson 5, and now arguably the design that comes closest to a randomized experiment without actually being one.

**Sarah:** And before we get into the structure, I want to set up why cohort studies matter. Because the case-control design we covered last lesson was clever. It said, find the cases that already exist, find a sensible comparison group, look back at exposure. The cohort design takes the opposite approach.

**Kiffer:** Right. The cohort study says, find people based on their exposure status, then follow them forward in time and watch what happens. You start with people who don't have the disease yet. You classify them by exposure. You wait. And you compare disease frequency between groups.

**Sarah:** Which has a kind of philosophical appeal, doesn't it? You're not asking people to remember. You're not relying on records that may have been kept inconsistently. You're observing the disease as it occurs, in real time, in people whose exposure you measured before they got sick.

**Kiffer:** Which is why cohort studies have a kind of pride of place in observational epidemiology. They map onto the logic of an experiment as closely as observation can. The investigator can't randomly assign exposure, but the temporal sequence, exposure measurement followed by outcome observation, gives the study a clean structure that the other designs can only approximate.

**Sarah:** And methodologically, cohorts can compute things case-control studies can't. Risk and incidence rate, directly. With no rare-disease assumption.

**Kiffer:** That's important. Recall from Lesson 5 that the odds ratio in a case-control study estimates the risk ratio only when the disease is rare. The cohort design doesn't have that constraint. You can directly compute the risk in each exposure group and just take the ratio. The interpretation is cleaner, and the result is a quantity a clinician or policymaker can act on.

**Sarah:** Okay. Map for today. Section 1, the basics of cohort design. Section 2, risk-based versus rate-based. Section 3, exposure measurement, which is harder than it looks. Section 4, comparability, follow-up, outcome ascertainment, analysis, and reporting.

**Kiffer:** Let's start with Section 1. The basic logic. The word cohort means a group of subjects sharing a defined characteristic. In epidemiologic study design, that characteristic is usually exposure status. You start with people who don't have the disease yet, classify them by exposure, follow them forward in time, and compare disease frequency between groups.

**Sarah:** And the phrase don't have the disease yet matters a lot. You're starting with disease-free people. The whole point is to watch the disease arise. If you started with people who were already sick, you'd be running a different study entirely.

**Kiffer:** Yeah, and that disease-free starting condition is what lets you cleanly attribute the timing. Exposure was measured first. Disease arose later. That sequence is the whole point.

**Sarah:** Quick definition for anyone new. Confounding is when some other variable is correlated with both your exposure and your outcome, and is producing some of the apparent association you observe. If smokers and non-smokers also differ on income, and income affects health, the smoking-health association is partly really an income-health association in disguise.

**Kiffer:** Right. And in a randomized trial, randomization handles that for you on average. In a cohort study, you have to fight confounding by hand. That's the single biggest methodological theme of the entire course.

**Sarah:** Okay. The lesson walks through three structural choices for selecting the cohort. Walk us through them.

**Kiffer:** Choice one. Two-cohort design. You select a group of exposed individuals separately from a group of unexposed individuals. Maybe you sample workers in a particular industry separately from workers not in that industry. Or smokers separately from non-smokers. Or veterans of a particular war separately from a comparable civilian population. You then follow both groups forward.

**Sarah:** Choice two. Single longitudinal cohort. You enroll one group with a range of exposure levels, then internally compare across the spectrum. The Framingham Heart Study, which started in 1948, is the textbook example.

**Kiffer:** Yeah, the Framingham Heart Study deserves a longer introduction because it gets referenced through the rest of this material. Framingham, Massachusetts, is a town about 20 miles west of Boston. In 1948, the United States Public Health Service launched a longitudinal cohort there, recruiting roughly 5,200 adults aged 30 to 62. The ambition was to figure out what causes heart disease.

**Sarah:** And the strategy was to enroll an entire town's worth of residents, measure everything they could think of, and follow everyone for the rest of their lives.

**Kiffer:** Exactly. They measured blood pressure, cholesterol, smoking habits, diet, exercise, weight, and family history. Then they sat back and watched. That single cohort gave us most of what we now consider basic cardiovascular risk-factor knowledge. The phrase risk factor itself was popularized by the Framingham investigators. Hypertension, hypercholesterolemia, smoking, and diabetes as cardiovascular risk factors all came out of that one study.

**Sarah:** And it's still running. They enrolled the children of the original cohort in the 1970s, and the grandchildren in the early 2000s. Three generations of cardiovascular data from one town.

**Kiffer:** Now the Nurses' Health Study is a similar single longitudinal cohort, but with a clever recruitment strategy. It started in 1976, run out of Harvard's Channing Laboratory in Boston. The investigators wanted to study long-term effects of oral contraceptives, which were relatively new at the time.

**Sarah:** And they recruited registered nurses. About 121,000 of them. The reasoning was that nurses would understand health questionnaires, would answer accurately, and would be relatively easy to track over time because they kept up with their professional licensure. So you got high-quality exposure measurement and low loss to follow-up almost for free.

**Kiffer:** Choice three. Virtual cohort. Where you reconstruct a cohort retrospectively from existing records. An occupational study where you go back to employment records from 1970 and reconstruct who was exposed to what, then look up what's happened to them since. Or a study using insurance claims databases to reconstruct who took a particular medication and what happened next.

**Sarah:** Then prospective versus retrospective. We met this distinction in Lesson 4. Worth re-stating in the cohort context.

**Kiffer:** In a prospective cohort study, the disease hasn't happened yet when the study begins. You enroll people, measure baseline exposure, and follow them forward. Detailed information collection. Careful timing. But expensive and slow. Framingham has run for over 75 years. The Nurses' Health Study for 50.

**Sarah:** In a retrospective cohort study, the follow-up has already ended by the time the study begins. You go back to existing records. Reconstruct exposure status from what was measured at the time. Track outcomes that have already occurred. Faster, cheaper, but you're stuck with whatever data the original record-keepers happened to collect.

**Kiffer:** Then a really important conceptual move that runs through the whole lesson. Open versus closed source populations.

**Sarah:** Closed populations have fixed membership at the start of the study. Everyone is observed for the full risk period. Few or no losses to follow-up. Best for short risk periods, like outbreaks or hospital readmissions within a defined window.

**Kiffer:** Open populations are dynamic. People enter and leave throughout the study. Some are observed for the whole follow-up. Others enter halfway through. Others drop out before the study ends. You have variable time at risk for different individuals.

**Sarah:** And the choice between closed and open populations determines which version of the analysis you need to use. Closed populations call for a risk-based analysis, where you count people. Open populations call for a rate-based analysis, where you count person-time. Section 2 will go deep on the difference.

**Kiffer:** The lesson uses three opening examples that come back through the discussion. Each is worth introducing because they ground the abstractions.

**Sarah:** Example 8.1 in the textbook. Choi and colleagues, in 2011, did a hospital-based retrospective cohort study. The exposure was Discharge Against Medical Advice, abbreviated DAMA, versus discharge with medical advice, abbreviated DWMA.

**Kiffer:** And let me say what those phrases mean. Discharge Against Medical Advice is when a patient checks themselves out of the hospital before their treatment is complete, against the recommendation of the medical team. Discharge with medical advice is just the normal case. The doctor agrees you're ready to go home, you go home.

**Sarah:** Because all patients could be observed for the full 14-day window, this is a classic closed-population, risk-based design. They used conditional logistic regression.

**Kiffer:** And the result was striking. Twenty-six percent of DAMA patients were readmitted within 14 days versus only 3 percent of DWMA patients. So leaving against medical advice was strongly associated with bouncing back.

**Sarah:** Example 8.2. Crane and colleagues in 2011. A retrospective cohort of over 11,000 women who gave birth in two Canadian provinces between 2001 and 2009. Eleven percent reported exposure to environmental tobacco smoke. That's secondhand smoke. Smoke from other people's cigarettes inhaled by the pregnant woman.

**Kiffer:** Outcomes were continuous variables. Infant body dimensions, Apgar scores, respiratory distress at birth, stillbirth. And this illustrates a useful point. Cohort outcomes don't have to be binary. When the outcome is continuous, like birth weight, you use linear regression instead of logistic regression. Tobacco smoke was associated with lower birth weight, smaller body size, and increased stillbirths.

**Sarah:** Example 8.3. Mehta and colleagues in 2010. A population-based retrospective cohort using administrative data, looking at falls and fractures in adults aged 50 and older. Exposure was atypical versus typical antipsychotic medications.

**Kiffer:** Quick context on those drug classes. Typical antipsychotics are the older generation of medications used for schizophrenia and psychosis, drugs like haloperidol and chlorpromazine. Atypical antipsychotics are the newer generation, drugs like olanzapine and risperidone. The atypicals were marketed as safer, but it was always unclear by how much.

**Sarah:** Right. And they used propensity-score matching to deal with the comparability problem.

**Kiffer:** Quick definition. The propensity score is the probability that a person was exposed, given their measured characteristics. We met it in Lesson 4. By matching on propensity score, you're approximating the balance that randomization would have created. Two patients with the same propensity score are, by construction, comparable on the 60 covariates that went into the score.

**Sarah:** And they had over 5,500 in each exposure group after matching. They found that the hazard ratio difference between the drug classes was small. But taking any antipsychotic for over 90 days was associated with a hazard ratio of 1.8 for falls or fractures. Real signal.

**Kiffer:** Okay. Let's move to Section 2. Risk-based versus rate-based designs. This is the analytical fork in the road for cohort studies. And the punchline I want students to leave with is that the choice flows from the population structure, not from any abstract preference for one design over the other.

**Sarah:** Risk-based design first. The denominator is the number of subjects in each exposure category. You're counting people. The risk is dimensionless, a proportion between zero and one. The risk ratio is just the ratio of the two risks.

**Kiffer:** Rate-based design. The denominator is person-time. You count time, not people. Each subject contributes their at-risk time until they get the disease, are lost to follow-up, or the study ends.

**Sarah:** And the design question is, were people followed for the same length of time? If yes, risk works. If not, you need rate-based.

**Kiffer:** Person-time is a key concept. Quick definition. If one person is followed for one year, that's one person-year. If five people are followed for two years each, that's ten person-years. Person-time accumulates across all the at-risk follow-up in the study.

**Sarah:** Right. And rate has units. Cases per person-time. So you might say the rate of new lung cancer cases is four point zero per 1,000 person-years. That number is informative across different populations and different follow-up lengths because the time component is built into the denominator.

**Kiffer:** Exactly. And the analytic methods for rate-based designs depend on the time scale. If follow-up is relatively short and the rate is reasonably constant, Poisson regression is appropriate. Poisson is named after Simeon Denis Poisson, a 19th-century French mathematician. The model assumes the rate is constant within each exposure group across the follow-up. Person-time enters as the offset, which lets the model give you an incidence rate ratio directly.

**Sarah:** And if follow-up is long or the rate changes substantially, you use survival analysis. Cox proportional hazards regression.

**Kiffer:** Cox proportional hazards regression is named after Sir David Cox, a British statistician who introduced the method in a 1972 paper. Cox is more sophisticated. It allows the underlying rate to vary over time, and it directly handles censoring, where some people are lost to follow-up before the end of the study, or simply hadn't yet developed the disease when the study ended.

**Sarah:** And the key idea of Cox proportional hazards is that you don't have to specify exactly how the underlying rate of disease varies over time. You only have to assume that the ratio of rates between two groups stays roughly constant. That assumption is what proportional hazards means.

**Kiffer:** The lesson uses four examples to make the design choice concrete. Two risk-based, two rate-based.

**Sarah:** Example 8.4. Risk-based. Kelz and colleagues in 2009 compared morbidity and mortality following over 56,000 general and vascular surgical procedures between 2001 and 2004 at a United States academic medical center. Time of operation was grouped into seven 2-hour periods through the day. Risk of mortality within 30 days had a moderately strong association with start times after 9:30 in the evening, with an odds ratio of 1.22.

**Kiffer:** But they did something important. When they excluded emergency cases, no associations were significant. The excess risk was largely explained by the nature of the clinical cases. Emergency surgery happens late at night because emergencies happen at night, not because nighttime surgery itself is dangerous. A great example of confounding by indication, which we'll get to in Lesson 12.

**Sarah:** Example 8.5. Risk-based. Leece and colleagues in 2010 followed approximately 250 women living with HIV who received care at the Ottawa Hospital General Campus Immunodeficiency Clinic between 2002 and 2005. The outcome was undergoing cervical cancer screening.

**Kiffer:** And quick context on why they cared. Women living with HIV are recommended to have cervical cancer screening at higher frequency than the general population, because HIV-related immune suppression is associated with higher cervical cancer risk. So the question wasn't whether HIV causes cervical cancer. It was whether women living with HIV were getting recommended screening.

**Sarah:** And the predictors were demographics, HIV indicators like CD4 count, and primary care provider status. Twelve women without a primary care provider were less likely to be screened, with a risk ratio of 1.6, compared to 84 women with providers. Having a regular doctor matters for screening uptake. Short defined window, closed cohort, risk-based makes sense.

**Kiffer:** Example 8.6. Rate-based. Chalmers and colleagues in 2011 followed 704 male amateur rugby players aged 13 and over across a single season in New Zealand. And here the time unit isn't a year. It's a game. Total of 6,263 player-games of follow-up.

**Sarah:** And because injury rates were reasonably constant across the season, Poisson regression was appropriate. Notable findings. Pacific Island compared to Maori ethnicity, incidence rate ratio 1.5. Forty or more hours of strenuous activity weekly, 1.5. Playing while injured, 1.5. Foul play, 1.9. And headgear use, 1.2.

**Kiffer:** That last one is interesting. Headgear use was associated with higher injury rates. Which sounds backwards. Probably explained by the fact that players who wore headgear were more likely to engage in risky play because they felt protected. A behavioral compensation effect. Real signal of selection on behavior, not a true causal effect of the headgear itself.

**Sarah:** Example 8.7. Rate-based. Luo and colleagues in 2011 used the Women's Health Initiative Observational Study.

**Kiffer:** Worth introducing the Women's Health Initiative briefly. It was launched in 1991 by the United States National Institutes of Health to study causes of death, disability, and impaired quality of life in postmenopausal women. The Observational Study arm enrolled over 90,000 women aged 50 to 79 across 40 United States clinical centers between 1993 and 1998.

**Sarah:** And in the Luo paper, smoking exposure was characterized in detail. Status, age started, age quit, cigarettes per day, pack-years.

**Kiffer:** Pack-years is a compound measure that combines duration and intensity. One pack-year is smoking one pack a day for one year. Or two packs a day for half a year. Or half a pack a day for two years. It's a way of quantifying total exposure that captures both how long and how heavily someone smoked.

**Sarah:** Over an average of 10.3 years of follow-up, 3,520 incident invasive breast cancers were identified. Because the follow-up was long, they used Cox proportional hazards regression rather than Poisson. The hazard ratio for former smokers was 1.09. For current smokers, 1.16. Among lifetime non-smokers, only those with the highest passive-smoke exposure had increased risk.

**Kiffer:** Okay. Section 3 is the part of the lesson that distinguishes a working cohort study from a textbook cohort study. The exposure.

**Sarah:** Sections 1 and 2 treated exposed as if it were a fixed property. In real cohorts, that's almost never true. People start smoking. People quit. People change jobs. Diets evolve. Even genetic predispositions get expressed differently across the lifecourse. The technical challenge is how to handle all of that across years or decades of follow-up.

**Kiffer:** First. Four scales of exposure measurement. Dichotomous, ordinal, continuous, and compound.

**Sarah:** Dichotomous is binary. Smoker versus non-smoker. Yes or no.

**Kiffer:** Ordinal has categories with a natural order but unequal spacing. Low, moderate, high physical activity. Or never, former, current smoker.

**Sarah:** Continuous is on a numerical scale. Cigarettes per day. Hours of sleep. Body mass index. Grams of alcohol per day.

**Kiffer:** Compound combines duration and intensity into a single number. Pack-years for smoking. Or metabolic equivalent of task hours per week, abbreviated MET-hours per week, where one metabolic equivalent of task is the energy you spend just sitting at rest. So three metabolic equivalents of task for one hour gives you three MET-hours. Add them up over the week and you get a total physical activity measure.

**Sarah:** And the rule of thumb is, more granular measurement makes the design more powerful for detecting dose-response relationships. But only if the underlying biology actually has a graded effect. Don't categorize a continuous exposure unless you have a good reason. You usually lose information.

**Kiffer:** Then permanent versus non-permanent exposures. A permanent exposure doesn't change. Genetic variants. Prenatal exposures. Race. Sex assigned at birth. Once measured, fixed.

**Sarah:** Right. Non-permanent exposures change over time. Smoking. Diet. Occupation. Medication use. And the analytic challenge is that someone's exposure category at the end of follow-up may not match their category at the start.

**Kiffer:** Then the induction period. The time between exposure and the earliest possible disease onset. If you don't account for it, you'll attribute disease to exposure that biologically couldn't have caused it.

**Sarah:** Quick example. If we're studying asbestos and mesothelioma, mesothelioma typically takes 20 to 40 years from exposure to develop. So if someone's first asbestos exposure was 5 years ago and they're diagnosed with mesothelioma now, that exposure didn't cause that mesothelioma. The induction period is way longer than 5 years.

**Kiffer:** And the practical handling is, until the induction period is over, the at-risk time of exposed individuals gets added to the unexposed group. Or some researchers prefer to discard the induction period entirely. Cleaner, if you have enough remaining person-time.

**Sarah:** And what about people whose exposure status changes during follow-up? Smokers who quit. People who start a new medication. How do you handle that?

**Kiffer:** Their person-time gets split. While they're unexposed, the person-time accumulates in the unexposed group. Once they become exposed, it accumulates in the exposed group. And if they develop the disease, they're assigned to whichever exposure category they were in at the moment of disease occurrence.

**Sarah:** Subjects lost to follow-up accumulate person-time until the last date their status is known. If you don't know the precise time of loss, the convention is to use the midpoint of the last known exposure period.

**Kiffer:** And one nice twist. Disease itself can serve as exposure for downstream outcomes. Lazo and colleagues followed over 11,000 adults for 18 years using non-alcoholic fatty liver disease as the exposure for mortality. They found a null result. Fatty liver disease wasn't associated with mortality after adjustment.

**Sarah:** The lesson uses three more examples for exposure handling. Worth walking through each because each one shows a different aspect of how careful exposure measurement plays out.

**Kiffer:** Example 8.8. Warensjo and colleagues studied calcium intake and fracture in the Swedish Mammography Cohort. The cohort started in 1987 in two counties in central Sweden, recruiting about 60,000 women coming in for routine mammography screening. Sweden's national personal identification system means investigators can link cohort members to national cancer, hospitalization, and death registries. Follow-up is essentially complete.

**Sarah:** And in the Warensjo paper, total calcium intake was assessed from diet, calcium supplements, and multivitamins. The exposure validation was excellent. Total calcium intake correlated 0.77 between food-frequency questionnaire and 14 repeated 24-hour recalls. That's a strong correlation, which means the questionnaire was capturing real intake.

**Kiffer:** Cumulative dietary calcium intake, in quintiles, was related to fracture and osteoporosis using Cox proportional hazards. The finding was that chronic low intake was associated with higher fracture risk, but above the base level the associations were modest. In the highest-intake group, the hip fracture rate was actually somewhat increased. Not simply more is better. There seemed to be an optimal range.

**Sarah:** Example 8.9. Rohan and colleagues' Canadian Diet, Lifestyle, and Health Study. They recruited alumni from three Ontario universities between 1995 and 1998. Over 73,000 participants. They collected lifestyle and food-frequency questionnaires, measured waist and hip circumferences, and provided hair and toenail specimens for trace element and DNA analysis.

**Kiffer:** And the discussion of compound nutritional variables in that paper is a model for how to construct dietary exposure variables. They had to combine, say, calcium from dairy versus supplements versus leafy greens into a single intake variable that captured the biologically relevant dose. Those design decisions are usually invisible in the published abstract but they're where the real intellectual work happens.

**Sarah:** And Example 8.10. Schutze and colleagues' multicountry European cohort study of alcohol and cancer.

**Kiffer:** This drew on the European Prospective Investigation into Cancer and Nutrition, abbreviated EPIC. It recruited 520,000 participants from 10 European countries between 1992 and 2000. Alcohol consumption was measured at recruitment in grams per day, classified as never, former, or lifetime consumer. Cancer incidence came from cancer registries, with follow-up ending between 2002 and 2005.

**Sarah:** And beyond computing relative risks, they computed something called a population attributable fraction. Let me define that because it shows up in policy contexts.

**Kiffer:** The population attributable fraction is the proportion of disease in a population that would be eliminated if the exposure were eliminated, assuming the association is causal. So if causality is assumed, the Schutze paper estimated alcohol consumption above recommended levels accounts for 10 percent of all cancers in men and 3 percent in women. The kind of number that lands in a policy report.

**Sarah:** Okay. Section 4. Practical questions after the design is set. Comparability, follow-up, outcome ascertainment, analysis, and reporting.

**Kiffer:** Comparability between exposed and unexposed groups uses the same toolkit. Restriction, where you exclude subjects with the confounder before sampling. Matching, where you pair exposed and unexposed on the confounder. And statistical control via multivariable models. Plus propensity scores when there are many covariates, as in the Mehta antipsychotics study.

**Sarah:** And the deep theoretical point is that you're trying to achieve what's called exchangeability between the groups. The same word we used in Lesson 4. Conditional on the variables you've measured and adjusted for, exposed and unexposed subjects are interchangeable.

**Kiffer:** Miguel Hernan, the Cuban-American epidemiologist at Harvard we mentioned in Lesson 4, has been clear about this. He argues that exchangeability is the foundation of causal inference in observational studies. And critically, exchangeability cannot be empirically tested. You can never confirm you've achieved it. You can only argue for it on the basis of subject-matter knowledge.

**Sarah:** Which is why the unified design discipline from Lesson 4 matters so much. Every potential confounder you don't measure is one you can't adjust for, and there's no statistical fix after the fact.

**Kiffer:** Follow-up is the cohort-specific risk. The longer the follow-up, the more people you lose. And losses are rarely random. People who drop out tend to differ from people who stay. Often sicker. Sometimes due to exactly the exposure you're studying. Differential loss to follow-up biases the results.

**Sarah:** Strategies. Maintain contact with participants. Use multiple contact methods. Send reminders. Build relationships with the cohort. Long-running cohorts like Framingham have entire teams dedicated to keeping participants engaged across decades. The Nurses' Health Study uses biennial questionnaires with response rates above 90 percent, sustained for 50 years.

**Kiffer:** And there's a rough rule of thumb. If loss to follow-up exceeds 10 percent in a risk-based design, you start to worry that the cohort you ended with is no longer the cohort you started with.

**Sarah:** Outcome ascertainment has to be the same across exposure groups. If you measure the outcome more carefully or more frequently in the exposed group, you'll find more disease in the exposed group regardless of whether the exposure causes it.

**Kiffer:** This is surveillance bias, which we covered briefly in Lesson 3 and which comes back in Lesson 10. The standard mitigation is blinded outcome assessment, where the people determining outcomes don't know which exposure group each subject is in. Plus standardized outcome definitions, applied identically across groups.

**Sarah:** And the lesson is firm on a particular point. Always measure incidence, not prevalence.

**Kiffer:** Right. Including only new disease events avoids reverse-causation, duration-of-disease, and survival biases. Strictly, measuring incidence requires two examinations. One at the start of follow-up to confirm subjects don't yet have the disease. One later to determine whether and when the disease developed.

**Sarah:** Then on the analysis side, the lesson talks about a particular concern with hazard ratios. Hernan in 2010 wrote about what he called the hazards of hazard ratios.

**Kiffer:** Yeah, the title is a pun. Two drawbacks of hazard ratios. First, the average hazard ratio you report depends on the duration of follow-up. A study with three years of follow-up and a study with twelve years of follow-up of the same population can produce different average hazard ratios for purely mechanical reasons.

**Sarah:** Second, period-specific hazard ratios have a built-in bias. The hazard ratio at time t is conditional on subjects not having developed the outcome before time t. As follow-up lengthens, the most susceptible exposed people develop the disease and drop out of the at-risk pool. So the apparent risk in the exposed group decreases relative to the unexposed group, even when the underlying causal effect hasn't changed.

**Kiffer:** And the practical implication is that hazard ratios are useful, but not as straightforward as they look. Hernan and colleagues developed alternatives, including dividing follow-up into shorter intervals and treating each as its own mini-trial. We'll return to it earlier in this series.

**Sarah:** And reporting. STROBE again, with the cohort extension.

**Kiffer:** STROBE stands for Strengthening the Reporting of Observational Studies in Epidemiology. The von Elm 2007 statement laid out 22 items considered essential. Tooth and colleagues in 2005 produced a cohort-specific extension. The cohort extension addresses the design-specific elements. How exposure was assessed. How participants were followed. How follow-up rates varied across groups. How losses were handled.

**Sarah:** Following the checklist isn't a formality. It's how the next reader can judge whether your design did what it claimed to do.

**Kiffer:** And the lesson notes a positive trend. More researchers are publishing the design and protocols of cohort studies before results come out. Pre-publication of design lets methodological strengths and weaknesses be assessed without the bias of already knowing the data. Same logic as the pre-analysis plans from Lesson 3.

**Sarah:** Okay. Let me try to pull the takeaways together. There are several that I think a student should leave with.

**Kiffer:** Yeah. Let's run through them.

**Sarah:** First. Cohort studies sample on exposure. Follow forward. Compare disease frequency across exposure groups. They resemble controlled trials minus randomization. Three architectural choices. Two-cohort, where you sample exposed and unexposed separately. Single longitudinal cohort, like Framingham or the Nurses' Health Study, where you enroll a population with a range of exposures and compare across the spectrum. Or virtual cohort, reconstructed from existing records.

**Kiffer:** Second. Open populations need rate-based, person-time designs. Closed populations work with risk-based, person-count designs. The choice flows from population structure. The Choi DAMA, Kelz time-of-surgery, and Leece HIV cervical-screening studies are all closed populations with short fixed risk periods, so risk-based makes sense. The Chalmers rugby and Luo Women's Health Initiative breast cancer studies are open populations with variable follow-up, so rate-based makes sense.

**Sarah:** Third. Within rate-based designs, choose Poisson regression when rates are constant over follow-up. Choose Cox proportional hazards when they're not. The Chalmers rugby study used Poisson because rates were stable across the season. The Luo breast cancer study used Cox because follow-up exceeded 10 years.

**Kiffer:** Fourth. Exposure measurement is harder than it looks. Permanent versus non-permanent. Scale of measurement, dichotomous through compound. Induction period. Changing status during follow-up. Each is a place where a careful study can quietly go wrong.

**Sarah:** Fifth. Loss to follow-up under 10 percent is the rule of thumb for risk-based designs. Beyond that, the cohort you ended with is no longer the cohort you started with. Differential loss biases results unpredictably. Most cohorts will lose a meaningful fraction of participants over time.

**Kiffer:** Sixth. Comparability tools are restriction, matching, and statistical control. The Mehta antipsychotic study used propensity-score matching with over 60 covariates. The Crane environmental tobacco smoke study used multivariable regression. The Choi DAMA study matched on demographic and clinical characteristics. Different strategies, same goal. Approximate the exchangeability that randomization would have produced for free.

**Sarah:** Seventh. Always measure incidence, not prevalence. That requires examination at the start of follow-up and a later examination to detect new cases. Otherwise, reverse-causation, duration-of-disease, and survival biases creep in.

**Kiffer:** Eighth. Hazard ratios are useful but not transparent. Hernan's hazards-of-hazard-ratios paper makes clear that the average hazard ratio depends on follow-up length, and period-specific hazard ratios are conditional on prior survival. Read with care.

**Sarah:** And ninth. STROBE for reporting, with the cohort extension. Use it when you write. Use it when you read.

**Kiffer:** And the practical recommendation. Don't skip the R box on computing risk, incidence rate, the risk ratio, and the incidence rate ratio from a small simulated cohort. It makes concrete what the design actually buys you that case-control studies can't deliver.

**Sarah:** And one more thing worth flagging. The connection back to the framing lessons.

**Kiffer:** Yeah. Lesson 1's reminder that progress in public health depends on networks of actors and institutions. The Framingham Heart Study, the Nurses' Health Study, the Women's Health Initiative, the European Prospective Investigation into Cancer and Nutrition, the Swedish Mammography Cohort, those aren't single research projects. They're decades-long institutional commitments. They're public-health infrastructure.

**Sarah:** Lesson 2's argument that paradigms shape what we can see. The cohort design is the most positivist of the observational designs. It assumes there's an underlying biological process to discover, that exposure precedes disease, that we can measure both well enough to compute meaningful ratios. Those assumptions are usually defensible. But they're not free.

**Kiffer:** And Lesson 3's catalogue of how the literature can mislead. Cohort studies offer particular temptations. Long follow-up, many measured outcomes, many measured exposures. The garden of forking paths is enormous. Pre-specified analysis plans matter even more in cohort settings than in trials, because the opportunities for selective reporting are vastly larger.

**Sarah:** Each lesson is building on the one before. Cohort studies are where the design discipline really earns its keep.

**Kiffer:** Next up is Lesson 7. Ecological and Group-Level Studies. Where the unit of analysis isn't the individual at all. Different design, different inferences, different traps.

**Sarah:** Take care, everyone.

**Kiffer:** See you there.