# Lesson 4 — Introduction to Observational Studies (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5523 words • ~29.9 min audio*

---

**Sarah:** Welcome back to Office Hours, the companion podcast. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 4, Introduction to Observational Studies. And this lesson is genuinely a pivot point earlier in this series. Probably the most important hinge in the whole course.

**Sarah:** Okay, that's a strong claim. What do you mean by pivot point?

**Kiffer:** The first three lessons of this course were essentially about framing. Lesson 1 was the history of how the discipline came to exist. Lesson 2 was about ways of knowing, the philosophical foundations of what counts as evidence. Lesson 3 was about how the published record can mislead us, even when nobody is lying. All of that is conceptual scaffolding.

**Sarah:** And Lesson 4 turns that scaffolding into actual research practice.

**Kiffer:** Right. From now on we're talking about how to actually do epidemiology. How to design a study. How to collect data. How to analyze. The framing concerns from Lessons 1 through 3 don't go away. They become the design checklist you use here. Every bias the manufactured doubt strategy from Lesson 3 weaponized, every analytical error catalogued in the replication crisis, every pre-analytic discipline that registered reports were designed to enforce, all of those play out concretely once you have a study design in front of you.

**Sarah:** So this lesson is about the design itself. The architecture beneath everything else.

**Kiffer:** Exactly. And it's structured carefully. Section one sets up a classification scheme that organizes every study you'll encounter for the rest of this material. Section two covers the simplest of those study types, the descriptive ones, where there's no comparison group at all. Section three introduces the first true analytic observational design, the cross-sectional study. And section four pushes on its limitations and shows what reporting it well looks like.

**Sarah:** And by the end of the lesson, the language you need for case-control studies in Lesson 5 and cohort studies in Lesson 6 is already in place. So this lesson does a lot of foundational work.

**Kiffer:** It really does. So let's start at the top, with section one and the classification map. The lesson opens with a really fundamental distinction. Descriptive versus explanatory studies. Sometimes called descriptive versus analytic studies. Same thing.

**Sarah:** Let me walk through that carefully, because it sounds simple but it's doing a lot of work.

**Kiffer:** Please.

**Sarah:** Descriptive studies just describe. They tell you the who, what, when, and where of disease occurrence in a population. They include case reports, case-series reports, and surveys. They don't make formal comparisons between groups. They don't try to test causal hypotheses. They tell you what's happening, not why.

**Kiffer:** And the lesson uses a really clean framing. A descriptive survey is not designed to assess hypotheses about manipulatable causes of the outcome. The frequency of the outcome is described in categories of age, race, sex, season, and space. That's what it does.

**Sarah:** And explanatory studies, the analytic ones, do try to make those comparisons. They contrast subgroups based on exposure or outcome status. They identify statistical associations between exposures and outcomes.

**Kiffer:** And that's the core conceptual move of analytic epidemiology. You take a population, you split it into groups based on some characteristic of interest, an exposure. And then you compare disease frequency across those groups. The differences between groups are what you're trying to explain.

**Sarah:** Okay, second split. Within explanatory studies, experimental versus observational. And this distinction is about how the exposure gets assigned, right?

**Kiffer:** Right. In an experimental study, the investigator controls who is exposed and who isn't. Usually through randomization. So in a randomized controlled trial of a new drug, the researcher randomly assigns some patients to receive the drug and others to receive a placebo or standard treatment.

**Sarah:** And in an observational study, the investigator does not control assignment. People become exposed through their own behavior, their occupation, their genetics, their environment, or some other process. The researcher just observes and measures.

**Kiffer:** And here's the trade-off the lesson really wants you to sit with. Experimental studies reduce variation from all sources through selection and control of the experimental setting. They strip away the noise. Observational studies embrace the presence of natural variation in order to identify important interactions among key variables.

**Sarah:** Let me try to articulate why this matters. In an experiment, when you randomize, what randomization is doing is making sure that on average everything else, every other characteristic of the people, balances across groups. Income. Age. Education. Co-existing conditions. Genetics. All of it. So any difference you observe in outcomes must be due to the exposure, because everything else is balanced by design.

**Kiffer:** Exactly. And that's the inferential power of randomization. It handles confounding for you.

**Sarah:** Just to define that term carefully because we'll come back to it. Confounding is when some other variable is correlated with both your exposure and your outcome, and is producing some of the apparent association you observe. So if smokers and non-smokers differ on income, and income also affects health, the smoking-health association you see is partly really an income-health association masquerading as smoking-health.

**Kiffer:** And that's the central methodological challenge in observational epidemiology. Smokers and non-smokers differ in many ways beyond smoking. Income, occupation, stress, education, social network, geography. To say that smoking causes lung cancer specifically, you have to figure out how to disentangle smoking from all those other differences.

**Sarah:** Which is why most of the rest of this material is about confounding and bias. Lessons 8 through 12 are essentially a toolkit for fighting these threats to validity in observational studies.

**Kiffer:** And here's a question that comes up a lot. If experiments are so much cleaner, why don't we just always do experiments?

**Sarah:** Yeah, I want to address that head-on. Most public health questions can't be studied experimentally. You can't ethically randomize people to be smokers or non-smokers. You can't randomize them to live in poverty or wealth. You can't randomize them to be exposed to environmental pollution, or to live in a particular neighborhood, or to experience racism. For all of those exposures, observational studies are the only option.

**Kiffer:** Plus, even when experiments are possible, they often answer narrower questions than we want. Trials usually have selected participants, controlled settings, short follow-up. Observational studies tell us what happens in the real world over the long term. Both kinds of evidence are necessary. The trick is knowing which is which when you're reading a paper.

**Sarah:** Now given that the fight against confounding has to be won by hand in observational studies, the lesson asks the natural follow-up. How do you design an observational study so the fight is actually winnable?

**Kiffer:** And the lesson introduces what's called the unified approach to observational study design. The argument is that observational studies should be designed with the same discipline as experiments, just without the randomization piece.

**Sarah:** Two methodologists are central here. Miguel Hernan and Donald Rubin.

**Kiffer:** Miguel Hernan is a Cuban-American epidemiologist at the Harvard T. H. Chan School of Public Health. He's been one of the most influential voices in modern causal inference, particularly through his book with James Robins called Causal Inference: What If.

**Sarah:** And Donald Rubin is an American statistician at Harvard who's spent his career thinking about how observational studies can approximate the inferential strength of experiments. The Rubin causal model, which we'll meet again earlier in this series, is named after him.

**Kiffer:** And the slogan that captures the unified approach is, design trumps analysis.

**Sarah:** Walk us through what that phrase actually means in practice.

**Kiffer:** It means the most important decisions in an observational study are the design decisions you make before you collect any data. Once the data are in hand, your statistical analysis can correct some problems, but not all of them. Bad data will produce bad conclusions no matter how sophisticated the analysis.

**Sarah:** And the unified approach has three parts. The thought experiment. Then completing all design features before seeing the outcome data. Then forward projection. Let's walk through each.

**Kiffer:** Step one. The thought experiment. Before you do anything, imagine the perfect randomized controlled trial that would best answer your question. Specify it in detail. The study group, who would be eligible, how subjects would be selected, how they'd be assigned to exposure, how follow-up would work, how outcomes would be detected.

**Sarah:** And the important part is, formal randomization in that imagined trial would ensure what's called exchangeability. Let me define that carefully because it's central to the rest of the course.

**Kiffer:** Yeah, exchangeability is the property where two groups are interchangeable. If you flipped their exposure status, the disease frequency in each group wouldn't change. The groups are similar enough on every relevant dimension that the only thing distinguishing them is whether they happened to receive the exposure.

**Sarah:** And the goal of the observational design is to approximate that exchangeability as closely as possible. You can't actually randomize, but you want your design to construct a comparison where, conditional on what you've measured, exposed and unexposed groups are as similar as you can make them.

**Kiffer:** Step two. Complete all design features before seeing any outcome data. Subject inclusion and exclusion criteria. The strategy for controlling confounding. The set of variables you'll measure. All decided in advance.

**Sarah:** And the lesson notes that Rubin formalizes part of this through propensity scores. Quick definition. The propensity score is the probability that a person is exposed, given their measured characteristics. If propensity scores are very similar in the exposed and unexposed groups, the groups are reasonably comparable. If they differ substantially, you've got a problem that statistical adjustment may not solve.

**Kiffer:** Step three. Forward projection. Once you've completed the design but before any data are in, you imagine three different result scenarios. Scenario one, the exposure appears to increase disease risk. Scenario two, the exposure appears to decrease risk. Scenario three, no apparent association.

**Sarah:** And for each scenario, you defend the design. You ask, would I trust this finding? What other explanations are plausible? What might I be missing?

**Kiffer:** And the exercise tends to surface hidden weaknesses you'd otherwise miss. You might realize that if exposure looks protective, the most likely explanation is selection bias rather than a real effect. So you redesign or restrict to rule that out before the study runs.

**Sarah:** Forward projection is a really powerful exercise because it forces you to anticipate all the ways your study could be misinterpreted. You're committing to your interpretation framework before you can see what the data say.

**Kiffer:** Which connects directly to the pre-analysis plan idea from Lesson 3. Same logic. You're protecting yourself from p-hacking yourself. From unconsciously gravitating toward the analysis that produces the most interesting result.

**Sarah:** Then Section 1 closes with the hierarchy of evidence. From strongest causal weight to weakest.

**Kiffer:** At the top, the strongest, are laboratory trials. High investigator control. High causal weight. But low real-world relevance, because lab settings are highly artificial.

**Sarah:** Then controlled field trials, which are randomized controlled trials in real-world settings. High investigator control. High causal evidence. High real-world relevance. The methodological gold standard for most public health questions.

**Kiffer:** Then cohort studies. High investigator control over what's measured but no control over exposure assignment. High causal evidence under good design.

**Sarah:** Then case-control studies. Moderate investigator control. Moderate causal evidence.

**Kiffer:** Then cross-sectional studies. Lower investigator control because you measure exposure and outcome simultaneously. Lower causal evidence because you can't establish temporal sequence.

**Sarah:** Then surveys, case series, and case reports at the bottom. Useful but limited for causal inference.

**Kiffer:** And the lesson is honest about a really important caveat. The hierarchy of evidence describes causal weight, not absolute quality. A well-conducted cross-sectional study can be far more useful for many real-world questions than a poorly conducted randomized trial.

**Sarah:** Yeah, that's worth emphasizing. The hierarchy is about what kind of conclusions a well-designed study of that type can support, all else being equal. It's not a quality ranking. A bad cohort study is worse than a good cross-sectional study, even though cohorts sit higher on the hierarchy.

**Kiffer:** And it connects back to Lesson 2. The hierarchy of evidence embeds a particular paradigmatic commitment. It privileges experimental designs. It devalues qualitative or experiential evidence. Some kinds of questions, especially questions about meaning, context, and lived experience, can't be addressed by a randomized trial, no matter how well designed.

**Sarah:** So the hierarchy is a useful tool for some questions. It's not the universal arbiter of what counts as evidence. We talked about this with Two-Eyed Seeing and lived experience in Lesson 2.

**Kiffer:** Right. And that takes us out of section one and into section two, which is about descriptive studies. Case reports, case series, and surveys.

**Sarah:** And the lesson takes an interesting position. The hierarchy of evidence puts descriptive designs at the bottom for causal inference. But the lesson argues that descriptive studies are where many real epidemiological investigations actually start. They generate the hypotheses that the analytic designs in later sections then test.

**Kiffer:** Yeah. Descriptive studies have actually been historically transformative in public health. Let me give two examples that I think illustrate this beautifully.

**Sarah:** Please.

**Kiffer:** First example. HIV and AIDS. In June 1981, the Morbidity and Mortality Weekly Report, which is the public health surveillance bulletin published by the US Centers for Disease Control, ran a brief case-series report. Five young gay men in Los Angeles had been hospitalized with an unusual and severe pneumonia caused by a fungus called Pneumocystis carinii. Pneumocystis was almost unheard of in healthy young people. It only struck severely immunocompromised patients.

**Sarah:** And that small report, just five cases described, was the first formal documentation of what we now call AIDS. Within months, more cases were identified. Other unusual infections. A specific cancer called Kaposi sarcoma. The pattern was clear. A new immunodeficiency syndrome had emerged. The case-series report was the signal that something new was happening.

**Kiffer:** Second example. Thalidomide. In the late 1950s and early 1960s, doctors in West Germany and Australia began publishing case reports of severe birth defects in newborns. Specifically, a condition called phocomelia, where the limbs are shortened or absent. Phocomelia had previously been extremely rare. Suddenly it was being seen in dozens of cases.

**Sarah:** And what those case reports led to was the identification of thalidomide, a drug given to pregnant women for morning sickness, as the cause. It had been marketed widely. Tens of thousands of children were born with birth defects before the cause was recognized.

**Kiffer:** And both stories show what case reports and case series can do. They can be the first signal of a new disease, a new exposure, or a new pattern that nobody was looking for. They generate the questions that analytic designs then answer.

**Sarah:** Let's walk through the three descriptive forms one at a time. A case report is a description of a single unusual case. A patient presents with an unexpected combination of symptoms, an unusual exposure history, an unusually severe response to a medication. The clinician writes it up so other clinicians can recognize the pattern.

**Kiffer:** A case-series report extends the case report. Instead of one patient, you describe a small group of patients with the same condition. Their characteristics, their treatment, their outcomes. The five Los Angeles AIDS cases were a case series.

**Sarah:** And the limitation of both is the same. There's no comparison group. You're describing what happened to people who got the disease. You can't say whether it would have happened anyway. You can't say whether some characteristic of these patients caused the disease. Without a comparison group, you can't draw causal conclusions.

**Kiffer:** Which is why descriptive studies are described as hypothesis-generating, not hypothesis-testing. A case series might suggest that a new drug is associated with a side effect. But to actually establish that link, you need an analytic design with a comparison group.

**Sarah:** Then surveys are the third descriptive design. A survey samples a population and measures characteristics of interest. Surveys answer questions like, what proportion of people had diarrhea over a one-month period? What's the average body mass index of grade twelve students? What's the prevalence of high blood pressure in Ontario adults?

**Kiffer:** And here's a really important transition the lesson makes. Once a survey starts collecting risk-factor information alongside disease information, beyond just person, place, and time, it stops being purely descriptive. It becomes a cross-sectional analytic study.

**Sarah:** The lesson uses a real example. The Ontario Hypertension Survey by Leenen and colleagues in 2008. They surveyed approximately 4,500 participants from over 6,000 eligible dwellings. They estimated that hypertension prevalence in Ontario was 21.3 percent. So roughly one in five adults.

**Kiffer:** And because they collected information not just on hypertension but also on potential risk factors, things like body mass index, age, sex, smoking, alcohol, exercise, dietary patterns, they could analyze the associations between those risk factors and hypertension prevalence. So it functions as cross-sectional analytic, not just descriptive.

**Sarah:** Which is the bridge into section three. The cross-sectional design itself.

**Kiffer:** And section three starts with two classification distinctions that apply to every observational study you'll meet, not just cross-sectional ones.

**Sarah:** First. Prospective versus retrospective.

**Kiffer:** In prospective studies, the disease has not yet occurred when the study begins. You enroll people, follow them forward in time, and observe who develops the outcome. The Framingham Heart Study, which began in 1948 and is still running, is the textbook prospective study.

**Sarah:** In retrospective studies, both the exposure and the outcome have already occurred when the study begins. You're looking back at what already happened, often through medical records, employment records, or interviews.

**Kiffer:** And cross-sectional studies are inherently retrospective in this sense. You take a cross-section of the population at a moment in time. You measure both exposure and outcome simultaneously. So both have, by definition, already occurred. You can't look at someone today and say their exposure tomorrow caused their disease today.

**Sarah:** Second classification. The three sampling approaches that organize all of analytic observational epidemiology. The lesson is brilliant on this because it shows that the entire field of observational study design can be organized around how you sample.

**Kiffer:** Approach one. Cross-sectional. You sample from the source population without regard to either exposure or disease. Then you measure both at the same time. The result is a snapshot of prevalence.

**Sarah:** Approach two. Cohort. You sample on exposure. You select a group of exposed individuals and a group of unexposed individuals, and you follow them forward in time to see who develops the outcome. The result is incidence.

**Kiffer:** And approach three. Case-control. You sample on the outcome. You select a group of people who have the disease, the cases, and a group who don't, the controls. Then you look back at their exposure histories. The result is an odds ratio.

**Sarah:** And the punchline is, sampling drives the design. The design is what determines what you can compute and what you can claim. Lessons 5 and 6 will go deep on case-control and cohort. Today is the introduction.

**Kiffer:** Let me define a couple of terms we just used. Prevalence and incidence.

**Sarah:** Yes, please. These are critical.

**Kiffer:** Prevalence is the proportion of a population that has a disease at a given moment. Not new cases. Everyone with the disease right now. If a hundred people are walking around with diabetes in a population of a thousand, prevalence is 10 percent. Doesn't matter when they were diagnosed.

**Sarah:** Incidence is different. Incidence is the rate of new cases occurring in a population over a defined time period. So if ten new cases of diabetes are diagnosed in that population over a year, the incidence rate is ten per thousand per year.

**Kiffer:** And the relationship between them is subtle but important. Prevalence depends on both incidence and duration. A disease can have high prevalence because it's common and short-lived, like seasonal flu. Or because it's rare but lifelong, like Type 1 diabetes. Same prevalence, very different epidemiologic implications.

**Sarah:** We'll come back to this earlier. For now, the key fact is that cross-sectional studies measure prevalence. They don't measure incidence directly. Because incidence requires following people forward through time and counting who develops the disease.

**Kiffer:** Before we leave section three, we should also flag the four design steps the lesson walks through for building a cross-sectional study. Defining the study group, assessing the exposure, ascertaining the outcome, and ensuring comparability between groups.

**Sarah:** And the lesson uses a real Canadian example to anchor those steps. Lanes and colleagues in 2011 studied postpartum depression in Canadian women. They surveyed about 6,400 women using the Edinburgh Postnatal Depression Scale, found prevalence around 8 to 9 percent, and identified prior depression and pregnancy stress as the strongest correlates.

**Kiffer:** And it's a useful example precisely because it shows the limit too. The stress exposure was measured at the same time as the depression outcome. So you can't tell whether stress preceded the depression or whether the depression itself was coloring how women reported their stress. The design generates the question. It can't fully answer it.

**Sarah:** One more practical point on confounding control before we move on. In cross-sectional studies, two of the standard tools are available. You can exclude people through restriction, and you can adjust statistically with a multivariable model. But matching, the third common tool, is not available here. Matching requires you to sample on disease or exposure status, and a cross-sectional study samples without regard to either. We'll see matching come back in case-control designs next week.

**Kiffer:** Okay. With that, section four is the honest counterweight to section three. What can the cross-sectional design not tell us? What can we squeeze from it? And how should we report it?

**Sarah:** The biggest limitation comes from the same fact that defines the design. Prevalence equals incidence times duration. So prevalence is influenced both by how often new cases arise and how long people stay sick or stay alive with the disease.

**Kiffer:** Which makes it really hard to disentangle factors that affect risk of disease from factors that affect duration of disease. If a factor seems to be associated with prevalence, it could be because the factor causes more new cases, or because the factor causes longer survival with the disease, or both. You can't tell from a cross-sectional snapshot.

**Sarah:** And the lesson uses a great example to drive this home. The reverse causation problem. Imagine you do a cross-sectional study and you find that dog ownership is associated with lower blood pressure.

**Kiffer:** And there's a temptation to interpret that finding. The walking, the social interaction at the dog park, the calming effect of caring for an animal. Maybe owning a dog lowers blood pressure.

**Sarah:** Plausible story. But cross-sectional data can't distinguish between two stories that go in opposite causal directions.

**Kiffer:** Walk through both.

**Sarah:** Story one. People who don't own dogs decide to get one. Maybe they want company, or their kids ask for one. Then the routine of walking the dog gets them moving more. Their stress levels drop because of the companionship. They sleep better. Over months or years, their blood pressure declines. Dog causes lower blood pressure.

**Kiffer:** Story two. People who already have low blood pressure tend to be healthier in other ways. More energy. More likely to exercise. More likely to seek out activities that involve responsibility and effort. Like getting a dog. So healthy people get dogs. The lower blood pressure was already there. The dog ownership came after, as a consequence of being healthy enough to take it on.

**Sarah:** Both stories are consistent with the data. Both have low blood pressure and dog ownership occurring together. Cross-sectional data measured at one moment in time can't tell you which way the arrow goes. The exposure and the outcome were measured simultaneously. There's no temporal sequence to anchor the causal claim.

**Kiffer:** And the lesson points out something important. The more changeable the exposure is, the worse this problem gets. Dog ownership can change. Diet can change. Exercise can change. Income can change. Stress can change. All of those are vulnerable to reverse causation in cross-sectional data.

**Sarah:** Which is why the lesson recommends that cross-sectional studies are best suited for time-invariant exposures. Things like sex, race, or genetic factors. Things that the investigator can be confident preceded the outcome, because they were established at birth and haven't changed.

**Kiffer:** Now let's spend some time on the unit of analysis the rest of the course is going to be built on. The 2 by 2 contingency table.

**Sarah:** Quick definition. A 2 by 2 contingency table is just a small grid. Two rows, two columns. Four cells total. The rows are exposed versus unexposed. The columns are diseased versus not diseased. Every person in your study falls into exactly one of the four cells.

**Kiffer:** And once you have those four numbers, you can compute everything else. The odds of disease given exposure. The risk of disease given exposure. The odds ratio. The risk ratio. Every measure of association earlier in this series starts from a structure that looks exactly like this. Once you can read the table, the rest of the course is mostly about what to do with it.

**Sarah:** There's an R box in the lesson that walks you through building one with simulated data. 200 people. A hundred exposed, a hundred unexposed. The exposed group has 30 percent disease, the unexposed group has 10 percent. You build the table with the table function in R. Then you can read prevalence within each row by dividing each cell by the row total.

**Kiffer:** And the output shows that 29 percent of the exposed have disease versus 8 percent of the unexposed. That contrast is the launchpad for every measure of association you'll learn in this course.

**Sarah:** Then the lesson covers approaches for estimating incidence from cross-sectional studies. Three approaches in increasing sophistication.

**Kiffer:** First. Repeated cross-sections. Run two cross-sectional studies at different time points and use the change in prevalence to estimate incidence. Miller and colleagues did this around the 2009 H1N1 influenza pandemic in England.

**Sarah:** Quick context on H1N1. H1N1 was the strain of influenza that caused the 2009 swine flu pandemic. It spread globally over a few months. Researchers used pre-pandemic and post-pandemic blood samples to estimate how many people in the population had been infected based on antibody levels. The change in prevalence between the two surveys gave them an estimate of incidence over that period.

**Kiffer:** Second approach. The two-test approach. Use two different diagnostic tests on the same samples. One that detects early immune response, one that detects long-lasting immunity. People who test negative on the less sensitive early-response test get followed forward to see how many become positive.

**Sarah:** This has been refined particularly for HIV studies. Being able to estimate recent infection from a single blood sample is operationally valuable for surveillance, because it lets you understand whether transmission is ongoing without having to do expensive longitudinal follow-up of every participant.

**Kiffer:** Third approach. Mathematical estimation. Rajan and Sokal in 2011 give a formula that uses two prevalence estimates at different time points to back out the incidence rate. The formula is in the textbook as Equation 7.1.

**Sarah:** And the choice of approach depends on what data you can collect and what assumptions you're willing to make. Each of these costs you something different in operational complexity in exchange for getting incidence information out of cross-sectional data.

**Kiffer:** Then the lesson contrasts repeated cross-sectional studies with cohort studies. Both follow a population through time, but in different ways.

**Sarah:** In repeated cross-sections, you sample fresh participants each time. The first survey samples 1,000 people, the second survey samples a different 1,000 people. So you're tracking population-level trends but not individual change.

**Kiffer:** In a cohort study, you sample once and follow the same individuals over time. So you can track how exposure and outcome change for specific people. The Framingham Heart Study is a cohort. So is the Nurses' Health Study. Lesson 6 will cover cohort designs in detail.

**Sarah:** And the choice is real. Repeated cross-sections give you population trends. Cohorts give you individual trajectories. Which one you want depends on the question.

**Kiffer:** Section 4 closes with reporting standards. The lesson introduces STROBE.

**Sarah:** STROBE stands for Strengthening the Reporting of Observational Studies in Epidemiology. We met it briefly in Lesson 3 as one of the EQUATOR Network reporting guidelines.

**Kiffer:** It was established in 2004 by an international group of methodologists, researchers, and journal editors. They published the consensus statement in 2007 in PLOS Medicine and a few other journals simultaneously. It's a checklist of 22 items considered essential for good reporting of observational studies.

**Sarah:** And the items cover everything you'd want to know to evaluate an observational study.

**Kiffer:** Title and abstract should clearly indicate the study design. Introduction should give the background, the objectives, and the hypotheses.

**Sarah:** Methods should describe the design, the setting, the participants, the variables, the data sources, the bias, the sample size, and the statistical methods.

**Kiffer:** Results should give numbers at each stage of the study, descriptive data, outcome data, and main results. With confidence intervals, not just point estimates.

**Sarah:** Discussion should cover key findings, limitations, interpretation, and generalizability.

**Kiffer:** And the lesson is sharp on what STROBE does and doesn't do. STROBE doesn't tell you whether a study is correct. It tells you what information should be present so a reader can decide whether the study is correct. Reporting completeness is necessary for evaluation, but it doesn't guarantee validity.

**Sarah:** Yeah, that's the same point we made in Lesson 3 about reporting guidelines generally. STROBE adherence makes a study evaluable. It doesn't make it good. We'll come back to this in Lesson 13 when we talk about how to actually appraise a published study end to end.

**Kiffer:** Okay. Let me try to pull the takeaways together.

**Sarah:** Yeah, let me list them. There are six main ones I'd want a student to leave with.

**Kiffer:** Go ahead.

**Sarah:** First takeaway. Two big classification splits. Descriptive versus explanatory, then experimental versus observational. The descriptive-versus-explanatory split is about purpose. Are you describing or are you comparing? The experimental-versus-observational split is about how exposure gets assigned. Did the investigator control it through randomization, or did it occur naturally?

**Kiffer:** And knowing which kind of study you're looking at is the first step in evaluating it. A descriptive study is good for what it does, but it's not testing causation. An observational study is doing real comparisons but has to fight confounding by hand.

**Sarah:** Second. Observational studies need the discipline experiments get for free. The unified approach from Hernan and Rubin. Thought experiment, design before data, forward projection. Design trumps analysis. The decisions you make before collecting data largely determine what conclusions you can credibly support.

**Kiffer:** Third. The hierarchy of evidence describes causal weight, not absolute quality. A well-designed cross-sectional study can be more informative than a poorly designed randomized trial. The hierarchy is a starting point for thinking about how much causal weight a study can carry, given good execution. It's not a quality ranking.

**Sarah:** And it embeds particular paradigmatic commitments. Some questions about meaning, context, and lived experience aren't well-suited to experimental designs no matter how well done. Lesson 2 made that point philosophically.

**Kiffer:** Fourth. Descriptive designs lack a comparison group. They're hypothesis-generating, not hypothesis-testing. But they're often where real epidemiological investigations start. The early case reports of HIV/AIDS and thalidomide birth defects are textbook examples of how case reports change history. Don't dismiss them. Just understand what they can and cannot do.

**Sarah:** Fifth. Cross-sectional studies measure prevalence, not incidence. They're inherently retrospective. They're best for time-invariant exposures. They're vulnerable to reverse causation, where you can't tell which direction the causal arrow goes. Sampling drives the design. The three sampling approaches, cross-sectional, cohort, and case-control, organize all of analytic observational epidemiology.

**Kiffer:** And sixth. STROBE is the reporting standard for observational studies. Use it when you write. Use it when you read.

**Sarah:** And the practical recommendation. Don't skip the R box on the 2 by 2 table. Building one with simulated data and computing prevalence within each exposure group is a tiny exercise. But the structure you build there is the structure every measure of association earlier in this series starts from. If the 2 by 2 table is comfortable, the rest of the course will feel familiar.

**Kiffer:** And one more thing worth mentioning. The connection back to the framing lessons. Every concept in this lesson is shaped by what came before.

**Sarah:** What do you mean?

**Kiffer:** Lesson 1's reminder that progress in public health depends on networks of actors and institutions, not just individual studies. That's why STROBE matters. Why the EQUATOR network exists. Why coordinated reporting infrastructure is what enables the discipline to advance.

**Sarah:** Lesson 2's argument that paradigms shape what we can see. That's why the hierarchy of evidence has to be read carefully, not absorbed uncritically. It privileges some questions over others.

**Kiffer:** And Lesson 3's catalogue of how the literature can mislead us. That's why the unified approach insists on design before data, on forward projection, on pre-specified analysis. Because the alternative is the garden of forking paths from Section 1 of Lesson 3. The slow erosion of credibility through small unprincipled choices.

**Sarah:** Each lesson is building on the one before. Lesson 4 is where it all becomes operational.

**Kiffer:** Next up is Lesson 5. Case-control studies. We'll go deep on the design and on the situations where it's the right tool for the job.

**Sarah:** Take care, everyone.

**Kiffer:** See you there.