# Lesson 12 — Confounding and Causal Inference (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5,650 words • ~31 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah, and this is the final lesson of the course.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 12, Confounding and Causal Inference. This is the deep dive on what is probably the single most important threat to causal inference in observational research, and it's also the capstone of the methods arc we've been building all term.

**Sarah:** And I want to flag something at the top. Confounding showed up earlier already, so students who came through the earlier material with us have heard the basics. So what's different about today?

**Kiffer:** Today goes much further into the machinery. The earlier version gave you the concept and the intuition. This version gives you the formal definition, the detection methods, five different control strategies in increasing complexity, and then advanced techniques like marginal structural models for time-varying confounding. We close with what to do about confounders you cannot measure, and a theoretical critique of how the field treats race, gender, and socioeconomic status.

**Sarah:** Big lesson, and it really does pull everything together. Let's start at the start. What is confounding, formally?

**Kiffer:** Confounding is the mixing together of effects. You think you're measuring the association between an exposure and an outcome, but the observed measure also reflects the effects of one or more extraneous factors. Those extraneous factors are called confounders.

**Sarah:** And the lesson gives three formal conditions. A variable is a confounder of the exposure-outcome relationship if all three are true. First, it is associated with the exposure in the source population. Second, it is an independent risk factor for the outcome, meaning it would still cause the outcome even if the exposure didn't exist. And third, it is not on the causal pathway from exposure to outcome.

**Kiffer:** Let's unpack each. The first condition just says the confounder and the exposure travel together. People with the exposure tend to have the confounder, or vice versa. Without that association, the confounder couldn't possibly distort the exposure-outcome estimate, because the exposed and unexposed groups wouldn't differ on it.

**Sarah:** The second condition says the confounder genuinely matters for the outcome on its own. If the confounder had no effect on the outcome, then balancing it across groups wouldn't change anything.

**Kiffer:** And the third condition is the one that trips students up the most. Not on the causal pathway. This is where we distinguish a confounder from a mediator.

**Sarah:** Let's define a mediator carefully. A mediator is a variable that sits between the exposure and the outcome on the causal chain. The exposure causes the mediator, and the mediator in turn causes the outcome. So part of how the exposure produces the outcome is by going through the mediator.

**Kiffer:** The classic example is smoking and lung cancer. One pathway is that smoking causes inflammation in the lungs, and inflammation causes cellular damage, and cellular damage causes cancer. Inflammation is a mediator on the smoking-cancer pathway.

**Sarah:** And here is the critical move. If you adjust for inflammation in your analysis, you have just blocked one of the pathways through which smoking actually causes cancer. The smoking-cancer estimate that comes out is no longer the total effect of smoking on cancer. It's the effect of smoking on cancer through pathways other than inflammation.

**Kiffer:** So confounders need to be adjusted for, but mediators should generally be left alone if you want the total effect of the exposure on the outcome. Conflating the two is one of the most common mistakes you'll see in published epidemiology.

**Sarah:** Okay, so once you've defined what a confounder is, how do you actually detect one in your data?

**Kiffer:** There are two broad approaches. The classical change-in-estimate approach, and the directed acyclic graph approach. Let's spell out both.

**Sarah:** The classical approach first. You compare your crude estimate, with no adjustment for the suspected confounder, to your adjusted estimate, with the suspected confounder added to the model. If the two estimates differ by more than a certain threshold, you call that confounding.

**Kiffer:** The conventional threshold most commonly cited is a ten percent change. Some textbooks use twenty percent, some use thirty percent, depending on the context. The point is the same. A meaningful shift in the estimate when you add the variable is taken as evidence the variable was confounding the relationship.

**Sarah:** And there's a real intuition here. If a variable doesn't budge your estimate at all, then balancing it across exposure groups didn't change what you saw, which suggests it wasn't actually distorting the relationship in your sample.

**Kiffer:** But the lesson is sharp on the limits of this approach. The change-in-estimate rule has problems. First, it depends entirely on your data. A variable might be a confounder in the population but happen, by chance, not to shift your estimate in this particular sample. The ten percent rule would tell you to drop it, and that's a mistake.

**Sarah:** Second, the odds ratio has a property called non-collapsibility. The crude odds ratio can differ from the stratum-specific odds ratios even when there is no confounding at all. So a more than ten percent shift in the odds ratio when you stratify might just reflect the math of odds ratios, not actual confounding.

**Kiffer:** And third, the change-in-estimate approach is purely statistical. It doesn't engage with the causal structure of the problem. You can adjust for a variable that shifts your estimate by thirty percent, declare it a confounder, and have actually made things worse if the variable was a collider or a mediator.

**Sarah:** So statistical detection of confounding is helpful but not sufficient. You need a more principled approach grounded in subject-matter knowledge.

**Kiffer:** Which brings us to directed acyclic graphs. We'll spell that out. A directed acyclic graph, or DAG for short, is a diagram of the causal structure of your problem. Variables are nodes. Causal relationships are arrows. Acyclic means no variable causes itself, directly or indirectly. Time only flows forward.

**Sarah:** We covered directed acyclic graphs earlier and an earlier lesson. The key idea today is using them for confounding control. The procedure has three steps.

**Kiffer:** First, draw the diagram based on what you know from prior research and substantive expertise. This is the hard part. The diagram encodes your assumptions about the causal world. Different assumptions, different diagram, different conclusions.

**Sarah:** Second, identify backdoor paths. A backdoor path is any non-causal path connecting the exposure to the outcome that starts with an arrow into the exposure. Those are the paths through which confounding flows. The classical confounder pattern is exposure and outcome both caused by some third variable. That third variable opens a backdoor path.

**Kiffer:** Third, block the backdoor paths by adjusting for variables on them. The technical statement of this is the backdoor criterion from Judea Pearl. If you adjust for a set of variables that blocks every backdoor path without opening any new ones, the remaining association between exposure and outcome is the causal effect.

**Sarah:** And the directed acyclic graph approach gives you two crucial don'ts. Don't adjust for mediators, because you'll erase part of the causal effect you're trying to estimate. And don't adjust for colliders.

**Kiffer:** Quick definition of collider. A collider is a variable that has two arrows pointing into it. Conditioning on a collider opens a new non-causal path between its parents. So if you adjust for a collider, you can introduce confounding that wasn't there in the unadjusted estimate. This is called collider stratification bias, or selection bias.

**Sarah:** Let me give a quick example of a collider so this lands. Suppose you're studying whether smoking causes depression. Both smoking and depression independently increase the risk of being hospitalized for a respiratory issue. Hospitalization is a collider. If you restrict your study to hospitalized patients, you've conditioned on a collider, and you can induce a spurious negative association between smoking and depression even if no causal relationship exists.

**Kiffer:** That's a real failure mode. Hospital-based studies routinely produce findings that don't replicate in general population samples for exactly this reason. The directed acyclic graph framework forces you to notice the collider before you collect the data, not after.

**Sarah:** The directed acyclic graph approach has been transformative in modern epidemiology because it gives you a principled basis for choosing your adjustment set. Subject-matter knowledge first, statistics second.

**Kiffer:** And there's one more useful concept the lesson highlights. The lesson distinguishes a population confounder from a sample confounder. A population confounder is a variable that is known or regularly reported to be a confounder in the target population. You should control for it regardless of what your sample data show. A sample confounder is a variable that appears to be a confounder in your study data but may not really be one in the population. You should not control for it unless there is substantive reason to believe it should be controlled.

**Sarah:** Which is exactly the warning against the change-in-estimate rule used as a sole criterion. A variable might shift your estimate just by chance, and adjusting for it could make things worse if it's actually a collider or a mediator. The diagram plus subject-matter knowledge tells you what to do, not the in-sample shift alone.

**Kiffer:** Okay. Suppose you've identified a confounder, by whichever method. How do you actually control for it? The lesson gives five methods, in roughly increasing order of complexity.

**Sarah:** Method one is restriction, the simplest. Limit your study to a single value of the confounder. If you're worried about confounding by sex, study only women. If you're worried about confounding by age, restrict to a narrow age band.

**Kiffer:** And restriction works because, within the restricted group, the confounder no longer varies. Everyone is the same on it. So it can't possibly be associated with the exposure, and it can't distort the estimate.

**Sarah:** The cost is twofold. You shrink your sample size, sometimes drastically. And you limit generalizability. Your study now applies only to people who match the restriction criteria. You can't say anything about men, or about older adults, or whatever you restricted out.

**Kiffer:** Method two is matching. You pair each exposed person with an unexposed person who shares the confounder values. Or more generally, you force the distribution of the confounder to be the same in your exposed and unexposed groups.

**Sarah:** There are two flavors. Frequency matching adjusts the overall distribution. You make sure the proportion of women in your exposed group equals the proportion of women in your unexposed group, and similarly for age bands and any other confounder. Pair matching links specific individuals. Each exposed person gets matched to a specific unexposed person, or set of unexposed people, with the same confounder values.

**Kiffer:** Matching has a subtlety the lesson is careful about. In a cohort study, where the disease has not yet happened when you match, matching genuinely eliminates confounding by the matched variables. But in a case-control study, where the disease has already happened, matching introduces a selection bias that has to be corrected analytically. So matching in case-control studies always requires a matched analysis afterward.

**Sarah:** And there is a hazard called overmatching. If you match on a variable that is associated with the exposure but not actually a confounder, you reduce the variability in exposure between groups without reducing confounding. The result is loss of statistical efficiency. So match only when you have a clear reason to believe the variable is a confounder.

**Kiffer:** Method three is stratification. You compute the association within each stratum of the confounder, then combine the stratum-specific estimates into a single summary.

**Sarah:** The workhorse here is the Mantel-Haenszel procedure, named after Nathan Mantel and William Haenszel, who developed the method in the nineteen fifties. The Mantel-Haenszel estimator computes a weighted average of the stratum-specific odds ratios, where the weights are designed to give more weight to strata with more information.

**Kiffer:** The intuition is straightforward. You're basically running a separate two-by-two table within each level of the confounder, then combining them. Within each stratum, the confounder is held constant, so it can't distort the within-stratum association. You then pool across strata to get a single summary measure.

**Sarah:** Let me work through a concrete example so the Mantel-Haenszel idea has texture. Suppose you're studying whether a bacterium called Streptococcus pneumoniae is associated with childhood respiratory disease. The crude odds ratio comes out to three point three. You're worried that respiratory syncytial virus, often abbreviated as RSV, another respiratory pathogen often present in the same kids, might be confounding the relationship.

**Kiffer:** You stratify your data by respiratory syncytial virus status. In children with the virus, the odds ratio for the bacterium-respiratory disease association is two point zero. In children without the virus, the odds ratio is also two point zero. The two stratum-specific odds ratios agree, so you can pool them with the Mantel-Haenszel estimator and get an adjusted odds ratio of two point zero.

**Sarah:** And the comparison tells the story. Crude odds ratio of three point three, adjusted of two point zero. The crude estimate was inflated by the confounding effect of the virus. After adjustment, the true effect of the bacterium on respiratory disease is closer to a doubling than a tripling. That's the Mantel-Haenszel procedure in action.

**Kiffer:** And there is a critical preliminary step before you trust the Mantel-Haenszel summary. You have to test whether the stratum-specific odds ratios are approximately equal. That's called testing for homogeneity. If the strata disagree substantially, you have effect modification, also called interaction. The exposure-outcome relationship genuinely differs across levels of the third variable. In that case, a single summary is misleading. You should report stratum-specific estimates separately.

**Sarah:** The lesson is sharp on the difference between confounding and effect modification. Confounding is a bias to be removed. Effect modification is a real biological or social phenomenon to be reported. Same-looking statistical pattern in some respects, very different interpretation.

**Kiffer:** Stratification works beautifully for one or two confounders with a few levels each. It breaks down quickly when you have many confounders, or continuous confounders, because the number of strata explodes and many cells end up empty.

**Sarah:** Method four is multivariable regression. You include all the confounders as covariates in a regression model. The most common form is logistic regression for binary outcomes. The coefficient on the exposure, holding the other variables constant, is the adjusted estimate.

**Kiffer:** This is the workhorse of modern epidemiology. It scales to many confounders. It handles continuous variables. It can incorporate interactions. The price is that you have to specify the model correctly. If you assume linearity when the relationship is curved, or additivity when there's interaction, the regression will give you a confidently wrong answer.

**Sarah:** And the lesson notes a rule of thumb. If the coefficient for your exposure changes by more than thirty percent when you add a putative confounder to the model, that's substantial confounding. But you should be using the change-in-estimate rule alongside diagram-based reasoning, not as a replacement for it.

**Kiffer:** Method five is propensity score methods, the most sophisticated of the five. The idea is to collapse all the measured confounders into a single number called the propensity score.

**Sarah:** Quick definition. The propensity score is the conditional probability of being exposed, given a person's measured covariates. So for each person in your study, you build a model, usually logistic regression, that predicts their probability of being in the exposed group based on age, sex, comorbidities, lab values, whatever you've measured. The output is a single number between zero and one.

**Kiffer:** And here is the magic. If two people have the same propensity score, they were equally likely to end up exposed given everything you measured. So comparing their outcomes is like a small randomized comparison, conditional on the measured covariates.

**Sarah:** There are four ways to use propensity scores. First, propensity score matching. You match each exposed person to one or more unexposed people with similar propensity scores, then compare outcomes within the matched sample.

**Kiffer:** Second, propensity score weighting, often called inverse probability of treatment weighting, or I-P-T-W. We'll get back to this in a minute. You weight each person by the inverse of their probability of receiving the exposure they actually received. The weighted sample behaves as if exposure had been randomly assigned.

**Sarah:** Third, propensity score stratification. You divide the sample into propensity score strata, often quintiles, and compute the effect within each stratum, then pool. Same idea as Mantel-Haenszel stratification but on a one-dimensional summary instead of the original confounders.

**Kiffer:** Fourth, propensity score as a covariate. You just put the propensity score into your outcome regression model. This is less common because it doesn't use the propensity score's full balancing property.

**Sarah:** And one important constraint. Propensity score analysis is limited to the region of common support. That's the range of propensity scores where both exposed and unexposed people exist. Outside that range, you have no comparator, and you have to drop those observations or accept that you can't estimate an effect for them.

**Kiffer:** Now let's spend some time on a particular flavor of confounding that is endemic in pharmacoepidemiology. Confounding by indication.

**Sarah:** Pharmacoepidemiology, briefly, is the study of the use and effects of medications in populations. And the central methodological challenge is this. Doctors don't prescribe drugs at random. They prescribe drugs to people who have the indication for the drug, and the indication is itself usually a marker of disease severity, comorbidity, or some other risk factor for outcomes.

**Kiffer:** Walk through the standard example. You want to study whether a particular antidepressant causes increased risk of suicide. You compare people who were prescribed the drug to people who were not, and you find that the prescribed group has a higher rate of suicide.

**Sarah:** And the immediate problem is that the people who got prescribed antidepressants were depressed. Depression is itself a major risk factor for suicide. So the higher suicide rate in the prescribed group might reflect their underlying depression, not the drug. The reason for prescription, the indication, is itself a confounder.

**Kiffer:** Confounding by indication is particularly hard to handle because the indication is often imperfectly measured, and because it's almost always associated with severity, which is also imperfectly measured.

**Sarah:** One partial mitigation is the active comparator design. Instead of comparing treated to untreated, you compare two active treatments that share the same indication. So instead of comparing people on this antidepressant to people not on any antidepressant, you compare them to people on a different antidepressant. Both groups have the indication. Both groups have approximately the same severity profile. The confounding by indication is greatly reduced.

**Kiffer:** Active comparator designs are now considered best practice in many areas of pharmacoepidemiology. They don't fully eliminate confounding, because there can be subtle differences in why one drug versus another gets prescribed, but they handle the bulk of the indication problem.

**Sarah:** Okay. Now let's get into one of the more advanced topics in the lesson. Time-varying confounding.

**Kiffer:** Time-varying confounding arises when a confounder changes over time, and when its values both depend on past treatment and predict future treatment and future outcome. The standard example is treatment of human immunodeficiency virus infection. We'll use H-I-V as the abbreviation from now on.

**Sarah:** Let's set up the example carefully. We're interested in the effect of antiretroviral therapy on mortality in people living with H-I-V. The treatment can be started, stopped, or modified over time. The CD4 cell count, where CD4 is a type of immune cell called a T-helper cell, is a measure of immune function. Lower CD4 count means worse immune function. CD4 is the standard biomarker for monitoring H-I-V disease progression.

**Kiffer:** Here's the structure of the problem. CD4 cell count predicts mortality. Lower CD4 means higher risk of dying. CD4 also predicts treatment decisions. Doctors are more likely to start or intensify therapy when CD4 drops. And critically, past treatment affects current CD4. Effective therapy raises CD4 over time.

**Sarah:** So CD4 is simultaneously a confounder of the treatment-mortality relationship, because it predicts treatment and predicts mortality, and a mediator of the treatment-mortality relationship, because past treatment changes current CD4 and current CD4 affects future mortality.

**Kiffer:** And this is where standard regression breaks down. If you don't adjust for CD4, you have classic confounding. Sicker patients get more treatment, so it looks like treatment is associated with worse outcomes. If you do adjust for CD4 in a standard regression, you block part of the causal pathway through which treatment helps people, because raising CD4 is one of the mechanisms of benefit.

**Sarah:** You're stuck. Adjusting too little leaves confounding. Adjusting in the standard way blocks the mediating pathway. Standard regression cannot resolve the conflict because it can't simultaneously condition on CD4 and not condition on CD4.

**Kiffer:** The solution is the marginal structural model. We'll spell that out. Marginal structural model, or M-S-M. The idea was developed by James Robins, Miguel Hernan, and Babette Brumback in a foundational paper from the year two thousand.

**Sarah:** The marginal structural model works by reweighting your sample so that, in the reweighted pseudo-population, treatment is no longer confounded by past CD4. Specifically, each person's contribution at each time point is weighted by the inverse probability of receiving the treatment they actually received, given their history.

**Kiffer:** Let me spell out that abbreviation. Inverse probability of treatment weighting, or I-P-T-W. Each person at each time point gets a weight equal to one divided by the probability they received their observed treatment, given everything that happened up to that point.

**Sarah:** Conceptually, inverse probability of treatment weighting takes a person who was unlikely to receive their treatment, given their history, and gives them more weight. So they stand in for the people like them who didn't get the same treatment. The weighted pseudo-population looks as if treatment had been randomly assigned at each time point.

**Kiffer:** And once you have the weighted pseudo-population, you fit a simple marginal model, regressing the outcome on treatment, without conditioning on CD4. Because the weighting has handled the confounding, you don't need to condition on CD4 in the outcome model. So you don't block the mediating pathway.

**Sarah:** The marginal structural model with inverse probability of treatment weighting gives you a valid estimate of the total effect of treatment on mortality, including the pathway through CD4. It's the right tool when a variable is simultaneously a confounder and a mediator.

**Kiffer:** Marginal structural models are now standard in H-I-V research, in many areas of comparative effectiveness research, and increasingly in occupational and environmental epidemiology where exposures are time-varying.

**Sarah:** Before we move on, let's say a few words about a useful taxonomy the lesson lays out. The eight types of extraneous variable relationships.

**Kiffer:** Right, this is the part of the lesson that asks you to slow down and really think about how a third variable can sit in relation to your exposure and your outcome. Not every extraneous variable is a confounder, and the framework gives you a vocabulary for the cases.

**Sarah:** The eight types include things like an exposure-independent variable, which causes the outcome but is not associated with the exposure and so is not a confounder. A simple antecedent, which causes the exposure but only affects the outcome through the exposure, so again not a confounder. An explanatory antecedent, which is the classic confounder pattern. A mediator, which sits on the causal pathway. A distorter, which creates a spurious association where none truly exists. A suppressor, which hides a true association. And a moderator, which is effect modification.

**Kiffer:** And the practical payoff is a decision guide. For each type, the lesson tells you whether adjusting for the variable changes the exposure-outcome estimate, whether there's an exposure-third-variable association, and whether there's a third-variable-outcome association. That triad of indicators lets you walk into messy data and start classifying what role each variable is plausibly playing.

**Sarah:** The big takeaway is that controlling for a variable is not a neutral statistical move. Depending on which of these eight roles the variable is playing, controlling for it could remove bias, introduce bias, mask a true effect, or reveal one that was hidden. The classification has to come first.

**Kiffer:** Okay. Now we have to face the hardest version of the problem. Unmeasured confounding.

**Sarah:** All the methods we've discussed, restriction, matching, stratification, regression, propensity scores, marginal structural models, all of them require that the confounder be measured. None of them help with confounders you didn't collect data on.

**Kiffer:** And in observational research, there is essentially always something you didn't measure. The question becomes, how do you assess the threat that unmeasured confounding poses to your conclusions?

**Sarah:** The first tool is sensitivity analysis using the E-value. Let me describe that in plain words. The E-value is a quantity that asks the following question. How strong would an unmeasured confounder have to be, both in its association with the exposure and in its association with the outcome, to fully explain away the observed effect?

**Kiffer:** If the E-value comes out to be modest, say, around one point five or two, that means a relatively weak unmeasured confounder could explain your finding. Your result is fragile. If the E-value comes out to be large, say, five or ten, that means only a very strong unmeasured confounder could explain your finding. Your result is robust.

**Sarah:** The E-value was developed by Tyler VanderWeele and Peng Ding in a two thousand seventeen paper in Annals of Internal Medicine. It has become a standard sensitivity analysis tool in modern epidemiology, partly because the formula is simple and partly because the interpretation is intuitive.

**Kiffer:** One nuance worth flagging. The E-value tells you how strong a single unmeasured confounder would have to be. It assumes the worst-case configuration of that confounder. So if you have a real but modest E-value, you're not guaranteed your finding is wrong. You're being told to think hard about whether a confounder of that strength is plausible given what you know about the field.

**Sarah:** And in practice, when you compare the E-value to the strengths of measured confounders in your study, you can often make a defensible argument. If your strongest measured confounder shifts the estimate by, say, one point five, and the E-value for the residual association is four, it would take an unmeasured confounder more than twice as strong as anything you measured to fully explain the result. That's often enough to call the finding robust, though not certain.

**Kiffer:** The second tool is negative controls. The idea is to find an exposure or an outcome that should not be associated with the exposure or outcome of interest, except through the same confounding structure. If you observe an association in the negative control, it tells you confounding is operating.

**Sarah:** Here's an example. Suppose you find that a particular drug is associated with reduced cardiovascular mortality. You're worried about healthy user bias, the tendency for people who take medications consistently to also be healthier in general. To test this, you look at whether the drug is associated with deaths from car accidents, which the drug couldn't plausibly affect biologically. If you find an association with car accident mortality too, that's a sign of healthy user confounding.

**Kiffer:** Negative control exposures work the other way. You find an exposure that should have no effect on your outcome but is subject to similar confounding. If you observe an association, that's evidence of residual confounding.

**Sarah:** Third tool is triangulation. You compare results across study designs that have different confounding structures. If the same effect estimate comes out of an observational cohort, an instrumental variable analysis, a sibling comparison, and a Mendelian randomization study, you can be more confident the effect is real, because each design has different sources of bias.

**Kiffer:** Triangulation is one of the most powerful tools in modern epidemiology, and it's basically Bradford Hill's old idea of consistency, dressed up in modern causal language. No single study is decisive. The convergence of multiple imperfect designs is what produces strong evidence.

**Sarah:** Let's now turn to the most politically charged part of the lesson. The theoretical critique of how confounding control treats race, gender, and socioeconomic status.

**Kiffer:** The standard practice in epidemiology is to throw race, gender, and socioeconomic status into the regression model as confounders. The intent is to isolate the effect of the primary exposure, holding these social variables constant.

**Sarah:** And the lesson argues that this practice embeds a theoretical claim that is often not defensible. Race, gender, and class are not isolated individual attributes the way height or blood pressure are. They are markers of structural processes. Patterns of who gets what, who lives where, who is treated how.

**Kiffer:** Take race in the context of the United States. Race in the United States is a constructed category that indexes a long history of slavery, segregation, redlining, mass incarceration, and ongoing racial discrimination in employment, housing, and healthcare. The downstream consequences include differences in income, education, neighborhood resources, exposure to pollution, exposure to police violence, and access to healthcare.

**Sarah:** Now imagine you want to study how race relates to a health outcome. The standard regression approach adjusts for income, education, and neighborhood. The intent is to isolate the effect of race, net of these factors.

**Kiffer:** But income, education, and neighborhood are not separate from race. They are mechanisms through which structural racism operates. Adjusting for them is over-adjustment. You are adjusting away the very pathways that connect the structural process to the outcome.

**Sarah:** The result is that you produce a residual race effect that looks small or statistically insignificant. You then publish a finding that says, after adjustment, race doesn't predict the outcome. And you've used statistical methods to erase the structural process you should have been describing in the first place.

**Kiffer:** The same logic applies to gender. Gender is not just an individual attribute. It indexes patterns of caregiving expectations, occupational segregation, gendered violence, and gendered medical care. Adjusting for occupation when you're studying gender effects on cardiovascular disease may erase the gendered occupational pathway.

**Sarah:** And the same logic applies to socioeconomic status. Income is not just an individual attribute. It reflects intergenerational wealth transfer, labor market discrimination, neighborhood effects, schooling. Adjusting for current income when studying income effects on health may erase the intergenerational mechanism.

**Kiffer:** The lesson's point is not that you should never adjust for these variables. The point is that the choice to adjust is a theoretical commitment, not a neutral statistical move. You have to ask, what causal structure am I assuming, and is that structure defensible?

**Sarah:** And in many cases, the right move is to estimate the total effect of the structural variable, without adjusting for its mechanisms, and then to use mediation analysis to describe the pathways. That's a different question than the standard adjusted analysis answers, but often it's the question you actually care about.

**Kiffer:** This connects back to the paradigms discussion earlier. The choice of paradigm shapes what you can see. Treating structural variables as ordinary confounders is a positivist move that abstracts away from social context. A critical theory perspective insists that the social context is the phenomenon, not a nuisance to be statistically removed.

**Sarah:** Okay. Big lesson. Let's pull the takeaways together.

**Kiffer:** First takeaway. Three formal conditions for confounding. Associated with the exposure. Independent risk factor for the outcome. Not on the causal pathway. The third condition is what distinguishes confounders from mediators. Confounders should be adjusted for. Mediators should not, if you want the total effect of the exposure.

**Sarah:** Second takeaway. Two approaches to detecting confounding. The classical change-in-estimate approach with a ten percent threshold is useful but has real limits, including non-collapsibility of the odds ratio and reliance on chance distribution in your sample. The directed acyclic graph approach grounded in subject-matter knowledge is more principled. Draw the structure, identify backdoor paths, block them, and don't condition on mediators or colliders.

**Kiffer:** Third takeaway. Five methods of control, in roughly increasing complexity. Restriction. Matching. Stratification with Mantel-Haenszel weighted averages. Multivariable regression. Propensity score methods, including matching, weighting, and stratification on the propensity score. Each has trade-offs. The right choice depends on the number and type of confounders and on your willingness to make modeling assumptions.

**Sarah:** Fourth takeaway. Confounding by indication is endemic in pharmacoepidemiology. The reason a treatment was prescribed is usually a strong predictor of the outcome. Active comparator designs, comparing two active treatments rather than treated to untreated, partially mitigate this.

**Kiffer:** Fifth takeaway. Time-varying confounding arises when a variable is both a confounder and a mediator at different time points. The H-I-V CD4 cell count example. Standard regression cannot resolve this. Marginal structural models with inverse probability of treatment weighting, from Hernan, Brumback, and Robins in the year two thousand, are the standard solution. The weighting creates a pseudo-population where treatment is unconfounded, and the simple outcome model on the pseudo-population gives you the total effect.

**Sarah:** Sixth takeaway. Unmeasured confounding never goes fully away. Sensitivity analysis with the E-value tells you how strong an unmeasured confounder would have to be to explain your result. Negative control exposures and outcomes test for residual confounding through associations that should be null. Triangulation across study designs with different confounding structures is the strongest evidence pattern observational research can produce.

**Kiffer:** And seventh takeaway. The theoretical critique. Treating race, gender, and socioeconomic status as ordinary confounders embeds a theoretical claim that may not be defensible. These variables index structural processes, not isolated individual attributes. Adjusting for downstream consequences like income, education, or neighborhood when studying race effects can be over-adjustment that erases the very mechanism you should be studying. The choice to adjust is a theoretical commitment, not a neutral statistical move.

**Sarah:** And one practical recommendation to leave students with. When you are reading a paper and the authors report an adjusted estimate, pause and ask three questions. Did they justify the adjustment set with a directed acyclic graph or with subject-matter reasoning, or did they just throw variables in? Did they distinguish confounders from mediators? And if structural variables are involved, did they think about whether the adjustment is erasing the mechanism?

**Kiffer:** Those three questions, asked of every adjusted analysis, will dramatically improve your ability to read epidemiology critically.

**Sarah:** And that brings the course to a close. We've built a full toolkit this term — from causal concepts and sampling, through measurement and study design, through measures of disease frequency and association, through validity, and now through confounding and causal inference. You have the conceptual machinery to read, design, and critique epidemiologic research at a professional level.

**Kiffer:** And the natural next step is this material, Exploratory Data Analysis for Epidemiology. That's where you take this conceptual toolkit and put your hands on real data. Twelve lessons that move you from a clean dataset to a defensible analysis, with R, descriptive methods, visualization, regression modelling, and your own term-long applied project.

**Sarah:** If you're continuing on with us into the analytic-methods material, we'll see you there. If this is where your epidemiology journey pauses for now, you're leaving with the most important skill in the field. The ability to ask what could be confounding this, and to demand a serious answer.

**Kiffer:** Take care, everyone.

**Sarah:** Thanks for spending the term with us.
