HSCI 341 — Lesson 16

Confounding: Detection and Control

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Apply criteria to identify potential confounders in observational studies
  • Use restricted sampling and matching to prevent confounding
  • Implement matching in both cohort and case-control study designs
  • Use causal diagrams (DAGs) to identify confounders needing control
  • Apply stratified analysis (Mantel-Haenszel) to control confounding and assess interaction
  • Understand propensity scores, instrumental variables, and marginal structural models
  • Evaluate the potential of unmeasured confounders using sensitivity analysis
  • Interpret the effects of controlling different types of extraneous variables

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1 of 5

Introduction & Pre-Analysis Control of Confounding

⏱ Estimated reading time: 25 minutes

13.1 Introduction

A central focus of epidemiological research is to identify factors that contribute to the occurrence of disease. Randomised controlled trials (RCTs) provide a probabilistic basis for balancing factors between groups. However, in observational studies we cannot randomly assign exposures, so confounding is always a concern.

What Is Confounding?

Confounding can be described as the mixing together of the effects of 2 or more factors. When confounding is present, we might think we are measuring the association between an exposure and an outcome, but the observed measure also includes the effects of one or more extraneous factors. These extraneous factors that produce the bias are called confounders or confounding factors.

13.1.1 Which Extraneous Factors Are Confounders?

A factor is a confounder if:

  1. It is a cause of the disease, or a surrogate for a cause, and
  2. It precedes and is associated with the exposure in the source population, and
  3. Its distribution across exposure levels cannot be determined by the exposure (i.e., it is not an intervening factor) or by the disease (i.e., it is not a result of the disease)
Important Distinction

Population confounder: known or regularly reported to be a confounder in the target population — should be controlled regardless of sample data.

Sample confounder: appears to be a confounder in the study data but may not truly be one in the population. We should not control for it unless there is substantive evidence.

Example 13.1: A Demonstration of Confounding

Investigating the relationship between Streptococcus pneumoniae (STREP) and childhood respiratory disease (CRD), with RSV (respiratory syncytial virus) as a potential confounder:

STREP+STREP−OR
CRD+240403.3 (crude)
CRD−62603460

When stratified by RSV status, the stratum-specific ORs are both 2.0, while the crude OR is 3.3. The >30% difference indicates confounding by RSV is present. The stratum-specific OR of 2.0 is the best estimate of the causal association.

13.2 Control of Confounding Prior to Data Analysis

We can prevent and control confounding using three general procedures:

Exclusion
Click to explore
🤝
Matching
Click to explore
📊
Analytic Control
Click to explore

13.3 Matching on Confounders

In a cohort study, matching makes the exposure independent of the matched extraneous variable so there can be no confounding. The matched variable(s) can still exert an effect on the outcome, but it has the same effect in both exposure groups.

Because the outcome (e.g., disease) has not happened at the time of matching, the matching process is independent of the outcome. No analytical control of the matched confounder is necessary, and there is no bias in the summary table.

In case-control studies, the disease has already occurred when matching takes place. Matching will actually introduce a selection bias. The stronger the exposure-confounder association, the greater the bias (generally toward the null).

This bias must be controlled by stratified or matched analysis — the matched variable(s) must be included in the analytical approach.

Overmatching

Do not match unless you are certain the variable is a confounder. Matching on a variable strongly associated with exposure but not a confounder leads to overmatching — giving the distribution of exposure in controls greater similarity to cases than in the source population, which can reduce precision.

Frequency vs. Pair Matching

FeatureFrequency MatchingPair Matching
MethodOverall distribution made equalIndividual-level matching (1:m)
AnalysisStratified (MH procedure)Matched-pair analysis (McNemar’s test)
InteractionCan assess interactionDifficult to assess interaction
Best whenConfounder has few levelsMany variables or refined categories
Control-to-case ratioVariableFixed (1:1, 1:4, etc.); minimal gain beyond 4:1

Analysing Matched Data

For pair-matched data in a case-control study with 1:1 matching, we analyse the four possible exposure patterns. Only the discordant pairs (case exposed/control unexposed, or case unexposed/control exposed) contribute information:

Eq 13.2 & 13.3 — Matched OR and McNemar’s Test
ORmatch = u / v

McNemar’s χ² = (u − v)² / (u + v)

where u = pairs where case is exposed and control is not, and v = pairs where case is not exposed and control is.

Reflection

Why does matching in case-control studies introduce selection bias while matching in cohort studies does not? Think about the timing of when disease occurs relative to the matching process.

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 1

1. Which of the following is not a criterion for a factor to be a confounder?

Statistical significance is not a criterion for confounding. A factor is a confounder based on its causal relationships — it must be a cause of the disease (or surrogate), associated with the exposure, and not an intervening variable. Statistical criteria alone are insufficient for identifying confounders.

2. In a case-control study, matching on a confounder:

In case-control studies, disease has already occurred when matching takes place. Matching alters the exposure distribution in controls to resemble that in cases, introducing selection bias (generally toward the null). This must be corrected through stratified or matched analysis.

3. The McNemar’s test is used for:

McNemar’s test is equivalent to the Mantel-Haenszel χ² test for 1:1 pair-matched case-control data. It uses only the discordant pairs to test whether the odds of exposure differ significantly between cases and controls.
Section 2 of 5

Detection of Confounding & Stratified Analysis

⏱ Estimated reading time: 25 minutes

13.4 Detection of Confounding

13.4.1 Using Causal Diagrams (DAGs)

Identifying which potential confounders need to be controlled can be accomplished using directed acyclic graphs (DAGs). The process:

  1. Draw the diagram using the principles from Chapter 1
  2. Eliminate all arrows emanating from the exposure factor of interest (CIG)
  3. If any paths still connect the exposure to the outcome, the causally prior factors and non-intervening variables on these paths must be controlled
  4. Connect marginally independent factors that become conditionally associated when a common effect is controlled (shown as a dashed line)
Example 13.4: DAG for Smoking & Birth Weight

In studying the effect of cigarette smoking (CIG) on birth weight (BWT), with RACE, COLLEGE, TBO (total birth order), and WTGAIN as additional factors:

  • After removing direct causal arrows from CIG, the path from CIG to BWT through WTGAIN remains — but WTGAIN is an intervening variable and should not be controlled
  • TBO needs to be controlled (causal path from TBO to CIG)
  • Controlling TBO makes COLLEGE and RACE conditionally associated — either (or both) must be controlled to break the remaining pathway

13.4.2 Change in Measure of Association

A practical approach: compare the crude OR (ORc) with the adjusted OR (ORa) obtained after stratification. If the change exceeds 20–30%, confounding is considered important.

Three Important Notes
  • Always use the unadjusted values as the baseline when computing % change
  • For ratio measures (OR), compute % change on the log scale (% change in lnOR)
  • Apply the % change criterion only to statistically significant variables; non-significant variables with lnOR ≈ 0 can have very large % changes

Non-Collapsibility of Odds Ratios

The odds ratio is not always collapsible — even in the absence of confounding, the crude OR can differ from the stratum-specific ORs. This typically occurs when outcome frequency is high. A >20–30% change in OR might look like confounding but could simply be non-collapsibility.

13.5 Analytic Control: The Mantel-Haenszel Estimator

The Mantel-Haenszel (MH) procedure is the most widely used stratified analytic approach. It involves physically stratifying data by levels of the confounder(s), examining stratum-specific ORs, and computing a pooled ‘adjusted’ estimate.

Key Formulae

Eq 13.4 — Stratum-Specific OR
ORj = (a1j × b0j) / (a0j × b1j)
Eq 13.7 — Mantel-Haenszel Adjusted OR
ORMH = Σ(a1j × b0j / nj) ⁄ Σ(a0j × b1j / nj)
Eq 13.8 — Wald Test for Homogeneity
χ²homo = Σ [(lnORj − lnORMH)² / var(lnORj)]
Eq 13.9 — Overall Test (ORMH = 1?)
χ²MH = (Σa1j − ΣEj)² / ΣVj

13.5.2 Interaction

Interaction occurs when the combined effect of 2 variables differs from the sum (or product) of their individual effects. There are 3 types of joint effects:

Additive Scale
Click to explore
Multiplicative Scale
Click to explore
Synergism & Antagonism
Click to explore
Key Rule: When Interaction Is Present

When stratum-specific measures differ significantly (interaction is present), we should not compute a single summary ORMH. Instead, we must report stratum-specific estimates because the effect of the exposure depends on the level of the other variable. This phenomenon is also called effect modification.

Reflection

Consider a study where the crude OR is 1.69 and the Mantel-Haenszel adjusted OR is 1.97 (a 17% change). Would you consider this sufficient evidence of confounding? What factors would influence your decision?

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 2

1. In the Mantel-Haenszel procedure, before interpreting ORMH, you should first:

Before interpreting ORMH as a valid summary measure, you must confirm that the stratum-specific ORs are approximately equal (homogeneous). If they differ significantly, interaction is present and a single summary measure is misleading.

2. Non-collapsibility of the odds ratio means that:

Non-collapsibility is a mathematical property of the OR: when collapsed (marginalised) over strata, the crude OR can differ from the stratum-specific ORs even when no confounding is present. This is especially problematic when outcome frequency is high.

3. In the context of interaction, if RR10 × RR01 ≠ RR11, this indicates interaction on the:

When the product of individual relative risks does not equal the joint relative risk (RR10 × RR01 ≠ RR11), this is the definition of interaction on the multiplicative scale. Additive interaction uses the risk difference: RD10 + RD01 ≠ RD11.
Section 3 of 5

Alternative Methods & Propensity Scores

⏱ Estimated reading time: 25 minutes

13.6 Multivariable Modelling

The most commonly used analytical method for controlling confounding is to include confounders in a multivariable model (e.g., logistic regression). The effect of the exposure is estimated while holding other factors constant.

Rule of Thumb

If the coefficient for a predictor changes by >30% when a putative confounder is added to the model, then substantial confounding exists. Note that the ‘adjusted’ measures from multivariable models are direct causal effects only, not total causal effects.

13.7 Other Approaches to Control Confounding

13.7.1 Standardised Risk Ratios (SRR)

Standardisation uses stratum-specific risks applied to a standard population. The SRR compares observed vs. expected number of cases:

Standardised Risk Ratio
SRR = (observed cases) / (expected cases using standard rates)

Unlike the MH estimator, the SRR provides a valid summary even in the presence of interaction, because the population of interest is specified. The SRR is a non-parametric method based on physical stratification.

13.7.2 Marginal Structural Models

The marginal structural model uses weights to create an unconfounded pseudo-population from which the causal effect can be estimated using a crude (marginal) measure.

The weight assigned to each subject is the inverse probability of treatment weight (IPTW): WT = 1/pE, where pE = p(E=e|C) is the conditional probability of the observed exposure given confounders.

The total pseudo-population is twice the size of the observed population and contains information on the counterfactual outcome. The IPTW estimate is equivalent to the SRRtot estimate.

13.7.3 Instrumental Variables

An instrumental variable (IV) Z must meet 3 requirements:

  1. It has a direct causal effect on the exposure E
  2. It is unrelated to the outcome D except through E
  3. It shares no common causes with the outcome

The true causal effect (TCE) is estimated as:

True Causal Effect via IV
TCE = [p(D+|Z=1) − p(D+|Z=0)] / [p(E+|Z=1) − p(E+|Z=0)]

The key advantage: we do not need to condition on confounders C. The IV approach bypasses confounding entirely. However, finding a valid IV in observational studies is very difficult.

13.8 Propensity Scores

A propensity score (PS) is the conditional probability of being treated/exposed given measured covariates: p(E+|C). Propensity scores condense multiple confounders into a single scalar summary.

13.8.1 Computing Propensity Scores

With 1–2 categorical confounders, PSs can be calculated manually. With more confounders, use a logit or probit model predicting treatment (exposure) allocation as the outcome. Include all potential confounders (known or suspected) and their interactions.

13.8.2 Balancing Exposure Groups

A study is balanced if: (1) the average PS value is the same in exposed and non-exposed within each PS stratum, and (2) the mean of all covariates making up the PS is equal across groups within each stratum.

Analysis is limited to the region of common support — observations falling in the range of PSs that includes both exposed and non-exposed individuals.

13.8.3–6 Using Propensity Scores

PSs can be used in four ways:

MethodDescription
MatchingMatch exposed to non-exposed with similar PSs. Methods: nearest-neighbour, radius, kernel matching
StratificationDivide into PS strata (blocks); compute att within each stratum and pool
Covariate in modelInclude PS as a continuous or categorical variable in the regression model
Weighting (IPTW)Weight observations by inverse of PS to create pseudo-population

The most common effect measure with PS methods is the average treatment effect in the treated (att): the difference in outcome between treated (exposed) and non-treated (non-exposed) groups.

Reflection

Compare the propensity score approach to traditional multivariable regression for controlling confounding. In what situations might propensity scores be preferable, and what are their limitations?

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 3

1. A propensity score is best described as:

A propensity score is defined as the conditional probability of being treated/exposed given measured covariates: p(E+|C). It condenses multiple confounders into a single summary scalar that can be used for matching, stratification, weighting, or as a model covariate.

2. An instrumental variable must satisfy all of the following except:

A valid instrumental variable must NOT be directly associated with the outcome — it should only affect the outcome through its effect on the exposure. If the IV is directly associated with the outcome, it violates the exclusion restriction and the causal estimate will be biased.

3. The “region of common support” in propensity score analysis refers to:

The region of common support includes PS values where both exposed and non-exposed individuals exist. Observations outside this range are excluded because they have no valid comparators, making causal inference impossible for those individuals.
Section 4 of 5

Unmeasured Confounders & Causal Relationships

⏱ Estimated reading time: 25 minutes

13.9 Unmeasured / Unknown Confounders

The Hidden Threat: All methods discussed so far — restriction, matching, stratification, multivariable modelling, propensity scores — require that confounders be measured. But what if a critical confounder was never collected, or is entirely unknown? Residual confounding from unmeasured variables is one of the most important limitations of observational research.

External Adjustment

When a confounder was not measured in the study but information about its distribution exists from external sources, we can estimate what the adjusted measure would have been using external adjustment. The method works by estimating the cell values that would be expected if the confounder had been measured.

Equation 13.12 — External Adjustment Cell Estimation
For a 2×2 table stratified by an unmeasured binary confounder Z:

a1 = a × p1     b1 = b × p2     c1 = aa1     d1 = bb1

where p1 = prevalence of Z among exposed cases,
p2 = prevalence of Z among unexposed cases

The Mantel-Haenszel OR is then calculated from these estimated strata.

Example 13.13 — External Adjustment for STREP-CRD

Click to explore how external data on RSV prevalence can be used to estimate adjusted OR when RSV was not directly measured in the study.

Example 13.13: External Adjustment

In our STREP-CRD study, suppose RSV status was not measured. From external data we know:

  • Among exposed (STREP+) cases: 40% have RSV → p1 = 0.40
  • Among unexposed (STREP−) cases: 10% have RSV → p2 = 0.10

Using the crude data (a = 70, b = 30, c = 90, d = 210):

Stratumabcd
RSV+ (estimated)2834227
RSV− (estimated)422748183

The MH OR from these estimated strata approximates the adjusted OR, illustrating how external information can help address unmeasured confounding — though with important assumptions about the accuracy of the external prevalence data.

Sensitivity Analysis

When no external data are available, sensitivity analysis explores how strong an unmeasured confounder would need to be to explain away an observed association. This does not eliminate confounding but quantifies the threat it poses to the study’s conclusions.

Key Question: “Could an unmeasured confounder plausibly be strong enough to account for the observed association?” If the required confounder-disease association or confounder-exposure prevalence difference is implausibly large, the finding is more robust.

Example 13.14 — Sensitivity Analysis for Unmeasured Confounding

Click to see how varying assumptions about an unmeasured confounder’s strength affects the adjusted estimate.

Example 13.14: Sensitivity Analysis

Suppose we observe a crude OR = 5.44 for the STREP-CRD association. We suspect an unmeasured confounder Z might exist.

We systematically vary two parameters:

  1. Prevalence difference of Z between exposed and unexposed groups
  2. Strength of Z-disease association (ORZD)
ORZDp1=0.4, p2=0.1p1=0.6, p2=0.1p1=0.8, p2=0.1
2.04.684.073.44
5.03.512.501.64
10.02.671.630.91

Even with a moderately strong unmeasured confounder (ORZD = 5, prevalence difference of 30%), the adjusted OR remains above 2.5 — suggesting the STREP-CRD association is reasonably robust to unmeasured confounding.

Reflect: Unmeasured Confounding

Think about a published observational study you have encountered (or one from class). What unmeasured confounders might threaten its conclusions? How could sensitivity analysis help evaluate the robustness of its findings?

13.10 Understanding Causal Relationships with Extraneous Variables

The relationship between exposure (E), disease (D), and an extraneous variable (F) can take many forms. Understanding these patterns is critical for correctly interpreting what happens when you “control for” a variable.

Three Statistical Indicators: For each type of E-F-D relationship, we can predict:
1. Whether E-D association changes after controlling for F
2. Whether there is an F-D association
3. Whether there is an E-F association

Eight Types of Extraneous Variable Relationships

1. Exposure-Independent Variable

Click to reveal

F → D   (no E-F link)

F causes D independently of E. Controlling for F does not change the E-D measure. There is an F-D association but no E-F association. F is not a confounder.

2. Simple Antecedent

Click to reveal

F → E → D

F causes E, which causes D. Controlling for F does not change the E-D measure. There is an F-D association and an E-F association. F is not a confounder — it acts through E.

3. Explanatory Antecedent (Complete)

Click to reveal

F → E and F → D  (no E→D)

F causes both E and D, but E does not cause D. Controlling for F eliminates the E-D association. This is complete confounding — the entire observed E-D link is spurious.

4. Explanatory Antecedent (Incomplete)

Click to reveal

F → E and F → D and E → D

F causes both E and D, but E also independently causes D. Controlling for F changes but does not eliminate the E-D association. This is partial confounding — the classic confounder scenario.

5. Intervening (Mediating) Variable

Click to reveal

E → F → D

E causes F, which causes D (F is on the causal pathway). Controlling for F reduces or eliminates the E-D association. F should generally not be controlled for, as it would mask E’s true effect.

6. Distorter

Click to reveal

F distorts a null E-D relationship

There is no true E-D association, but F creates a spurious one. Crude analysis shows E-D association; controlling for F reveals the null. Both F-D and E-F associations exist. A distorter is a confounder that creates a false positive.

7. Suppressor

Click to reveal

F suppresses a true E-D relationship

A true E-D association exists but is hidden in crude analysis because F masks it. Controlling for F reveals or strengthens the E-D association. A suppressor is a confounder that creates a false negative.

8. Moderator (Effect Modifier)

Click to reveal

F modifies the E → D effect

F changes the magnitude of the E-D association across its strata. Controlling for F reveals different stratum-specific measures. Effect modification is a biological phenomenon, not a bias — stratum-specific results should be reported separately.

Decision Guide: What Happens When You Control for F?

Type E-D changes? F-D assoc? E-F assoc? Confounder?
Exposure-independentNoYesNoNo
Simple antecedentNoYesYesNo
Explanatory (complete)Yes → nullYesYesYes
Explanatory (incomplete)Yes → attenuatedYesYesYes
Intervening variableYes → reducedYesYesNo*
DistorterYes → nullYesYesYes
SuppressorYes → strongerYesYesYes
ModeratorVaries by stratumMay varyMay varyNo**

*Controlling for an intervening variable is usually inappropriate.
**Effect modification is a biological phenomenon, not bias.

13.11 Chapter Summary

Confounding is a fundamental threat to causal inference in observational studies. Its control requires a combination of study design strategies (restriction, matching) and analytical approaches (stratification, multivariable modelling, propensity scores). The choice among methods depends on the research question, data structure, and assumptions the investigator is willing to make.

Table 13.7 — Summary: Effect of Controlling RSV on STREP-CRD

Click to review how different methods yielded similar adjusted estimates.

Table 13.7: Comparison of Confounding Control Methods

MethodOR EstimateKey Feature
Crude (unadjusted)5.44No control for RSV
Restriction (RSV− only)3.21Limits generalizability
MH Stratification3.38Transparent, stratum-specific
Mantel-Haenszel (pooled)3.38Weighted average across strata
Logistic Regression3.40Handles multiple confounders
Propensity Score~3.4Balances many covariates
External Adjustment~3.4Uses external prevalence data

All methods converge on a similar adjusted OR of approximately 3.4, down from the crude OR of 5.44. This consistency strengthens confidence that RSV confounds the STREP-CRD association and that the true effect of STREP on CRD is approximately 3-fold.

Key Takeaways — Section 4

  • Unmeasured confounders cannot be controlled directly; external adjustment and sensitivity analysis help evaluate their impact.
  • Sensitivity analysis asks how strong an unmeasured confounder must be to explain away findings — strengthening or weakening confidence in results.
  • Eight types of extraneous variable relationships exist, and only some represent true confounding.
  • Intervening variables should generally not be controlled for; doing so obscures the causal pathway.
  • Effect modification is a biological phenomenon to report, not a bias to remove.
  • Multiple methods for controlling confounding typically yield similar results when applied correctly.

Section 4 Quiz

Answer all questions correctly to continue.

Question 1: In sensitivity analysis for unmeasured confounding, what is the primary goal?

Sensitivity analysis systematically varies assumptions about the strength and prevalence of a hypothetical unmeasured confounder to determine whether a plausible confounder could account for the observed association. It does not identify or adjust for specific confounders.

Question 2: A researcher finds that controlling for variable F completely eliminates the association between exposure E and disease D. F is associated with both E and D. Which type of extraneous variable is F most likely?

When F is associated with both E and D, and controlling for F completely eliminates the E-D association, this indicates complete confounding — the entire observed E-D link was spurious, created by F causing both E and D independently (explanatory antecedent with complete confounding).

Question 3: Why should an intervening (mediating) variable generally not be controlled for in analysis?

An intervening variable lies on the causal pathway between E and D (E → F → D). Controlling for it removes the indirect effect of E on D through F, which can mask or eliminate a real causal effect. The goal is usually to estimate the total effect of E on D, which includes the pathway through the mediator.

Reflection

Consider a real-world observational study (e.g., the association between coffee consumption and heart disease). Identify at least one potential unmeasured confounder and describe how you would design a sensitivity analysis to evaluate its impact on the study conclusions.

Minimum 20 characters required.

✓ Reflection saved
Section 5 of 5

Final Assessment

⏱ Estimated time: 20 minutes

Lesson Summary

In this lesson, you have explored the concept of confounding in observational research, methods for its detection and control at both the design and analysis stages, and how to evaluate the robustness of findings to unmeasured confounding.

Core Concepts Reviewed

Section 1: Definition of confounding and the three criteria for a confounder, population vs. sample-level confounding, pre-analysis control through restriction and matching (frequency and pair matching), McNemar’s test for matched pairs (Eqs 13.2–13.3), and the risk of overmatching.

Section 2: Detection of confounding via DAGs and the change-in-estimate approach (20–30% threshold), non-collapsibility of the odds ratio, Mantel-Haenszel stratified analysis (Eqs 13.4–13.9), interaction and effect modification (additive vs. multiplicative), and when to report stratum-specific results.

Section 3: Multivariable modelling and the 30% change rule for variable inclusion, standardisation and marginal structural models, instrumental variable analysis and the exclusion restriction, propensity score methods (matching, stratification, weighting, covariate adjustment), and the region of common support.

Section 4: External adjustment for unmeasured confounders using external prevalence data (Eq 13.12), sensitivity analysis to quantify the threat of unmeasured confounding, eight types of extraneous variable relationships (exposure-independent, simple antecedent, explanatory antecedent with complete/incomplete confounding, intervening variable, distorter, suppressor, moderator), and comparison of confounding control methods.

Lesson 16 Comprehensive Assessment

This comprehensive assessment covers all material from Chapter 13. You must answer all 15 questions correctly to complete the lesson.

Question 1: Which of the following is not one of the three criteria for a variable to be a confounder?

A confounder must NOT be on the causal pathway between exposure and disease. Being an intermediate step (mediator) is the opposite of what defines a confounder. The three criteria are: (1) associated with exposure, (2) independent risk factor for disease, and (3) not on the causal pathway.

Question 2: In a case-control study using frequency matching, what is the primary purpose?

Frequency matching ensures the distribution of matching variables is similar between cases and controls overall. Unlike pair matching, it does not link individual cases to specific controls. Note that the matched variable must still be adjusted for in analysis.

Question 3: What is “overmatching” in the context of confounding control?

Overmatching occurs when you match on a variable associated with the exposure but not independently with the disease (or on an intermediary). This reduces the variability in exposure between groups without reducing confounding, leading to loss of statistical efficiency.

Question 4: A DAG shows arrows from F to E and from F to D, with no arrow from E to D. What does controlling for F reveal?

This DAG depicts complete confounding (explanatory antecedent). F causes both E and D, but E does not cause D. The observed E-D association is entirely spurious, so controlling for F eliminates it.

Question 5: The change-in-estimate approach typically considers confounding present when the crude and adjusted measures differ by more than:

The change-in-estimate approach uses a threshold of 20–30% difference between crude and adjusted measures to identify meaningful confounding. This is preferred over statistical testing because confounding is not a random phenomenon.

Question 6: In the Mantel-Haenszel method, the ORMH is calculated as:

The Mantel-Haenszel OR is a weighted average across strata, calculated as Σ(aidi/Ti) / Σ(bici/Ti), where Ti is the total for stratum i. This gives more weight to larger strata.

Question 7: What distinguishes effect modification from confounding?

Effect modification (interaction) is a real biological phenomenon where the magnitude of the exposure-disease association genuinely differs across strata of a third variable. It should be reported, not removed. Confounding is a bias due to a common cause that should be controlled.

Question 8: Which confounding control method balances many covariates simultaneously using a single composite score?

Propensity score analysis condenses multiple confounders into a single score — the probability of being exposed given observed covariates. This allows balancing many variables simultaneously, which is especially useful when confounders are numerous relative to outcomes.

Question 9: In a cohort study, restriction as a confounding control method involves:

Restriction limits enrollment to individuals who share the same level of the potential confounder (e.g., studying only non-smokers). This eliminates confounding by that variable but limits generalizability to the restricted group.

Question 10: What is the “region of common support” in propensity score analysis?

The region of common support includes propensity score values where both exposed and unexposed individuals exist. Observations outside this range lack valid comparators and must be excluded from analysis to ensure valid causal inference.

Question 11: A “suppressor” variable is one that:

A suppressor is a confounder that hides a true association. In crude analysis, the E-D association appears null or weak, but controlling for the suppressor reveals the true (stronger) association. This is confounding that creates a false negative.

Question 12: Why is non-collapsibility a concern when using the odds ratio?

Non-collapsibility means the crude (marginal) OR can differ from the weighted average of stratum-specific ORs even when there is no confounding. This is a mathematical property of the OR (unlike the risk ratio or risk difference), and the change-in-estimate approach should account for this limitation.

Question 13: An instrumental variable (IV) must satisfy which key condition?

An instrumental variable must satisfy the exclusion restriction: it affects the outcome only through its effect on the exposure, with no direct path to the outcome and no shared causes with the outcome. This allows estimation of causal effects even with unmeasured confounders.

Question 14: Sensitivity analysis for unmeasured confounding helps researchers by:

Sensitivity analysis does not identify specific confounders or prove their absence. Instead, it systematically explores how strong a hypothetical unmeasured confounder would need to be (in terms of prevalence and association with disease) to explain away the observed result — helping assess the robustness of findings.

Question 15: Which statement about confounders at the population level versus the study sample level is correct?

A variable that is a confounder in the source population may not confound in a specific study sample if, by chance or design, it is equally distributed across exposure groups. Confounding depends on the actual association between the variable and exposure in the study data.

Final Reflection

Reflecting on this entire chapter, consider a research question you find interesting. Design an observational study to address it: What confounders would you anticipate? Which control strategies (design-based and analytical) would you employ? How would you assess the robustness of your findings to unmeasured confounding?

Minimum 20 characters required.

✓ Reflection saved

Congratulations!

You have successfully completed Lesson 16: Confounding — Detection and Control.

You now understand the principles of confounding, methods for its detection and control, and how to evaluate the robustness of findings in observational research.