Confounding: Detection and Control

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Apply criteria to identify potential confounders in observational studies
Use restricted sampling and matching to prevent confounding
Implement matching in both cohort and case-control study designs
Use causal diagrams (DAGs) to identify confounders needing control
Apply stratified analysis (Mantel-Haenszel) to control confounding and assess interaction
Understand propensity scores, instrumental variables, and marginal structural models
Evaluate the potential of unmeasured confounders using sensitivity analysis
Interpret the effects of controlling different types of extraneous variables

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1 of 5

Introduction & Pre-Analysis Control of Confounding

⏱ Estimated reading time: 25 minutes

13.1 Introduction

A central focus of epidemiological research is to identify factors that contribute to the occurrence of disease. Randomised controlled trials (RCTs) provide a probabilistic basis for balancing factors between groups. However, in observational studies we cannot randomly assign exposures, so confounding is always a concern.

What Is Confounding?

Confounding can be described as the mixing together of the effects of 2 or more factors. When confounding is present, we might think we are measuring the association between an exposure and an outcome, but the observed measure also includes the effects of one or more extraneous factors. These extraneous factors that produce the bias are called confounders or confounding factors.

13.1.1 Which Extraneous Factors Are Confounders?

A factor is a confounder if:

It is a cause of the disease, or a surrogate for a cause, and
It precedes and is associated with the exposure in the source population, and
Its distribution across exposure levels cannot be determined by the exposure (i.e., it is not an intervening factor) or by the disease (i.e., it is not a result of the disease)

Important Distinction

Population confounder: known or regularly reported to be a confounder in the target population — should be controlled regardless of sample data.

Sample confounder: appears to be a confounder in the study data but may not truly be one in the population. We should not control for it unless there is substantive evidence.

Example 13.1: A Demonstration of Confounding

Investigating the relationship between Streptococcus pneumoniae (STREP) and childhood respiratory disease (CRD), with RSV (respiratory syncytial virus) as a potential confounder:

	STREP+	STREP−	OR
CRD+	240	40	3.3 (crude)
CRD−	6260	3460	3.3 (crude)

When stratified by RSV status, the stratum-specific ORs are both 2.0, while the crude OR is 3.3. The >30% difference indicates confounding by RSV is present. The stratum-specific OR of 2.0 is the best estimate of the causal association.

13.2 Control of Confounding Prior to Data Analysis

We can prevent and control confounding using three general procedures:

✂

Exclusion

Click to explore

🤝

Matching

Click to explore

📊

Analytic Control

Click to explore

13.3 Matching on Confounders

In a cohort study, matching makes the exposure independent of the matched extraneous variable so there can be no confounding. The matched variable(s) can still exert an effect on the outcome, but it has the same effect in both exposure groups.

Because the outcome (e.g., disease) has not happened at the time of matching, the matching process is independent of the outcome. No analytical control of the matched confounder is necessary, and there is no bias in the summary table.

In case-control studies, the disease has already occurred when matching takes place. Matching will actually introduce a selection bias. The stronger the exposure-confounder association, the greater the bias (generally toward the null).

This bias must be controlled by stratified or matched analysis — the matched variable(s) must be included in the analytical approach.

Overmatching

Do not match unless you are certain the variable is a confounder. Matching on a variable strongly associated with exposure but not a confounder leads to overmatching — giving the distribution of exposure in controls greater similarity to cases than in the source population, which can reduce precision.

Frequency vs. Pair Matching

Feature	Frequency Matching	Pair Matching
Method	Overall distribution made equal	Individual-level matching (1:m)
Analysis	Stratified (MH procedure)	Matched-pair analysis (McNemar’s test)
Interaction	Can assess interaction	Difficult to assess interaction
Best when	Confounder has few levels	Many variables or refined categories
Control-to-case ratio	Variable	Fixed (1:1, 1:4, etc.); minimal gain beyond 4:1

Analysing Matched Data

For pair-matched data in a case-control study with 1:1 matching, we analyse the four possible exposure patterns. Only the discordant pairs (case exposed/control unexposed, or case unexposed/control exposed) contribute information:

Eq 13.2 & 13.3 — Matched OR and McNemar’s Test

OR_match = u / v

McNemar’s χ² = (u − v)² / (u + v)

where u = pairs where case is exposed and control is not, and v = pairs where case is not exposed and control is.

Reflection

Why does matching in case-control studies introduce selection bias while matching in cohort studies does not? Think about the timing of when disease occurs relative to the matching process.

Minimum 20 characters required.

✓ Reflection saved

Section 3 of 5

Alternative Methods & Propensity Scores

⏱ Estimated reading time: 25 minutes

13.6 Multivariable Modelling

The most commonly used analytical method for controlling confounding is to include confounders in a multivariable model (e.g., logistic regression). The effect of the exposure is estimated while holding other factors constant.

Rule of Thumb

If the coefficient for a predictor changes by >30% when a putative confounder is added to the model, then substantial confounding exists. Note that the ‘adjusted’ measures from multivariable models are direct causal effects only, not total causal effects.

13.7 Other Approaches to Control Confounding

13.7.1 Standardised Risk Ratios (SRR)

Standardisation uses stratum-specific risks applied to a standard population. The SRR compares observed vs. expected number of cases:

Standardised Risk Ratio

SRR = (observed cases) / (expected cases using standard rates)

Unlike the MH estimator, the SRR provides a valid summary even in the presence of interaction, because the population of interest is specified. The SRR is a non-parametric method based on physical stratification.

13.7.2 Marginal Structural Models

The marginal structural model uses weights to create an unconfounded pseudo-population from which the causal effect can be estimated using a crude (marginal) measure.

The weight assigned to each subject is the inverse probability of treatment weight (IPTW): W_T = 1/p_E, where p_E = p(E=e|C) is the conditional probability of the observed exposure given confounders.

The total pseudo-population is twice the size of the observed population and contains information on the counterfactual outcome. The IPTW estimate is equivalent to the SRR_tot estimate.

13.7.3 Instrumental Variables

An instrumental variable (IV) Z must meet 3 requirements:

It has a direct causal effect on the exposure E
It is unrelated to the outcome D except through E
It shares no common causes with the outcome

The true causal effect (TCE) is estimated as:

True Causal Effect via IV

TCE = [p(D+|Z=1) − p(D+|Z=0)] / [p(E+|Z=1) − p(E+|Z=0)]

The key advantage: we do not need to condition on confounders C. The IV approach bypasses confounding entirely. However, finding a valid IV in observational studies is very difficult.

13.8 Propensity Scores

A propensity score (PS) is the conditional probability of being treated/exposed given measured covariates: p(E+|C). Propensity scores condense multiple confounders into a single scalar summary.

13.8.1 Computing Propensity Scores

With 1–2 categorical confounders, PSs can be calculated manually. With more confounders, use a logit or probit model predicting treatment (exposure) allocation as the outcome. Include all potential confounders (known or suspected) and their interactions.

13.8.2 Balancing Exposure Groups

A study is balanced if: (1) the average PS value is the same in exposed and non-exposed within each PS stratum, and (2) the mean of all covariates making up the PS is equal across groups within each stratum.

Analysis is limited to the region of common support — observations falling in the range of PSs that includes both exposed and non-exposed individuals.

13.8.3–6 Using Propensity Scores

PSs can be used in four ways:

Method	Description
Matching	Match exposed to non-exposed with similar PSs. Methods: nearest-neighbour, radius, kernel matching
Stratification	Divide into PS strata (blocks); compute att within each stratum and pool
Covariate in model	Include PS as a continuous or categorical variable in the regression model
Weighting (IPTW)	Weight observations by inverse of PS to create pseudo-population

The most common effect measure with PS methods is the average treatment effect in the treated (att): the difference in outcome between treated (exposed) and non-treated (non-exposed) groups.

Reflection

Compare the propensity score approach to traditional multivariable regression for controlling confounding. In what situations might propensity scores be preferable, and what are their limitations?

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

Unmeasured Confounders & Causal Relationships

⏱ Estimated reading time: 25 minutes

13.9 Unmeasured / Unknown Confounders

The Hidden Threat: All methods discussed so far — restriction, matching, stratification, multivariable modelling, propensity scores — require that confounders be measured. But what if a critical confounder was never collected, or is entirely unknown? Residual confounding from unmeasured variables is one of the most important limitations of observational research.

External Adjustment

When a confounder was not measured in the study but information about its distribution exists from external sources, we can estimate what the adjusted measure would have been using external adjustment. The method works by estimating the cell values that would be expected if the confounder had been measured.

Equation 13.12 — External Adjustment Cell Estimation
For a 2×2 table stratified by an unmeasured binary confounder Z:

a₁ = a × p₁ b₁ = b × p₂ c₁ = a − a₁ d₁ = b − b₁

where p₁ = prevalence of Z among exposed cases,
p₂ = prevalence of Z among unexposed cases

The Mantel-Haenszel OR is then calculated from these estimated strata.

Example 13.13 — External Adjustment for STREP-CRD

Click to explore how external data on RSV prevalence can be used to estimate adjusted OR when RSV was not directly measured in the study.

Example 13.13: External Adjustment

In our STREP-CRD study, suppose RSV status was not measured. From external data we know:

Among exposed (STREP+) cases: 40% have RSV → p₁ = 0.40
Among unexposed (STREP−) cases: 10% have RSV → p₂ = 0.10

Using the crude data (a = 70, b = 30, c = 90, d = 210):

Stratum	a	b	c	d
RSV+ (estimated)	28	3	42	27
RSV− (estimated)	42	27	48	183

The MH OR from these estimated strata approximates the adjusted OR, illustrating how external information can help address unmeasured confounding — though with important assumptions about the accuracy of the external prevalence data.

Sensitivity Analysis

When no external data are available, sensitivity analysis explores how strong an unmeasured confounder would need to be to explain away an observed association. This does not eliminate confounding but quantifies the threat it poses to the study’s conclusions.

Key Question: “Could an unmeasured confounder plausibly be strong enough to account for the observed association?” If the required confounder-disease association or confounder-exposure prevalence difference is implausibly large, the finding is more robust.

Example 13.14 — Sensitivity Analysis for Unmeasured Confounding

Click to see how varying assumptions about an unmeasured confounder’s strength affects the adjusted estimate.

Example 13.14: Sensitivity Analysis

Suppose we observe a crude OR = 5.44 for the STREP-CRD association. We suspect an unmeasured confounder Z might exist.

We systematically vary two parameters:

Prevalence difference of Z between exposed and unexposed groups
Strength of Z-disease association (OR_ZD)

OR_ZD	p₁=0.4, p₂=0.1	p₁=0.6, p₂=0.1	p₁=0.8, p₂=0.1
2.0	4.68	4.07	3.44
5.0	3.51	2.50	1.64
10.0	2.67	1.63	0.91

Even with a moderately strong unmeasured confounder (OR_ZD = 5, prevalence difference of 30%), the adjusted OR remains above 2.5 — suggesting the STREP-CRD association is reasonably robust to unmeasured confounding.

Reflect: Unmeasured Confounding

Think about a published observational study you have encountered (or one from class). What unmeasured confounders might threaten its conclusions? How could sensitivity analysis help evaluate the robustness of its findings?

13.10 Understanding Causal Relationships with Extraneous Variables

The relationship between exposure (E), disease (D), and an extraneous variable (F) can take many forms. Understanding these patterns is critical for correctly interpreting what happens when you “control for” a variable.

Three Statistical Indicators: For each type of E-F-D relationship, we can predict:
1. Whether E-D association changes after controlling for F
2. Whether there is an F-D association
3. Whether there is an E-F association

Eight Types of Extraneous Variable Relationships

1. Exposure-Independent Variable

Click to reveal

F → D (no E-F link)

F causes D independently of E. Controlling for F does not change the E-D measure. There is an F-D association but no E-F association. F is not a confounder.

2. Simple Antecedent

Click to reveal

F → E → D

F causes E, which causes D. Controlling for F does not change the E-D measure. There is an F-D association and an E-F association. F is not a confounder — it acts through E.

3. Explanatory Antecedent (Complete)

Click to reveal

F → E and F → D (no E→D)

F causes both E and D, but E does not cause D. Controlling for F eliminates the E-D association. This is complete confounding — the entire observed E-D link is spurious.

4. Explanatory Antecedent (Incomplete)

Click to reveal

F → E and F → D and E → D

F causes both E and D, but E also independently causes D. Controlling for F changes but does not eliminate the E-D association. This is partial confounding — the classic confounder scenario.

5. Intervening (Mediating) Variable

Click to reveal

E → F → D

E causes F, which causes D (F is on the causal pathway). Controlling for F reduces or eliminates the E-D association. F should generally not be controlled for, as it would mask E’s true effect.

6. Distorter

Click to reveal

F distorts a null E-D relationship

There is no true E-D association, but F creates a spurious one. Crude analysis shows E-D association; controlling for F reveals the null. Both F-D and E-F associations exist. A distorter is a confounder that creates a false positive.

7. Suppressor

Click to reveal

F suppresses a true E-D relationship

A true E-D association exists but is hidden in crude analysis because F masks it. Controlling for F reveals or strengthens the E-D association. A suppressor is a confounder that creates a false negative.

8. Moderator (Effect Modifier)

Click to reveal

F modifies the E → D effect

F changes the magnitude of the E-D association across its strata. Controlling for F reveals different stratum-specific measures. Effect modification is a biological phenomenon, not a bias — stratum-specific results should be reported separately.

Decision Guide: What Happens When You Control for F?

Type	E-D changes?	F-D assoc?	E-F assoc?	Confounder?
Exposure-independent	No	Yes	No	No
Simple antecedent	No	Yes	Yes	No
Explanatory (complete)	Yes → null	Yes	Yes	Yes
Explanatory (incomplete)	Yes → attenuated	Yes	Yes	Yes
Intervening variable	Yes → reduced	Yes	Yes	No*
Distorter	Yes → null	Yes	Yes	Yes
Suppressor	Yes → stronger	Yes	Yes	Yes
Moderator	Varies by stratum	May vary	May vary	No**

*Controlling for an intervening variable is usually inappropriate.
**Effect modification is a biological phenomenon, not bias.

13.11 Chapter Summary

Confounding is a fundamental threat to causal inference in observational studies. Its control requires a combination of study design strategies (restriction, matching) and analytical approaches (stratification, multivariable modelling, propensity scores). The choice among methods depends on the research question, data structure, and assumptions the investigator is willing to make.

Table 13.7 — Summary: Effect of Controlling RSV on STREP-CRD

Click to review how different methods yielded similar adjusted estimates.

Table 13.7: Comparison of Confounding Control Methods

Method	OR Estimate	Key Feature
Crude (unadjusted)	5.44	No control for RSV
Restriction (RSV− only)	3.21	Limits generalizability
MH Stratification	3.38	Transparent, stratum-specific
Mantel-Haenszel (pooled)	3.38	Weighted average across strata
Logistic Regression	3.40	Handles multiple confounders
Propensity Score	~3.4	Balances many covariates
External Adjustment	~3.4	Uses external prevalence data

All methods converge on a similar adjusted OR of approximately 3.4, down from the crude OR of 5.44. This consistency strengthens confidence that RSV confounds the STREP-CRD association and that the true effect of STREP on CRD is approximately 3-fold.

Key Takeaways — Section 4

Unmeasured confounders cannot be controlled directly; external adjustment and sensitivity analysis help evaluate their impact.
Sensitivity analysis asks how strong an unmeasured confounder must be to explain away findings — strengthening or weakening confidence in results.
Eight types of extraneous variable relationships exist, and only some represent true confounding.
Intervening variables should generally not be controlled for; doing so obscures the causal pathway.
Effect modification is a biological phenomenon to report, not a bias to remove.
Multiple methods for controlling confounding typically yield similar results when applied correctly.

Reflection

Consider a real-world observational study (e.g., the association between coffee consumption and heart disease). Identify at least one potential unmeasured confounder and describe how you would design a sensitivity analysis to evaluate its impact on the study conclusions.

Minimum 20 characters required.

✓ Reflection saved

HSCI 341 — Lesson 16

Fundamental Epidemiological Concepts and Approaches

Confounding: Detection and Control

Learning objectives for this lesson:

Introduction & Pre-Analysis Control of Confounding

13.1 Introduction

13.1.1 Which Extraneous Factors Are Confounders?

13.2 Control of Confounding Prior to Data Analysis

13.3 Matching on Confounders

Frequency vs. Pair Matching

Analysing Matched Data

Reflection

Knowledge Check: Section 1

Detection of Confounding & Stratified Analysis

13.4 Detection of Confounding

13.4.1 Using Causal Diagrams (DAGs)

13.4.2 Change in Measure of Association

Non-Collapsibility of Odds Ratios

13.5 Analytic Control: The Mantel-Haenszel Estimator

Key Formulae

13.5.2 Interaction

Reflection

Knowledge Check: Section 2

Alternative Methods & Propensity Scores

13.6 Multivariable Modelling

13.7 Other Approaches to Control Confounding

13.7.1 Standardised Risk Ratios (SRR)

13.7.2 Marginal Structural Models

13.7.3 Instrumental Variables

13.8 Propensity Scores

Reflection

Knowledge Check: Section 3

Unmeasured Confounders & Causal Relationships

13.9 Unmeasured / Unknown Confounders

External Adjustment

Example 13.13 — External Adjustment for STREP-CRD

Example 13.13: External Adjustment

Sensitivity Analysis

Example 13.14 — Sensitivity Analysis for Unmeasured Confounding

Example 13.14: Sensitivity Analysis

Reflect: Unmeasured Confounding

13.10 Understanding Causal Relationships with Extraneous Variables

Eight Types of Extraneous Variable Relationships

1. Exposure-Independent Variable

2. Simple Antecedent

3. Explanatory Antecedent (Complete)

4. Explanatory Antecedent (Incomplete)

5. Intervening (Mediating) Variable

6. Distorter

7. Suppressor

8. Moderator (Effect Modifier)

Decision Guide: What Happens When You Control for F?

13.11 Chapter Summary

Table 13.7 — Summary: Effect of Controlling RSV on STREP-CRD

Table 13.7: Comparison of Confounding Control Methods

Key Takeaways — Section 4

Section 4 Quiz

Reflection

Final Assessment

Lesson Summary

Core Concepts Reviewed

Lesson 16 Comprehensive Assessment

Final Reflection

Congratulations!