Introduction &
Causal Concepts
Fundamental Epidemiological Concepts and Approaches
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Trace the history of causal thinking in epidemiology
- Understand component-cause and causal-web models
- Describe the counterfactual concept for estimating causal effects
- Explain how observational studies and experiments seek causal evidence
- Distinguish inductive and deductive reasoning in science
- Identify the key components of epidemiologic research
- Apply causal criteria to evaluate associations
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
What Is Epidemiology?
⏱ Estimated reading time: 10 minutes
Learning Objectives
- Define epidemiology and explain its core purpose.
- Describe the historical evolution of causal thinking about disease.
- Recognize that epidemiology seeks to identify causal associations between exposures and outcomes.
Defining Epidemiology
Epidemiology is fundamentally about understanding the patterns, causes, and effects of health and disease in populations. Historically, epidemiologists have been concerned with identifying the "succession of events which result in the exposure of specific types of individuals to specific types of environment" — that is, the exposures and causal factors that drive disease.
Modern epidemiology aims to improve population health by integrating data from many disciplines and proposing interventions based on scientific evidence. The discipline focuses on identifying exposures — whether demographic factors, infectious agents, nutritional factors, toxins, or lifestyle elements — and evaluating their associations with health outcomes such as disease, quality of life, and mortality.
Core Insight
Epidemiology is a field-based discipline. It is only by studying exposure-disease associations under real-world conditions that we can begin to understand the web of causal relationships that affect health. The associations we find are part of a complex web of relationships involving organisms and all aspects of their environment.
A Brief History of Causal Thinking
The way we think about what causes disease has shifted dramatically over the centuries. Understanding this history helps us appreciate the complexity of modern causal models.
Key Historical Milestones
Click each card to learn more:
(~400 BC)Click to learn more
(1750–1885)Click to learn more
(mid-1800s)Click to learn more
(late 1800s)Click to learn more
(early 1900s)Click to learn more
(mid-1900s)Click to learn more
(1970s)Click to learn more
(21st Century)Click to learn more
Why the History Matters
Throughout the history of epidemiology, there has been an ongoing tension between two perspectives: one oriented toward biology and mechanisms of causation, the other toward populations and their interactions with the environment. Both are essential. Epidemiologists accept that there are multiple causes for almost every outcome and that a single cause can have multiple effects.
Key Takeaways
- Epidemiology identifies causal associations between exposures and outcomes to improve population health.
- Causal thinking has evolved from single-cause models (miasma, germ theory) to multifactorial models embracing complexity.
- Modern epidemiology integrates social, biological, and environmental factors in understanding disease.
1. What is the primary goal of epidemiology?
2. What important principle did John Snow's cholera investigation demonstrate?
3. Modern epidemiology accepts that:
✦ Pass the knowledge check with 100% to continue
Scientific Inference & Key Research Components
⏱ Estimated reading time: 12 minutes
Learning Objectives
- Distinguish between inductive and deductive reasoning.
- Explain the role of Bayesian thinking and scientific consensus in epidemiology.
- Identify the key components of epidemiologic research design.
Why Scientific Inference Matters
Epidemiology relies primarily on observational studies because many health-related problems cannot be studied under controlled laboratory conditions. Ethical concerns, practical limitations, and the complexity of real-world relationships all demand that we study humans in their natural environments. Drawing valid inferences from these studies requires both inductive and deductive reasoning.
Two Forms of Reasoning
Inductive Reasoning
Inductive reasoning involves making generalized inferences about causation based on repeated observations. You observe specific instances and draw broader conclusions.
Francis Bacon first presented inductive reasoning in 1620 as a method of making generalizations from careful observations. Classic examples include Edward Jenner's observation that milkmaids who developed cowpox didn't get smallpox — which led to the development of the smallpox vaccine. John Stuart Mill's canons (1843) formalized rules for inductive inference and helped shape our concepts of necessary and sufficient causes.
However, as David Hume noted, "there is no logical force to inductive reasoning" — we cannot perceive a causal connection, only a series of events. Repeated observations may be consistent with causation but do not prove it.
Deductive Reasoning
Deductive reasoning involves inferring that a general "law of nature" exists and testing specific hypotheses against observations to prove or refute them. This approach is closely linked to refutationism, attributed to Karl Popper.
Popper argued that scientists should not collect data to prove a hypothesis but rather should attempt to disprove it. Only by disproving hypotheses can we make scientific progress. This is why statistical analyses typically form hypotheses in the null (no association) and then attempt to refute them.
The key benefit: it helps narrow the scope of studies. We carefully review what is known and formulate a few specific, testable hypotheses rather than casting a wide net with hundreds of variables.
Bayesian Thinking & Scientific Consensus
Thomas Bayes (1764) noted that all inference is based on the validity of our premises and that no inference can be known with certainty. The information we have before making observations influences our interpretation of those observations. This gave rise to Bayesian analysis, which formally incorporates prior knowledge and updates it with new data.
Thomas Kuhn reminded us that although a single observation can disprove a hypothesis, the observation might be anomalous. Scientific communities therefore rely on consensus — paradigm shifts — when weighing the usefulness of theories, even if they cannot prove absolute truth.
Key Components of Epidemiologic Research
The overall structure of an epidemiologic study involves several interrelated components, each of which must be carefully managed to produce valid results:
Figure 1.1 — Key components of epidemiologic research. Research starts from a source population, samples a study group, measures exposures and outcomes, accounts for extraneous variables (confounders and biases), and ultimately draws causal inferences.
The Central Goal
The rationale for epidemiologic research is to identify potential causal associations between exposures and outcomes. In many instances the exposures are potential risk factors and the outcome is a disease of interest. Ultimately, we aim to make causal inferences about these relationships in the source population as a basis for developing policy and prevention programs.
Key Takeaways
- Inductive reasoning generalizes from observations; deductive reasoning tests specific hypotheses.
- Bayesian thinking incorporates prior knowledge into the interpretation of new evidence.
- Epidemiologic research involves defining a source population, sampling a study group, measuring exposures and outcomes, controlling for bias and confounding, and making causal inferences.
1. Which philosopher argued that scientists should attempt to disprove rather than prove their hypotheses?
2. Bayesian analysis is best described as:
3. Which of the following is a potential threat to validity when sampling from a source population?
✦ Pass the knowledge check with 100% to continue
Seeking Causes & Models of Causation
⏱ Estimated reading time: 15 minutes
Learning Objectives
- Define what constitutes a "cause" in epidemiology.
- Explain the component-cause model including necessary, sufficient, and component causes.
- Describe how causal complements affect the strength of association.
- Understand the causal-web model and distinguish direct from indirect causes.
What Is a "Cause"?
For practical purposes in epidemiology, a cause is any factor that produces a change in the severity or frequency of an outcome. Some causes operate at the biological level within individuals (such as a specific microorganism), while others operate at the group or population level (such as lifestyle, nutrition, or weather).
Epidemiology deals with groups of individuals because the methods for determining causality require it. Researchers take a holistic approach, striving to study and measure every suspected causal factor for the outcome of interest — while recognizing that not every factor can be captured in a single study.
Pragmatic Focus
Epidemiologists prefer to identify causal factors that can be manipulated to prevent disease. But some non-manipulable factors (like genetic predisposition) may also be crucial for understanding disease patterns in populations.
The Component-Cause Model
This foundational model is based on the concepts of necessary and sufficient causes:
A necessary cause is one without which the disease cannot occur. The factor will always be present if the disease occurs. For example, Mycobacterium tuberculosis is a necessary cause of tuberculosis — you cannot develop TB without the bacterium being present.
A sufficient cause is a set of conditions that, when present, will invariably produce the disease. In practice, very few single exposures are sufficient on their own. Instead, different groupings of factors combine to form sufficient causes.
A component cause is one of a number of factors that, in combination, constitutes a sufficient cause. The factors might be present at the same time or follow one another in a temporal chain. When there are a number of causal chains with one or more factors in common, we can conceptualize the web of causal chains as a causal web.
Example: Childhood Respiratory Disease (CRD)
Consider four risk factors for CRD: the bacterium Streptococcus pneumoniae (STREP), a virus (RSV), environmental stressors like cold weather, and other bacteria like Mycoplasma pneumoniae (MP). Different two-factor combinations of these can form sufficient causes:
| Component Causes | Sufficient Cause I | Sufficient Cause II | Sufficient Cause III | Sufficient Cause IV |
|---|---|---|---|---|
| STREP | + | + | ||
| RSV | + | + | ||
| Stressors | + | + | + | |
| Other organism (MP) | + |
Key Points from this Model
No single factor is a necessary cause of CRD (none appears in every sufficient cause). STREP is a component of 2 of the 4 sufficient causes. A child exposed to any complete combination will develop CRD. And critically, because the causal complements (the other factors in a sufficient cause) can vary in prevalence, the observed strength of association between an exposure like STREP and CRD can change even though the underlying causal mechanism has not changed.
Causal Complements and Strength of Association
A critical insight from the component-cause model is that the prevalence of causal complements — the other factors needed to complete a sufficient cause — directly affects the strength of association we observe between an exposure and outcome. Even when the causal mechanism stays the same, changes in the distribution of co-factors in the population can make the association appear stronger or weaker.
Worked Example: How Co-Factor Prevalence Matters
Imagine STREP requires RSV or Stressors as a co-factor to cause CRD. In Population A, where RSV prevalence is 30%, the risk ratio for STREP is 4.83. In Population B, where RSV prevalence rises to 70%, the risk ratio drops to 2.93 — even though the causal relationship between STREP and CRD has not changed at all.
The difference is due entirely to the change in the frequency of the co-factor RSV. This is why strength of association is not a fixed measure and is considered "population specific."
The Causal-Web Model
An alternative way to visualize how multiple factors combine to cause disease is the causal web, consisting of interconnected direct and indirect causal chains:
Direct (Proximal) Causes
A direct cause has no known intervening variable between it and the disease. Diagrammatically, the exposure is adjacent to the outcome. Examples often include specific microorganisms or toxins. However, in disease control, direct causes are not necessarily more valuable than indirect ones — many large-scale control efforts work by manipulating indirect rather than direct causes.
Indirect Causes
An indirect cause is one whose effects on the outcome are mediated through one or more intervening variables. For example, Stressors (cold weather) may make a child susceptible to STREP, RSV, and MP — so Stressors act as an indirect cause of CRD. Removing stress could reduce CRD even though stress itself is not a direct cause.
Implications of the Causal Web
The causal-web model complements the component-cause model but is not equivalent. It shows that we can control disease by preventing the action of direct causes (e.g., vaccination against RSV) or by removing indirect causes (e.g., reducing environmental stressors). The diagram also reveals gaps in our knowledge — apparent direct connections might actually reflect unmeasured intervening factors.
Proportion of Disease Explained
Using the concepts of necessary and sufficient causes, we can estimate the population attributable fraction (AFp) — the proportion of disease in the population that is attributable to a given exposure. Because component causes can appear in multiple sufficient causes, the AFp for all factors can sum to more than 100%. This is not an error; it reflects the reality of multicausal disease.
The Prevention Paradox
Even when a factor has a high AFp (say a vaccine with AFp = 50%), the benefit at the individual level may appear modest. If disease prevalence was 6%, universal vaccination would reduce it to 3%. While 94% of the vaccinated population would not have gotten the disease anyway, the 3% reduction is still a major population-level achievement. However, half of those who would have gotten sick will still get the disease despite being vaccinated. This creates a paradox: the average person may not perceive the same benefit that population-level data shows.
Key Takeaways
- A cause in epidemiology is any factor that changes disease severity or frequency.
- The component-cause model shows how different groupings of factors form sufficient causes, and why no single factor need be necessary for a disease.
- The strength of association can vary between populations even when the underlying causal mechanism is unchanged, due to differences in the prevalence of causal complements.
- The causal-web model distinguishes direct and indirect causes and guides study design and disease control strategies.
- The population attributable fraction can exceed 100% because components are shared across multiple sufficient causes.
1. In the component-cause model, a "sufficient cause" is best described as:
2. Why can the strength of association between an exposure and disease change between populations?
3. An indirect cause of disease is one that:
4. Why can the population attributable fractions for all risk factors of a disease sum to more than 100%?
✦ Pass the knowledge check with 100% to continue
The Counterfactual Concept
⏱ Estimated reading time: 12 minutes
Learning Objectives
- Define the counterfactual (potential outcomes) model for causal inference.
- Explain why counterfactual outcomes cannot be directly observed.
- Describe how randomized experiments approximate the counterfactual ideal.
- Understand the concept of confounding and exchangeability.
What Is the Counterfactual?
The counterfactual (or potential outcomes) model is currently the most widely accepted conceptual basis for determining causation in epidemiology. At its core, it asks a deceptively simple question: What would have happened to this same person if they had not been exposed?
The Thought Experiment
Imagine you want to know if a vaccine protects against a disease. You observe a vaccinated person who develops the disease. If you could rewind time and observe the same person in the same period without vaccination, and they did NOT develop the disease, you would conclude the vaccine actually caused the disease in that individual. Conversely, if they still got the disease without the vaccine, the vaccine was not the cause.
This counterfactual individual does not exist — you can never observe the same person under two different exposure levels simultaneously. But this is the ideal that our research methods try to approximate.
From Individuals to Populations
Since we cannot observe counterfactual outcomes at the individual level (each person is either exposed or not, never both simultaneously), we shift our thinking to the population level. We compare:
- p(DE+) — the potential frequency of disease if all population members were exposed
- p(DE-) — the potential frequency of disease if none were exposed
If these two quantities differ, we infer that there is a causal effect in the population.
The Role of Randomization
In a perfect experiment, we would randomly assign subjects to exposed and unexposed groups. Randomization creates exchangeability: the condition where the disease frequency in each group would not change if the groups' exposure status were switched. This means any difference in outcomes can be attributed to the exposure itself.
Why Randomization Works
When groups are exchangeable, comparing p(D|E+) and p(D|E-) gives us the closest possible estimate of the true counterfactual effect. However, in real trials, data come from two different subsets of subjects, so the estimate is approximate. The assumption is that random assignment balances all known and unknown confounders between groups.
Confounding: A Threat to Causal Inference
A confounder is a variable that is associated with both the exposure and the outcome and can distort the observed association between them. Consider a study of vaccination (E) and disease (D) where a third variable — say a pre-existing health condition (C) — independently predicts both who gets vaccinated and who gets the disease.
Confounding in Action
In Table 1.3 from the text, 20 subjects are studied. Looking at the raw data, p(D|E+) = 7/13 = 0.54 and p(D|E-) = 3/7 = 0.43, suggesting the exposure might increase disease risk. But when we stratify by the confounder C:
Among C+ subjects: p(D|E+) = 6/9 = 0.67 and p(D|E-) = 2/3 = 0.67
Among C- subjects: p(D|E+) = 1/4 = 0.25 and p(D|E-) = 1/4 = 0.25
Within each stratum, the exposure has NO effect on disease! The apparent association was entirely due to confounding by C. This is why controlling for confounders is essential in epidemiologic analysis.
Observational Studies and the Counterfactual
In observational studies, we cannot randomize. This means groups may not be exchangeable, and confounding is a major concern. Epidemiologists use several strategies to address this: restriction (limiting the study to one level of the confounder), matching, stratification, and multivariable statistical models. All of these aim to simulate the exchangeability that randomization would provide.
Reflection
Think of a research question in your area of interest. What would the ideal counterfactual comparison look like? What confounders might distort the observed association, and how might you control for them?
Minimum 20 characters required.
Key Takeaways
- The counterfactual model asks: what would have happened to the same individual under a different exposure level?
- Counterfactual outcomes cannot be directly observed; we approximate them at the population level.
- Randomized experiments create exchangeability, allowing us to estimate causal effects by comparing groups.
- Confounding occurs when a third variable distorts the exposure-outcome association; controlling for it is essential for valid causal inference.
1. The counterfactual concept asks:
2. Exchangeability in a randomized experiment means:
3. A confounder is a variable that:
✦ Complete the reflection and pass the knowledge check with 100% to continue
Lesson Review & Final Assessment
⏱ Estimated time: 15 minutes
Lesson Summary
In this lesson, you have explored the foundational concepts of epidemiologic research and causal thinking. Here is a recap of what you covered:
- Epidemiology defined: The study of exposure-outcome associations in populations, aimed at identifying causal factors to improve health through evidence-based interventions.
- History of causal thinking: From Hippocrates' environmental framework, through miasma and germ theory, to modern multifactorial models that integrate social, biological, and environmental perspectives.
- Scientific inference: Inductive reasoning (generalizing from observations), deductive reasoning (testing and disproving hypotheses), and Bayesian thinking (updating beliefs with new evidence).
- Key research components: Source populations, sampling, study design, measuring exposures and outcomes, addressing confounding and bias, and drawing causal inferences.
- Defining causes: Any factor that changes disease severity or frequency, with a preference for identifying manipulable factors for prevention.
- Component-cause model: How different groupings of component causes form sufficient causes, and why necessary causes are rare in complex diseases.
- Causal complements: Why the strength of association is population-specific and depends on the prevalence of co-factors.
- Causal-web model: Direct and indirect causes, and the strategic implications for disease prevention.
- The counterfactual concept: The gold standard for causal thinking — comparing what happened with what would have happened under different conditions.
- Confounding and exchangeability: Why controlling for confounders is essential, and how randomization approximates the counterfactual ideal.
Final Reflection
Think about a health issue you are interested in studying. Identify a potential exposure-outcome relationship and sketch out what a component-cause model might look like. What would be a direct cause versus an indirect cause? What confounders would you need to consider?
Minimum 20 characters required.
Final Knowledge Assessment
Complete the following 15-question assessment. A score of 100% is required to complete the lesson. You may retake the assessment as many times as needed.
1. The primary goal of epidemiology is to:
2. John Snow's investigation of cholera demonstrated that:
3. Karl Popper's philosophy of refutationism holds that:
4. Bayesian analysis in epidemiology:
5. In epidemiology, a "cause" is defined as:
6. A sufficient cause in the component-cause model is:
7. The strength of association between an exposure and disease can vary between populations because:
8. An indirect cause of disease is one that:
9. The counterfactual model is based on comparing:
10. Exchangeability in a randomized trial means:
11. A confounder is a variable that:
12. Selection bias occurs when:
13. The population attributable fraction (AFp) can exceed 100% because:
14. The prevention paradox refers to the fact that:
15. Which statement best reflects the overall message of this lesson?
✦ Complete the final reflection above before submitting