Measures of
Association

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Calculate and interpret the risk ratio, incidence rate ratio, and odds ratio
Compute risk difference, attributable fraction (exposed), and population attributable measures
Understand when to use each measure of association
Correctly distinguish between strength of association and statistical significance
Understand the basis for hypothesis tests and confidence intervals

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Introduction & Ratio Measures of Association

⏱ Estimated reading time: 15 minutes

Learning Objectives

Explain why measures of association are used in epidemiology.
Set up incidence risk and incidence rate data in 2×2 tables.
Calculate and interpret the risk ratio (RR), incidence rate ratio (IR), and odds ratio (OR).
Describe the relationships among RR, IR, and OR.

Why Measure Association?

Measures of association assess the magnitude of the relationship between an exposure (a potential cause) and a disease. Unlike measures of statistical significance, which are heavily dependent on sample size, measures of association indicate the strength of the effect — how much more (or less) likely disease is in exposed compared to non-exposed groups.

Strength vs. Significance

A measure of association tells you how strongly an exposure is linked to disease. A P-value tells you how likely the observed data would be under the null hypothesis of no association. A strong association can be non-significant (small sample), and a weak association can be highly significant (large sample). Always report both.

Data Layout

Depending on study design, disease frequency can be expressed as incidence risk, incidence rate, prevalence, or odds. For risk data, the standard 2×2 table is:

	Exposed	Non-exposed	Total
Diseased	a₁	a₀	m₁
Non-diseased	b₁	b₀	m₀
Total	n₁	n₀	n

For rate data, the denominator is person-time at risk rather than the number of individuals:

	Exposed	Non-exposed	Total
Number of cases	a₁	a₀	m₁
Person-time at risk	t₁	t₀	t

Three Ratio Measures of Association

Click each card to learn more:

Risk Ratio (RR)Click to learn more

Incidence Rate Ratio (IR)Click to learn more

Odds Ratio (OR)Click to learn more

Worked Example: Brazil Water Cistern Study

Diarrhea & Water Cistern Presence

	Water Cistern	No Cistern	Total
Diarrhea Present	194	303	497
Diarrhea Absent	1,588	1,314	2,902
Total	1,782	1,617	3,399

RR = (194/1782) / (303/1617) = 0.109 / 0.187 = 0.58
OR = (194 × 1314) / (303 × 1588) = 0.53

Both measures indicate that having a water cistern is protective against diarrhea (values < 1). The RR of 0.58 means the risk is 42% lower among those with cisterns.

Worked Example: Migraine Incidence Rates

Gender and Migraine (Ages 30–40)

	Female	Male	Total
Cases of migraine	131	44	175
Person-months	250	236	486

IR = (131/250) / (44/236) = 0.524 / 0.186 = 2.81

The rate of migraine is 2.81 times higher in females than males aged 30–40.

Relationships Among RR, IR, and OR

In general, IR values are further from the null (1) than RR values, and OR values are even further away. This can be visualised on a number line:

Figure 6.1 — General relationships among RR, IR, and OR. OR is always furthest from the null value of 1.

When is OR ≈ RR?▼

When the disease is rare (prevalence or incidence risk < 5%), OR approximates RR. This is because when a₁ is small relative to n₁, the denominator of the odds (b₁) is approximately equal to n₁, and similarly for the non-exposed group. In the cistern example, the overall risk was 14.6%, so OR (0.53) was more extreme than RR (0.58).

When is RR ≈ IR?▼

RR and IR will be close to each other if the exposure has a negligible impact on the total time at risk in the study population. This occurs when the disease is rare or when IR is close to the null value (IR ≈ 1).

OR as an Estimator of IR▼

OR is a good estimator of IR under certain conditions in case-control studies. If controls are selected using cumulative or risk-based sampling (all non-cases after cases have occurred), then OR estimates IR only if the disease is rare. If controls are selected using density sampling (a control selected from non-cases each time a case occurs), then OR is a direct estimate of IR regardless of disease rarity.

Key Takeaways

Measures of association quantify the strength of the exposure-disease relationship, unlike P-values which reflect sample size.
RR compares risks, IR compares incidence rates, and OR compares odds between exposed and non-exposed groups.
OR is the only measure that can be computed from case-control studies due to its symmetry property.
When disease is rare (<5%), OR ≈ RR. IR values are further from the null than RR, and OR values further still.

✦ Pass the knowledge check with 100% to continue

Section 2

Measures of Effect in the Exposed Group

⏱ Estimated reading time: 15 minutes

Learning Objectives

Distinguish between “ratio” (relative) and “difference” (absolute) measures of association.
Calculate and interpret the risk difference (RD) and incidence rate difference (ID).
Calculate and interpret the attributable fraction in the exposed (AF_e).
Explain the concept of vaccine efficacy as a special case of AF_e.

Ratio vs. Difference Measures

The ratio measures from Section 1 (RR, IR, OR) tell us the relative strength of association, but they do not indicate the absolute number of cases attributable to the exposure. Difference (absolute effect) measures address this gap by computing how many additional cases occur because of the exposure.

Why Both Matter

Even when an exposure is very strongly associated with disease (high RR), if the exposure is rare in a population, it may contribute very few cases. Conversely, a relatively weak risk factor (modest RR) that is common can be responsible for many cases. Difference measures capture this “public health impact.”

Risk Difference (RD) — Attributable Risk

The risk difference (RD), also called attributable risk, is simply the risk in the exposed group minus the risk in the non-exposed group:

RD = p(D+|E+) − p(D+|E−) = (a₁/n₁) − (a₀/n₀) Eq 6.5

Similarly, the incidence rate difference (ID) is the difference between two incidence rates:

ID = (a₁/t₁) − (a₀/t₀) Eq 6.6

Interpretation of Difference Measures

RD or ID < 0 → Exposure is protective
RD or ID = 0 → No effect of exposure
RD or ID > 0 → Exposure is positively associated with disease

RD indicates the increase (or decrease) in the probability of disease in the exposed group, beyond the baseline risk. It tells you: “For every X exposed individuals, how many additional cases occur because of the exposure?”

Example: Smoking & Low Birth Weight

From a cohort of 5,000 women followed through pregnancy:

	Smoker	Non-smoker	Total
Low birth weight	40	331	371
Normal birth weight	311	4,318	4,629
Total	351	4,649	5,000

Risk in exposed: R_E+ = 40/351 = 0.114
Risk in non-exposed: R_E− = 331/4649 = 0.071
RD = 0.114 − 0.071 = 0.043

For every 100 women who smoked, approximately 4.3 had a low-birth-weight baby due to the fact that they smoked (assuming causal relationship).

Attributable Fraction in the Exposed (AF_e)

The AF_e expresses the proportion of disease in exposed individuals that is due to the exposure, assuming the relationship is causal. It can be viewed as the proportion of disease in the exposed group that would be avoided if the exposure were removed.

AF_e = RD / p(D+|E+) = (RR − 1) / RR ≈ (OR − 1) / OR Eq 6.7

AF_e ranges from 0 (where risk is equal, RR = 1) to 1 (where all disease in the exposed group is due to the exposure, RR = ∞). In case-control studies, AF_e can be approximated by substituting OR for RR.

Worked Example: AF_e for Smoking

From the smoking example above:

RR = 0.114 / 0.071 = 1.60
AF_e = (1.60 − 1) / 1.60 = 0.60 / 1.60 = 0.375 (37.5%)

Among women who smoked, 37.5% of the low-birth-weight cases were attributable to smoking. Alternatively: 0.043 / 0.114 = 0.377 ≈ 37.7% (slight rounding difference).

Vaccine Efficacy

Vaccine efficacy is a special form of AF_e, where “not vaccinated” is the exposure (factor positive) and “vaccinated” is the comparison group. If 20% of unvaccinated individuals develop disease versus 5% of vaccinated individuals:

Vaccine Efficacy Calculation

RD = 0.20 − 0.05 = 0.15
AF_e = 0.15 / 0.20 = 0.75 (75%)

The vaccine has prevented 75% of the cases of disease that would have occurred in the vaccinated group if the vaccine had not been used.

AF_e vs. Etiologic Fraction

The etiologic fraction is the proportion of cases in the exposed group for which exposure was a component of the sufficient cause. While AF_e measures the excess fraction, the etiologic fraction can be higher because exposure may contribute to cases even when the baseline risk would have produced them eventually. In general, AF_e provides a lower bound for the etiologic fraction.

Reflection

In a cohort study, a new environmental pollutant is found to have an RR of 3.0 for respiratory disease. The risk of respiratory disease in the non-exposed population is 2%. Calculate the RD and AF_e. If 1,000 people are exposed, how many additional cases would you expect due to the exposure? Discuss why RD and AF_e give different but complementary perspectives.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

RD (attributable risk) measures the absolute increase in risk due to exposure; the null value is 0.
AF_e = (RR − 1)/RR gives the proportion of disease in the exposed that is due to the exposure.
Vaccine efficacy is a special case of AF_e where the “exposure” is being unvaccinated.
AF_e provides a lower bound for the etiologic fraction.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 3

Population-Level Measures & Study Design

⏱ Estimated reading time: 12 minutes

Learning Objectives

Calculate and interpret the population attributable risk (PAR) and population attributable fraction (AF_p).
Explain how the prevalence of exposure affects population-level measures.
Identify which measures of association can be computed from each study design.

From the Exposed Group to the Entire Population

While RD and AF_e describe the effect of exposure among exposed individuals, public health decisions often require understanding the impact of an exposure on the entire population. Two key population-level measures address this:

Population Attributable Risk (PAR)

The PAR is the increase in overall population risk attributable to the exposure. It reflects both the strength of the association and the frequency of the exposure in the population.

PAR = p(D+) − p(D+|E−) = (m₁/n) − (a₀/n₀) = RD × p(E+) Eq 6.8

Population Attributable Fraction (AF_p)

The AF_p indicates the proportion of disease in the entire population that is attributable to the exposure, and which would be avoided if the exposure were removed (assuming causation and no confounding).

AF_p = PAR / p(D+) = p(E+)(RR − 1) / [p(E+)(RR − 1) + 1] Eq 6.9

Why Exposure Prevalence Matters

A strong risk factor (high RR) that is rare in the population will have a small AF_p. A weaker risk factor (modest RR) that is common may have a large AF_p. For example, intravenous drug use has a very high RR for HIV, but if it is rare in the population, eliminating it would prevent few total cases. A modestly elevated risk factor like poor diet, affecting millions, may account for more total cases.

Worked Example: Smoking & Low Birth Weight (Population Level)

From the cohort of 5,000 women (351 smokers, 4,649 non-smokers):

Overall risk: p(D+) = 371/5000 = 0.074
Risk in non-exposed: 331/4649 = 0.071
PAR = 0.074 − 0.071 = 0.003
AF_p = 0.003 / 0.074 = 0.041 (4.1%)

Only 4.1% of all low-birth-weight babies in the population were attributable to smoking. The low AF_p is because very few women (351/5000 = 7%) smoked during the 2nd trimester, despite the relatively strong association (RR = 1.60).

Confounding and AF_p

If confounding is present, adjusted estimates of RR should be used. The AF_p can then be estimated using:

AF_p = pd × (aRR − 1) / aRR Eq 6.10

where pd is the proportion of cases exposed to the risk factor, and aRR is the adjusted risk ratio. For multiple exposure categories, a summation formula is used.

Study Design and Measures of Association

Not all measures can be computed from all study designs. The following table summarises which measures are available:

Measure	Cross-sectional	Cohort	Case-control
RR	✓	✓
IR		✓
OR	✓	✓	✓
RD	✓	✓
AF_e	✓	✓	✓^b
PAR	✓	✓^a
AF_p	✓	✓^a	✓^c

^a Requires independent estimate of p(D+) or p(E+). ^b Estimated using OR. ^c Requires OR and independent estimate of p(E+|D+).

Reflection

Consider two risk factors for a disease: Factor A has RR = 5.0 and affects 2% of the population. Factor B has RR = 1.5 and affects 40% of the population. Calculate AF_p for each factor using the formula AF_p = p(E+)(RR − 1) / [p(E+)(RR − 1) + 1]. Which factor would you prioritise in a public health intervention, and why?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

PAR = overall population risk − risk in unexposed; it reflects both strength and prevalence of exposure.
AF_p = PAR / p(D+); it gives the proportion of all disease in the population attributable to the exposure.
A common risk factor with modest RR can have a larger AF_p than a rare factor with high RR.
Different study designs support different measures: only OR is available from case-control studies.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 4

Hypothesis Testing & Confidence Intervals

⏱ Estimated reading time: 15 minutes

Learning Objectives

Explain the concepts of standard error, null hypothesis, and P-value.
Describe the four common test statistics for evaluating associations.
Interpret confidence intervals for measures of association.
Distinguish between statistical significance and the strength of association.

Standard Error

The standard error (SE) provides a measure of the precision of a point estimate — how much uncertainty exists in the estimate. For difference measures (RD, ID), the variance can be computed directly:

var(RD) = [(a₁/n₁)(1 − a₁/n₁)] / n₁ + [(a₀/n₀)(1 − a₀/n₀)] / n₀ Eq 6.13

For ratio measures (RR, IR, OR), the variance is computed on the log scale using Taylor series approximations:

var(ln RR) = 1/a₁ − 1/n₁ + 1/a₀ − 1/n₀ Eq 6.14

var(ln OR) = 1/a₁ + 1/a₀ + 1/b₁ + 1/b₀ Eq 6.15

Hypothesis Testing

Significance testing is based on specifying a null hypothesis about the population parameter. The null hypothesis typically states there is no association:

For difference measures (RD, ID): H₀: θ = 0
For ratio measures (RR, IR, OR): H₀: θ = 1

An alternative hypothesis can be 1-tailed or 2-tailed. In general, 2-tailed hypotheses are preferred because 1-tailed hypotheses are harder to justify.

Limitations of P-values

P-values are often dichotomised into “significant” or “non-significant” at α = 0.05, but this entails a huge loss of information. A P-value of 0.049 and 0.051 lead to different conclusions despite being virtually identical. Always report the actual P-value and a confidence interval, which conveys both significance and precision.

Test Statistics

Click each card to explore:

Pearson χ²Click to explore

Exact TestsClick to explore

Wald StatisticClick to explore

Likelihood Ratio TestClick to explore

Confidence Intervals

A confidence interval (CI) reflects the level of uncertainty in a point estimate. A 95% CI means that if the study were repeated many times under identical conditions, 95% of the computed CIs would contain the true parameter value.

Computing CIs

For difference measures, the CI is computed directly:

θ ± Z_α × √var(θ) Eq 6.19

For ratio measures, the CI is computed on the log scale and then exponentiated:

θ × exp(± Z_α × √var(ln θ)) Eq 6.21

The CI is symmetrical about lnθ but not about θ itself — this is why confidence intervals for ratio measures appear asymmetric.

Interpreting CIs for Measures of Association

For RR, IR, OR: if the 95% CI includes 1, the association is not statistically significant at α = 0.05.
For RD, ID: if the 95% CI includes 0, the association is not statistically significant.

However, this “surrogate significance test” is an under-use of the CI. The CI also shows the range of plausible effect sizes, which is far more informative than a binary significant/non-significant classification.

Example CIs from the Textbook

Measure	Point Estimate	95% CI
RD (smoking)	0.043	(0.009, 0.077)
RR (smoking)	1.601	(1.174, 2.182)
OR (smoking)	1.678	(1.154, 2.387)
ID (migraine)	0.338	(0.232, 0.443)
IR (migraine)	2.811	(1.983, 4.050)

None of the CIs for ratio measures include 1, and none for difference measures include 0, confirming statistical significance for all associations.

Reflection

A study reports an OR of 1.45 with a 95% CI of (0.92, 2.28). A second study reports an OR of 1.15 with a 95% CI of (1.02, 1.30). Compare these two findings in terms of: (a) strength of association, (b) statistical significance, and (c) precision. Which finding might be more concerning from a public health perspective, and why?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Standard errors quantify precision; they are computed differently for difference vs. ratio measures.
Hypothesis testing uses a null hypothesis (no effect) and a test statistic to generate a P-value.
Four common test statistics: Pearson χ², exact tests, Wald, and likelihood ratio tests.
Confidence intervals are more informative than P-values: they show the range of plausible effect sizes.
For ratio measures, the CI containing 1 (or 0 for differences) indicates non-significance at the corresponding α level.

✦ Pass the knowledge check with 100% and complete the reflection to continue

HSCI 341 — Lesson 6

Fundamental Epidemiological Concepts and Approaches

Measures ofAssociation

Learning objectives for this lesson:

Introduction & Ratio Measures of Association

Learning Objectives

Why Measure Association?

Strength vs. Significance

Data Layout

Three Ratio Measures of Association

Worked Example: Brazil Water Cistern Study

Diarrhea & Water Cistern Presence

Worked Example: Migraine Incidence Rates

Gender and Migraine (Ages 30–40)

Relationships Among RR, IR, and OR

Key Takeaways

Measures of Effect in the Exposed Group

Learning Objectives

Ratio vs. Difference Measures

Why Both Matter

Risk Difference (RD) — Attributable Risk

Interpretation of Difference Measures

Example: Smoking & Low Birth Weight

Attributable Fraction in the Exposed (AFe)

Worked Example: AFe for Smoking

Vaccine Efficacy

Vaccine Efficacy Calculation

AFe vs. Etiologic Fraction

Reflection

Key Takeaways

Population-Level Measures & Study Design

Learning Objectives

From the Exposed Group to the Entire Population

Population Attributable Risk (PAR)

Population Attributable Fraction (AFp)

Why Exposure Prevalence Matters

Worked Example: Smoking & Low Birth Weight (Population Level)

Confounding and AFp

Study Design and Measures of Association

Reflection

Key Takeaways

Hypothesis Testing & Confidence Intervals

Learning Objectives

Standard Error

Hypothesis Testing

Limitations of P-values

Test Statistics

Confidence Intervals

Computing CIs

Interpreting CIs for Measures of Association

Example CIs from the Textbook

Reflection

Key Takeaways

Final Review & Assessment

Lesson Summary

Reflection

Final Assessment

Lesson Complete!

Measures of
Association

Attributable Fraction in the Exposed (AF_e)

Worked Example: AF_e for Smoking

AF_e vs. Etiologic Fraction

Population Attributable Fraction (AF_p)

Confounding and AF_p