HSCI 341 — Lesson 7

Measures of
Association

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Calculate and interpret the risk ratio, incidence rate ratio, and odds ratio
  • Compute risk difference, attributable fraction (exposed), and population attributable measures
  • Understand when to use each measure of association
  • Correctly distinguish between strength of association and statistical significance
  • Understand the basis for hypothesis tests and confidence intervals

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Ratio Measures of Association
Risk Ratio (Relative Risk, RR) The ratio of the cumulative incidence (risk) in the exposed group to that in the unexposed group: RR = R₁ / R₀. Used in cohort studies and RCTs over a defined follow-up period.
Incidence Rate Ratio (IRR) The ratio of incidence rates (events per person-time) in the exposed and unexposed groups. Appropriate when follow-up time varies across individuals.
Odds Ratio (OR) The ratio of the odds of disease in the exposed group to the odds in the unexposed group (Cornfield, 1951; Wikipedia). The standard measure in case-control studies and the natural output of logistic regression.
Prevalence Ratio (PR) The ratio of prevalence in the exposed group to prevalence in the unexposed group. Preferred over the prevalence odds ratio when the outcome is not rare in cross-sectional analyses.
Hazard Ratio (HR) The ratio of instantaneous event rates (hazards) between groups, typically from Cox proportional hazards regression (Wikipedia). Approximates the rate ratio under proportional hazards.
Rare-Disease Assumption When disease prevalence is low (commonly < 10%), the odds ratio approximates the risk ratio. Important for interpreting case-control results.
Difference Measures of Association
Risk Difference (RD, Attributable Risk) The absolute difference in risk between exposed and unexposed groups: RD = R₁ − R₀. Captures the public-health impact in absolute terms.
Rate Difference The absolute difference in incidence rates between exposed and unexposed groups, in events per person-time.
Attributable Fraction in the Exposed (AFe) Among the exposed, the proportion of disease attributable to the exposure: AFe = (R₁ − R₀) / R₁ = (RR − 1) / RR.
Population Attributable Risk (PAR) The excess risk in the total population attributable to the exposure: PAR = Rpop − R₀. Reflects both effect size and exposure prevalence.
Population Attributable Fraction (PAF) The proportion of disease in the total population attributable to the exposure (Northridge, 1995; Wikipedia). Useful for prioritising public-health interventions.
Number Needed to Treat (NNT) 1 / |risk difference| — the average number of patients who must receive a beneficial intervention for one to avoid the bad outcome (Laupacis, Sackett, & Roberts, 1988; Wikipedia).
Number Needed to Harm (NNH) 1 / |risk difference| for harmful exposures — the average number exposed for one additional case of harm.
Inference & Interpretation
Null Hypothesis (H₀) A statement of no effect or no difference (e.g., RR = 1, RD = 0). The hypothesis tested by significance tests.
p-Value The probability of observing data as extreme or more extreme than the observed, assuming the null hypothesis is true. Not the probability that the null is true.
Confidence Interval A range of values consistent with the data at a chosen confidence level. For a ratio measure, an interval that excludes 1 implies statistical significance at the corresponding level.
Strength of Association vs. Statistical Significance The size of an effect (e.g., RR) is conceptually distinct from how confident we are that it differs from chance (p-value, CI width). Large samples can yield significant findings for trivial effects, and vice versa.
Effect Modification (Interaction) When the magnitude of an exposure-outcome association differs across levels of a third variable. Reported as stratum-specific estimates — not adjusted away. Additive and multiplicative scales can give different interaction conclusions; reporting both is recommended (Knol & VanderWeele, 2012; VanderWeele & Knol, 2014).
No matching entries. Try a different search term.
Section 1

Introduction & Ratio Measures of Association

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Lesson 5 gave us measures of disease frequency — prevalence, incidence, risk, rate. Lesson 6 took the same probabilistic vocabulary and applied it at the level of a single test. Lesson 7 brings these strands together: it combines the disease-frequency vocabulary with the 2×2 contingency logic to produce measures of association — the quantitative comparison between exposed and unexposed groups that is the central output of analytic epidemiology. The four content sections build up from the three ratio measures (Section 1: risk ratio, rate ratio, odds ratio), through difference measures and exposed-group attributable fractions (Section 2), to population-level attributable measures and how each measure relates to study design (Section 3), and finally to the hypothesis-testing and confidence-interval machinery that turns each of these point estimates into a defensible inference (Section 4).

Learning Objectives

  • Explain why measures of association are used in epidemiology.
  • Set up incidence risk and incidence rate data in 2×2 tables.
  • Calculate and interpret the risk ratio (RR), incidence rate ratio (IR), and odds ratio (OR).
  • Describe the relationships among RR, IR, and OR.

Why Measure Association?

Measures of association assess the magnitude of the relationship between an exposure (a potential cause) and a disease. Unlike measures of statistical significance, which are heavily dependent on sample size, measures of association indicate the strength of the effect — how much more (or less) likely disease is in exposed compared to non-exposed groups (Tripepi et al., 2007).

Strength vs. Significance

A measure of association tells you how strongly an exposure is linked to disease. A P-value tells you how likely the observed data would be under the null hypothesis of no association. A strong association can be non-significant (small sample), and a weak association can be highly significant (large sample). Always report both.

Data Layout

Depending on study design, disease frequency can be expressed as incidence risk, incidence rate, prevalence, or odds. For risk data, the standard 2×2 table is:

ExposedNon-exposedTotal
Diseaseda1a0m1
Non-diseasedb1b0m0
Totaln1n0n

For rate data, the denominator is person-time at risk rather than the number of individuals:

ExposedNon-exposedTotal
Number of casesa1a0m1
Person-time at riskt1t0t

Three Ratio Measures of Association

Click each card to learn more:

Risk Ratio (RR)Click to learn more
Incidence Rate Ratio (IR)Click to learn more
Odds Ratio (OR)Click to learn more

Worked Example: Brazil Water Cistern Study

Diarrhea & Water Cistern Presence

Water CisternNo CisternTotal
Diarrhea Present194303497
Diarrhea Absent1,5881,3142,902
Total1,7821,6173,399
  • RR = (194/1782) / (303/1617) = 0.109 / 0.187 = 0.58
  • OR = (194 × 1314) / (303 × 1588) = 0.53

Both measures indicate that having a water cistern is protective against diarrhea (values < 1). The RR of 0.58 means the risk is 42% lower among those with cisterns.

Worked Example: Migraine Incidence Rates

Gender and Migraine (Ages 30–40)

FemaleMaleTotal
Cases of migraine13144175
Person-months250236486

IR = (131/250) / (44/236) = 0.524 / 0.186 = 2.81

The rate of migraine is 2.81 times higher in females than males aged 30–40.

Relationships Among RR, IR, and OR

In general, IR values are further from the null (1) than RR values, and OR values are even further away (Cornfield, 1951; Knol et al., 2008). This can be visualised on a number line:

1 (null) 0 OR IR RR RR IR OR

Figure 6.1 — General relationships among RR, IR, and OR. OR is always furthest from the null value of 1.

When is OR ≈ RR?

When the disease is rare (prevalence or incidence risk < 5%), OR approximates RR — the classic Cornfield (1951) approximation used by Doll and Hill (1950) in their landmark case-control study of smoking and lung cancer. This is because when a1 is small relative to n1, the denominator of the odds (b1) is approximately equal to n1, and similarly for the non-exposed group. In the cistern example, the overall risk was 14.6%, so OR (0.53) was more extreme than RR (0.58); when outcomes are not rare, treating OR as RR can substantially overstate the effect (Knol et al., 2008).

When is RR ≈ IR?

RR and IR will be close to each other if the exposure has a negligible impact on the total time at risk in the study population. This occurs when the disease is rare or when IR is close to the null value (IR ≈ 1).

OR as an Estimator of IR

OR is a good estimator of IR under certain conditions in case-control studies. If controls are selected using cumulative or risk-based sampling (all non-cases after cases have occurred), then OR estimates IR only if the disease is rare. If controls are selected using density sampling (a control selected from non-cases each time a case occurs), then OR is a direct estimate of IR regardless of disease rarity.

⚖ Interactive: Risk Ratio vs. Odds Ratio

Edit any cell of the 2×2 (or use the slider for outcome prevalence). Watch how OR ≈ RR when the outcome is rare, but the two diverge dramatically as the outcome becomes common — OR always overstates RR when RR > 1, and understates it when RR < 1.

2×2 table (click cells to edit)
Y+Y−Total
E+4060100
E−2080100
Scales the table to give this overall prevalence (preserving RR).
Presets:
RR vs. OR as outcome prevalence climbs
Risk in E+
Risk in E−
Outcome prevalence
Risk Ratio
Odds Ratio
OR / RR
Try the Common outcome preset: a true RR of 2.0 produces an OR around 3.0+. Reporting the OR as if it were a "risk ratio" overstates the harm by 50%. The "rare disease assumption" is what justifies treating OR as RR — ignore it at your peril.

Key Takeaways

  • Measures of association quantify the strength of the exposure-disease relationship, unlike P-values which reflect sample size.
  • RR compares risks, IR compares incidence rates, and OR compares odds between exposed and non-exposed groups.
  • OR is the only measure that can be computed from case-control studies due to its symmetry property.
  • When disease is rare (<5%), OR ≈ RR. IR values are further from the null than RR, and OR values further still.
Knowledge Check — Section 1

1. The odds ratio (OR) is the only ratio measure of association applicable to case-control studies because:

The OR exhibits symmetry: (a1×b0)/(a0×b1) is the same regardless of whether you view it as odds of disease or odds of exposure. In case-control studies, the investigator sets the number of cases and controls, making RR incalculable, but OR remains valid.

2. A risk ratio of 0.58 for diarrhea in a cistern study indicates:

An RR of 0.58 means the risk in the exposed group is 58% of the risk in the non-exposed group, which is a 42% reduction (1 − 0.58 = 0.42). Since RR < 1, the exposure (cistern) is protective.

3. Under what condition does OR best approximate RR?

When disease is rare, the number of cases (a) is small relative to the total (n), so odds and risk become approximately equal. Under density sampling, OR estimates IR regardless of rarity.

✦ Pass the knowledge check with 100% to continue

Section 2

Measures of Effect in the Exposed Group

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Section 1 covered the three ratio measures (RR, IR, OR) — how many times more likely disease is in the exposed group compared to the unexposed. Section 2 turns to the parallel set of difference measures, which answer a different question: not how many times more, but how many extra cases occur because of the exposure. Difference measures lead naturally to attributable fractions in the exposed, which quantify how much disease in the exposed group can be attributed to the exposure itself.

Learning Objectives

  • Distinguish between “ratio” (relative) and “difference” (absolute) measures of association.
  • Calculate and interpret the risk difference (RD) and incidence rate difference (ID).
  • Calculate and interpret the attributable fraction in the exposed (AFe).
  • Explain the concept of vaccine efficacy as a special case of AFe.

Ratio vs. Difference Measures

The ratio measures from Section 1 (RR, IR, OR) tell us the relative strength of association, but they do not indicate the absolute number of cases attributable to the exposure. Difference (absolute effect) measures address this gap by computing how many additional cases occur because of the exposure; the choice between ratio and difference measures is itself a substantive scientific decision rather than a statistical convenience (Greenland & Pearce, 2015; Tripepi et al., 2007).

Why Both Matter

Even when an exposure is very strongly associated with disease (high RR), if the exposure is rare in a population, it may contribute very few cases. Conversely, a relatively weak risk factor (modest RR) that is common can be responsible for many cases. Difference measures capture this “public health impact.”

Risk Difference (RD) — Attributable Risk

The risk difference (RD), also called attributable risk (Wikipedia; Walter, 1976), is simply the risk in the exposed group minus the risk in the non-exposed group:

RD = p(D+|E+) − p(D+|E−) = (a1/n1) − (a0/n0) Eq 6.5

Similarly, the incidence rate difference (ID) is the difference between two incidence rates:

ID = (a1/t1) − (a0/t0) Eq 6.6

Interpretation of Difference Measures

  • RD or ID < 0 → Exposure is protective
  • RD or ID = 0 → No effect of exposure
  • RD or ID > 0 → Exposure is positively associated with disease

RD indicates the increase (or decrease) in the probability of disease in the exposed group, beyond the baseline risk. It tells you: “For every X exposed individuals, how many additional cases occur because of the exposure?”

Example: Smoking & Low Birth Weight

From a cohort of 5,000 women followed through pregnancy:

SmokerNon-smokerTotal
Low birth weight40331371
Normal birth weight3114,3184,629
Total3514,6495,000
  • Risk in exposed: RE+ = 40/351 = 0.114
  • Risk in non-exposed: RE− = 331/4649 = 0.071
  • RD = 0.114 − 0.071 = 0.043

For every 100 women who smoked, approximately 4.3 had a low-birth-weight baby due to the fact that they smoked (assuming causal relationship).

Attributable Fraction in the Exposed (AFe)

The AFe (also called the attributable fraction among the exposed) expresses the proportion of disease in exposed individuals that is due to the exposure, assuming the relationship is causal (Walter, 1976). It can be viewed as the proportion of disease in the exposed group that would be avoided if the exposure were removed.

AFe = RD / p(D+|E+) = (RR − 1) / RR ≈ (OR − 1) / OR Eq 6.7

AFe ranges from 0 (where risk is equal, RR = 1) to 1 (where all disease in the exposed group is due to the exposure, RR = ∞). In case-control studies, AFe can be approximated by substituting OR for RR.

Worked Example: AFe for Smoking

From the smoking example above:

  • RR = 0.114 / 0.071 = 1.60
  • AFe = (1.60 − 1) / 1.60 = 0.60 / 1.60 = 0.375 (37.5%)

Among women who smoked, 37.5% of the low-birth-weight cases were attributable to smoking. Alternatively: 0.043 / 0.114 = 0.377 ≈ 37.7% (slight rounding difference).

Vaccine Efficacy

Vaccine efficacy is a special form of AFe, where “not vaccinated” is the exposure (factor positive) and “vaccinated” is the comparison group. If 20% of unvaccinated individuals develop disease versus 5% of vaccinated individuals:

Vaccine Efficacy Calculation

  • RD = 0.20 − 0.05 = 0.15
  • AFe = 0.15 / 0.20 = 0.75 (75%)

The vaccine has prevented 75% of the cases of disease that would have occurred in the vaccinated group if the vaccine had not been used.

AFe vs. Etiologic Fraction

The etiologic fraction is the proportion of cases in the exposed group for which exposure was a component of the sufficient cause (Rothman, 1976). While AFe measures the excess fraction, the etiologic fraction can be higher because exposure may contribute to cases even when the baseline risk would have produced them eventually. In general, AFe provides a lower bound for the etiologic fraction (Greenland & Robins, 1988).

Reflection

In a cohort study, a new environmental pollutant is found to have an RR of 3.0 for respiratory disease. The risk of respiratory disease in the non-exposed population is 2%. Calculate the RD and AFe. If 1,000 people are exposed, how many additional cases would you expect due to the exposure? Discuss why RD and AFe give different but complementary perspectives.

Model answerRD = baseline × (RR−1) = 0.02 × 2 = 0.04 (4 per 100 exposed). AFe = (RR−1)/RR = 2/3 ≈ 0.667 — 67% of disease in exposed people is attributable to the exposure. With 1,000 exposed individuals, expected cases at baseline = 1,000 × 0.02 = 20; with exposure = 1,000 × 0.06 = 60; additional cases due to exposure = 40. RD gives the public-health-relevant absolute number (40 extra cases per 1000 exposed); AFe gives the within-exposed fraction (67% of exposed cases would not have occurred without exposure). They are complementary: RD scales to population impact; AFe addresses individual-level questions like the legal standard "but for the exposure, would this person have gotten sick?"

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • RD (attributable risk) measures the absolute increase in risk due to exposure; the null value is 0.
  • AFe = (RR − 1)/RR gives the proportion of disease in the exposed that is due to the exposure.
  • Vaccine efficacy is a special case of AFe where the “exposure” is being unvaccinated.
  • AFe provides a lower bound for the etiologic fraction.
Knowledge Check — Section 2

1. If the risk of disease is 12% in the exposed group and 4% in the non-exposed group, the risk difference (RD) is:

RD = 0.12 − 0.04 = 0.08 or 8%. This is the absolute increase in risk attributable to the exposure. (The RR would be 3.0, which is a ratio measure.)

2. A vaccine efficacy of 75% means:

Vaccine efficacy = AFe = (risk in unvaccinated − risk in vaccinated) / risk in unvaccinated. A value of 75% means 75% of cases were prevented by vaccination.

3. AFe = (RR − 1)/RR. If RR = 2.5, what is AFe?

AFe = (2.5 − 1) / 2.5 = 1.5 / 2.5 = 0.60. This means 60% of disease among the exposed is attributable to the exposure.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 3

Population-Level Measures & Study Design

⏱ Estimated reading time: 12 minutes

Introduction and Overview

Sections 1 and 2 worked at the level of the exposed group. Section 3 zooms out: even if an exposure powerfully causes disease in the exposed, its public-health importance also depends on how common it is in the population. The population attributable fraction (AFp) captures this combination, and the section closes by mapping each measure of association onto the study design that produces it — tying back to HSCI 230 Lessons 4–6.

Learning Objectives

  • Calculate and interpret the population attributable risk (PAR) and population attributable fraction (AFp).
  • Explain how the prevalence of exposure affects population-level measures.
  • Identify which measures of association can be computed from each study design.

From the Exposed Group to the Entire Population

While RD and AFe describe the effect of exposure among exposed individuals, public health decisions often require understanding the impact of an exposure on the entire population. Two key population-level measures address this:

Population Attributable Risk (PAR)

The PAR is the increase in overall population risk attributable to the exposure. It reflects both the strength of the association and the frequency of the exposure in the population — an idea originally developed by Levin (1953) for lung cancer and reviewed by Northridge (1995) as the key link between causal inference and public-health action (see also Wikipedia).

PAR = p(D+) − p(D+|E−) = (m1/n) − (a0/n0) = RD × p(E+) Eq 6.8

Population Attributable Fraction (AFp)

The AFp indicates the proportion of disease in the entire population that is attributable to the exposure, and which would be avoided if the exposure were removed (assuming causation and no confounding).

AFp = PAR / p(D+) = p(E+)(RR − 1) / [p(E+)(RR − 1) + 1] Eq 6.9

Why Exposure Prevalence Matters

A strong risk factor (high RR) that is rare in the population will have a small AFp. A weaker risk factor (modest RR) that is common may have a large AFp. For example, intravenous drug use has a very high RR for HIV, but if it is rare in the population, eliminating it would prevent few total cases. A modestly elevated risk factor like poor diet, affecting millions, may account for more total cases.

Worked Example: Smoking & Low Birth Weight (Population Level)

From the cohort of 5,000 women (351 smokers, 4,649 non-smokers):

  • Overall risk: p(D+) = 371/5000 = 0.074
  • Risk in non-exposed: 331/4649 = 0.071
  • PAR = 0.074 − 0.071 = 0.003
  • AFp = 0.003 / 0.074 = 0.041 (4.1%)

Only 4.1% of all low-birth-weight babies in the population were attributable to smoking. The low AFp is because very few women (351/5000 = 7%) smoked during the 2nd trimester, despite the relatively strong association (RR = 1.60).

Confounding and AFp

If confounding is present, adjusted estimates of RR should be used. The AFp can then be estimated using:

AFp = pd × (aRR − 1) / aRR Eq 6.10

where pd is the proportion of cases exposed to the risk factor, and aRR is the adjusted risk ratio. For multiple exposure categories, a summation formula is used.

Study Design and Measures of Association

Not all measures can be computed from all study designs. The following table summarises which measures are available:

MeasureCross-sectionalCohortCase-control
RR
IR
OR
RD
AFeb
PARa
AFpac

a Requires independent estimate of p(D+) or p(E+). b Estimated using OR. c Requires OR and independent estimate of p(E+|D+).

Reflection

Consider two risk factors for a disease: Factor A has RR = 5.0 and affects 2% of the population. Factor B has RR = 1.5 and affects 40% of the population. Calculate AFp for each factor using the formula AFp = p(E+)(RR − 1) / [p(E+)(RR − 1) + 1]. Which factor would you prioritise in a public health intervention, and why?

Model answerFactor A: AFp = 0.02(4)/(0.02(4)+1) = 0.08/1.08 = 0.074 (7.4%). Factor B: AFp = 0.40(0.5)/(0.40(0.5)+1) = 0.20/1.20 = 0.167 (16.7%). Despite a much smaller per-person RR, Factor B prevents over twice as many population cases because it is so much more common. Prioritise B for a population-level public-health intervention — the population attributable fraction is what determines burden averted. A nuance: B's lower per-person effect may mean the intervention is harder to deliver, less attractive to individuals (low perceived risk), and politically tougher; A's larger per-person effect may justify a targeted (high-risk) intervention even though its population impact is smaller. The best portfolio often combines both: high-risk strategies for A and population-wide strategies for B.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • PAR = overall population risk − risk in unexposed; it reflects both strength and prevalence of exposure.
  • AFp = PAR / p(D+); it gives the proportion of all disease in the population attributable to the exposure.
  • A common risk factor with modest RR can have a larger AFp than a rare factor with high RR.
  • Different study designs support different measures: only OR is available from case-control studies.
Knowledge Check — Section 3

1. A risk factor has RR = 4.0 but affects only 1% of the population. The AFp is:

AFp = p(E+)(RR−1) / [p(E+)(RR−1)+1] = 0.01×3 / (0.01×3+1) = 0.03/1.03 = 0.029 or about 2.9%. Despite the strong association, the low prevalence of exposure limits the population impact.

2. Which measure cannot be computed directly from a case-control study?

RD requires actual disease risks in the exposed and non-exposed groups. In case-control studies, these risks cannot be computed because the investigator determines the ratio of cases to controls.

3. PAR differs from RD in that:

PAR = p(D+) − p(D+|E−). It is the overall population-level risk increase attributable to the exposure, incorporating both the strength of association and the prevalence of exposure. RD only measures the difference between exposed and non-exposed groups.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 4

Hypothesis Testing & Confidence Intervals

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Sections 1–3 produced point estimates of association — single numbers like RR = 2.5 or AFp = 30%. Those numbers are useless without a quantification of how uncertain they are. Section 4 closes the lesson by introducing the standard error, hypothesis tests, and confidence intervals — the same statistical-inference machinery you previewed in HSCI 230, now applied directly to the measures you just learned to compute.

Learning Objectives

  • Explain the concepts of standard error, null hypothesis, and P-value.
  • Describe the four common test statistics for evaluating associations.
  • Interpret confidence intervals for measures of association.
  • Distinguish between statistical significance and the strength of association.

Standard Error

The standard error (SE) provides a measure of the precision of a point estimate — how much uncertainty exists in the estimate. For difference measures (RD, ID), the variance can be computed directly:

var(RD) = [(a1/n1)(1 − a1/n1)] / n1 + [(a0/n0)(1 − a0/n0)] / n0 Eq 6.13

For ratio measures (RR, IR, OR), the variance is computed on the log scale using Taylor series approximations:

var(ln RR) = 1/a1 − 1/n1 + 1/a0 − 1/n0 Eq 6.14
var(ln OR) = 1/a1 + 1/a0 + 1/b1 + 1/b0 Eq 6.15

Hypothesis Testing

Significance testing is based on specifying a null hypothesis about the population parameter. The null hypothesis typically states there is no association:

  • For difference measures (RD, ID): H0: θ = 0
  • For ratio measures (RR, IR, OR): H0: θ = 1

An alternative hypothesis can be 1-tailed or 2-tailed. In general, 2-tailed hypotheses are preferred because 1-tailed hypotheses are harder to justify.

Limitations of P-values

P-values are often dichotomised into “significant” or “non-significant” at α = 0.05, but this entails a huge loss of information (Wasserstein & Lazar, 2016; Greenland et al., 2016). A P-value of 0.049 and 0.051 lead to different conclusions despite being virtually identical. Always report the actual P-value and a confidence interval, which conveys both significance and precision.

Test Statistics

Click each card to explore:

Pearson χ²Click to explore
Exact TestsClick to explore
Wald StatisticClick to explore
Likelihood Ratio TestClick to explore

Confidence Intervals

A confidence interval (CI) (Wikipedia) reflects the level of uncertainty in a point estimate. A 95% CI means that if the study were repeated many times under identical conditions, 95% of the computed CIs would contain the true parameter value — it is a property of the procedure, not a probability statement about the parameter (Greenland et al., 2016).

Computing CIs

For difference measures, the CI is computed directly:

θ ± Zα × √var(θ) Eq 6.19

For ratio measures, the CI is computed on the log scale and then exponentiated:

θ × exp(± Zα × √var(ln θ)) Eq 6.21

The CI is symmetrical about lnθ but not about θ itself — this is why confidence intervals for ratio measures appear asymmetric.

Interpreting CIs for Measures of Association

  • For RR, IR, OR: if the 95% CI includes 1, the association is not statistically significant at α = 0.05.
  • For RD, ID: if the 95% CI includes 0, the association is not statistically significant.

However, this “surrogate significance test” is an under-use of the CI. The CI also shows the range of plausible effect sizes, which is far more informative than a binary significant/non-significant classification.

Example CIs from the Textbook

MeasurePoint Estimate95% CI
RD (smoking)0.043(0.009, 0.077)
RR (smoking)1.601(1.174, 2.182)
OR (smoking)1.678(1.154, 2.387)
ID (migraine)0.338(0.232, 0.443)
IR (migraine)2.811(1.983, 4.050)

None of the CIs for ratio measures include 1, and none for difference measures include 0, confirming statistical significance for all associations.

Choosing the Right Statistical Test

The Pearson χ², Fisher’s exact, Wald, and likelihood ratio tests above are the workhorses for 2×2 tables and the regression-based measures of association you will meet in HSCI 410. But epidemiological analyses often require comparing means, proportions, or whole distributions across groups — sometimes paired, sometimes not, sometimes badly skewed. The right test depends on three structural questions about your data:

  • Outcome type — continuous (means / medians) or categorical (counts / proportions)?
  • Group structure — one group, two groups, or three or more? Independent or paired/matched?
  • Distributional assumptions — can you defend approximate normality (or appeal to a large-sample CLT argument), or do you need a non-parametric (rank-based) alternative?

Quick Decision Guide

  • Continuous, 1 group vs. a known value → one-sample t-test (or Wilcoxon signed-rank if non-normal).
  • Continuous, 2 independent groups → two-sample t-test (or Mann–Whitney U).
  • Continuous, 2 paired observations → paired t-test (or Wilcoxon signed-rank on differences).
  • Continuous, 3+ groups → one-way ANOVA (or Kruskal–Wallis).
  • Categorical, independent groups → Pearson χ² (or Fisher’s exact for small expected counts).
  • Categorical, paired/matched → McNemar’s test (on discordant pairs).
  • Two continuous variables, association → Pearson r (linear) or Spearman ρ (monotonic, rank-based).

Comparison Table: When Used, How Calculated, How Interpreted

The table below summarises the tests most commonly reported alongside measures of association. The R code that follows runs each on simple built-in datasets so you can copy, paste, and read the output without external data files.

TestWhen to useHow calculatedHow to interpret
One-sample t-test Compare a single sample mean to a known/hypothesised value μ0; continuous, approximately normal (or n large). t = ( − μ0) / (s/√n); df = n−1. If p < α (or 95% CI for the mean excludes μ0), the population mean differs from μ0.
Two-sample (independent) t-test Compare means of 2 independent groups; continuous, approximately normal. Welch’s version (R’s default) does not assume equal variances. t = (12) / SEdiff; df = n1+n2−2 (Student) or Welch–Satterthwaite df (Welch). p < α → the two group means differ. Always report the mean difference and its 95% CI.
Paired t-test Two related observations on the same unit (before/after, twin pairs, matched cases-controls); continuous differences approximately normal. Compute within-pair differences di; t = / (sd/√n); df = n−1. p < α → mean within-pair change ≠ 0. Reduces between-subject variability — usually more powerful than treating data as unpaired.
One-way ANOVA Compare means across 3+ independent groups; continuous, approximately normal, roughly equal variances. F = MSbetween / MSwithin; df1 = k−1, df2 = Nk. Significant Fat least one group mean differs. Follow with post-hoc pairwise comparisons (Tukey HSD, Bonferroni) to see which.
Pearson χ² Test independence of two categorical variables (any r×c table). Assumption: all expected counts > 1 and ≥ 80% > 5. χ² = Σ(O−E)²/E; df = (r−1)(c−1). Expected = (row total × column total) / N. p < α → row and column variables are associated. The test signals whether there is association — report a measure of association (OR, RR) for strength.
Fisher’s exact test 2×2 (or larger) tables with small expected counts where the χ² approximation is suspect. Hypergeometric: enumerates every table with the same margins, sums probabilities of tables as extreme or more extreme than observed. Exact p-value — no large-sample assumption. With modern computing, fine to use even when χ² would also be valid.
McNemar’s test Paired/matched binary outcomes (before/after on the same person; matched case-control on exposure; agreement of two diagnostic tests). Look only at discordant pairs b and c: χ² = (bc)² / (b+c); df = 1. Concordant pairs ignored. p < α → the discordant pairs are unbalanced — a real change/effect exists. The matched OR is b/c.
Wilcoxon signed-rank Non-parametric alternative to one-sample / paired t-test. Use when differences are skewed, ordinal, or have outliers. Rank |di|, attach signs, sum positive (or negative) ranks; compare to its null distribution (or large-sample z). p < α → the median difference is non-zero (under symmetry). Robust to outliers.
Mann–Whitney U (Wilcoxon rank-sum) Non-parametric alternative to two-sample t-test. Two independent groups, continuous or ordinal. Pool all observations, rank them, sum ranks in one group; U = R1n1(n1+1)/2. p < α → the two distributions differ. If shapes are similar, this is a test of medians; otherwise it tests stochastic dominance.
Kruskal–Wallis Non-parametric alternative to one-way ANOVA. 3+ independent groups; continuous or ordinal. H from rank sums; approximately χ²k−1 under H0. p < α → at least one group’s distribution differs. Follow with pairwise rank-sum tests (Dunn’s test or pairwise Wilcoxon with adjustment).
Pearson correlation (r) Linear association between two continuous variables; assumes approximate bivariate normality, sensitive to outliers. r = Σ(x)(y) / √[Σ(x)² Σ(y)²]; tested via t = r√[(n−2)/(1−r²)]. Range −1 to +1: sign = direction, magnitude = strength of linear association. Always inspect a scatterplot first.
Spearman ρ Monotonic (not necessarily linear) association between two ordinal or non-normally distributed continuous variables. Pearson r applied to the ranks of x and y. Same −1 to +1 interpretation, but for monotonic association. Robust to outliers and non-linearity.

Test choice is not just about p-values

None of the tests above are themselves measures of effect size. A statistically significant chi-square confirms that some association exists, but the magnitude must come from the OR, RR, mean difference, or correlation coefficient. Likewise, a non-significant t-test in a small study is not evidence of no effect. Always pair every test with the corresponding effect estimate and its 95% CI.

R Tutorial — Running These Tests

R has all of these tests in base (no packages required). The blocks below use the built-in datasets ToothGrowth, sleep, and iris — copy any block straight into your console.

R Comparing means: t-tests and ANOVA

R’s t.test() defaults to Welch’s two-sample t-test (no equal-variance assumption). Use var.equal = TRUE for the classic Student’s version. The paired = TRUE flag flips it to a paired t.

# --- ONE-SAMPLE t-test: is mean tooth length different from 18 mm? ---
t.test(ToothGrowth$len, mu = 18)

# --- TWO-SAMPLE (independent) t-test: OJ vs. VC supplements ---
t.test(len ~ supp, data = ToothGrowth)                 # Welch (default)
t.test(len ~ supp, data = ToothGrowth, var.equal = TRUE)  # Student

# --- PAIRED t-test: sleep gain under two drugs, same 10 subjects ---
t.test(extra ~ group, data = sleep, paired = TRUE)

# --- ONE-WAY ANOVA: tooth length across 3 doses ---
ToothGrowth$dose <- factor(ToothGrowth$dose)
fit <- aov(len ~ dose, data = ToothGrowth)
summary(fit)

# Post-hoc pairwise comparisons (control familywise error)
TukeyHSD(fit)

Reading the output

For t.test(): report t, df, p-value, the mean difference, and the 95% CI. For aov(): summary() gives the F-statistic and overall p; TukeyHSD() tells you which specific group pairs differ.

Diagnostic before you trust it. Check normality of residuals (plot(fit, which = 2)) and equal variances (Levene’s test, or plot(fit, which = 1)). If either fails badly, switch to the non-parametric box below.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers you produced for the t-test and ANOVA block. Look at your console output before answering.

1. From the two-sample Welch t.test(len ~ supp, data = ToothGrowth), report the mean difference between OJ and VC and the 95% CI. Does the CI exclude 0?

Model answerThe Welch t-test on ToothGrowth gives a mean difference of about 3.7 (OJ minus VC), 95% CI roughly (−0.17, 7.57), and p ≈ 0.06. The CI just barely includes 0, so at α = 0.05 the difference is not statistically significant. The reading: there is a suggestive trend favouring OJ but the data do not exclude no difference; this is the textbook case for ‘suggestive, not significant’ that calls for either a larger study or analytic refinement (e.g., adjusting for dose).

2. From the paired t.test(extra ~ group, data = sleep, paired = TRUE), what mean difference in sleep gain did you observe and what was the p-value? Why does pairing the same 10 subjects give more power than treating the two columns as independent?

Model answerPaired t-test on the sleep dataset gives mean difference ≈ 1.58 (group 2 − group 1) with p ≈ 0.0028 — clearly significant. Pairing the same 10 subjects gives more power because each subject's between-condition difference removes between-subject variability (the noisy heterogeneity in baseline sleep), so the SE of the difference is much smaller than the SE for two independent groups. Algebraically, Var(d) = Var(X1) + Var(X2) − 2Cov(X1, X2); when Cov is positive (as it is for repeated measures on the same person), the variance shrinks.

3. From summary(fit) and TukeyHSD(fit) on tooth length by dose, what does the overall ANOVA F-test say, and which specific dose-pairs in the Tukey output differ significantly?

Model answerThe overall ANOVA F-test is highly significant (F ≈ 67, p < 0.001) — tooth length varies with dose. Tukey HSD pairwise comparisons show all three dose-pair contrasts significant: 1.0−0.5, 2.0−0.5, and 2.0−1.0 all have adjusted p < 0.001. The pattern is monotone dose-response: each step up in vitamin C increases tooth length. The Tukey adjustment controls family-wise error rate across the multiple pairwise comparisons, so the joint conclusion is statistically defensible.
Saved.
R Comparing categorical outcomes: χ², Fisher’s exact, McNemar’s

For 2×2 tables build a matrix or table. R applies continuity correction by default for chisq.test() and mcnemar.test() on 2×2 tables — turn it off with correct = FALSE if you prefer the uncorrected statistic.

# --- 2x2 table: exposure x disease ---
exposure <- matrix(c(30, 70, 10, 90), nrow = 2, byrow = TRUE,
                   dimnames = list(exposure = c("E+", "E-"),
                                   disease  = c("D+", "D-")))
exposure

chisq.test(exposure)                       # Pearson chi-square (with Yates' correction)
chisq.test(exposure, correct = FALSE)       # without correction
fisher.test(exposure)                      # exact test (preferred for small cells)

# --- McNemar's test: paired/matched binary data ---
# e.g. agreement between two diagnostic tests on the same 100 patients
paired <- matrix(c(40, 10,
                   25, 25), nrow = 2, byrow = TRUE,
                 dimnames = list(test1 = c("+", "-"),
                                 test2 = c("+", "-")))
mcnemar.test(paired)                       # with continuity correction
mcnemar.test(paired, correct = FALSE)       # without

Reading the output

chisq.test() returns χ², df, and p; check $expected for sparse cells. fisher.test() additionally returns the OR and its 95% CI — a free measure of association on top of the test. mcnemar.test() tests only the discordant pairs (10 vs 25 here); the matched OR is 10/25 = 0.4.

When to switch to Fisher. If any expected cell count drops below 5 (or any below 1), the χ² large-sample approximation is unreliable. chisq.test() will warn you; fisher.test() sidesteps the issue entirely.

R Reflect on what you just ran

Use the questions below to interpret your 2x2 table analyses. Look at the exposure matrix, the test output, and the McNemar output before answering.

1. From chisq.test(exposure) on the 2x2 table (30/70/10/90), report the chi-square statistic and p-value. By hand or by inspection, what is the odds ratio for this table, and what does the test conclude about independence?

Model answerchisq.test on the 2×2 table (30/70/10/90) gives χ² ≈ 14.3, p ≈ 1.6e-4. The cross-product OR = (30×90)/(70×10) = 2700/700 = 3.86. The test conclusively rejects independence: exposure and outcome are associated, with exposed individuals having about 3.9-fold higher odds of disease.

2. From fisher.test(exposure), what OR and 95% CI did it return? How does this OR compare with the simple cross-product OR you can compute from the cells of the table?

Model answerFisher's exact returns OR ≈ 3.83 with 95% CI roughly (1.78, 8.78). The point estimate is essentially the same as the cross-product OR (3.86) — Fisher uses a conditional MLE that differs slightly in small samples but matches the simple formula closely when expected cell counts are not too small. Fisher's CI is the right one to report whenever any expected cell is < 5 (assumption-violation territory for chi-square).

3. McNemar's test only uses the off-diagonal (discordant) pairs - 10 and 25 in this example. What does the matched OR of 10/25 = 0.4 tell you about the two diagnostic tests, and why would a chi-square on the same table give a misleading answer?

Model answerMatched OR = 10/25 = 0.40 — the new diagnostic test (call it test B) detected disease in 10 patients that the old test (A) missed, while A detected 25 that B missed. The ratio less than 1 says test A is more often the ‘extra’ detector. McNemar uses only discordant pairs because concordant pairs (both tests agree) carry no information about which test is more sensitive. A regular chi-square on the same table would treat the four cells as if from independent samples, ignore the pairing, and inflate the apparent independence (typically making the discordance look smaller than it is) — a serious error in any paired-design analysis.
Saved.
R Non-parametric alternatives + correlation

The same wilcox.test() function covers both the Wilcoxon signed-rank (one-sample / paired) and Mann–Whitney U (two-sample) tests — the paired flag and the formula form decide which.

# --- WILCOXON SIGNED-RANK (one-sample / paired) ---
wilcox.test(ToothGrowth$len, mu = 18)             # one-sample
wilcox.test(extra ~ group, data = sleep, paired = TRUE) # paired

# --- MANN-WHITNEY U / WILCOXON RANK-SUM (two-sample) ---
wilcox.test(len ~ supp, data = ToothGrowth)

# --- KRUSKAL-WALLIS (3+ groups, non-parametric ANOVA) ---
kruskal.test(len ~ dose, data = ToothGrowth)

# Post-hoc pairwise (Bonferroni-adjusted)
pairwise.wilcox.test(ToothGrowth$len, ToothGrowth$dose,
                     p.adjust.method = "bonferroni")

# --- CORRELATION: Pearson (linear) and Spearman (monotonic) ---
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson")
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "spearman")

Reading the output

Rank-based tests return a test statistic (W, U, or H) and a p-value. They do not give a CI for a mean difference — if you need one, use a Hodges–Lehmann estimator (conf.int = TRUE on wilcox.test()). cor.test() returns the correlation, its 95% CI (Pearson only by default), and the p-value.

Parametric or non-parametric? With n > ~30 per group the t-test and ANOVA are robust to mild non-normality (CLT), so reach for non-parametric tests mainly for skewed small samples, ordinal outcomes, or when outliers dominate. Spearman is also a sensible default whenever a scatterplot looks monotonic but not linear.

R Reflect on what you just ran

Use the questions below to interpret your non-parametric and correlation output. Look at your console results before answering.

1. Compare the p-values from t.test(len ~ supp, data = ToothGrowth) (earlier box) and wilcox.test(len ~ supp, data = ToothGrowth). Are they similar or different? What does that tell you about whether the t-test was robust here?

Model answerThe t-test and Wilcoxon give nearly identical p-values (both around 0.06) on ToothGrowth supp. That agreement is reassuring: it means the t-test was robust here despite using means — either the distributions are reasonably symmetric or the sample is large enough for the central limit theorem to apply. When parametric and non-parametric tests disagree (Wilcoxon p much smaller than t-test p), it typically signals heavy skew or outliers; the rank-based test is then preferred.

2. From cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson"), report the correlation coefficient r and its 95% CI. How does the Spearman correlation compare, and what would a large discrepancy suggest about the relationship's shape?

Model answerPearson r ≈ 0.872 with 95% CI (0.827, 0.906) for sepal vs. petal length in iris. Spearman r will be similar (~0.88). When Pearson and Spearman are nearly equal, the relationship is approximately linear and well-described by the Pearson correlation; a large discrepancy (e.g., Pearson 0.5 vs. Spearman 0.85) would signal a monotone non-linear relationship (Pearson misses the curve) or that outliers are driving the linear estimate.

3. From kruskal.test(len ~ dose, data = ToothGrowth), what was the H statistic and p-value? Which pairs in the pairwise.wilcox.test() with Bonferroni adjustment remained significant? Why does Bonferroni make this conclusion more conservative?

Model answerKruskal-Wallis H ≈ 40.7 with p < 0.001 on ToothGrowth dose — soundly rejects equal distributions across dose levels. Pairwise Wilcoxon with Bonferroni correction shows all three pairwise comparisons significant after adjustment (p < 0.05). Bonferroni divides α by the number of comparisons (here 3), so a raw p must fall below 0.05/3 = 0.0167 to count as significant — this controls family-wise error rate at the cost of statistical power. Conservative is the right word: Bonferroni almost certainly under-rejects when many tests are correlated, but it guarantees the joint Type-I error stays below 5%.
Saved.

Reflection

A study reports an OR of 1.45 with a 95% CI of (0.92, 2.28). A second study reports an OR of 1.15 with a 95% CI of (1.02, 1.30). Compare these two findings in terms of: (a) strength of association, (b) statistical significance, and (c) precision. Which finding might be more concerning from a public health perspective, and why?

Model answerStrength: Study 1 reports a larger point estimate (OR 1.45 vs. 1.15), suggesting a stronger per-person association if both are correct. Statistical significance: Study 1's CI (0.92, 2.28) includes 1, so it is non-significant; Study 2's CI (1.02, 1.30) excludes 1, so it is significant. Precision: Study 2 is much more precise — CI width 0.28 vs. Study 1's 1.36 — reflecting larger sample size and/or better measurement. Public-health concern: Study 2 is more concerning for action because the effect, though smaller, is reliably non-zero and applies (likely) to a much larger population; small effects in big populations produce many cases. Study 1's effect, if real, is larger per person but the data don't yet exclude no effect. The decision depends on the absolute baseline risk and population size, but precision and population scope usually win over magnitude alone.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Standard errors quantify precision; they are computed differently for difference vs. ratio measures.
  • Hypothesis testing uses a null hypothesis (no effect) and a test statistic to generate a P-value.
  • Four common test statistics for measures of association: Pearson χ², exact tests, Wald, and likelihood ratio tests.
  • Beyond 2×2 tables, choose tests by outcome type, group structure, and distribution: t-tests / ANOVA for normal continuous data, χ² / Fisher / McNemar for categorical, and Wilcoxon / Mann–Whitney / Kruskal–Wallis / Spearman as non-parametric alternatives.
  • Confidence intervals are more informative than P-values: they show the range of plausible effect sizes.
  • For ratio measures, the CI containing 1 (or 0 for differences) indicates non-significance at the corresponding α level.
Knowledge Check — Section 4

1. A 95% confidence interval for an odds ratio is (1.2, 3.8). This means:

Since the CI does not include 1 (the null value for ratio measures), the association is statistically significant. The interval 1.2 to 3.8 represents the range of plausible values for the true OR. Note: the CI is a property of the procedure, not a probability statement about the parameter.

2. Why are confidence intervals for ratio measures (like OR) asymmetric around the point estimate?

The variance of ratio measures is computed on the log scale, where the CI is symmetric around ln(θ). When exponentiated back to the original scale, the CI becomes asymmetric around θ.

3. Which test statistic is generally considered superior in regression settings?

Likelihood ratio tests are generally superior to Wald tests, especially in regression settings. They compare the likelihood of the data under the estimated parameters versus the null hypothesis parameters.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 5

Final Review & Assessment

⏱ Estimated time: 20 minutes

Bringing It All Together

This lesson translated the disease-frequency measures of Lesson 5 into measures of association between an exposure and an outcome. You worked through the three ratio measures (RR, IR, OR), the difference measures of effect in the exposed (RD, AFe), the population-level extensions (PAR, AFp), and the inferential machinery — standard errors, hypothesis tests, confidence intervals — that surrounds every estimate.

Lesson 8 returns to study-design topics first introduced in HSCI 230 and consolidates them; Lessons 9–10 cover hybrid designs (nested case-control, case-cohort, case-crossover) and controlled trials. The measures you computed here will reappear in each of those lessons as the outputs that designs deliver and that systematic reviews eventually pool. For deeper treatment of these measures and their statistical foundations, see Greenland and Pearce (2015) and the historical landmark papers by Cornfield (1951) and Doll and Hill (1950).

Key Takeaways from Lesson 7

  • Ratio measures (RR, IR, OR) compare disease frequency between exposed and unexposed; RR and IR come from cohort studies, OR is the natural measure for case-control. When disease is rare, OR ≈ RR.
  • Risk difference (RD) and the attributable fraction in the exposed (AFe) express absolute and proportional impact in the exposed group; vaccine efficacy is a special case of AFe.
  • Population attributable risk (PAR) and AFp depend on both the strength of association and the prevalence of exposure — a common weak risk factor can dwarf a rare strong one in population terms.
  • The study design dictates the measure: cohort designs deliver risks/rates and ratios; case-control designs deliver odds ratios; cross-sectional designs deliver prevalence ratios.
  • Every measure of association needs a standard error and confidence interval; for ratio measures, CIs are constructed on the log scale and are therefore asymmetric.
  • The right hypothesis test follows the data: t-tests/ANOVA for normal continuous outcomes, χ² / Fisher / McNemar for categorical, and Wilcoxon / Mann–Whitney / Kruskal–Wallis / Spearman as non-parametric analogues.

Reflection

A colleague presents findings from a case-control study showing OR = 2.3 (95% CI: 1.1, 4.8) for the association between a workplace chemical exposure and bladder cancer. She concludes the chemical “causes 2.3 times the risk of bladder cancer.” Evaluate this statement. What can and cannot be concluded from this study? Discuss the roles of strength of association, statistical significance, the rare disease assumption, and the distinction between OR and RR.

Model answerThe colleague has conflated three things: strength of association (OR 2.3 is moderate-to-strong, consistent with causation if confirmed); statistical significance (CI excludes 1, so chance is unlikely the sole explanation); and risk. The OR is not a risk ratio: case-control sampling fixes the row totals on outcome status, so absolute risks cannot be estimated. "2.3 times the risk" is correct only under the rare-disease approximation (OR ≈ RR when outcome prevalence < 10%); for bladder cancer in occupational cohorts that is plausible but not given. Further, association is not causation: the OR could be inflated by recall bias (cases recall workplace exposures more thoroughly), selection bias (hospital cases vs. community controls), or residual confounding (smoking, age, occupational co-exposures). What can be concluded: there is a statistically significant association of moderate strength in this case-control study; the rare-disease approximation makes the OR a reasonable proxy for the RR; the association is consistent with causation but not sufficient on its own. What is needed: a prospective cohort with biomarker exposure assessment, biological-mechanism evidence, and dose-response data.

Minimum 20 characters required.

✓ Reflection saved

Final Assessment

Complete all 15 questions below with 100% accuracy to finish this lesson. You must also complete the reflection above before submitting.

Final Assessment — Measures of Association

1. Measures of association differ from measures of statistical significance in that they:

Measures of association (RR, OR, etc.) assess the strength of the relationship between exposure and disease. Statistical significance (P-values) is heavily influenced by sample size, not the magnitude of the effect.

2. In a cohort study, 150 of 2,000 exposed individuals and 75 of 2,000 non-exposed individuals develop the disease. What is the risk ratio?

RR = (150/2000) / (75/2000) = 0.075 / 0.0375 = 2.00. The risk of disease is twice as high in the exposed group.

3. The odds ratio can be calculated from a case-control study because:

OR = (a1×b0)/(a0×b1) is the same whether viewed as the ratio of disease odds or the ratio of exposure odds. Since case-control studies sample based on disease status, only the OR can be validly computed.

4. RD = 0.043 in the smoking and low-birth-weight example means:

RD is the absolute difference in risk: 0.114 − 0.071 = 0.043. This means 4.3 additional cases per 100 exposed women, above what would be expected based on the baseline risk.

5. If RR = 4.0, the attributable fraction in the exposed (AFe) is:

AFe = (RR−1)/RR = (4−1)/4 = 3/4 = 0.75. So 75% of disease in the exposed group is attributable to the exposure.

6. Vaccine efficacy of 80% indicates:

Vaccine efficacy = AFe = (risk unvaccinated − risk vaccinated)/risk unvaccinated. A value of 80% means the vaccine prevented 80% of expected cases.

7. A common risk factor with a modest RR may have a larger AFp than a rare risk factor with a high RR because:

AFp = p(E+)(RR−1)/[p(E+)(RR−1)+1]. Both the RR and the prevalence of exposure (p(E+)) contribute. A high p(E+) with a modest RR can produce a larger AFp than a low p(E+) with a high RR.

8. PAR is best described as:

PAR = p(D+) − p(D+|E−). It reflects the increase in disease risk in the entire population that is attributable to the exposure, incorporating both the strength of association and how common the exposure is.

9. The null value for the risk ratio (RR) is:

For ratio measures (RR, IR, OR), the null value is 1, meaning the risk (or rate or odds) is the same in both groups. For difference measures (RD, ID), the null value is 0.

10. A 95% CI for RR of (0.85, 1.32) suggests:

Since the 95% CI includes the null value of 1, we cannot reject H0 at the 0.05 level. The range of plausible values spans from a modest protective effect (0.85) to a modest risk increase (1.32).

11. Confidence intervals for OR are asymmetric around the point estimate because:

The CI is symmetric on the ln(OR) scale. When exponentiated back to the OR scale, the interval becomes asymmetric because the exponential function is non-linear.

12. In the formula var(ln OR) = 1/a1 + 1/a0 + 1/b1 + 1/b0, increasing all cell counts will:

Since the variance is a sum of reciprocals of cell counts, larger cell counts produce smaller reciprocals, reducing the overall variance. This leads to a narrower (more precise) confidence interval.

13. Which of the following measures CANNOT be estimated from a case-control study, even with external data?

The incidence rate ratio requires person-time data from a cohort study. Case-control studies do not follow participants over time, so IR cannot be computed. AFe and AFp can be approximated using OR with appropriate external data.

14. The Pearson χ² test is most appropriate when:

The Pearson χ² has an approximate χ² distribution provided all expected cell values are >1 and at least 75% (3 of 4 cells) have expected values >5. For small samples, exact tests (like Fisher’s) are preferred.

15. A study finds RR = 1.8 (P = 0.40). Which interpretation is most appropriate?

An RR of 1.8 suggests a meaningful increase in risk. However, the P-value of 0.40 indicates the result is not statistically significant, likely due to insufficient sample size. A non-significant P-value does not prove the null hypothesis — it means we lack sufficient evidence to reject it. The CI would be more informative here.

✦ Complete the final reflection above before submitting