Measures of
Association

Fundamental Epidemiological Concepts and Approaches

Learning objectives for this lesson:

Calculate and interpret the risk ratio, incidence rate ratio, and odds ratio
Compute risk difference, attributable fraction (exposed), and population attributable measures
Understand when to use each measure of association
Correctly distinguish between strength of association and statistical significance
Understand the basis for hypothesis tests and confidence intervals

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Ratio Measures of Association

Risk Ratio (Relative Risk, RR) The ratio of the cumulative incidence (risk) in the exposed group to that in the unexposed group: RR = R₁ / R₀. Used in cohort studies and RCTs over a defined follow-up period.

Incidence Rate Ratio (IRR) The ratio of incidence rates (events per person-time) in the exposed and unexposed groups. Appropriate when follow-up time varies across individuals.

Odds Ratio (OR) The ratio of the odds of disease in the exposed group to the odds in the unexposed group (Cornfield, 1951). The standard measure in case-control studies and the natural output of logistic regression.

Prevalence Ratio (PR) The ratio of prevalence in the exposed group to prevalence in the unexposed group. Preferred over the prevalence odds ratio when the outcome is not rare in cross-sectional analyses.

Hazard Ratio (HR) The ratio of instantaneous event rates (hazards) between groups, typically from Cox proportional hazards regression. Approximates the rate ratio under proportional hazards.

Rare-Disease Assumption When disease prevalence is low (commonly < 10%), the odds ratio approximates the risk ratio. Important for interpreting case-control results.

Difference Measures of Association

Risk Difference (RD, Attributable Risk) The absolute difference in risk between exposed and unexposed groups: RD = R₁ − R₀. Captures the public-health impact in absolute terms.

Rate Difference The absolute difference in incidence rates between exposed and unexposed groups, in events per person-time.

Attributable Fraction in the Exposed (AFe) Among the exposed, the proportion of disease attributable to the exposure: AFe = (R₁ − R₀) / R₁ = (RR − 1) / RR.

Population Attributable Risk (PAR) The excess risk in the total population attributable to the exposure: PAR = R_pop − R₀. Reflects both effect size and exposure prevalence.

Population Attributable Fraction (PAF) The proportion of disease in the total population attributable to the exposure (Northridge, 1995). Useful for prioritising public-health interventions.

Number Needed to Treat (NNT) 1 / |risk difference|: the average number of patients who must receive a beneficial intervention for one to avoid the bad outcome (Laupacis, Sackett, & Roberts, 1988).

Number Needed to Harm (NNH) 1 / |risk difference| for harmful exposures: the average number exposed for one additional case of harm.

Inference & Interpretation

Null Hypothesis (H₀) A statement of no effect or no difference (e.g., RR = 1, RD = 0). The hypothesis tested by significance tests.

p-Value The probability of observing data as extreme or more extreme than the observed, assuming the null hypothesis is true. Not the probability that the null is true.

Confidence Interval A range of values consistent with the data at a chosen confidence level. For a ratio measure, an interval that excludes 1 implies statistical significance at the corresponding level.

Strength of Association vs. Statistical Significance The size of an effect (e.g., RR) is conceptually distinct from how confident we are that it differs from chance (p-value, CI width). Large samples can yield significant findings for trivial effects, and vice versa.

Effect Modification (Interaction) When the magnitude of an exposure-outcome association differs across levels of a third variable. Reported as stratum-specific estimates rather than adjusted away. Additive and multiplicative scales can give different interaction conclusions; reporting both is recommended (Knol & VanderWeele, 2012; VanderWeele & Knol, 2014).

No matching entries. Try a different search term.

Section 1

Introduction & Ratio Measures of Association

⏱ Estimated reading time: 15 minutes

Lesson 7 · HSCI 341

Comparison Is the Core Operation

A single frequency tells you how common a disease is. A measure of association tells you how much the exposure changes it.

Section 1 of 4

Ratio Measures of Association

Risk ratio, incidence rate ratio, and odds ratio: formulas, relationships, and worked examples.

The core idea

Why ratios, and why not just P-values

Measure of association

How strongly is the exposure linked to disease? Independent of sample size. The number that answers the biological or causal question.

P-value

How likely are these data if there were truly no association? Sensitive to sample size. Does not describe effect magnitude.

Null value for ratio measures: 1. Values > 1 indicate increased risk; values < 1 indicate protection.

Three formulas

RR, IR, and OR from the 2x2 table

Risk Ratio (RR)

\[ \color{#0B7B6B}{RR} = \frac{\color{#C2410C}{a_1/(a_1+b_1)}}{\color{#1D4ED8}{a_0/(a_0+b_0)}} = \frac{\color{#C2410C}{R_{E+}}}{\color{#1D4ED8}{R_{E-}}} \]

RR risk ratioR_E+ risk in the exposedR_E− risk in the unexposed

Incidence Rate Ratio (IR)

\[ \color{#0B7B6B}{IR} = \frac{\color{#C2410C}{a_1 / T_1}}{\color{#1D4ED8}{a_0 / T_0}} \]

IR incidence rate ratioa₁/T₁ rate in the exposeda₀/T₀ rate in the unexposed

Odds Ratio (OR)

\[ \color{#0B7B6B}{OR} = \frac{\color{#C2410C}{a_1 / b_1}}{\color{#1D4ED8}{a_0 / b_0}} = \frac{\color{#C2410C}{a_1} \color{#6D28D9}{b_0}}{\color{#1D4ED8}{a_0} \color{#BE185D}{b_1}} \]

OR odds ratioa₁ exposed casesb₁ exposed non-casesa₀ unexposed casesb₀ unexposed non-cases

Worked example 1

Brazil water cistern study

Exposure: household water cistern
Outcome: diarrhea

\[ RR = \frac{194/1782}{303/1617} = \frac{0.109}{0.187} = 0.58 \]

\[ OR = \frac{194 \times 1314}{303 \times 1588} = 0.53 \]

Interpretation

Both measures are below 1: cistern access is protective. The RR of 0.58 means the risk of diarrhea is 42% lower in the cistern group. OR is further from the null than RR, as the general relationship predicts.

Worked example 2

Migraine incidence rates

Exposure: female sex (vs. male)
Outcome: migraine onset, ages 30-40

Incidence Rate Ratio

\[ IR = \frac{131/250}{44/236} = \frac{0.524}{0.186} = 2.81 \]

Interpretation

The migraine rate is 2.81 times higher in females than males aged 30-40. Person-months in the denominator account for variable follow-up time across participants.

The OR's special role

Why OR is the case-control measure

OR is symmetric: the odds of disease given exposure equal the odds of exposure given disease. Case-control studies sample on outcome, so only OR can be computed directly.

Cornfield approximation (rare disease)

\[ \text{When } \color{#C2410C}{p(D+)} < 0.05:\quad \color{#0B7B6B}{OR} \approx \color{#1D4ED8}{RR} \]

p(D+) disease frequencyOR odds ratioRR risk ratio

As disease prevalence rises, OR diverges from RR and moves further from the null. IR sits between RR and OR. This ordering holds on both sides of one.

Carry forward

Three anchors from this section

RR and IR come from cohort studies where the population at risk is observed directly.
OR is the case-control measure; it approximates RR when disease is rare (under 5%).
Ordering from the null: IR further than RR, OR furthest of all, on both sides of 1.

Introduction and Overview

An earlier lesson gave us measures of disease frequency: prevalence, incidence, risk, and rate. An earlier lesson took the same probabilistic vocabulary and applied it at the level of a single test. This lesson brings these strands together: it combines the disease-frequency vocabulary with the 2×2 contingency logic to produce measures of association, the quantitative comparison between exposed and unexposed groups that is the central output of analytic epidemiology. The four content sections build up from the three ratio measures (this section: risk ratio, rate ratio, odds ratio), through difference measures and exposed-group attributable fractions (a later section), to population-level attributable measures and how each measure relates to study design (a later section), and finally to the hypothesis-testing and confidence-interval machinery that turns each of these point estimates into a defensible inference (a later section).

Learning Objectives

Explain why measures of association are used in epidemiology.
Set up incidence risk and incidence rate data in 2×2 tables.
Calculate and interpret the risk ratio (RR), incidence rate ratio (IR), and odds ratio (OR).
Describe the relationships among RR, IR, and OR.

Why Measure Association?

Measures of association assess the magnitude of the relationship between an exposure (a potential cause) and a disease. Unlike measures of statistical significance, which are heavily dependent on sample size, measures of association indicate the strength of the effect, that is, how much more (or less) likely disease is in exposed compared to non-exposed groups (Tripepi et al., 2007).

Strength vs. Significance

A measure of association tells you how strongly an exposure is linked to disease. A P-value tells you how likely the observed data would be under the null hypothesis of no association. A strong association can be non-significant (small sample), and a weak association can be highly significant (large sample). Always report both.

Data Layout

Depending on study design, disease frequency can be expressed as incidence risk, incidence rate, prevalence, or odds. For risk data, the standard 2×2 table is:

	Exposed	Non-exposed	Total
Diseased	a₁	a₀	m₁
Non-diseased	b₁	b₀	m₀
Total	n₁	n₀	n

For rate data, the denominator is person-time at risk rather than the number of individuals:

	Exposed	Non-exposed	Total
Number of cases	a₁	a₀	m₁
Person-time at risk	t₁	t₀	t

A quick word on odds, since the third measure below is built from them. The risk of an outcome is the number of cases divided by everyone at risk, meaning cases plus non-cases. The odds of the same outcome is the number of cases divided by the non-cases alone. If 20 of 100 people get sick, the risk is 20/100 = 0.20, but the odds are 20/80 = 0.25. Risk and odds stay close while an outcome is uncommon and pull apart as it becomes common, which is the same fact that lets the odds ratio stand in for the risk ratio only when disease is rare.

Three Ratio Measures of Association

Click each card to learn more:

Risk Ratio (RR)Click to learn more

Incidence Rate Ratio (IR)Click to learn more

Odds Ratio (OR)Click to learn more

Worked Example: Brazil Water Cistern Study

Diarrhea & Water Cistern Presence

	Water Cistern	No Cistern	Total
Diarrhea Present	194	303	497
Diarrhea Absent	1,588	1,314	2,902
Total	1,782	1,617	3,399

RR = (194/1782) / (303/1617) = 0.109 / 0.187 = 0.58
OR = (194 × 1314) / (303 × 1588) = 0.53

Both measures indicate that having a water cistern is protective against diarrhea (values < 1). The RR of 0.58 means the risk is 42% lower among those with cisterns.

Worked Example: Migraine Incidence Rates

Gender and Migraine (Ages 30–40)

	Female	Male	Total
Cases of migraine	131	44	175
Person-months	250	236	486

IR = (131/250) / (44/236) = 0.524 / 0.186 = 2.81

The rate of migraine is 2.81 times higher in females than males aged 30–40.

Relationships Among RR, IR, and OR

In general, IR values are further from the null (1) than RR values, and OR values are even further away (Cornfield, 1951; Knol et al., 2008). This can be visualised on a number line:

Figure 6.1. General relationships among RR, IR, and OR. OR is always furthest from the null value of 1.

When is OR ≈ RR?▼

When the disease is rare (prevalence or incidence risk < 5%), OR approximates RR, the classic Cornfield (1951) approximation used by Doll and Hill (1950) in their landmark case-control study of smoking and lung cancer. This is because when a₁ is small relative to n₁, the denominator of the odds (b₁) is approximately equal to n₁, and similarly for the non-exposed group. In the cistern example, the overall risk was 14.6%, so OR (0.53) was more extreme than RR (0.58); when outcomes are not rare, treating OR as RR can substantially overstate the effect (Knol et al., 2008).

When is RR ≈ IR?▼

RR and IR will be close to each other if the exposure has a negligible impact on the total time at risk in the study population. This occurs when the disease is rare or when IR is close to the null value (IR ≈ 1).

OR as an Estimator of IR▼

OR is a good estimator of IR under certain conditions in case-control studies. If controls are selected using cumulative or risk-based sampling (all non-cases after cases have occurred), then OR estimates IR only if the disease is rare. If controls are selected using density sampling (a control selected from non-cases each time a case occurs), then OR is a direct estimate of IR regardless of disease rarity.

⚖ Interactive: Risk Ratio vs. Odds Ratio

Edit any cell of the 2×2 (or use the slider for outcome prevalence). Watch how OR ≈ RR when the outcome is rare, but the two diverge dramatically as the outcome becomes common: OR always overstates RR when RR > 1, and understates it when RR < 1.

2×2 table (click cells to edit)

	Y+	Y−	Total
E+	40	60	100
E−	20	80	100

Outcome prevalence shifter 0.30 Scales the table to give this overall prevalence (preserving RR).

Presets:

RR vs. OR as outcome prevalence climbs

Risk in E+

–

Risk in E−

–

Outcome prevalence

–

Risk Ratio

–

Odds Ratio

–

OR / RR

–

Try the Common outcome preset: a true RR of 2.0 produces an OR around 3.0+. Reporting the OR as if it were a "risk ratio" overstates the harm by 50%. The "rare disease assumption" is what justifies treating OR as RR, and it should not be assumed without checking.

Key Takeaways

Measures of association quantify the strength of the exposure-disease relationship, unlike P-values which reflect sample size.
RR compares risks, IR compares incidence rates, and OR compares odds between exposed and non-exposed groups.
OR is the only measure that can be computed from case-control studies due to its symmetry property.
When disease is rare (<5%), OR ≈ RR. IR values are further from the null than RR, and OR values further still.

✦ Pass the knowledge check with 100% to continue

Section 2

Measures of Effect in the Exposed Group

⏱ Estimated reading time: 15 minutes

Section 2 of 4

Measures of Effect in the Exposed Group

Risk difference, attributable fraction in the exposed, and vaccine efficacy as a special case.

Risk difference

RD: the absolute increase in risk

Risk Difference (RD) / Attributable Risk

\[ \color{#0B7B6B}{RD} = \color{#C2410C}{R_{E+}} - \color{#1D4ED8}{R_{E-}} = \frac{\color{#C2410C}{a_1}}{a_1+b_1} - \frac{\color{#1D4ED8}{a_0}}{a_0+b_0} \]

RD risk differenceR_E+ risk in the exposedR_E− risk in the unexposed

Incidence Rate Difference (ID)

\[ \color{#0B7B6B}{ID} = \frac{\color{#C2410C}{a_1}}{\color{#C2410C}{T_1}} - \frac{\color{#1D4ED8}{a_0}}{\color{#1D4ED8}{T_0}} \]

ID rate differencea₁/T₁ rate in the exposeda₀/T₀ rate in the unexposed

Null value: 0. RD > 0 means excess risk in the exposed; RD < 0 means a protective effect.

Worked example

Smoking and low birth weight

Smokers: 40/351 = 0.114
Non-smokers: 331/4,649 = 0.071

\[ RD = 0.114 - 0.071 = 0.043 \]

Per 100 women who smoked, approximately 4.3 additional low-birth-weight babies attributable to smoking.

Why RD matters

A risk ratio of 1.60 sounds substantial. An RD of 4.3 per 100 tells you the actual scale of the problem when planning an intervention.

Attributable fraction

AF_e: proportion of exposed cases due to exposure

Attributable Fraction in the Exposed

\[ \color{#0B7B6B}{AF_e} = \frac{\color{#1D4ED8}{RR} - 1}{\color{#1D4ED8}{RR}} = \frac{\color{#C2410C}{RD}}{\color{#6D28D9}{R_{E+}}} \]

AF_e attributable fraction in the exposedRR risk ratioRD risk differenceR_E+ risk in the exposed

Ranges from 0 (when RR = 1, no excess) to 1 (when all disease in the exposed is due to the exposure).

From the smoking example:

\[ AF_e = \frac{1.60 - 1}{1.60} = \frac{0.60}{1.60} \approx 0.375 \]

37.5% of low-birth-weight cases among smoking women were attributable to their smoking.

Special case

Vaccine efficacy as AF_e

Unvaccinated (exposure+): 20% develop disease
Vaccinated (exposure-): 5% develop disease

Vaccine Efficacy

\[ \color{#0B7B6B}{VE} = AF_e = \frac{\color{#C2410C}{RD}}{\color{#6D28D9}{R_{E+}}} = \frac{0.15}{0.20} = 0.75 \]

VE vaccine efficacyRD risk difference (unvaccinated vs vaccinated)R_E+ risk in the unvaccinated

Plain language

The vaccine prevented 75% of the cases that would have occurred in vaccinated individuals had they remained unvaccinated.

Carry forward

Relative vs. absolute measures

Ratio measures (section 1)

Relative strength of association. Stable across different baseline risks. Used in clinical risk communication.

Difference measures (section 2)

Absolute excess risk and proportional causal share. Needed for resource allocation and policy prioritisation.

A later section takes these ideas and asks: what is the impact on the whole population, not just on those who are exposed?

Introduction and Overview

An earlier section covered the three ratio measures (RR, IR, OR), that is, how many times more likely disease is in the exposed group compared to the unexposed. This section turns to the parallel set of difference measures, which answer a different question: not how many times more, but how many extra cases occur because of the exposure. Difference measures lead naturally to attributable fractions in the exposed, which quantify how much disease in the exposed group can be attributed to the exposure itself.

Learning Objectives

Distinguish between “ratio” (relative) and “difference” (absolute) measures of association.
Calculate and interpret the risk difference (RD) and incidence rate difference (ID).
Calculate and interpret the attributable fraction in the exposed (AF_e).
Explain the concept of vaccine efficacy as a special case of AF_e.

Ratio vs. Difference Measures

The ratio measures from an earlier section (RR, IR, OR) tell us the relative strength of association, but they do not indicate the absolute number of cases attributable to the exposure. Difference (absolute effect) measures address this gap by computing how many additional cases occur because of the exposure; the choice between ratio and difference measures is itself a substantive scientific decision rather than a statistical convenience (Greenland & Pearce, 2015; Tripepi et al., 2007).

It helps to name the two scales this contrast lives on. Ratio measures work on a multiplicative scale: a risk ratio of 2 says the exposed risk is the baseline risk multiplied by two. Difference measures work on an additive scale: a risk difference of 0.04 says the exposed risk is the baseline plus four percentage points. The same association can look striking on one scale and slight on the other, so the choice of scale is a real part of the scientific question rather than a matter of presentation.

Why Both Matter

Even when an exposure is very strongly associated with disease (high RR), if the exposure is rare in a population, it may contribute very few cases. Conversely, a relatively weak risk factor (modest RR) that is common can be responsible for many cases. Difference measures capture this “public health impact.”

Risk Difference (RD): Attributable Risk

The risk difference (RD), also called attributable risk (Walter, 1976), is simply the risk in the exposed group minus the risk in the non-exposed group:

Risk difference

RD = p(D+|E+) − p(D+|E−) = (a₁/n₁) − (a₀/n₀) Eq 6.5

The risk difference is the risk in the exposed group minus the risk in the unexposed group.

Similarly, the incidence rate difference (ID) is the difference between two incidence rates:

Incidence rate difference

ID = (a₁/t₁) − (a₀/t₀) Eq 6.6

The rate difference is the rate in the exposed minus the rate in the unexposed, each a count of cases over person-time.

Interpretation of Difference Measures

RD or ID < 0 → Exposure is protective
RD or ID = 0 → No effect of exposure
RD or ID > 0 → Exposure is positively associated with disease

RD indicates the increase (or decrease) in the probability of disease in the exposed group, beyond the baseline risk. It tells you: “For every X exposed individuals, how many additional cases occur because of the exposure?”

Example: Smoking & Low Birth Weight

From a cohort of 5,000 women followed through pregnancy:

	Smoker	Non-smoker	Total
Low birth weight	40	331	371
Normal birth weight	311	4,318	4,629
Total	351	4,649	5,000

Risk in exposed: R_E+ = 40/351 = 0.114
Risk in non-exposed: R_E− = 331/4649 = 0.071
RD = 0.114 − 0.071 = 0.043

For every 100 women who smoked, approximately 4.3 had a low-birth-weight baby due to the fact that they smoked (assuming causal relationship).

Attributable Fraction in the Exposed (AF_e)

The AF_e (also called the attributable fraction among the exposed) expresses the proportion of disease in exposed individuals that is due to the exposure, assuming the relationship is causal (Walter, 1976). It can be viewed as the proportion of disease in the exposed group that would be avoided if the exposure were removed.

Attributable fraction in the exposed

AF_e = RD / p(D+|E+) = (RR − 1) / RR ≈ (OR − 1) / OR Eq 6.7

The attributable fraction in the exposed is the risk difference over the risk in the exposed; equivalently it can be written from the risk ratio, or approximated from the odds ratio.

AF_e ranges from 0 (where risk is equal, RR = 1) to 1 (where all disease in the exposed group is due to the exposure, RR = ∞). In case-control studies, AF_e can be approximated by substituting OR for RR.

Worked Example: AF_e for Smoking

From the smoking example above:

RR = 0.114 / 0.071 = 1.60
AF_e = (1.60 − 1) / 1.60 = 0.60 / 1.60 = 0.375 (37.5%)

Among women who smoked, 37.5% of the low-birth-weight cases were attributable to smoking. Alternatively: 0.043 / 0.114 = 0.377 ≈ 37.7% (slight rounding difference).

Vaccine Efficacy

Vaccine efficacy is a special form of AF_e, where “not vaccinated” is the exposure (factor positive) and “vaccinated” is the comparison group. If 20% of unvaccinated individuals develop disease versus 5% of vaccinated individuals:

Vaccine Efficacy Calculation

RD = 0.20 − 0.05 = 0.15
AF_e = 0.15 / 0.20 = 0.75 (75%)

The vaccine has prevented 75% of the cases of disease that would have occurred in the vaccinated group if the vaccine had not been used.

AF_e vs. Etiologic Fraction

The etiologic fraction is the proportion of cases in the exposed group for which exposure was a component of the sufficient cause (Rothman, 1976). While AF_e measures the excess fraction, the etiologic fraction can be higher because exposure may contribute to cases even when the baseline risk would have produced them eventually. In general, AF_e provides a lower bound for the etiologic fraction (Greenland & Robins, 1988).

Reflection

In a cohort study, a new environmental pollutant is found to have an RR of 3.0 for respiratory disease. The risk of respiratory disease in the non-exposed population is 2%. Calculate the RD and AF_e. If 1,000 people are exposed, how many additional cases would you expect due to the exposure? Discuss why RD and AF_e give different but complementary perspectives.

Model answerRD = baseline × (RR−1) = 0.02 × 2 = 0.04 (4 per 100 exposed). AFe = (RR−1)/RR = 2/3 ≈ 0.667, so 67% of disease in exposed people is attributable to the exposure. With 1,000 exposed individuals, expected cases at baseline = 1,000 × 0.02 = 20; with exposure = 1,000 × 0.06 = 60; additional cases due to exposure = 40. RD gives the public-health-relevant absolute number (40 extra cases per 1000 exposed); AFe gives the within-exposed fraction (67% of exposed cases would not have occurred without exposure). They are complementary: RD scales to population impact; AFe addresses individual-level questions like the legal standard "but for the exposure, would this person have gotten sick?"

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

RD (attributable risk) measures the absolute increase in risk due to exposure; the null value is 0.
AF_e = (RR − 1)/RR gives the proportion of disease in the exposed that is due to the exposure.
Vaccine efficacy is a special case of AF_e where the “exposure” is being unvaccinated.
AF_e provides a lower bound for the etiologic fraction.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 3

Population-Level Measures & Study Design

⏱ Estimated reading time: 12 minutes

Section 3 of 4

Population-Level Measures & Study Design

PAR and AF_p: scaling attributable risk to the whole population, and matching measures to study designs.

The two population measures

PAR and AF_p

Population Attributable Risk (PAR)

\[ \color{#0B7B6B}{PAR} = \color{#C2410C}{p(D+)_{\text{pop}}} - \color{#1D4ED8}{p(D+)_{E-}} \]

PAR population attributable riskp(D+)_pop risk in the whole populationp(D+)_E− risk in the unexposed

Population Attributable Fraction (Levin, 1953)

\[ \color{#0B7B6B}{AF_p} = \frac{\color{#C2410C}{PAR}}{\color{#1D4ED8}{p(D+)_{\text{pop}}}} = \frac{\color{#6D28D9}{p_e}(\color{#BE185D}{RR}-1)}{\color{#6D28D9}{p_e}(\color{#BE185D}{RR}-1)+1} \]

AF_p population attributable fractionPAR population attributable riskp(D+)_pop total population riskp_e exposure prevalenceRR risk ratio

where $p_e$ is the prevalence of exposure. Both association strength and exposure prevalence determine the result.

Worked example

Same cohort, population perspective

Overall risk: 371/5,000 = 0.074
Risk in non-smokers: 331/4,649 = 0.071
Exposure prevalence: 351/5,000 = 7%

\[ PAR = 0.074 - 0.071 = 0.003 \]

\[ AF_p = 0.003/0.074 = 4.1\% \]

The key insight

AF_e = 37.5% but AF_p = 4.1%. A strong association with a rare exposure produces a small population fraction.

A critical contrast

Strong and rare vs. weak and common

High RR, rare exposure

Example: intravenous drug use and HIV. Very strong association, but small AF_p because the exposure is uncommon in the full population.

Modest RR, common exposure

Example: poor diet and chronic disease. Weaker association, but large AF_p because most of the population is exposed.

Both the strength of association and the prevalence of exposure drive public-health impact.

Study design constraints

Which measures each design delivers

Carry forward

Two things into the next section

Population impact depends on exposure prevalence, not only on association strength. A common weak factor can outweigh a rare strong one.
Study design constrains measures: cohort gives everything; case-control gives OR directly; cross-sectional gives prevalence ratios.

A later section adds the last piece: standard errors and confidence intervals to quantify how precise these estimates really are.

Introduction and Overview

Earlier sections worked at the level of the exposed group. This section zooms out: even if an exposure powerfully causes disease in the exposed, its public-health importance also depends on how common it is in the population. The population attributable fraction (AF_p) captures this combination, and the section closes by mapping each measure of association onto the study design that produces it, tying back to earlier lessons.

Learning Objectives

Calculate and interpret the population attributable risk (PAR) and population attributable fraction (AF_p).
Explain how the prevalence of exposure affects population-level measures.
Identify which measures of association can be computed from each study design.

From the Exposed Group to the Entire Population

While RD and AF_e describe the effect of exposure among exposed individuals, public health decisions often require understanding the impact of an exposure on the entire population. Two key population-level measures address this:

Population Attributable Risk (PAR)

The PAR is the increase in overall population risk attributable to the exposure. It reflects both the strength of the association and the frequency of the exposure in the population, an idea originally developed by Levin (1953) for lung cancer and reviewed by Northridge (1995) as a link between causal inference and public-health action.

Population attributable risk

PAR = p(D+) − p(D+|E−) = (m₁/n) − (a₀/n₀) = RD × p(E+) Eq 6.8

The population attributable risk is the disease risk in the whole population minus the risk in the unexposed.

Population Attributable Fraction (AF_p)

The AF_p indicates the proportion of disease in the entire population that is attributable to the exposure, and which would be avoided if the exposure were removed (assuming causation and no confounding).

Population attributable fraction

AF_p = PAR / p(D+) = p(E+)(RR − 1) / [p(E+)(RR − 1) + 1] Eq 6.9

The population attributable fraction is the population attributable risk as a share of the total population risk; it rises with both the exposure prevalence and the risk ratio.

Why Exposure Prevalence Matters

A strong risk factor (high RR) that is rare in the population will have a small AF_p. A weaker risk factor (modest RR) that is common may have a large AF_p. For example, intravenous drug use has a very high RR for HIV, but if it is rare in the population, eliminating it would prevent few total cases. A modestly elevated risk factor like poor diet, affecting millions, may account for more total cases.

Worked Example: Smoking & Low Birth Weight (Population Level)

From the cohort of 5,000 women (351 smokers, 4,649 non-smokers):

Overall risk: p(D+) = 371/5000 = 0.074
Risk in non-exposed: 331/4649 = 0.071
PAR = 0.074 − 0.071 = 0.003
AF_p = 0.003 / 0.074 = 0.041 (4.1%)

Only 4.1% of all low-birth-weight babies in the population were attributable to smoking. The low AF_p is because very few women (351/5000 = 7%) smoked during the 2nd trimester, despite the relatively strong association (RR = 1.60).

Confounding and AF_p

If confounding is present, adjusted estimates of RR should be used. The AF_p can then be estimated using:

Population attributable fraction (adjusted)

AF_p = pd × (aRR − 1) / aRR Eq 6.10

The population attributable fraction can be estimated from the proportion of cases exposed and the confounding-adjusted risk ratio.

where pd is the proportion of cases exposed to the risk factor, and aRR is the adjusted risk ratio. For multiple exposure categories, a summation formula is used.

Study Design and Measures of Association

Not all measures can be computed from all study designs. The following table summarises which measures are available:

Measure	Cross-sectional	Cohort	Case-control
RR	✓	✓
IR		✓
OR	✓	✓	✓
RD	✓	✓
AF_e	✓	✓	✓^b
PAR	✓	✓^a
AF_p	✓	✓^a	✓^c

^a Requires independent estimate of p(D+) or p(E+). ^b Estimated using OR. ^c Requires OR and independent estimate of p(E+|D+).

Reflection

Consider two risk factors for a disease: Factor A has RR = 5.0 and affects 2% of the population. Factor B has RR = 1.5 and affects 40% of the population. Calculate AF_p for each factor using the formula AF_p = p(E+)(RR − 1) / [p(E+)(RR − 1) + 1]. Which factor would you prioritise in a public health intervention, and why?

Model answerFactor A: AFp = 0.02(4)/(0.02(4)+1) = 0.08/1.08 = 0.074 (7.4%). Factor B: AFp = 0.40(0.5)/(0.40(0.5)+1) = 0.20/1.20 = 0.167 (16.7%). Despite a much smaller per-person RR, Factor B prevents over twice as many population cases because it is so much more common. Prioritise B for a population-level public-health intervention, because the population attributable fraction is what determines burden averted. A nuance: B's lower per-person effect may mean the intervention is harder to deliver, less attractive to individuals (low perceived risk), and politically tougher; A's larger per-person effect may justify a targeted (high-risk) intervention even though its population impact is smaller. The best portfolio often combines both: high-risk strategies for A and population-wide strategies for B.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

PAR = overall population risk − risk in unexposed; it reflects both strength and prevalence of exposure.
AF_p = PAR / p(D+); it gives the proportion of all disease in the population attributable to the exposure.
A common risk factor with modest RR can have a larger AF_p than a rare factor with high RR.
Different study designs support different measures: only OR is available from case-control studies.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 4

Hypothesis Testing & Confidence Intervals

⏱ Estimated reading time: 15 minutes

Section 4 of 4

Hypothesis Testing & Confidence Intervals

Standard errors, P-values, confidence intervals, and a guide to choosing the right statistical test.

Standard error

Precision of a point estimate

Difference measures (RD, ID)

Variance computed directly from cell counts.

Var(RD)

\[ \frac{\color{#C2410C}{R_{E+}}(1-\color{#C2410C}{R_{E+}})}{\color{#C2410C}{n_1}} + \frac{\color{#1D4ED8}{R_{E-}}(1-\color{#1D4ED8}{R_{E-}})}{\color{#1D4ED8}{n_0}} \]

R_E+, n₁ risk and size, exposed groupR_E−, n₀ risk and size, unexposed group

Ratio measures (RR, IR, OR)

Variance on the log scale using Taylor series approximations.

Var(ln OR)

\[ \frac{1}{\color{#C2410C}{a_1}} + \frac{1}{\color{#BE185D}{b_1}} + \frac{1}{\color{#1D4ED8}{a_0}} + \frac{1}{\color{#6D28D9}{b_0}} \]

a₁ exposed casesb₁ exposed non-casesa₀ unexposed casesb₀ unexposed non-cases

Confidence intervals

Uncertainty around the point estimate

CI for difference measures (symmetric)

\[ \color{#0B7B6B}{\hat{\theta}} \pm \color{#1D4ED8}{z_{\alpha/2}} \cdot \color{#C2410C}{SE(\hat{\theta})} \]

θ̂ point estimatez_α/2 critical valueSE standard error

CI for ratio measures (log scale, then exponentiate)

\[ \exp\!\left[\color{#0B7B6B}{\ln\hat{\theta}} \pm \color{#1D4ED8}{z_{\alpha/2}} \cdot \color{#C2410C}{SE(\ln\hat{\theta})}\right] \]

ln θ̂ estimate on the log scalez_α/2 critical valueSE standard error of the log estimate

Significance criterion: the CI for OR, RR, or IR must exclude 1; the CI for RD or ID must exclude 0.

From the lesson's examples

Computed confidence intervals

Forest plot of the three ratio-measure confidence intervals from the lesson examples, on a log axis, all sitting to the right of the null value of 1. — The three ratio measures plotted on a log axis. Each interval lies entirely to the right of the null (1), so each is significant at the 5% level; the differing interval widths show how much more the data constrain some estimates than others.

Four test statistics

Tests for 2x2 tables and regression

Pearson chi-squared

For categorical outcomes with expected cell counts all above 5. Compares observed to expected counts under independence.

Fisher's exact

Small expected counts (below 5). Exact P-value, no large-sample assumption. Returns OR and its CI directly in R.

Wald test

Estimate divided by its standard error. Fast, but less accurate in small samples.

Likelihood ratio test

Generally preferred in regression settings; more accurate than Wald in small samples.

Choosing the right test

A decision framework

Carry forward

Point estimates need uncertainty bands

Standard error quantifies precision; computed differently for difference vs. ratio measures.
Confidence intervals show the range of plausible effects, not just a binary signal.
P-values are not the only verdict: a P-value of 0.049 and one of 0.051 carry almost identical information.

The final assessment is just below. Take the synthesis materials slowly before attempting the fifteen questions.

Introduction and Overview

Earlier sections produced point estimates of association: single numbers like RR = 2.5 or AF_p = 30%. Those numbers are useless without a quantification of how uncertain they are. This section closes the lesson by introducing the standard error, hypothesis tests, and confidence intervals, the same statistical-inference machinery you previewed in an earlier course, now applied directly to the measures you just learned to compute.

Learning Objectives

Explain the concepts of standard error, null hypothesis, and P-value.
Describe the four common test statistics for evaluating associations.
Interpret confidence intervals for measures of association.
Distinguish between statistical significance and the strength of association.

Standard Error

The standard error (SE) provides a measure of the precision of a point estimate, that is, how much uncertainty exists in the estimate. For difference measures (RD, ID), the variance can be computed directly:

Variance of the risk difference

var(RD) = [(a₁/n₁)(1 − a₁/n₁)] / n₁ + [(a₀/n₀)(1 − a₀/n₀)] / n₀ Eq 6.13

The variance of the risk difference adds a contribution from the exposed group and one from the unexposed group, each computed straight from the cell counts.

For ratio measures (RR, IR, OR), the variance is computed on the log scale using Taylor series approximations:

Variance of ln(RR)

var(ln RR) = 1/a₁ − 1/n₁ + 1/a₀ − 1/n₀ Eq 6.14

The variance of the log risk ratio is built from the diseased counts and group totals; working on the log scale keeps the ratio roughly symmetric.

Variance of ln(OR)

var(ln OR) = 1/a₁ + 1/a₀ + 1/b₁ + 1/b₀ Eq 6.15

The variance of the log odds ratio is the sum of the reciprocals of all four cells of the table.

Hypothesis Testing

Significance testing is based on specifying a null hypothesis about the population parameter. The null hypothesis typically states there is no association:

For difference measures (RD, ID): H₀: θ = 0
For ratio measures (RR, IR, OR): H₀: θ = 1

An alternative hypothesis can be 1-tailed or 2-tailed. In general, 2-tailed hypotheses are preferred because 1-tailed hypotheses are harder to justify.

Limitations of P-values

P-values are often dichotomised into “significant” or “non-significant” at α = 0.05, but this entails a huge loss of information (Wasserstein & Lazar, 2016; Greenland et al., 2016). A P-value of 0.049 and 0.051 lead to different conclusions despite being virtually identical. Always report the actual P-value and a confidence interval, which conveys both significance and precision.

Test Statistics

Click each card to explore:

Pearson χ²Click to explore

Exact TestsClick to explore

Wald StatisticClick to explore

Likelihood Ratio TestClick to explore

Confidence Intervals

A confidence interval (CI) reflects the level of uncertainty in a point estimate. A 95% CI means that if the study were repeated many times under identical conditions, 95% of the computed CIs would contain the true parameter value. This is a property of the procedure, not a probability statement about the parameter (Greenland et al., 2016).

Computing CIs

For difference measures, the CI is computed directly:

Confidence interval, difference measures

θ ± Z_α × √var(θ) Eq 6.19

A symmetric interval places the point estimate at the centre, then steps out by a critical value times the standard error.

For ratio measures, the CI is computed on the log scale and then exponentiated:

Confidence interval, ratio measures

θ × exp(± Z_α × √var(ln θ)) Eq 6.21

For a ratio measure the interval is built on the log scale, then converted back by exponentiating, so it ends up asymmetric around the point estimate.

The CI is symmetrical about lnθ but not about θ itself, which is why confidence intervals for ratio measures appear asymmetric.

Interpreting CIs for Measures of Association

For RR, IR, OR: if the 95% CI includes 1, the association is not statistically significant at α = 0.05.
For RD, ID: if the 95% CI includes 0, the association is not statistically significant.

However, this “surrogate significance test” is an under-use of the CI. The CI also shows the range of plausible effect sizes, which is far more informative than a binary significant/non-significant classification.

Example CIs from the Textbook

Measure	Point Estimate	95% CI
RD (smoking)	0.043	(0.009, 0.077)
RR (smoking)	1.601	(1.174, 2.182)
OR (smoking)	1.678	(1.154, 2.387)
ID (migraine)	0.338	(0.232, 0.443)
IR (migraine)	2.811	(1.983, 4.050)

None of the CIs for ratio measures include 1, and none for difference measures include 0, confirming statistical significance for all associations.

Choosing the Right Statistical Test

The Pearson χ², Fisher’s exact, Wald, and likelihood ratio tests above are the workhorses for 2×2 tables and the regression-based measures of association you will meet in a later course. But epidemiological analyses often require comparing means, proportions, or whole distributions across groups, sometimes paired, sometimes not, sometimes badly skewed. The right test depends on three structural questions about your data:

Outcome type: continuous (means / medians) or categorical (counts / proportions)?
Group structure: one group, two groups, or three or more? Independent or paired/matched?
Distributional assumptions: can you defend approximate normality (or appeal to a large-sample CLT argument), or do you need a non-parametric (rank-based) alternative?

Quick Decision Guide

Continuous, 1 group vs. a known value → one-sample t-test (or Wilcoxon signed-rank if non-normal).
Continuous, 2 independent groups → two-sample t-test (or Mann–Whitney U).
Continuous, 2 paired observations → paired t-test (or Wilcoxon signed-rank on differences).
Continuous, 3+ groups → one-way ANOVA (or Kruskal–Wallis).
Categorical, independent groups → Pearson χ² (or Fisher’s exact for small expected counts).
Categorical, paired/matched → McNemar’s test (on discordant pairs).
Two continuous variables, association → Pearson r (linear) or Spearman ρ (monotonic, rank-based).

Comparison Table: When Used, How Calculated, How Interpreted

The table below summarises the tests most commonly reported alongside measures of association. The R code that follows runs each on simple built-in datasets so you can copy, paste, and read the output without external data files.

Test	When to use	How calculated	How to interpret
One-sample t-test	Compare a single sample mean to a known/hypothesised value μ₀; continuous, approximately normal (or n large).	t = (x̄ − μ₀) / (s/√n); df = n−1.	If p < α (or 95% CI for the mean excludes μ₀), the population mean differs from μ₀.
Two-sample (independent) t-test	Compare means of 2 independent groups; continuous, approximately normal. Welch’s version (R’s default) does not assume equal variances.	t = (x̄₁ − x̄₂) / SE_diff; df = n₁+n₂−2 (Student) or Welch–Satterthwaite df (Welch).	p < α → the two group means differ. Always report the mean difference and its 95% CI.
Paired t-test	Two related observations on the same unit (before/after, twin pairs, matched cases-controls); continuous differences approximately normal.	Compute within-pair differences d_i; t = d̄ / (s_d/√n); df = n−1.	p < α → mean within-pair change ≠ 0. Reduces between-subject variability, and is usually more powerful than treating data as unpaired.
One-way ANOVA	Compare means across 3+ independent groups; continuous, approximately normal, roughly equal variances.	F = MS_between / MS_within; df₁ = k−1, df₂ = N−k.	Significant F → at least one group mean differs. Follow with post-hoc pairwise comparisons (Tukey HSD, Bonferroni) to see which.
Pearson χ²	Test independence of two categorical variables (any r×c table). Assumption: all expected counts > 1 and ≥ 80% > 5.	χ² = Σ(O−E)²/E; df = (r−1)(c−1). Expected = (row total × column total) / N.	p < α → row and column variables are associated. The test signals whether there is association; report a measure of association (OR, RR) for strength.
Fisher’s exact test	2×2 (or larger) tables with small expected counts where the χ² approximation is suspect.	Hypergeometric: enumerates every table with the same margins, sums probabilities of tables as extreme or more extreme than observed.	Exact p-value, with no large-sample assumption. With modern computing, fine to use even when χ² would also be valid.
McNemar’s test	Paired/matched binary outcomes (before/after on the same person; matched case-control on exposure; agreement of two diagnostic tests).	Look only at discordant pairs b and c: χ² = (b−c)² / (b+c); df = 1. Concordant pairs ignored.	p < α → the discordant pairs are unbalanced, so a real change/effect exists. The matched OR is b/c.
Wilcoxon signed-rank	Non-parametric alternative to one-sample / paired t-test. Use when differences are skewed, ordinal, or have outliers.	Rank \|d_i\|, attach signs, sum positive (or negative) ranks; compare to its null distribution (or large-sample z).	p < α → the median difference is non-zero (under symmetry). Robust to outliers.
Mann–Whitney U (Wilcoxon rank-sum)	Non-parametric alternative to two-sample t-test. Two independent groups, continuous or ordinal.	Pool all observations, rank them, sum ranks in one group; U = R₁ − n₁(n₁+1)/2.	p < α → the two distributions differ. If shapes are similar, this is a test of medians; otherwise it tests stochastic dominance.
Kruskal–Wallis	Non-parametric alternative to one-way ANOVA. 3+ independent groups; continuous or ordinal.	H from rank sums; approximately χ²_k−1 under H₀.	p < α → at least one group’s distribution differs. Follow with pairwise rank-sum tests (Dunn’s test or pairwise Wilcoxon with adjustment).
Pearson correlation (r)	Linear association between two continuous variables; assumes approximate bivariate normality, sensitive to outliers.	r = Σ(x−x̄)(y−ȳ) / √[Σ(x−x̄)² Σ(y−ȳ)²]; tested via t = r√[(n−2)/(1−r²)].	Range −1 to +1: sign = direction, magnitude = strength of linear association. Always inspect a scatterplot first.
Spearman ρ	Monotonic (not necessarily linear) association between two ordinal or non-normally distributed continuous variables.	Pearson r applied to the ranks of x and y.	Same −1 to +1 interpretation, but for monotonic association. Robust to outliers and non-linearity.

Pair every test with an effect estimate

None of the tests above are themselves measures of effect size. A statistically significant chi-square confirms that some association exists, but the magnitude must come from the OR, RR, mean difference, or correlation coefficient. Likewise, a non-significant t-test in a small study is not evidence of no effect. Always pair every test with the corresponding effect estimate and its 95% CI.

R Tutorial: Running These Tests

R has all of these tests in base (no packages required). The blocks below use the built-in datasets ToothGrowth, sleep, and iris. Copy any block straight into your console.

R Comparing means: t-tests and ANOVA

R’s t.test() defaults to Welch’s two-sample t-test (no equal-variance assumption). Use var.equal = TRUE for the classic Student’s version. The paired = TRUE flag flips it to a paired t.

# --- ONE-SAMPLE t-test: is mean tooth length different from 18 mm? ---
t.test(ToothGrowth$len, mu = 18)

# --- TWO-SAMPLE (independent) t-test: OJ vs. VC supplements ---
t.test(len ~ supp, data = ToothGrowth)                 # Welch (default)
t.test(len ~ supp, data = ToothGrowth, var.equal = TRUE)  # Student

# --- PAIRED t-test: sleep gain under two drugs, same 10 subjects ---
t.test(extra ~ group, data = sleep, paired = TRUE)

# --- ONE-WAY ANOVA: tooth length across 3 doses ---
ToothGrowth$dose <- factor(ToothGrowth$dose)
fit <- aov(len ~ dose, data = ToothGrowth)
summary(fit)

# Post-hoc pairwise comparisons (control familywise error)
TukeyHSD(fit)

Reading the output

For t.test(): report t, df, p-value, the mean difference, and the 95% CI. For aov(): summary() gives the F-statistic and overall p; TukeyHSD() tells you which specific group pairs differ.

Diagnostic before you trust it. Check normality of residuals (plot(fit, which = 2)) and equal variances (Levene’s test, or plot(fit, which = 1)). If either fails badly, switch to the non-parametric box below.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers you produced for the t-test and ANOVA block. Look at your console output before answering.

1. From the two-sample Welch t.test(len ~ supp, data = ToothGrowth), report the mean difference between OJ and VC and the 95% CI. Does the CI exclude 0?

Model answerThe Welch t-test on ToothGrowth gives a mean difference of about 3.7 (OJ minus VC), 95% CI roughly (−0.17, 7.57), and p ≈ 0.06. The CI just barely includes 0, so at α = 0.05 the difference is not statistically significant. The reading: there is a suggestive trend favouring OJ but the data do not exclude no difference; this is the textbook case for ‘suggestive, not significant’ that calls for either a larger study or analytic refinement (e.g., adjusting for dose).

2. From the paired t.test(extra ~ group, data = sleep, paired = TRUE), what mean difference in sleep gain did you observe and what was the p-value? Why does pairing the same 10 subjects give more power than treating the two columns as independent?

Model answerPaired t-test on the sleep dataset gives mean difference ≈ 1.58 (group 2 − group 1) with p ≈ 0.0028, clearly significant. Pairing the same 10 subjects gives more power because each subject's between-condition difference removes between-subject variability (the noisy heterogeneity in baseline sleep), so the SE of the difference is much smaller than the SE for two independent groups. Algebraically, Var(d) = Var(X₁) + Var(X₂) − 2Cov(X₁, X₂); when Cov is positive (as it is for repeated measures on the same person), the variance shrinks.

3. From summary(fit) and TukeyHSD(fit) on tooth length by dose, what does the overall ANOVA F-test say, and which specific dose-pairs in the Tukey output differ significantly?

Model answerThe overall ANOVA F-test is highly significant (F ≈ 67, p < 0.001), so tooth length varies with dose. Tukey HSD pairwise comparisons show all three dose-pair contrasts significant: 1.0−0.5, 2.0−0.5, and 2.0−1.0 all have adjusted p < 0.001. The pattern is monotone dose-response: each step up in vitamin C increases tooth length. The Tukey adjustment controls family-wise error rate across the multiple pairwise comparisons, so the joint conclusion is statistically defensible.

Saved.

R Comparing categorical outcomes: χ², Fisher’s exact, McNemar’s

For 2×2 tables build a matrix or table. R applies continuity correction by default for chisq.test() and mcnemar.test() on 2×2 tables; turn it off with correct = FALSE if you prefer the uncorrected statistic.

# --- 2x2 table: exposure x disease ---
exposure <- matrix(c(30, 70, 10, 90), nrow = 2, byrow = TRUE,
                   dimnames = list(exposure = c("E+", "E-"),
                                   disease  = c("D+", "D-")))
exposure

chisq.test(exposure)                       # Pearson chi-square (with Yates' correction)
chisq.test(exposure, correct = FALSE)       # without correction
fisher.test(exposure)                      # exact test (preferred for small cells)

# --- McNemar's test: paired/matched binary data ---
# e.g. agreement between two diagnostic tests on the same 100 patients
paired <- matrix(c(40, 10,
                   25, 25), nrow = 2, byrow = TRUE,
                 dimnames = list(test1 = c("+", "-"),
                                 test2 = c("+", "-")))
mcnemar.test(paired)                       # with continuity correction
mcnemar.test(paired, correct = FALSE)       # without

Reading the output

chisq.test() returns χ², df, and p; check $expected for sparse cells. fisher.test() additionally returns the OR and its 95% CI, a free measure of association on top of the test. mcnemar.test() tests only the discordant pairs (10 vs 25 here); the matched OR is 10/25 = 0.4.

When to switch to Fisher. If any expected cell count drops below 5 (or any below 1), the χ² large-sample approximation is unreliable. chisq.test() will warn you; fisher.test() sidesteps the issue entirely.

R Reflect on what you just ran

Use the questions below to interpret your 2x2 table analyses. Look at the exposure matrix, the test output, and the McNemar output before answering.

1. From chisq.test(exposure) on the 2x2 table (30/70/10/90), report the chi-square statistic and p-value. By hand or by inspection, what is the odds ratio for this table, and what does the test conclude about independence?

Model answerchisq.test on the 2×2 table (30/70/10/90) gives χ² ≈ 11.3 with R's default Yates continuity correction (p ≈ 0.0008); without it, chisq.test(exposure, correct = FALSE) gives χ² = 12.5 (p ≈ 0.0004). The cross-product OR = (30×90)/(70×10) = 2700/700 = 3.86. The test conclusively rejects independence: exposure and outcome are associated, with exposed individuals having about 3.9-fold higher odds of disease.

2. From fisher.test(exposure), what OR and 95% CI did it return? How does this OR compare with the simple cross-product OR you can compute from the cells of the table?

Model answerFisher's exact returns OR ≈ 3.83 with 95% CI roughly (1.78, 8.78). The point estimate is essentially the same as the cross-product OR (3.86); Fisher uses a conditional MLE that differs slightly in small samples but matches the simple formula closely when expected cell counts are not too small. Fisher's CI is the right one to report whenever any expected cell is < 5 (assumption-violation territory for chi-square).

3. McNemar's test only uses the off-diagonal (discordant) pairs - 10 and 25 in this example. What does the matched OR of 10/25 = 0.4 tell you about the two diagnostic tests, and why would a chi-square on the same table give a misleading answer?

Model answerMatched OR = 10/25 = 0.40, so the new diagnostic test (call it test B) detected disease in 10 patients that the old test (A) missed, while A detected 25 that B missed. The ratio less than 1 says test A is more often the ‘extra’ detector. McNemar uses only discordant pairs because concordant pairs (both tests agree) carry no information about which test is more sensitive. A regular chi-square on the same table would treat the four cells as if from independent samples, ignore the pairing, and inflate the apparent independence (typically making the discordance look smaller than it is), a serious error in any paired-design analysis.

Saved.

R Non-parametric alternatives + correlation

The same wilcox.test() function covers both the Wilcoxon signed-rank (one-sample / paired) and Mann–Whitney U (two-sample) tests; the paired flag and the formula form decide which.

# --- WILCOXON SIGNED-RANK (one-sample / paired) ---
wilcox.test(ToothGrowth$len, mu = 18)             # one-sample
wilcox.test(extra ~ group, data = sleep, paired = TRUE) # paired

# --- MANN-WHITNEY U / WILCOXON RANK-SUM (two-sample) ---
wilcox.test(len ~ supp, data = ToothGrowth)

# --- KRUSKAL-WALLIS (3+ groups, non-parametric ANOVA) ---
kruskal.test(len ~ dose, data = ToothGrowth)

# Post-hoc pairwise (Bonferroni-adjusted)
pairwise.wilcox.test(ToothGrowth$len, ToothGrowth$dose,
                     p.adjust.method = "bonferroni")

# --- CORRELATION: Pearson (linear) and Spearman (monotonic) ---
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson")
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "spearman")

Reading the output

Rank-based tests return a test statistic (W, U, or H) and a p-value. They do not give a CI for a mean difference; if you need one, use a Hodges–Lehmann estimator (conf.int = TRUE on wilcox.test()). cor.test() returns the correlation, its 95% CI (Pearson only by default), and the p-value.

Parametric or non-parametric? With n > ~30 per group the t-test and ANOVA are robust to mild non-normality (CLT), so reach for non-parametric tests mainly for skewed small samples, ordinal outcomes, or when outliers dominate. Spearman is also a sensible default whenever a scatterplot looks monotonic but not linear.

R Reflect on what you just ran

Use the questions below to interpret your non-parametric and correlation output. Look at your console results before answering.

1. Compare the p-values from t.test(len ~ supp, data = ToothGrowth) (earlier box) and wilcox.test(len ~ supp, data = ToothGrowth). Are they similar or different? What does that tell you about whether the t-test was robust here?

Model answerThe t-test and Wilcoxon give nearly identical p-values (both around 0.06) on ToothGrowth supp. That agreement is reassuring: it means the t-test was robust here despite using means, because either the distributions are reasonably symmetric or the sample is large enough for the central limit theorem to apply. When parametric and non-parametric tests disagree (Wilcoxon p much smaller than t-test p), it typically signals heavy skew or outliers; the rank-based test is then preferred.

2. From cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson"), report the correlation coefficient r and its 95% CI. How does the Spearman correlation compare, and what would a large discrepancy suggest about the relationship's shape?

Model answerPearson r ≈ 0.872 with 95% CI (0.827, 0.906) for sepal vs. petal length in iris. Spearman r will be similar (~0.88). When Pearson and Spearman are nearly equal, the relationship is approximately linear and well-described by the Pearson correlation; a large discrepancy (e.g., Pearson 0.5 vs. Spearman 0.85) would signal a monotone non-linear relationship (Pearson misses the curve) or that outliers are driving the linear estimate.

3. From kruskal.test(len ~ dose, data = ToothGrowth), what was the H statistic and p-value? Which pairs in the pairwise.wilcox.test() with Bonferroni adjustment remained significant? Why does Bonferroni make this conclusion more conservative?

Model answerKruskal-Wallis H ≈ 40.7 with p < 0.001 on ToothGrowth dose, soundly rejecting equal distributions across dose levels. Pairwise Wilcoxon with Bonferroni correction shows all three pairwise comparisons significant after adjustment (p < 0.05). Bonferroni divides α by the number of comparisons (here 3), so a raw p must fall below 0.05/3 = 0.0167 to count as significant; this controls family-wise error rate at the cost of statistical power. Conservative is the right word: Bonferroni almost certainly under-rejects when many tests are correlated, but it guarantees the joint Type-I error stays below 5%.

Saved.

Reflection

A study reports an OR of 1.45 with a 95% CI of (0.92, 2.28). A second study reports an OR of 1.15 with a 95% CI of (1.02, 1.30). Compare these two findings in terms of: (a) strength of association, (b) statistical significance, and (c) precision. Which finding might be more concerning from a public health perspective, and why?

Model answerStrength: Study 1 reports a larger point estimate (OR 1.45 vs. 1.15), suggesting a stronger per-person association if both are correct. Statistical significance: Study 1's CI (0.92, 2.28) includes 1, so it is non-significant; Study 2's CI (1.02, 1.30) excludes 1, so it is significant. Precision: Study 2 is much more precise, with CI width 0.28 vs. Study 1's 1.36, reflecting larger sample size and/or better measurement. Public-health concern: Study 2 is more concerning for action because the effect, though smaller, is reliably non-zero and applies (likely) to a much larger population; small effects in big populations produce many cases. Study 1's effect, if real, is larger per person but the data don't yet exclude no effect. The decision depends on the absolute baseline risk and population size, but precision and population scope usually win over magnitude alone.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Standard errors quantify precision; they are computed differently for difference vs. ratio measures.
Hypothesis testing uses a null hypothesis (no effect) and a test statistic to generate a P-value.
Four common test statistics for measures of association: Pearson χ², exact tests, Wald, and likelihood ratio tests.
Beyond 2×2 tables, choose tests by outcome type, group structure, and distribution: t-tests / ANOVA for normal continuous data, χ² / Fisher / McNemar for categorical, and Wilcoxon / Mann–Whitney / Kruskal–Wallis / Spearman as non-parametric alternatives.
Confidence intervals are more informative than P-values: they show the range of plausible effect sizes.
For ratio measures, the CI containing 1 (or 0 for differences) indicates non-significance at the corresponding α level.

✦ Pass the knowledge check with 100% and complete the reflection to continue

HSCI 341, Lesson 7

Fundamental Epidemiological Concepts and Approaches

Measures ofAssociation

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction & Ratio Measures of Association

Comparison Is the Core Operation

Ratio Measures of Association

Why ratios, and why not just P-values

Measure of association

P-value

RR, IR, and OR from the 2x2 table

Brazil water cistern study

Interpretation

Migraine incidence rates

Interpretation

Why OR is the case-control measure

Three anchors from this section

Introduction and Overview

Learning Objectives

Why Measure Association?

Strength vs. Significance

Data Layout

Three Ratio Measures of Association

Worked Example: Brazil Water Cistern Study

Diarrhea & Water Cistern Presence

Worked Example: Migraine Incidence Rates

Gender and Migraine (Ages 30–40)

Relationships Among RR, IR, and OR

⚖ Interactive: Risk Ratio vs. Odds Ratio

2×2 table (click cells to edit)

RR vs. OR as outcome prevalence climbs

Key Takeaways

Measures of Effect in the Exposed Group

Measures of Effect in the Exposed Group

RD: the absolute increase in risk

Smoking and low birth weight

Why RD matters

AFe: proportion of exposed cases due to exposure

Vaccine efficacy as AFe

Plain language

Relative vs. absolute measures

Ratio measures (section 1)

Difference measures (section 2)

Introduction and Overview

Learning Objectives

Ratio vs. Difference Measures

Why Both Matter

Risk Difference (RD): Attributable Risk

Interpretation of Difference Measures

Example: Smoking & Low Birth Weight

Attributable Fraction in the Exposed (AFe)

Worked Example: AFe for Smoking

Vaccine Efficacy

Vaccine Efficacy Calculation

AFe vs. Etiologic Fraction

Reflection

Key Takeaways

Population-Level Measures & Study Design

Population-Level Measures & Study Design

PAR and AFp

Same cohort, population perspective

The key insight

Strong and rare vs. weak and common

High RR, rare exposure

Modest RR, common exposure

Which measures each design delivers

Two things into the next section

Introduction and Overview

Learning Objectives

From the Exposed Group to the Entire Population

Population Attributable Risk (PAR)

Population Attributable Fraction (AFp)

Why Exposure Prevalence Matters

Worked Example: Smoking & Low Birth Weight (Population Level)

Confounding and AFp

Study Design and Measures of Association

Reflection

Key Takeaways

Hypothesis Testing & Confidence Intervals

Measures of
Association

AF_e: proportion of exposed cases due to exposure

Vaccine efficacy as AF_e

Attributable Fraction in the Exposed (AF_e)

Worked Example: AF_e for Smoking

AF_e vs. Etiologic Fraction

PAR and AF_p

Population Attributable Fraction (AF_p)

Confounding and AF_p