HSCI 341 — Lesson 5

Measures of
Disease Frequency

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Differentiate among counts, proportions, odds, and rates
  • Describe the difference between incidence and prevalence
  • Distinguish between incidence risk and incidence rate
  • Explain cause-specific measures, proportional morbidity/mortality, and case fatality rates
  • Calculate and interpret burden-of-disease metrics (YLLs, YLDs, and DALYs)
  • Select appropriate measures of disease frequency for specific circumstances
  • Compute measures and calculate confidence intervals when provided with data

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Foundational Concepts
Count A simple frequency — the number of cases or events. Counts alone are uninterpretable without a denominator.
Proportion A fraction whose numerator is included in the denominator (e.g., the proportion of a population with disease). Bounded between 0 and 1.
Ratio A division of two quantities where the numerator is not necessarily contained in the denominator (e.g., sex ratio).
Odds The ratio of the probability of an event to the probability of its complement: p / (1 − p). Used in case-control studies and logistic regression.
Rate A measure of how quickly events occur, with time in the denominator (e.g., cases per 1,000 person-years). Has units of inverse time.
Person-Time The sum of time each person was at risk for the outcome — e.g., person-years. The denominator for incidence rates.
Measures of Disease Frequency
Prevalence The proportion of a population with the disease at a specified time. Reflects both incidence and duration; a snapshot rather than a flow measure.
Point Prevalence Prevalence measured at a single point in time (a snapshot). The standard prevalence measure in cross-sectional surveys.
Period Prevalence The proportion of a population with the disease at any time during a specified period (e.g., 12-month prevalence).
Incidence The occurrence of new cases of disease in a population over a defined time. The flow measure that, together with duration, drives prevalence.
Incidence Proportion (Cumulative Incidence, Risk) The proportion of an initially disease-free population that develops the outcome over a defined follow-up period. Bounded between 0 and 1; requires the time period to be specified.
Incidence Rate (Incidence Density) New cases divided by total person-time at risk. Has units of inverse time and is appropriate when follow-up time varies across individuals.
Attack Rate Cumulative incidence in an outbreak setting over a short, defined period — commonly used for foodborne or infectious-disease outbreaks.
Secondary Attack Rate The proportion of contacts of a primary case who develop disease — a measure of transmissibility within close-contact settings (e.g., households).
Case-Fatality Rate (CFR) The proportion of diagnosed cases that die from the disease within a specified time. Despite the name, it is a proportion, not a rate.
Mortality Rate Deaths per unit person-time in a specified population (e.g., per 100,000 person-years). May be all-cause or cause-specific.
Proportional Mortality The proportion of deaths in a population that are attributable to a specific cause. Useful for ranking causes but does not measure risk.
Burden & Adjustment
Years of Life Lost (YLL) The years of life lost due to premature mortality, calculated against a standard life expectancy.
Years Lived with Disability (YLD) Prevalence-weighted years lived in less-than-full health, where disability weights reflect severity.
Disability-Adjusted Life Year (DALY) A summary burden measure equal to YLL + YLD — one DALY represents one year of healthy life lost.
Standardisation (Direct & Indirect) Methods to remove the effect of confounding by age (or other variables) when comparing rates across populations. Direct standardisation applies a standard population’s structure to observed rates; indirect applies a standard set of rates to the observed structure (yielding the SMR).
Confidence Interval A range of values constructed so that, on repeated sampling, a specified proportion (often 95%) of such intervals would contain the true population parameter.
No matching entries. Try a different search term.
Section 1

Introduction & Foundational Concepts

⏱ Estimated reading time: 12 minutes

Introduction and Overview

Lessons 1–4 covered the conceptual scaffolding (causal inference), surveillance, the population side of design (sampling), and the measurement side (questionnaires). Lesson 5 turns the data those instruments produce into the standard quantitative outputs of epidemiology — the measure definitions of Rothman (2012) and Porta (2014) are the canonical references used throughout. The four content sections proceed from the basic vocabulary — counts, proportions, odds, and rates (Section 1) — through the two faces of incidence: risk and rate (Section 2), to prevalence, mortality, and burden-of-disease measures like DALYs (Section 3), and finally to the standardisation techniques and confidence intervals that let you compare populations fairly (Section 4). Every measure of association you'll meet in Lesson 7 starts from one of these frequency measures.

Learning Objectives

  • Explain why measuring disease frequency is fundamental to epidemiology.
  • Distinguish between counts, proportions, odds, and rates.
  • Identify key factors (study period, risk period) that influence the choice of frequency measure.

Why Measure Disease Frequency?

Measurement of disease (or event) frequency is the foundation of many epidemiological activities, including routine surveillance, observational research, and outbreak investigations (Porta, 2014). In observational studies, measuring the frequency of a disease and linking it to an exposure are the first steps toward inferring causation (Rothman, 2012).

Morbidity and mortality are the two main categories of events for which frequency measures are calculated. However, the same approaches apply to other events of interest such as vaccination, hospital admission, or giving birth.

Stratification Matters

Because both morbidity and mortality are strongly associated with individual attributes, and because different diseases have different impacts, we often calculate frequency measures for specific host attributes (e.g., age, sex, race) and for specific diseases. This stratification allows us to detect patterns that would be hidden in overall population-level data.

Study Period and Risk Period

When selecting a measure of disease frequency, two time-related concepts are critical:

Study Period

The study period is the time interval during which study subjects are observed for the outcome of interest. It is usually measured in calendar time, but sometimes represents a point in time. The study period could also be defined by a specific event — for example, “at birth” or “congenital defects during 2008–2010.”

Risk Period

The risk period is the time during which an individual could develop the disease. For some conditions, the risk period is very short (e.g., post-partum eclampsia — usually less than 2 days), while for others it is essentially lifelong (e.g., migraine headaches).

Diseases with a short risk period relative to the study period are good candidates for risk measures. Diseases with long risk periods are better suited to rate-based measures.

Counts, Proportions, Odds, and Rates

Before examining specific measures of disease frequency, it is essential to understand the four mathematical forms these measures can take:

Click each card to learn more:

CountClick to learn more
ProportionClick to learn more
OddsClick to learn more
RateClick to learn more

Comparison Table

MeasureNumeratorDenominatorRangeUnits
CountCasesNone0 to ∞None
ProportionSubset of denominatorTotal population0 to 1Dimensionless
OddsCasesNon-cases0 to ∞Dimensionless
RateCasesPerson-time at risk0 to ∞Per person-time

Common Terminology Confusion

The term “rate” is often used loosely to refer to all types of measures of disease frequency. Similarly, people commonly say someone has a high “chance” or “risk” of disease when the underlying measure might actually be a rate. Being precise about these terms helps avoid misinterpretation of research findings.

Key Takeaways

  • Disease frequency measurement underpins surveillance, research, and outbreak investigation.
  • The study period and risk period influence which type of measure is most appropriate.
  • Counts, proportions, odds, and rates are the four mathematical forms; each has distinct properties.
  • A rate strictly requires person-time in the denominator, though the term is often used loosely.
Knowledge Check — Section 1

1. What distinguishes a proportion from odds?

In a proportion, the numerator (e.g., cases) is included within the denominator (e.g., total population). In odds, the denominator excludes the numerator (e.g., cases divided by non-cases).

2. A disease with a long risk period relative to the study period is best measured using:

When the risk period is longer than the study period, rate-based measures (using person-time denominators) are more appropriate because they account for varying amounts of time at risk.

3. Why are simple counts of limited use in epidemiologic research?

Without knowing the population size, a count of cases tells us nothing about the relative burden of disease. Fifty cases in a population of 100 is very different from 50 cases in a population of 1,000,000.

✦ Pass the knowledge check with 100% to continue

Section 2

Incidence: Risk & Rate

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Section 1 set up the basic vocabulary — what counts, proportions, odds, and rates are and how they differ. Section 2 takes the most consequential of these — incidence — and unpacks its two faces: risk (a probability) and rate (events per unit of person-time). The distinction is central, because risk-based and rate-based designs (which you met in HSCI 230 Lesson 5) lead to different measures of association in Lesson 7 of this course.

Learning Objectives

  • Define incidence and explain the four ways to express it.
  • Distinguish between incidence risk and incidence rate.
  • Compare closed and open populations and how they affect measurement.
  • Calculate incidence risk and incidence rate from provided data.

What Is Incidence?

Incidence relates to the number of new events (e.g., new cases of a disease) in a defined population within a specific period. Because it deals with new cases, incidence is used to identify factors associated with becoming ill. A clear case definition is essential, along with a surveillance programme capable of identifying all cases.

First Cases vs. All Cases

Although incidence deals with “new cases,” this does not necessarily mean only the first-ever case within an individual. For some conditions (e.g., migraine headaches, recurrent infections), multiple episodes can occur. Researchers must decide whether to count only first cases or all new episodes, and clearly state this in their methodology.

Four Ways to Express Incidence

Incident Times

Incident times are the specific times at which cases occur, measured as elapsed time since a reference event (e.g., days after exposure to a toxin, or days after parturition). Incident times form the basis of survival analysis and are discussed at length in more advanced topics.

Incidence Count

The incidence count is the simple count of new cases observed. It is often used when a disease did not previously exist or was very rare in a population. Incidence counts are sometimes expressed as absolute rates, relating the number of cases to the time period of observation (e.g., “3 cases per year”). They have limited value unless combined with population-at-risk data.

Incidence Risk (R)

Incidence risk (R) is the probability that an individual will contract or develop a disease in a defined time period. It is a proportion (dimensionless, ranges from 0 to 1) and the time period must be specified. Only the first occurrence in the period of interest counts. Sometimes called cumulative incidence.

R is used in studies focused on individual-level predictions (e.g., the probability of breast cancer recurrence within the next year is 14%).

Incidence Rate (I)

Incidence rate (I) is the number of new cases per unit of person-time during a given time period. It has units of 1/time and is positive without an upper bound. Also called incidence density.

I is used to determine what factors are related to diseases and the effects of those diseases. It is the preferred measure for open populations where individuals enter and leave over time.

Calculating Risk

R = number of newly affected individuals in a defined time period / population at risk Eq 5.1

Population at Risk: Closed vs. Open

Estimating the population at risk can be challenging. The key distinction is whether the population is closed or open.

Closed Population

A closed population has no additions and few to no losses during the study period. Examples include residents of a nursing home followed for a year, or women followed for one week post-partum.

Only disease-free individuals at the start are considered at risk. People lost to follow-up are called withdrawals, and the simplest correction is to subtract half the number of withdrawals from the population at risk (assuming they leave, on average, halfway through the study).

Open Population

An open population has individuals entering and leaving throughout the study period. For example, women served by a cancer centre who had mastectomies form an open population as new patients continually arrive.

An open population is considered stable (stationary) if the rate of additions and withdrawals, and the distribution of host attributes, remain relatively constant over time.

Risk cannot be computed directly from an open population. Instead, it can be estimated from the incidence rate (I) using the relationship between risk and rate.

Calculating Incidence Rates

I = number of new cases in a defined time period / number of person-time units at risk during the time period Eq 5.2

A person-time unit is one person observed for a defined period (e.g., one person-month, one person-year). After an individual contracts the disease of interest, they are no longer at risk and no longer contribute person-time to the denominator (for first-case analyses).

Worked Example: Exact Incidence Rate

Assume 4 previously healthy people were observed for exactly 1 month (30 days):

  • Person 1: not sick at all → 1.00 person-months at risk
  • Person 2: sick on day 10 → 0.33 person-months at risk
  • Person 3: sick on day 20 → 0.67 person-months at risk
  • Person 4: moved away on day 15 (lost to follow-up) → 0.50 person-months at risk

Total person-months at risk = 2.50

Total new cases = 2

I = 2 / 2.50 = 0.80 cases per person-month

Approximate Calculation

When exact person-time data is unavailable, an approximate formula can be used:

I = cases / (start − ½sick − ½withdrawn + ½added) × time Eq 5.3

This assumes that, on average, events (illness, withdrawal, addition) occur at the midpoint of the study period.

Relationship Between Risk and Rate

Risk and rate are mathematically related. For a closed population:

R = I × Δt   (when I×Δt is small)

More precisely, the relationship is: R = 1 − eI×Δt

This exponential formula accounts for the fact that as people become diseased, they leave the at-risk pool. When the product I×Δt is small (less than 0.1), the simpler linear approximation works well. The exponential relationship comes from the same constant-hazard assumption that underlies the Kaplan–Meier survival estimator (Kaplan & Meier, 1958; see also Wikipedia).

R Activity — Person-time, incidence rate, rate-to-risk, and prevalence

The companion R script r-activities/HSCI_341_Lesson_5_Measures_of_Disease_Frequency.R walks through three calculations on a 4-person mini-cohort: total person-time, the incidence rate I, and the rate-to-risk conversion (linear vs. exact). Then it uses the steady-state identity P = ID/(ID+1) to compute prevalence from incidence and duration, and sweeps the duration grid to plot the curve.

# PART A -- person-time and incidence rate
person  <- c("P1", "P2", "P3", "P4")
days    <- c(30,    10,    20,    15)
event   <- c(0,     1,     1,     0)

person_months <- days / 30
total_pm    <- sum(person_months)
total_cases <- sum(event)

I <- total_cases / total_pm        # incidence rate per person-month
I

# PART B -- rate-to-risk over a 1-month window
delta_t  <- 1
R_linear <- I * delta_t                  # approximation
R_exact  <- 1 - exp(-I * delta_t)        # proper formula
round(c(I = I, R_linear = R_linear, R_exact = R_exact), 3)

# PART C -- prevalence from I and D, then plot
prev_from_id <- function(I, D) (I * D) / (I * D + 1)

prev_from_id(I = 0.3, D = 0.058)          # influenza: ~0.017

D_grid <- seq(1, 365, by = 1) / 365
P_grid <- prev_from_id(I = 0.3, D = D_grid)
plot(D_grid * 365, P_grid, type = "l", lwd = 2,
     xlab = "Mean disease duration (days)",
     ylab = "Steady-state prevalence")

What you should be able to do after this activity: aggregate days at risk into person-months, compute I, convert I×Δt to a risk using both the linear and the exact formula, and use P = ID/(ID+1) to predict prevalence from incidence and duration.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers and plot. Look at your console output and the plotting window before answering.

1. What total person-months did the 4-person cohort contribute, and what was the resulting incidence rate I (per person-month)?

Model answerSumming the four person-time vectors gives about 38 person-months total (e.g., 12 + 6 + 10 + 10 = 38 depending on the simulation's exact entry/exit dates). With one new case observed in that window, the incidence rate I = 1/38 ≈ 0.026 per person-month, or about 0.32 per person-year. Reading: 26 events expected per 1000 person-months of observation in this small cohort.

2. Compare R_linear and R_exact from the rounded vector. Are they close at Δt = 1 month? At what value of IΔt would the linear approximation start to mislead by more than ~10%?

Model answerAt Δt = 1 month, R_linear (= IΔt = 0.026) and R_exact (= 1 − exp(−IΔt) = 0.0257) differ by less than 1.5%. The linear approximation drifts above ~10% error when IΔt > 0.20 (an incidence rate of 20 per 100 person-units over the window) — at that point the constant-hazard exponential bends measurably below the straight line. Practical rule: linear is fine for small rates over short intervals, but for cumulative risks over years use the exponential form.

3. Look at the plot of P_grid vs. duration with I = 0.3/year. Does prevalence rise linearly or saturate, and at roughly what duration does the curve cross P = 0.50? What does prev_from_id(0.05, 20) give you, and why is it larger than the influenza number despite the smaller I?

Model answerPrevalence rises but saturates as duration grows — you cannot exceed 100% prevalence no matter how chronic the condition. With I = 0.3/year the curve crosses P = 0.50 at roughly D = 3.3 years (P = ID / (1 + ID), so P = 0.5 when ID = 1). prev_from_id(0.05, 20) gives P = 0.05*20 / (1 + 0.05*20) = 1.0 / 2.0 = 0.50 — a chronic condition with low incidence (5% per year) but long average duration (20 years) reaches 50% prevalence, much higher than seasonal influenza despite the lower I. This is why HIV, type-2 diabetes, and depression have high prevalences while measles does not.
Saved.

Reflection

A nursing home with 80 residents experiences 12 cases of influenza over a 3-month winter period. Five residents were transferred out during the study (assume at the midpoint), and 3 new residents were admitted (also at the midpoint). Estimate the incidence rate using the approximate formula. Would incidence risk or incidence rate be more appropriate here, and why?

Model answerPerson-time: start = 80 residents × 3 months; net change at midpoint = −5 transfers + 3 admissions = −2 (assume each of the 8 movements contributes ~1.5 person-months). Approximate person-time = (80×3) − (2×1.5) = 240 − 3 = 237 person-months. Incidence rate ≈ 12/237 = 0.051 cases per person-month, or 51 per 1000 person-months. Incidence rate is more appropriate than incidence risk here because the cohort is open (residents move in and out) and exposure time differs across individuals — rate handles the unequal denominators that risk cannot.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Incidence measures new cases; it can be expressed as incident times, counts, risk (R), or rate (I).
  • Risk is a proportion (0 to 1) measured in closed populations; rate uses person-time denominators and works for open populations.
  • Withdrawals are handled by subtracting half their number from the population at risk.
  • Risk and rate are related: R = 1 − eIΔt; for small values, RI × Δt.
Knowledge Check — Section 2

1. Incidence risk (R) is best described as:

Incidence risk is a proportion (probability) — dimensionless, ranging from 0 to 1 — that requires a defined time period. It is sometimes called cumulative incidence.

2. In a closed population study, how are withdrawals typically handled when calculating risk?

Assuming withdrawals leave on average halfway through the study period, we subtract half their number from the denominator (population at risk).

3. What units does incidence rate (I) have?

Incidence rate has units of 1/person-time, which simplifies to 1/time. It is positive without an upper bound.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 3

Prevalence, Mortality & Other Frequency Measures

⏱ Estimated reading time: 18 minutes

Introduction and Overview

Section 2 covered incidence — the rate at which new cases arise. Section 3 turns to the complementary set of measures that describe disease as it currently exists in a population: prevalence (current cases), mortality (deaths), and burden-of-disease summaries like DALYs that combine premature mortality with disability. Each tells you something different about a population's health.

Learning Objectives

  • Define prevalence and explain how it differs from incidence.
  • Describe the relationship between prevalence, incidence, and disease duration.
  • Explain mortality statistics and cause-specific measures.
  • Calculate Years of Life Lost (YLLs), Years Lived with Disability (YLDs), and Disability-Adjusted Life Years (DALYs), and explain why they add value beyond mortality and morbidity measures alone.
  • Distinguish attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates.

Prevalence

Unlike incidence, which counts new cases, prevalence measures the number of existing cases of disease at a specific point in time (or over a period). The prevalence count is the number of individuals in a population who have an attribute or disease at a particular time.

P = cases of disease at a point in time / individuals in the population at the same point in time Eq 5.8

For example, if 75 athletes are tested for performance-enhancing drug use and 3 test positive, then P = 3/75 = 0.04 (4%). Operational decisions about case definitions and time windows matter more than this simple formula suggests (Spronk et al., 2019).

Relationship Between Prevalence and Incidence

In a stable population where the incidence rate (I) remains constant, prevalence (P), incidence, and mean disease duration (D) are related:

P = (I × D) / (I × D + 1) Eq 5.9

Worked Example

If the incidence rate of influenza in an urban population is 0.3/person-year (30 cases per 100 people per year) and the mean duration of infection is 3 weeks (0.058 years), then:

P = (0.3 × 0.058) / (0.3 × 0.058 + 1) = 0.0174 / 1.0174 = 0.017 (1.7%)

So on any given day, we would expect about 1.7% of the population to have the flu.

🚰 Interactive: The Sink Analogy — Incidence, Duration & Prevalence

Disease in a population is like water in a sink. The tap is incidence (new cases flowing in). The two drains are recovery and mortality (ways people exit). The water level is prevalence. Try the presets, then adjust the sliders and watch the level settle to its new equilibrium.

Incidence (tap) new cases flowing in 0% 10% 20% 30% Recovery Mortality (get well drain) (death drain) P = 0%
How many new cases occur per 1,000 susceptible people each year.
Higher = faster recovery (left drain opens wider).
Higher = faster death from disease (right drain opens wider).
Prevalence
Mean Duration
Steady state: P = I × D / (I × D + 1)   where   D = 1 / (r + m)
Presets:
Pick a preset or move the sliders to see how the inflow and drains determine the water level.

What the Sink Teaches

Two diseases can have very different prevalences for entirely different reasons. Diabetes has a low incidence but very long duration — the sink fills slowly but barely drains, so prevalence is high. Pancreatic cancer has a low incidence and very short duration (high mortality) — the drain is wide open, so prevalence stays near zero even though the disease is deadly. This is why prevalence alone is a poor measure of disease risk — it confounds occurrence and survival.

Prevalence vs. Incidence: When to Use Which?

Prevalence (P) is less useful than incidence rate (I) for research into risk factors, because factors that affect either the occurrence of disease or its duration will both influence prevalence. A disease may have high prevalence simply because people survive with it for a long time, not because new cases are frequent.

However, serial prevalence studies can be used to estimate incidence rates for diseases that are not easily detected at onset — for example, by testing blood samples at regular intervals to determine when individuals become infected.

Mortality Statistics

Mortality statistics are calculated in exactly the same way as P, R, and I, except that the outcome of interest is death. The term mortality rate, strictly speaking, refers to the incidence rate of mortality, but it is often used to describe the risk of mortality (Porta, 2014; Rothman, 2012).

Overall Mortality Rate

The overall mortality rate describes the number of individuals that die from all causes in a defined time period. It is analogous to incidence rate (I) except that the outcome is death rather than disease onset.

Cause-Specific Mortality Rate

The cause-specific mortality rate describes the number of individuals that die from (or with) a specific disease during a defined period. It is calculated the same way as I but focuses on deaths attributed to one disease.

Challenges of Cause Attribution

It is often difficult to determine the specific cause of death. For example, if a recumbent patient regurgitates, contracts aspiration pneumonia, and then dies, did they die: from the initial condition causing recumbency? Due to pneumonia? With pneumonia? The “cause” is usually deemed to be the proximate cause — the factor considered the final trigger — but this determination can be challenging.

Burden of Disease: YLLs, YLDs & DALYs

Mortality, incidence, and prevalence each tell us something different, but taken individually they give an incomplete picture of a disease's impact. A condition can be deadly but rare (pancreatic cancer), or non-fatal yet extremely common and disabling (low-back pain, depression, untreated vision loss). To compare these on a single scale, the World Health Organization (WHO) and the Global Burden of Disease (GBD) Study — originated by Murray & Lopez (1997) and now updated annually by IHME (GBD 2019 Diseases and Injuries Collaborators, 2020) — express disease impact in healthy years of life lost, a common “currency” that combines premature death and time spent in less-than-perfect health (Robine, Jagger, Mathers, Crimmins, & Suzman, 2003).

Years of Life Lost (YLL)

YLL quantifies the burden from premature mortality. Each death is weighted by the number of additional years that person would have been expected to live had they survived. A death at age 30 contributes far more YLLs than a death at age 90, even though both count equally toward a crude mortality count.

YLL = Σ Nx × Lx Eq 5.4

Where Nx is the number of deaths at age x, and Lx is the standard life expectancy remaining at age x, taken from a reference life table — the GBD standard sets life expectancy at birth at ~86 years (GBD 2019 Diseases and Injuries Collaborators, 2020).

Years Lived with Disability (YLD)

YLD quantifies the burden from non-fatal health loss. Each year spent with a condition is multiplied by a disability weight (DW) between 0 (perfect health) and 1 (a health state as bad as death). Disability weights are derived from large population surveys that ask respondents to compare paired health states (Salomon et al., 2012).

YLD = I × D × DW   (incidence approach) Eq 5.5

Where I is the number of incident cases, D is the average duration of the condition until remission or death, and DW is the disability weight. A prevalence-based version (used by GBD since 2010) instead multiplies the number of prevalent cases by DW directly: YLD = P × DW.

Disability-Adjusted Life Years (DALYs)

The DALY is the simple sum of years lost to early death and years lost to disability:

DALY = YLL + YLD Eq 5.6

One DALY equals one healthy year of life lost. The total DALYs from a condition therefore measure the gap between current population health and an ideal in which everyone lives the standard life expectancy in full health (Murray et al., 2012).

Worked Example: Calculating DALYs for Breast Cancer

In a population of 100,000 women, suppose breast cancer causes 20 deaths in one year, with an average age at death of 60 (standard life expectancy at age 60 = 25 years). Suppose also that 500 women are living with breast cancer for an average of 4 years before remission or death, with a disability weight of 0.30.

YLL = 20 × 25 = 500 years

YLD = 500 × 4 × 0.30 = 600 years

DALYs = 500 + 600 = 1,100 healthy years lost in this population in one year.

Mortality alone would have shown only the 20 deaths. The DALY makes it visible that more than half of breast-cancer burden in this population is non-fatal — a fact that should shape investment in screening, treatment, and survivorship support, not just end-of-life care (Murray & Lopez, 1997).

Why Use DALYs Beyond Mortality, Incidence, and Prevalence?

  • A common currency. DALYs let you compare diseases that primarily kill against those that primarily disable on the same scale — e.g., ischaemic heart disease against major depression. No single mortality or morbidity measure can do this.
  • Premature death is weighted. A young death contributes more YLLs than an old death, capturing the intuition that lost years matter — something crude mortality rates ignore.
  • Non-fatal burden becomes visible. Conditions like low-back pain, anxiety, hearing loss, and migraine are major drivers of population health loss but are nearly invisible in mortality statistics.
  • Priority-setting and equity. Ministries of health, the WHO, and global funders use DALYs to compare cost-effectiveness across very different interventions (e.g., dollars per DALY averted).
  • Tracking change over time. Because DALYs combine fatal and non-fatal loss, they capture epidemiological transitions — the shift from infectious, fatal diseases toward chronic, disabling ones — better than any single measure.

Caveats

DALYs are powerful but not value-neutral. Disability weights reflect aggregate survey responses and may not match an individual patient's lived experience; weights have also shifted across GBD revisions (Salomon et al., 2012; Vos et al., 2016). Choices about the standard life expectancy, age-weighting, and time-discounting can substantially change disease rankings (Murray & Lopez, 1997). As with any composite indicator, always check which version of the methodology produced the number you are reading, and whether comparing across years uses the same conventions.

Other Measures of Disease Frequency

Several additional frequency measures appear frequently in epidemiological literature. Most of these are technically measures of risk (proportions), even though they are often called “rates.”

Attack
Rate
Click to learn more
Secondary
Attack Rate
Click to learn more
Case Fatality
Rate
Click to learn more
Proportional
Morbidity/Mortality
Click to learn more

Where These Numbers Come From in Canada: Surveillance Products

The incidence, prevalence, mortality, and case-fatality numbers in your textbooks and in the news are produced by named surveillance systems. The most important to know:

  • Canadian Chronic Disease Surveillance System (CCDSS) — PHAC. Linked provincial/territorial physician-billing and hospital data harmonised across jurisdictions. Produces national prevalence and incidence estimates for diabetes, hypertension, mental illness, COPD, asthma, IHD, cancer, and more.
  • Canadian Notifiable Disease Surveillance System (CNDSS) — PHAC. Aggregated reports from provinces/territories on legally reportable communicable diseases (e.g., measles, pertussis, syphilis, HIV).
  • FluWatch and RVDSS (Respiratory Virus Detection Surveillance System) — PHAC, sentinel and laboratory surveillance for influenza and respiratory viruses.
  • Canadian Cancer Registry — Statistics Canada / provincial cancer registries. Population-based incidence and survival data; the source for "Canadian Cancer Statistics" annual reports.
  • Canadian Vital Statistics — Births and Deaths Database — Statistics Canada. Source for crude and age-standardised mortality, life expectancy, infant mortality, and cause-specific death rates.
  • BC Centre for Disease Control (BCCDC) surveillance dashboards — provincial reportable-disease, STBBI, and overdose surveillance, including the BC Overdose Cohort and weekly respiratory-virus reports.
  • BC Coroners Service — primary source for unregulated drug deaths and suicide statistics in British Columbia.
  • Canadian Institute for Health Information (CIHI) — Discharge Abstract Database (DAD) and National Ambulatory Care Reporting System (NACRS); source for hospital-based incidence, length-of-stay, and case-fatality estimates.

Each system has its own case definition, denominator, and reporting lag. When you cite a Canadian rate, always state which surveillance system produced it — and check whether the case definition matches your research question.

Worked Example: Reading a CCDSS Diabetes Rate

The CCDSS reports that the age-standardised prevalence of diagnosed diabetes among Canadians aged 1+ was about 9.4% (2017–18 fiscal year). To interpret this number you need to know:

  • Numerator (case definition): a CCDSS case is a person with one hospital discharge OR two physician claims for diabetes within a two-year window — a validated administrative algorithm with sensitivity ~86% and specificity ~99%.
  • Denominator: all individuals registered with provincial health insurance during the fiscal year (a near-complete population frame — not a sample).
  • Standardisation: rates are direct-standardised to the 2011 Canadian population, so comparisons across years and provinces aren't distorted by ageing.

Notice how all three concepts from this lesson — case definition, denominator at risk, and standardisation — are required just to read a single published number correctly.

Reflection

A community of 5,000 people experiences a foodborne illness outbreak. Over two weeks, 200 people fall ill, and 8 of them die. Calculate the attack rate and the case fatality rate. In your own words, explain what each measure tells us about the outbreak.

Model answerAttack rate = 200 / 5,000 = 4% (0.04). This is the cumulative incidence of illness over the outbreak window, telling us 1 in 25 community members became ill — a serious outbreak by foodborne-illness norms. Case-fatality rate = 8 / 200 = 4% (0.04). This measures the lethality of the illness among those infected, an indicator of pathogen virulence and population vulnerability. The two together describe outbreak severity: attack rate quantifies spread, CFR quantifies lethality. A high attack rate with low CFR (e.g., norovirus) implies high transmission and morbidity but limited mortality; high CFR with low attack rate implies a deadly but contained pathogen.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Prevalence measures existing cases at a point in time; it is influenced by both incidence and disease duration.
  • The relationship P = I×D / (I×D+1) connects prevalence to incidence and duration.
  • Mortality statistics use the same formulas as morbidity measures, with death as the outcome.
  • YLLs (premature mortality) and YLDs (non-fatal health loss) sum to DALYs, expressing fatal and non-fatal burden in a single “healthy years lost” currency that no single mortality or morbidity measure can provide.
  • Attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates are specialised measures for specific contexts.
Knowledge Check — Section 3

1. Why is prevalence less useful than incidence for identifying risk factors?

A factor that changes disease duration (but not occurrence) will change prevalence, potentially misleading risk factor analysis. Incidence focuses solely on new cases.

2. The case fatality rate is best described as:

The case fatality rate is technically a risk measure (proportion) — it gives the probability that a person who has the disease will die from it within a specified period.

3. In the prevalence-incidence relationship P = I×D / (I×D+1), what does D represent?

D is the mean duration of the disease. Longer-lasting diseases will have higher prevalence even at the same incidence rate.

4. A condition causes 10 deaths at an average age where standard life expectancy is 30 remaining years, and 200 prevalent cases live with the condition for 5 years on average with a disability weight of 0.20. What is the total DALY burden?

YLL = 10 × 30 = 300. YLD = 200 × 5 × 0.20 = 200. DALYs = YLL + YLD = 500. DALYs combine premature death and time lived with disability into a single “healthy years lost” total.

5. Why are DALYs useful beyond mortality, incidence, and prevalence measures?

DALYs combine YLL (premature mortality) and YLD (non-fatal burden) into one currency, allowing direct comparison of conditions like ischaemic heart disease and depression. They are not assumption-free — disability weights and standard life expectancy are explicit value choices.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 4

Standardisation & Confidence Intervals

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Sections 1–3 walked through how to compute disease frequency measures from raw data. Section 4 addresses the immediate next problem: a crude rate from one population almost never compares fairly to a crude rate from another, because populations differ in age structure and other characteristics that affect disease occurrence. Standardisation is the technique for making the comparison fair, and confidence intervals are the technique for quantifying how confident the resulting estimate is.

Learning Objectives

  • Explain why standardisation of risks and rates is necessary.
  • Distinguish between indirect and direct standardisation.
  • Calculate standard errors and confidence intervals for proportions and rates.

Why Standardise?

When comparing disease frequency between populations, differences in host characteristics (e.g., age, sex, geographic location) can confound the comparison. For example, a population with more elderly individuals will naturally have higher disease rates. Standardisation adjusts for these confounders, allowing for fairer comparisons (Ahmad et al., 2001).

Key Concept: Confounding by Host Factors

A population can be divided into strata based on one or more host characteristics. The overall frequency of disease depends on both the size of each stratum (Hj) and the stratum-specific rates or risks (Ij or Rj). If two populations have different age distributions, their crude rates may differ even if their age-specific rates are identical. Standardisation removes this confounding effect.

Indirect Standardisation

Indirect standardisation uses standard (reference) population rates applied to the study population’s structure to calculate the expected number of cases.

Step 1: Obtain Standard Rates

Obtain a set of stratum-specific rates (Isj) from a reference or standard population. For example, national age-specific disease rates.

Step 2: Calculate Expected Cases

Apply those standard rates to the study population’s time-at-risk in each stratum. The expected adjusted rate (Ie) equals the sum of Hj × Isj, where Hj is the proportion of the study population in stratum j. The expected number of cases is E = T × Ie, where T is the total time at risk.

Step 3: Compute the SMR

The Standardised Morbidity (or Mortality) Ratio (SMR) is the ratio of observed cases (A) to expected cases (E): SMR = A / E.

An SMR > 1 means more cases were observed than expected; SMR < 1 means fewer. The indirect standardised rate is: Iind = Is × SMR, where Is is the overall rate in the standard population.

Direct Standardisation

Direct standardisation uses the study population’s stratum-specific rates applied to a standard population distribution:

Idir = Σ Tsj × Ij Eq 5.20

where Tsj is the proportion of person-time in the standard population assigned to stratum j, and Ij is the observed rate in the study population for that stratum.

Indirect vs. Direct: Key Differences

Indirect standardisation is useful when stratum-specific rates for the study population are unavailable or based on small samples. It produces the SMR.

Direct standardisation applies the study’s own stratum-specific rates to a standard distribution. A drawback is that all stratum-specific rates receive equal weight regardless of precision.

Both methods are commonly used in practice. Age standardisation of cancer statistics is one well-known application (e.g., rates published by cancer registries); the WHO World Standard population (Ahmad et al., 2001) is the most widely used reference structure for direct standardisation in global health comparisons. See also Standardized mortality ratio (Wikipedia).

Scenario: Comparing GI Disease Across Regions

Three geographic regions report the following GI disease data. Age distributions differ across regions. Region 1 has a crude risk of 0.136, Region 2 has 0.125, and Region 3 has 0.088. After indirect standardisation, the SMRs are 1.12, 1.03, and 0.74 respectively.

Region 1’s observed risk was higher than expected (SMR > 1), while Region 3 was lower (SMR < 1). What does this suggest about the role of age distribution versus true disease risk in these regions?

Standard Errors and Confidence Intervals

When estimating any proportion or rate, we need a measure of precision. The standard error (SE) quantifies the uncertainty around our estimate.

SE for a Proportion

SE(p) = √[p(1 − p) / N] Eq 5.10

where p is the estimated proportion and N is the sample size.

SE for an Incidence Rate

SE(I) = √(A) / t Eq 5.11

where A is the number of cases and t is the total person-time at risk.

Confidence Intervals

Approximate confidence intervals are calculated as:

θ − Zα × SE   to   θ + Zα × SE Eq 5.12

where θ is the estimate and Zα is the appropriate percentile of the standard normal distribution (e.g., 1.96 for a 95% CI) (Rothman, 2012).

When Approximate CIs May Mislead

In small samples, or when the frequency of disease is very low or very high, approximate CIs can produce misleading results (e.g., negative lower limits). In such cases, exact CIs based on the binomial distribution (for proportions) or the Poisson distribution (for rates) are more appropriate.

Reflection

In a study of 200 individuals, 30 develop the disease of interest. Calculate the incidence risk and its 95% confidence interval. Then discuss: how would the width of the confidence interval change if the sample size were 2,000 instead of 200?

Model answerIncidence risk = 30/200 = 0.15 (15%). Approximate 95% CI using the Wald formula: 0.15 ± 1.96 × √(0.15(0.85)/200) = 0.15 ± 0.049 = (0.101, 0.199). With n = 2,000 and the same proportion, the SE shrinks by a factor of √10 ≈ 3.16, so the half-width drops to ~0.016 and CI = (0.134, 0.166) — nearly seven times tighter. The general rule: standard error of a proportion is √(p(1−p)/n), so to halve the CI width you must quadruple n. This is the precision/sample-size trade-off baked into every observational study design.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Standardisation adjusts for confounding by host characteristics when comparing rates across populations.
  • Indirect standardisation produces the SMR (observed/expected); direct standardisation applies study-specific rates to a standard population.
  • Standard errors measure precision; 95% CIs are typically calculated as estimate ± 1.96 × SE.
  • Use exact CIs (binomial or Poisson) for small samples or extreme frequencies.
Knowledge Check — Section 4

1. What does a Standardised Morbidity Ratio (SMR) greater than 1 indicate?

An SMR > 1 means the observed number of cases exceeds the expected number (calculated by applying standard rates to the study population’s structure).

2. When is indirect standardisation preferred over direct standardisation?

Indirect standardisation is useful when the study population lacks reliable stratum-specific rates. It borrows rates from a reference population and applies them to the study population’s structure.

3. The 95% confidence interval for a proportion is calculated as:

The 95% CI is the estimate plus/minus 1.96 times the standard error, where SE = √[p(1−p)/N] for a proportion.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 5

Final Review & Assessment

⏱ Estimated time: 20 minutes

Bringing It All Together

This lesson assembled the basic numerical vocabulary of epidemiology (Porta, 2014; Rothman, 2012). You moved from the four mathematical forms — counts, proportions, odds, and rates — through incidence (risk and rate), into prevalence and mortality, then into burden-of-disease metrics (GBD 2019 Diseases and Injuries Collaborators, 2020), and finally into standardisation (Ahmad et al., 2001) and the precision around the resulting estimates. Every later lesson assumes you can pick the right denominator for the question being asked.

The takeaways below pull these threads together. The decisions in this lesson are usually the difference between a defensible comparison and a misleading one: prevalence and incidence answer different questions; risks and rates require different denominators; and crude rates can hide more than they reveal whenever populations differ in age structure or other strong confounders.

Key Takeaways from Lesson 5

  • Counts, proportions, odds, and rates are mathematically distinct — choose the form that matches the population structure and the time at risk.
  • Incidence risk (a proportion in a closed population) and incidence rate (per person-time in an open population) measure new cases but are not interchangeable: R = 1 − eIΔt.
  • Prevalence reflects both incidence and duration, so it is a poor measure of risk for diseases with long survival or short courses.
  • Mortality, YLLs, YLDs, and DALYs place fatal and non-fatal health loss on a single scale, enabling comparison between conditions that primarily kill and those that primarily disable.
  • Standardisation (direct or indirect) is required whenever populations differ in structure; the SMR and standardised rates are not just summaries but adjustments.
  • Every reported measure should carry a standard error and confidence interval — a point estimate without precision is not actually a measure.

Reflection

A new disease has been identified in two regions. Region A (population 50,000, younger demographic) reports 200 cases over one year. Region B (population 30,000, older demographic) reports 150 cases over the same period. Discuss which measures of disease frequency you would calculate, why you would need to standardise before comparing the regions, and which method of standardisation you would choose.

Model answerCalculate: crude incidence rates (e.g., 200/50,000 = 400 per 100,000 in Region A; 150/30,000 = 500 per 100,000 in Region B); cumulative incidence over one year; and at minimum stratified rates by age group. Standardisation is essential because the regions differ in age structure (A younger, B older) and the disease likely varies with age, so crude rates conflate the age-structure effect with any real regional difference. Method choice: direct standardisation applies the age-specific rates from each region to a common standard population (CIHI national, WHO world, or the combined regional population) — reports rates as if the populations had the same age structure, ideal when age-specific rates are reliable in both regions. Indirect standardisation uses a standard population's age-specific rates to compute expected cases for each region — better when one region's age-specific counts are sparse. Default to direct standardisation here; report SMRs from indirect as a sensitivity check.

Minimum 20 characters required.

✓ Reflection saved

Final Assessment

This assessment covers all material from Lesson 5. You must score 100% to complete the lesson. Review the feedback for any incorrect answers and try again.

Final Assessment — Measures of Disease Frequency (15 Questions)

1. What distinguishes a rate from a proportion?

Rates use person-time at risk in the denominator (units of 1/time), while proportions have the numerator as a subset of the total population (dimensionless, 0 to 1).

2. In a proportion, the numerator is:

By definition, in a proportion the numerator is contained within the denominator. This is what distinguishes a proportion from odds.

3. Which statement about incidence risk (R) is correct?

Incidence risk is a probability (proportion) — dimensionless, between 0 and 1. It requires a specified time period and is ideally measured in closed populations.

4. A person-time unit of “1 person-month” means:

A person-time unit combines the number of individuals with the duration of their observation. One person-month equals one person observed for one month.

5. In a closed population study, withdrawals are handled by:

The convention assumes withdrawals leave, on average, at the midpoint of the study. Subtracting half their number from the denominator corrects for this.

6. The relationship R = 1 − eIΔt connects:

This exponential formula allows conversion between incidence risk (R) and incidence rate (I) over a time period Δt.

7. Prevalence measures:

Prevalence counts all existing (current) cases at a given point in time, regardless of when they first occurred. Incidence, by contrast, counts only new cases.

8. Why is prevalence affected by disease duration?

The relationship P = I×D/(I×D+1) shows that even at constant incidence, longer disease duration leads to higher prevalence.

9. An “attack rate” is best described as:

Despite the name, attack rates are actually a measure of risk (a proportion). They are the number of cases divided by the exposed population in an outbreak setting.

10. Proportional morbidity/mortality rates are used when:

Proportional morbidity/mortality rates divide disease-specific cases by total cases from all diseases, bypassing the need for a population-at-risk denominator. They are less precise than true risk measures.

11. The purpose of rate standardisation is to:

Standardisation removes the confounding effect of different population structures (e.g., age distributions) so that the comparison of rates reflects true differences in disease frequency.

12. In indirect standardisation, the SMR equals:

SMR = A/E, where A is the observed number of cases and E is the expected number based on applying standard rates to the study population’s structure.

13. The standard error of a proportion increases when:

SE = √[p(1−p)/N]. As N decreases, the SE increases, meaning less precision in the estimate.

14. When should exact (rather than approximate) confidence intervals be used?

Approximate CIs can produce misleading results (e.g., negative lower limits) in small samples or at extreme frequencies. Exact CIs from binomial or Poisson distributions are preferred in these situations.

15. Which statement best summarises this lesson?

This lesson emphasises that each measure has specific applications. The choice depends on whether the population is open or closed, whether we need individual-level or population-level estimates, and what the research question requires. Standardisation and confidence intervals are essential tools for valid comparisons.

✦ Complete the final reflection above before submitting