HSCI 341 — Lesson 4

Measures of
Disease Frequency

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Differentiate among counts, proportions, odds, and rates
  • Describe the difference between incidence and prevalence
  • Distinguish between incidence risk and incidence rate
  • Explain cause-specific measures, proportional morbidity/mortality, and case fatality rates
  • Select appropriate measures of disease frequency for specific circumstances
  • Compute measures and calculate confidence intervals when provided with data

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Introduction & Foundational Concepts

⏱ Estimated reading time: 12 minutes

Learning Objectives

  • Explain why measuring disease frequency is fundamental to epidemiology.
  • Distinguish between counts, proportions, odds, and rates.
  • Identify key factors (study period, risk period) that influence the choice of frequency measure.

Why Measure Disease Frequency?

Measurement of disease (or event) frequency is the foundation of many epidemiological activities, including routine surveillance, observational research, and outbreak investigations. In observational studies, measuring the frequency of a disease and linking it to an exposure are the first steps toward inferring causation.

Morbidity and mortality are the two main categories of events for which frequency measures are calculated. However, the same approaches apply to other events of interest such as vaccination, hospital admission, or giving birth.

Stratification Matters

Because both morbidity and mortality are strongly associated with individual attributes, and because different diseases have different impacts, we often calculate frequency measures for specific host attributes (e.g., age, sex, race) and for specific diseases. This stratification allows us to detect patterns that would be hidden in overall population-level data.

Study Period and Risk Period

When selecting a measure of disease frequency, two time-related concepts are critical:

Study Period

The study period is the time interval during which study subjects are observed for the outcome of interest. It is usually measured in calendar time, but sometimes represents a point in time. The study period could also be defined by a specific event — for example, “at birth” or “congenital defects during 2008–2010.”

Risk Period

The risk period is the time during which an individual could develop the disease. For some conditions, the risk period is very short (e.g., post-partum eclampsia — usually less than 2 days), while for others it is essentially lifelong (e.g., migraine headaches).

Diseases with a short risk period relative to the study period are good candidates for risk measures. Diseases with long risk periods are better suited to rate-based measures.

Counts, Proportions, Odds, and Rates

Before examining specific measures of disease frequency, it is essential to understand the four mathematical forms these measures can take:

Click each card to learn more:

CountClick to learn more
ProportionClick to learn more
OddsClick to learn more
RateClick to learn more

Comparison Table

MeasureNumeratorDenominatorRangeUnits
CountCasesNone0 to ∞None
ProportionSubset of denominatorTotal population0 to 1Dimensionless
OddsCasesNon-cases0 to ∞Dimensionless
RateCasesPerson-time at risk0 to ∞Per person-time

Common Terminology Confusion

The term “rate” is often used loosely to refer to all types of measures of disease frequency. Similarly, people commonly say someone has a high “chance” or “risk” of disease when the underlying measure might actually be a rate. Being precise about these terms helps avoid misinterpretation of research findings.

Key Takeaways

  • Disease frequency measurement underpins surveillance, research, and outbreak investigation.
  • The study period and risk period influence which type of measure is most appropriate.
  • Counts, proportions, odds, and rates are the four mathematical forms; each has distinct properties.
  • A rate strictly requires person-time in the denominator, though the term is often used loosely.
Knowledge Check — Section 1

1. What distinguishes a proportion from odds?

In a proportion, the numerator (e.g., cases) is included within the denominator (e.g., total population). In odds, the denominator excludes the numerator (e.g., cases divided by non-cases).

2. A disease with a long risk period relative to the study period is best measured using:

When the risk period is longer than the study period, rate-based measures (using person-time denominators) are more appropriate because they account for varying amounts of time at risk.

3. Why are simple counts of limited use in epidemiologic research?

Without knowing the population size, a count of cases tells us nothing about the relative burden of disease. Fifty cases in a population of 100 is very different from 50 cases in a population of 1,000,000.

✦ Pass the knowledge check with 100% to continue

Section 2

Incidence: Risk & Rate

⏱ Estimated reading time: 15 minutes

Learning Objectives

  • Define incidence and explain the four ways to express it.
  • Distinguish between incidence risk and incidence rate.
  • Compare closed and open populations and how they affect measurement.
  • Calculate incidence risk and incidence rate from provided data.

What Is Incidence?

Incidence relates to the number of new events (e.g., new cases of a disease) in a defined population within a specific period. Because it deals with new cases, incidence is used to identify factors associated with becoming ill. A clear case definition is essential, along with a surveillance programme capable of identifying all cases.

First Cases vs. All Cases

Although incidence deals with “new cases,” this does not necessarily mean only the first-ever case within an individual. For some conditions (e.g., migraine headaches, recurrent infections), multiple episodes can occur. Researchers must decide whether to count only first cases or all new episodes, and clearly state this in their methodology.

Four Ways to Express Incidence

Incident Times

Incident times are the specific times at which cases occur, measured as elapsed time since a reference event (e.g., days after exposure to a toxin, or days after parturition). Incident times form the basis of survival analysis and are discussed at length in more advanced topics.

Incidence Count

The incidence count is the simple count of new cases observed. It is often used when a disease did not previously exist or was very rare in a population. Incidence counts are sometimes expressed as absolute rates, relating the number of cases to the time period of observation (e.g., “3 cases per year”). They have limited value unless combined with population-at-risk data.

Incidence Risk (R)

Incidence risk (R) is the probability that an individual will contract or develop a disease in a defined time period. It is a proportion (dimensionless, ranges from 0 to 1) and the time period must be specified. Only the first occurrence in the period of interest counts. Sometimes called cumulative incidence.

R is used in studies focused on individual-level predictions (e.g., the probability of breast cancer recurrence within the next year is 14%).

Incidence Rate (I)

Incidence rate (I) is the number of new cases per unit of person-time during a given time period. It has units of 1/time and is positive without an upper bound. Also called incidence density.

I is used to determine what factors are related to diseases and the effects of those diseases. It is the preferred measure for open populations where individuals enter and leave over time.

Calculating Risk

R = number of newly affected individuals in a defined time period / population at risk Eq 4.1

Population at Risk: Closed vs. Open

Estimating the population at risk can be challenging. The key distinction is whether the population is closed or open.

Closed Population

A closed population has no additions and few to no losses during the study period. Examples include residents of a nursing home followed for a year, or women followed for one week post-partum.

Only disease-free individuals at the start are considered at risk. People lost to follow-up are called withdrawals, and the simplest correction is to subtract half the number of withdrawals from the population at risk (assuming they leave, on average, halfway through the study).

Open Population

An open population has individuals entering and leaving throughout the study period. For example, women served by a cancer centre who had mastectomies form an open population as new patients continually arrive.

An open population is considered stable (stationary) if the rate of additions and withdrawals, and the distribution of host attributes, remain relatively constant over time.

Risk cannot be computed directly from an open population. Instead, it can be estimated from the incidence rate (I) using the relationship between risk and rate.

Calculating Incidence Rates

I = number of new cases in a defined time period / number of person-time units at risk during the time period Eq 4.2

A person-time unit is one person observed for a defined period (e.g., one person-month, one person-year). After an individual contracts the disease of interest, they are no longer at risk and no longer contribute person-time to the denominator (for first-case analyses).

Worked Example: Exact Incidence Rate

Assume 4 previously healthy people were observed for exactly 1 month (30 days):

  • Person 1: not sick at all → 1.00 person-months at risk
  • Person 2: sick on day 10 → 0.33 person-months at risk
  • Person 3: sick on day 20 → 0.67 person-months at risk
  • Person 4: moved away on day 15 (lost to follow-up) → 0.50 person-months at risk

Total person-months at risk = 2.50

Total new cases = 2

I = 2 / 2.50 = 0.80 cases per person-month

Approximate Calculation

When exact person-time data is unavailable, an approximate formula can be used:

I = cases / (start − ½sick − ½withdrawn + ½added) × time Eq 4.3

This assumes that, on average, events (illness, withdrawal, addition) occur at the midpoint of the study period.

Relationship Between Risk and Rate

Risk and rate are mathematically related. For a closed population:

R = I × Δt   (when I×Δt is small)

More precisely, the relationship is: R = 1 − eI×Δt

This exponential formula accounts for the fact that as people become diseased, they leave the at-risk pool. When the product I×Δt is small (less than 0.1), the simpler linear approximation works well.

Reflection

A nursing home with 80 residents experiences 12 cases of influenza over a 3-month winter period. Five residents were transferred out during the study (assume at the midpoint), and 3 new residents were admitted (also at the midpoint). Estimate the incidence rate using the approximate formula. Would incidence risk or incidence rate be more appropriate here, and why?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Incidence measures new cases; it can be expressed as incident times, counts, risk (R), or rate (I).
  • Risk is a proportion (0 to 1) measured in closed populations; rate uses person-time denominators and works for open populations.
  • Withdrawals are handled by subtracting half their number from the population at risk.
  • Risk and rate are related: R = 1 − eIΔt; for small values, RI × Δt.
Knowledge Check — Section 2

1. Incidence risk (R) is best described as:

Incidence risk is a proportion (probability) — dimensionless, ranging from 0 to 1 — that requires a defined time period. It is sometimes called cumulative incidence.

2. In a closed population study, how are withdrawals typically handled when calculating risk?

Assuming withdrawals leave on average halfway through the study period, we subtract half their number from the denominator (population at risk).

3. What units does incidence rate (I) have?

Incidence rate has units of 1/person-time, which simplifies to 1/time. It is positive without an upper bound.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 3

Prevalence, Mortality & Other Frequency Measures

⏱ Estimated reading time: 14 minutes

Learning Objectives

  • Define prevalence and explain how it differs from incidence.
  • Describe the relationship between prevalence, incidence, and disease duration.
  • Explain mortality statistics and cause-specific measures.
  • Distinguish attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates.

Prevalence

Unlike incidence, which counts new cases, prevalence measures the number of existing cases of disease at a specific point in time (or over a period). The prevalence count is the number of individuals in a population who have an attribute or disease at a particular time.

P = cases of disease at a point in time / individuals in the population at the same point in time Eq 4.8

For example, if 75 athletes are tested for performance-enhancing drug use and 3 test positive, then P = 3/75 = 0.04 (4%).

Relationship Between Prevalence and Incidence

In a stable population where the incidence rate (I) remains constant, prevalence (P), incidence, and mean disease duration (D) are related:

P = (I × D) / (I × D + 1) Eq 4.9

Worked Example

If the incidence rate of influenza in an urban population is 0.3/person-year (30 cases per 100 people per year) and the mean duration of infection is 3 weeks (0.058 years), then:

P = (0.3 × 0.058) / (0.3 × 0.058 + 1) = 0.0174 / 1.0174 = 0.017 (1.7%)

So on any given day, we would expect about 1.7% of the population to have the flu.

Prevalence vs. Incidence: When to Use Which?

Prevalence (P) is less useful than incidence rate (I) for research into risk factors, because factors that affect either the occurrence of disease or its duration will both influence prevalence. A disease may have high prevalence simply because people survive with it for a long time, not because new cases are frequent.

However, serial prevalence studies can be used to estimate incidence rates for diseases that are not easily detected at onset — for example, by testing blood samples at regular intervals to determine when individuals become infected.

Mortality Statistics

Mortality statistics are calculated in exactly the same way as P, R, and I, except that the outcome of interest is death. The term mortality rate, strictly speaking, refers to the incidence rate of mortality, but it is often used to describe the risk of mortality.

Overall Mortality Rate

The overall mortality rate describes the number of individuals that die from all causes in a defined time period. It is analogous to incidence rate (I) except that the outcome is death rather than disease onset.

Cause-Specific Mortality Rate

The cause-specific mortality rate describes the number of individuals that die from (or with) a specific disease during a defined period. It is calculated the same way as I but focuses on deaths attributed to one disease.

Challenges of Cause Attribution

It is often difficult to determine the specific cause of death. For example, if a recumbent patient regurgitates, contracts aspiration pneumonia, and then dies, did they die: from the initial condition causing recumbency? Due to pneumonia? With pneumonia? The “cause” is usually deemed to be the proximate cause — the factor considered the final trigger — but this determination can be challenging.

Other Measures of Disease Frequency

Several additional frequency measures appear frequently in epidemiological literature. Most of these are technically measures of risk (proportions), even though they are often called “rates.”

Attack
Rate
Click to learn more
Secondary
Attack Rate
Click to learn more
Case Fatality
Rate
Click to learn more
Proportional
Morbidity/Mortality
Click to learn more

Reflection

A community of 5,000 people experiences a foodborne illness outbreak. Over two weeks, 200 people fall ill, and 8 of them die. Calculate the attack rate and the case fatality rate. In your own words, explain what each measure tells us about the outbreak.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Prevalence measures existing cases at a point in time; it is influenced by both incidence and disease duration.
  • The relationship P = I×D / (I×D+1) connects prevalence to incidence and duration.
  • Mortality statistics use the same formulas as morbidity measures, with death as the outcome.
  • Attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates are specialised measures for specific contexts.
Knowledge Check — Section 3

1. Why is prevalence less useful than incidence for identifying risk factors?

A factor that changes disease duration (but not occurrence) will change prevalence, potentially misleading risk factor analysis. Incidence focuses solely on new cases.

2. The case fatality rate is best described as:

The case fatality rate is technically a risk measure (proportion) — it gives the probability that a person who has the disease will die from it within a specified period.

3. In the prevalence-incidence relationship P = I×D / (I×D+1), what does D represent?

D is the mean duration of the disease. Longer-lasting diseases will have higher prevalence even at the same incidence rate.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 4

Standardisation & Confidence Intervals

⏱ Estimated reading time: 15 minutes

Learning Objectives

  • Explain why standardisation of risks and rates is necessary.
  • Distinguish between indirect and direct standardisation.
  • Calculate standard errors and confidence intervals for proportions and rates.

Why Standardise?

When comparing disease frequency between populations, differences in host characteristics (e.g., age, sex, geographic location) can confound the comparison. For example, a population with more elderly individuals will naturally have higher disease rates. Standardisation adjusts for these confounders, allowing for fairer comparisons.

Key Concept: Confounding by Host Factors

A population can be divided into strata based on one or more host characteristics. The overall frequency of disease depends on both the size of each stratum (Hj) and the stratum-specific rates or risks (Ij or Rj). If two populations have different age distributions, their crude rates may differ even if their age-specific rates are identical. Standardisation removes this confounding effect.

Indirect Standardisation

Indirect standardisation uses standard (reference) population rates applied to the study population’s structure to calculate the expected number of cases.

Step 1: Obtain Standard Rates

Obtain a set of stratum-specific rates (Isj) from a reference or standard population. For example, national age-specific disease rates.

Step 2: Calculate Expected Cases

Apply those standard rates to the study population’s time-at-risk in each stratum. The expected adjusted rate (Ie) equals the sum of Hj × Isj, where Hj is the proportion of the study population in stratum j. The expected number of cases is E = T × Ie, where T is the total time at risk.

Step 3: Compute the SMR

The Standardised Morbidity (or Mortality) Ratio (SMR) is the ratio of observed cases (A) to expected cases (E): SMR = A / E.

An SMR > 1 means more cases were observed than expected; SMR < 1 means fewer. The indirect standardised rate is: Iind = Is × SMR, where Is is the overall rate in the standard population.

Direct Standardisation

Direct standardisation uses the study population’s stratum-specific rates applied to a standard population distribution:

Idir = Σ Tsj × Ij Eq 4.20

where Tsj is the proportion of person-time in the standard population assigned to stratum j, and Ij is the observed rate in the study population for that stratum.

Indirect vs. Direct: Key Differences

Indirect standardisation is useful when stratum-specific rates for the study population are unavailable or based on small samples. It produces the SMR.

Direct standardisation applies the study’s own stratum-specific rates to a standard distribution. A drawback is that all stratum-specific rates receive equal weight regardless of precision.

Both methods are commonly used in practice. Age standardisation of cancer statistics is one well-known application (e.g., rates published by cancer registries).

Scenario: Comparing GI Disease Across Regions

Three geographic regions report the following GI disease data. Age distributions differ across regions. Region 1 has a crude risk of 0.136, Region 2 has 0.125, and Region 3 has 0.088. After indirect standardisation, the SMRs are 1.12, 1.03, and 0.74 respectively.

Region 1’s observed risk was higher than expected (SMR > 1), while Region 3 was lower (SMR < 1). What does this suggest about the role of age distribution versus true disease risk in these regions?

Standard Errors and Confidence Intervals

When estimating any proportion or rate, we need a measure of precision. The standard error (SE) quantifies the uncertainty around our estimate.

SE for a Proportion

SE(p) = √[p(1 − p) / N] Eq 4.10

where p is the estimated proportion and N is the sample size.

SE for an Incidence Rate

SE(I) = √(A) / t Eq 4.11

where A is the number of cases and t is the total person-time at risk.

Confidence Intervals

Approximate confidence intervals are calculated as:

θ − Zα × SE   to   θ + Zα × SE Eq 4.12

where θ is the estimate and Zα is the appropriate percentile of the standard normal distribution (e.g., 1.96 for a 95% CI).

When Approximate CIs May Mislead

In small samples, or when the frequency of disease is very low or very high, approximate CIs can produce misleading results (e.g., negative lower limits). In such cases, exact CIs based on the binomial distribution (for proportions) or the Poisson distribution (for rates) are more appropriate.

Reflection

In a study of 200 individuals, 30 develop the disease of interest. Calculate the incidence risk and its 95% confidence interval. Then discuss: how would the width of the confidence interval change if the sample size were 2,000 instead of 200?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Standardisation adjusts for confounding by host characteristics when comparing rates across populations.
  • Indirect standardisation produces the SMR (observed/expected); direct standardisation applies study-specific rates to a standard population.
  • Standard errors measure precision; 95% CIs are typically calculated as estimate ± 1.96 × SE.
  • Use exact CIs (binomial or Poisson) for small samples or extreme frequencies.
Knowledge Check — Section 4

1. What does a Standardised Morbidity Ratio (SMR) greater than 1 indicate?

An SMR > 1 means the observed number of cases exceeds the expected number (calculated by applying standard rates to the study population’s structure).

2. When is indirect standardisation preferred over direct standardisation?

Indirect standardisation is useful when the study population lacks reliable stratum-specific rates. It borrows rates from a reference population and applies them to the study population’s structure.

3. The 95% confidence interval for a proportion is calculated as:

The 95% CI is the estimate plus/minus 1.96 times the standard error, where SE = √[p(1−p)/N] for a proportion.

✦ Pass the knowledge check with 100% and complete the reflection to continue

Section 5

Final Review & Assessment

⏱ Estimated time: 20 minutes

Lesson Summary

This lesson covered the fundamental measures used to quantify disease frequency in epidemiology. Here is a review of the key concepts:

Section 1: Foundational Concepts

Disease frequency measurement underpins all epidemiological activities. The four mathematical forms — counts, proportions, odds, and rates — each have distinct properties. The study period and risk period determine which measure is most appropriate for a given situation.

Section 2: Incidence — Risk & Rate

Incidence measures new cases. Incidence risk (R) is a proportion ranging from 0 to 1 for closed populations. Incidence rate (I) uses person-time denominators and works for open populations. Risk and rate are related by the formula R = 1 − eIΔt.

Section 3: Prevalence & Mortality

Prevalence measures existing cases at a point in time and is influenced by both incidence and disease duration. Mortality statistics use the same formulas with death as the outcome. Specialised measures include attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates.

Section 4: Standardisation & Confidence Intervals

Standardisation adjusts for confounding by host characteristics when comparing rates. Indirect standardisation produces the SMR; direct standardisation applies study rates to a standard distribution. Standard errors and confidence intervals quantify precision around estimates.

Final Reflection

Reflection

A new disease has been identified in two regions. Region A (population 50,000, younger demographic) reports 200 cases over one year. Region B (population 30,000, older demographic) reports 150 cases over the same period. Discuss which measures of disease frequency you would calculate, why you would need to standardise before comparing the regions, and which method of standardisation you would choose.

Minimum 20 characters required.

✓ Reflection saved

Final Assessment

This assessment covers all material from Lesson 4. You must score 100% to complete the lesson. Review the feedback for any incorrect answers and try again.

Final Assessment — Measures of Disease Frequency (15 Questions)

1. What distinguishes a rate from a proportion?

Rates use person-time at risk in the denominator (units of 1/time), while proportions have the numerator as a subset of the total population (dimensionless, 0 to 1).

2. In a proportion, the numerator is:

By definition, in a proportion the numerator is contained within the denominator. This is what distinguishes a proportion from odds.

3. Which statement about incidence risk (R) is correct?

Incidence risk is a probability (proportion) — dimensionless, between 0 and 1. It requires a specified time period and is ideally measured in closed populations.

4. A person-time unit of “1 person-month” means:

A person-time unit combines the number of individuals with the duration of their observation. One person-month equals one person observed for one month.

5. In a closed population study, withdrawals are handled by:

The convention assumes withdrawals leave, on average, at the midpoint of the study. Subtracting half their number from the denominator corrects for this.

6. The relationship R = 1 − eIΔt connects:

This exponential formula allows conversion between incidence risk (R) and incidence rate (I) over a time period Δt.

7. Prevalence measures:

Prevalence counts all existing (current) cases at a given point in time, regardless of when they first occurred. Incidence, by contrast, counts only new cases.

8. Why is prevalence affected by disease duration?

The relationship P = I×D/(I×D+1) shows that even at constant incidence, longer disease duration leads to higher prevalence.

9. An “attack rate” is best described as:

Despite the name, attack rates are actually a measure of risk (a proportion). They are the number of cases divided by the exposed population in an outbreak setting.

10. Proportional morbidity/mortality rates are used when:

Proportional morbidity/mortality rates divide disease-specific cases by total cases from all diseases, bypassing the need for a population-at-risk denominator. They are less precise than true risk measures.

11. The purpose of rate standardisation is to:

Standardisation removes the confounding effect of different population structures (e.g., age distributions) so that the comparison of rates reflects true differences in disease frequency.

12. In indirect standardisation, the SMR equals:

SMR = A/E, where A is the observed number of cases and E is the expected number based on applying standard rates to the study population’s structure.

13. The standard error of a proportion increases when:

SE = √[p(1−p)/N]. As N decreases, the SE increases, meaning less precision in the estimate.

14. When should exact (rather than approximate) confidence intervals be used?

Approximate CIs can produce misleading results (e.g., negative lower limits) in small samples or at extreme frequencies. Exact CIs from binomial or Poisson distributions are preferred in these situations.

15. Which statement best summarises this lesson?

This lesson emphasises that each measure has specific applications. The choice depends on whether the population is open or closed, whether we need individual-level or population-level estimates, and what the research question requires. Standardisation and confidence intervals are essential tools for valid comparisons.

✦ Complete the final reflection above before submitting