Measures of
Disease Frequency
Fundamental Epidemiological Concepts and Approaches
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Differentiate among counts, proportions, odds, and rates
- Describe the difference between incidence and prevalence
- Distinguish between incidence risk and incidence rate
- Explain cause-specific measures, proportional morbidity/mortality, and case fatality rates
- Select appropriate measures of disease frequency for specific circumstances
- Compute measures and calculate confidence intervals when provided with data
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Introduction & Foundational Concepts
⏱ Estimated reading time: 12 minutes
Learning Objectives
- Explain why measuring disease frequency is fundamental to epidemiology.
- Distinguish between counts, proportions, odds, and rates.
- Identify key factors (study period, risk period) that influence the choice of frequency measure.
Why Measure Disease Frequency?
Measurement of disease (or event) frequency is the foundation of many epidemiological activities, including routine surveillance, observational research, and outbreak investigations. In observational studies, measuring the frequency of a disease and linking it to an exposure are the first steps toward inferring causation.
Morbidity and mortality are the two main categories of events for which frequency measures are calculated. However, the same approaches apply to other events of interest such as vaccination, hospital admission, or giving birth.
Stratification Matters
Because both morbidity and mortality are strongly associated with individual attributes, and because different diseases have different impacts, we often calculate frequency measures for specific host attributes (e.g., age, sex, race) and for specific diseases. This stratification allows us to detect patterns that would be hidden in overall population-level data.
Study Period and Risk Period
When selecting a measure of disease frequency, two time-related concepts are critical:
Study Period
The study period is the time interval during which study subjects are observed for the outcome of interest. It is usually measured in calendar time, but sometimes represents a point in time. The study period could also be defined by a specific event — for example, “at birth” or “congenital defects during 2008–2010.”
Risk Period
The risk period is the time during which an individual could develop the disease. For some conditions, the risk period is very short (e.g., post-partum eclampsia — usually less than 2 days), while for others it is essentially lifelong (e.g., migraine headaches).
Diseases with a short risk period relative to the study period are good candidates for risk measures. Diseases with long risk periods are better suited to rate-based measures.
Counts, Proportions, Odds, and Rates
Before examining specific measures of disease frequency, it is essential to understand the four mathematical forms these measures can take:
Click each card to learn more:
Comparison Table
| Measure | Numerator | Denominator | Range | Units |
|---|---|---|---|---|
| Count | Cases | None | 0 to ∞ | None |
| Proportion | Subset of denominator | Total population | 0 to 1 | Dimensionless |
| Odds | Cases | Non-cases | 0 to ∞ | Dimensionless |
| Rate | Cases | Person-time at risk | 0 to ∞ | Per person-time |
Common Terminology Confusion
The term “rate” is often used loosely to refer to all types of measures of disease frequency. Similarly, people commonly say someone has a high “chance” or “risk” of disease when the underlying measure might actually be a rate. Being precise about these terms helps avoid misinterpretation of research findings.
Key Takeaways
- Disease frequency measurement underpins surveillance, research, and outbreak investigation.
- The study period and risk period influence which type of measure is most appropriate.
- Counts, proportions, odds, and rates are the four mathematical forms; each has distinct properties.
- A rate strictly requires person-time in the denominator, though the term is often used loosely.
1. What distinguishes a proportion from odds?
2. A disease with a long risk period relative to the study period is best measured using:
3. Why are simple counts of limited use in epidemiologic research?
✦ Pass the knowledge check with 100% to continue
Incidence: Risk & Rate
⏱ Estimated reading time: 15 minutes
Learning Objectives
- Define incidence and explain the four ways to express it.
- Distinguish between incidence risk and incidence rate.
- Compare closed and open populations and how they affect measurement.
- Calculate incidence risk and incidence rate from provided data.
What Is Incidence?
Incidence relates to the number of new events (e.g., new cases of a disease) in a defined population within a specific period. Because it deals with new cases, incidence is used to identify factors associated with becoming ill. A clear case definition is essential, along with a surveillance programme capable of identifying all cases.
First Cases vs. All Cases
Although incidence deals with “new cases,” this does not necessarily mean only the first-ever case within an individual. For some conditions (e.g., migraine headaches, recurrent infections), multiple episodes can occur. Researchers must decide whether to count only first cases or all new episodes, and clearly state this in their methodology.
Four Ways to Express Incidence
Incident times are the specific times at which cases occur, measured as elapsed time since a reference event (e.g., days after exposure to a toxin, or days after parturition). Incident times form the basis of survival analysis and are discussed at length in more advanced topics.
The incidence count is the simple count of new cases observed. It is often used when a disease did not previously exist or was very rare in a population. Incidence counts are sometimes expressed as absolute rates, relating the number of cases to the time period of observation (e.g., “3 cases per year”). They have limited value unless combined with population-at-risk data.
Incidence risk (R) is the probability that an individual will contract or develop a disease in a defined time period. It is a proportion (dimensionless, ranges from 0 to 1) and the time period must be specified. Only the first occurrence in the period of interest counts. Sometimes called cumulative incidence.
R is used in studies focused on individual-level predictions (e.g., the probability of breast cancer recurrence within the next year is 14%).
Incidence rate (I) is the number of new cases per unit of person-time during a given time period. It has units of 1/time and is positive without an upper bound. Also called incidence density.
I is used to determine what factors are related to diseases and the effects of those diseases. It is the preferred measure for open populations where individuals enter and leave over time.
Calculating Risk
Population at Risk: Closed vs. Open
Estimating the population at risk can be challenging. The key distinction is whether the population is closed or open.
Closed Population
A closed population has no additions and few to no losses during the study period. Examples include residents of a nursing home followed for a year, or women followed for one week post-partum.
Only disease-free individuals at the start are considered at risk. People lost to follow-up are called withdrawals, and the simplest correction is to subtract half the number of withdrawals from the population at risk (assuming they leave, on average, halfway through the study).
Open Population
An open population has individuals entering and leaving throughout the study period. For example, women served by a cancer centre who had mastectomies form an open population as new patients continually arrive.
An open population is considered stable (stationary) if the rate of additions and withdrawals, and the distribution of host attributes, remain relatively constant over time.
Risk cannot be computed directly from an open population. Instead, it can be estimated from the incidence rate (I) using the relationship between risk and rate.
Calculating Incidence Rates
A person-time unit is one person observed for a defined period (e.g., one person-month, one person-year). After an individual contracts the disease of interest, they are no longer at risk and no longer contribute person-time to the denominator (for first-case analyses).
Worked Example: Exact Incidence Rate
Assume 4 previously healthy people were observed for exactly 1 month (30 days):
- Person 1: not sick at all → 1.00 person-months at risk
- Person 2: sick on day 10 → 0.33 person-months at risk
- Person 3: sick on day 20 → 0.67 person-months at risk
- Person 4: moved away on day 15 (lost to follow-up) → 0.50 person-months at risk
Total person-months at risk = 2.50
Total new cases = 2
I = 2 / 2.50 = 0.80 cases per person-month
Approximate Calculation
When exact person-time data is unavailable, an approximate formula can be used:
This assumes that, on average, events (illness, withdrawal, addition) occur at the midpoint of the study period.
Relationship Between Risk and Rate
Risk and rate are mathematically related. For a closed population:
More precisely, the relationship is: R = 1 − e−I×Δt
This exponential formula accounts for the fact that as people become diseased, they leave the at-risk pool. When the product I×Δt is small (less than 0.1), the simpler linear approximation works well.
Reflection
A nursing home with 80 residents experiences 12 cases of influenza over a 3-month winter period. Five residents were transferred out during the study (assume at the midpoint), and 3 new residents were admitted (also at the midpoint). Estimate the incidence rate using the approximate formula. Would incidence risk or incidence rate be more appropriate here, and why?
Minimum 20 characters required.
Key Takeaways
- Incidence measures new cases; it can be expressed as incident times, counts, risk (R), or rate (I).
- Risk is a proportion (0 to 1) measured in closed populations; rate uses person-time denominators and works for open populations.
- Withdrawals are handled by subtracting half their number from the population at risk.
- Risk and rate are related: R = 1 − e−IΔt; for small values, R ≈ I × Δt.
1. Incidence risk (R) is best described as:
2. In a closed population study, how are withdrawals typically handled when calculating risk?
3. What units does incidence rate (I) have?
✦ Pass the knowledge check with 100% and complete the reflection to continue
Prevalence, Mortality & Other Frequency Measures
⏱ Estimated reading time: 14 minutes
Learning Objectives
- Define prevalence and explain how it differs from incidence.
- Describe the relationship between prevalence, incidence, and disease duration.
- Explain mortality statistics and cause-specific measures.
- Distinguish attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates.
Prevalence
Unlike incidence, which counts new cases, prevalence measures the number of existing cases of disease at a specific point in time (or over a period). The prevalence count is the number of individuals in a population who have an attribute or disease at a particular time.
For example, if 75 athletes are tested for performance-enhancing drug use and 3 test positive, then P = 3/75 = 0.04 (4%).
Relationship Between Prevalence and Incidence
In a stable population where the incidence rate (I) remains constant, prevalence (P), incidence, and mean disease duration (D) are related:
Worked Example
If the incidence rate of influenza in an urban population is 0.3/person-year (30 cases per 100 people per year) and the mean duration of infection is 3 weeks (0.058 years), then:
P = (0.3 × 0.058) / (0.3 × 0.058 + 1) = 0.0174 / 1.0174 = 0.017 (1.7%)
So on any given day, we would expect about 1.7% of the population to have the flu.
Prevalence vs. Incidence: When to Use Which?
Prevalence (P) is less useful than incidence rate (I) for research into risk factors, because factors that affect either the occurrence of disease or its duration will both influence prevalence. A disease may have high prevalence simply because people survive with it for a long time, not because new cases are frequent.
However, serial prevalence studies can be used to estimate incidence rates for diseases that are not easily detected at onset — for example, by testing blood samples at regular intervals to determine when individuals become infected.
Mortality Statistics
Mortality statistics are calculated in exactly the same way as P, R, and I, except that the outcome of interest is death. The term mortality rate, strictly speaking, refers to the incidence rate of mortality, but it is often used to describe the risk of mortality.
Overall Mortality Rate
The overall mortality rate describes the number of individuals that die from all causes in a defined time period. It is analogous to incidence rate (I) except that the outcome is death rather than disease onset.
Cause-Specific Mortality Rate
The cause-specific mortality rate describes the number of individuals that die from (or with) a specific disease during a defined period. It is calculated the same way as I but focuses on deaths attributed to one disease.
Challenges of Cause Attribution
It is often difficult to determine the specific cause of death. For example, if a recumbent patient regurgitates, contracts aspiration pneumonia, and then dies, did they die: from the initial condition causing recumbency? Due to pneumonia? With pneumonia? The “cause” is usually deemed to be the proximate cause — the factor considered the final trigger — but this determination can be challenging.
Other Measures of Disease Frequency
Several additional frequency measures appear frequently in epidemiological literature. Most of these are technically measures of risk (proportions), even though they are often called “rates.”
RateClick to learn more
Attack RateClick to learn more
RateClick to learn more
Morbidity/MortalityClick to learn more
Reflection
A community of 5,000 people experiences a foodborne illness outbreak. Over two weeks, 200 people fall ill, and 8 of them die. Calculate the attack rate and the case fatality rate. In your own words, explain what each measure tells us about the outbreak.
Minimum 20 characters required.
Key Takeaways
- Prevalence measures existing cases at a point in time; it is influenced by both incidence and disease duration.
- The relationship P = I×D / (I×D+1) connects prevalence to incidence and duration.
- Mortality statistics use the same formulas as morbidity measures, with death as the outcome.
- Attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates are specialised measures for specific contexts.
1. Why is prevalence less useful than incidence for identifying risk factors?
2. The case fatality rate is best described as:
3. In the prevalence-incidence relationship P = I×D / (I×D+1), what does D represent?
✦ Pass the knowledge check with 100% and complete the reflection to continue
Standardisation & Confidence Intervals
⏱ Estimated reading time: 15 minutes
Learning Objectives
- Explain why standardisation of risks and rates is necessary.
- Distinguish between indirect and direct standardisation.
- Calculate standard errors and confidence intervals for proportions and rates.
Why Standardise?
When comparing disease frequency between populations, differences in host characteristics (e.g., age, sex, geographic location) can confound the comparison. For example, a population with more elderly individuals will naturally have higher disease rates. Standardisation adjusts for these confounders, allowing for fairer comparisons.
Key Concept: Confounding by Host Factors
A population can be divided into strata based on one or more host characteristics. The overall frequency of disease depends on both the size of each stratum (Hj) and the stratum-specific rates or risks (Ij or Rj). If two populations have different age distributions, their crude rates may differ even if their age-specific rates are identical. Standardisation removes this confounding effect.
Indirect Standardisation
Indirect standardisation uses standard (reference) population rates applied to the study population’s structure to calculate the expected number of cases.
Obtain a set of stratum-specific rates (Isj) from a reference or standard population. For example, national age-specific disease rates.
Apply those standard rates to the study population’s time-at-risk in each stratum. The expected adjusted rate (Ie) equals the sum of Hj × Isj, where Hj is the proportion of the study population in stratum j. The expected number of cases is E = T × Ie, where T is the total time at risk.
The Standardised Morbidity (or Mortality) Ratio (SMR) is the ratio of observed cases (A) to expected cases (E): SMR = A / E.
An SMR > 1 means more cases were observed than expected; SMR < 1 means fewer. The indirect standardised rate is: Iind = Is × SMR, where Is is the overall rate in the standard population.
Direct Standardisation
Direct standardisation uses the study population’s stratum-specific rates applied to a standard population distribution:
where Tsj is the proportion of person-time in the standard population assigned to stratum j, and Ij is the observed rate in the study population for that stratum.
Indirect vs. Direct: Key Differences
Indirect standardisation is useful when stratum-specific rates for the study population are unavailable or based on small samples. It produces the SMR.
Direct standardisation applies the study’s own stratum-specific rates to a standard distribution. A drawback is that all stratum-specific rates receive equal weight regardless of precision.
Both methods are commonly used in practice. Age standardisation of cancer statistics is one well-known application (e.g., rates published by cancer registries).
Scenario: Comparing GI Disease Across Regions
Three geographic regions report the following GI disease data. Age distributions differ across regions. Region 1 has a crude risk of 0.136, Region 2 has 0.125, and Region 3 has 0.088. After indirect standardisation, the SMRs are 1.12, 1.03, and 0.74 respectively.
Region 1’s observed risk was higher than expected (SMR > 1), while Region 3 was lower (SMR < 1). What does this suggest about the role of age distribution versus true disease risk in these regions?
Standard Errors and Confidence Intervals
When estimating any proportion or rate, we need a measure of precision. The standard error (SE) quantifies the uncertainty around our estimate.
SE for a Proportion
where p is the estimated proportion and N is the sample size.
SE for an Incidence Rate
where A is the number of cases and t is the total person-time at risk.
Confidence Intervals
Approximate confidence intervals are calculated as:
where θ is the estimate and Zα is the appropriate percentile of the standard normal distribution (e.g., 1.96 for a 95% CI).
When Approximate CIs May Mislead
In small samples, or when the frequency of disease is very low or very high, approximate CIs can produce misleading results (e.g., negative lower limits). In such cases, exact CIs based on the binomial distribution (for proportions) or the Poisson distribution (for rates) are more appropriate.
Reflection
In a study of 200 individuals, 30 develop the disease of interest. Calculate the incidence risk and its 95% confidence interval. Then discuss: how would the width of the confidence interval change if the sample size were 2,000 instead of 200?
Minimum 20 characters required.
Key Takeaways
- Standardisation adjusts for confounding by host characteristics when comparing rates across populations.
- Indirect standardisation produces the SMR (observed/expected); direct standardisation applies study-specific rates to a standard population.
- Standard errors measure precision; 95% CIs are typically calculated as estimate ± 1.96 × SE.
- Use exact CIs (binomial or Poisson) for small samples or extreme frequencies.
1. What does a Standardised Morbidity Ratio (SMR) greater than 1 indicate?
2. When is indirect standardisation preferred over direct standardisation?
3. The 95% confidence interval for a proportion is calculated as:
✦ Pass the knowledge check with 100% and complete the reflection to continue
Final Review & Assessment
⏱ Estimated time: 20 minutes
Lesson Summary
This lesson covered the fundamental measures used to quantify disease frequency in epidemiology. Here is a review of the key concepts:
Section 1: Foundational Concepts
Disease frequency measurement underpins all epidemiological activities. The four mathematical forms — counts, proportions, odds, and rates — each have distinct properties. The study period and risk period determine which measure is most appropriate for a given situation.
Section 2: Incidence — Risk & Rate
Incidence measures new cases. Incidence risk (R) is a proportion ranging from 0 to 1 for closed populations. Incidence rate (I) uses person-time denominators and works for open populations. Risk and rate are related by the formula R = 1 − e−IΔt.
Section 3: Prevalence & Mortality
Prevalence measures existing cases at a point in time and is influenced by both incidence and disease duration. Mortality statistics use the same formulas with death as the outcome. Specialised measures include attack rates, secondary attack rates, case fatality rates, and proportional morbidity/mortality rates.
Section 4: Standardisation & Confidence Intervals
Standardisation adjusts for confounding by host characteristics when comparing rates. Indirect standardisation produces the SMR; direct standardisation applies study rates to a standard distribution. Standard errors and confidence intervals quantify precision around estimates.
Final Reflection
Reflection
A new disease has been identified in two regions. Region A (population 50,000, younger demographic) reports 200 cases over one year. Region B (population 30,000, older demographic) reports 150 cases over the same period. Discuss which measures of disease frequency you would calculate, why you would need to standardise before comparing the regions, and which method of standardisation you would choose.
Minimum 20 characters required.
Final Assessment
This assessment covers all material from Lesson 4. You must score 100% to complete the lesson. Review the feedback for any incorrect answers and try again.
1. What distinguishes a rate from a proportion?
2. In a proportion, the numerator is:
3. Which statement about incidence risk (R) is correct?
4. A person-time unit of “1 person-month” means:
5. In a closed population study, withdrawals are handled by:
6. The relationship R = 1 − e−IΔt connects:
7. Prevalence measures:
8. Why is prevalence affected by disease duration?
9. An “attack rate” is best described as:
10. Proportional morbidity/mortality rates are used when:
11. The purpose of rate standardisation is to:
12. In indirect standardisation, the SMR equals:
13. The standard error of a proportion increases when:
14. When should exact (rather than approximate) confidence intervals be used?
15. Which statement best summarises this lesson?
✦ Complete the final reflection above before submitting