Case-Control
Studies

Evaluating Epidemiological Research

Learning objectives for this lesson:

Describe the major design features of risk-based and rate-based case-control studies
Identify hypotheses and population types consistent with each design
Differentiate between primary-base and secondary-base case-control studies
Elaborate the principles used to select and define the case series
Explain the principal features for selecting controls in open and closed populations
Design and implement a valid case-control study to meet specific objectives

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas

Study Base The population and time-period from which both cases and controls arise. Defining the study base sharply is the conceptual foundation of valid case-control design.

Primary-Base Design A case-control study in which the source population (the “study base”) is defined first, then cases and controls are sampled from it. Generally yields cleaner inferences because the denominator is explicit.

Secondary-Base Design A case-control study in which cases are identified first (e.g., from a clinic) and controls are then selected to represent the (implicit) population that produced those cases. More common in practice but more vulnerable to selection bias.

Open (Dynamic) Population A population whose membership changes over time as people enter and leave (e.g., the residents of a city). Person-time is the appropriate denominator.

Closed (Fixed) Population A population with a fixed membership followed over a defined period (e.g., a graduating class). Persons (rather than person-time) are the typical denominator.

Case Definition The explicit criteria a person must meet to count as a case, e.g., diagnostic codes, lab values, time window. A vague case definition undermines every later step.

Incident Case A newly diagnosed case during the study window. Usually preferred because it avoids overrepresenting long-surviving (prevalent) cases.

Prevalent Case A case identified at a point in time regardless of when diagnosed. Use is risky because survival differences can mimic exposure effects.

Control A non-case sampled from the same study base as the cases. Selected to represent the exposure distribution of the source population, not a healthy comparator.

Density (Risk-Set) Sampling A control selection scheme in which controls are sampled at the moment each case occurs, from those still at risk. Yields incidence-rate ratios as the natural effect measure.

Cumulative (Closed-Cohort) Sampling A control selection scheme used when the source is a closed cohort: controls are sampled from non-cases at the end of follow-up. Yields a risk ratio (with a rare-disease assumption to interpret OR).

Case-Cohort Sampling Controls are a random sample of the original cohort (the “subcohort”) chosen at baseline, regardless of who later becomes a case. Useful when the same controls support multiple outcomes (Prentice, 1986).

Odds Ratio (OR) The natural effect measure produced by a case-control study: the odds of exposure among cases divided by the odds of exposure among controls. Approximates the risk ratio when the outcome is rare (Cornfield, 1951).

Rare-Disease Assumption The assumption (typically prevalence < ~10%) that allows the OR from a cumulative case-control study to approximate the risk ratio. Many landmark case-control discoveries (for example, the link between prenatal DES and clear-cell vaginal adenocarcinoma) involve rare outcomes (Herbst, Ulfelder, & Poskanzer, 1971).

Recall Bias Differential misreporting of past exposures by cases versus controls, cases often think harder about possible causes. A characteristic threat to retrospective case-control designs (Coughlin, 1990).

Berkson's Bias A selection bias arising when both the exposure and the disease independently increase the probability of being hospitalized; using hospital controls then distorts the apparent association (Berkson, 1946).

Matching Selecting controls so that the case and control series have the same distribution of one or more variables (e.g., age, sex). Controls confounding by those variables but requires matched analysis (e.g., conditional logistic regression).

Overmatching Matching on a variable that is on the causal pathway, an effect of the exposure, or unrelated to confounding. Reduces statistical efficiency or biases the estimate toward the null.

Selection Bias Systematic error introduced when cases or controls are not representative of the underlying study base on exposure. The chief vulnerability of case-control designs.

Information (Misclassification) Bias Error in the measurement of exposure or outcome. Differential misclassification (e.g., recall bias) is especially damaging because it can bias either toward or away from the null.

Confounding A distortion of the exposure–outcome association by a third variable associated with both. Must be addressed by matching, restriction, stratification, or multivariable adjustment.

Methods & Study Designs

Case-Control Study A study that samples on outcome: a series of people with the disease (cases) and a series without (controls), then compares their past exposures. Efficient for rare or long-latency diseases.

Risk-Based (Cumulative) Case-Control A case-control study set in a closed population, with cumulative sampling of controls. Estimates a risk-based OR.

Rate-Based (Density) Case-Control A case-control study set in a dynamic population, with density (risk-set) sampling of controls. Estimates an incidence-rate ratio without the rare-disease assumption.

Nested Case-Control Study A case-control study sampled from within an established cohort. Combines the efficiency of case-control sampling with the rigor of cohort-based exposure measurement (Prentice, 1986).

Key People

Janet Lane-Claypon (1877–1967) British physician–epidemiologist whose 1926 study of breast cancer was one of the earliest formal case-control studies. A pioneer of comparative epidemiologic design (Press & Pharoah, 2010).

Richard Doll (1912–2005) & Austin Bradford Hill (1897–1991) British epidemiologists whose 1950 case-control study of British doctors linked smoking to lung cancer; one of the most consequential observational studies in public-health history (Doll & Hill, 1950).

Kenneth Rothman (1945–) American epidemiologist whose work clarified the case-control study as sampling from an underlying study base, and who helped systematize density and risk-set sampling (Rothman & Greenland, 2005).

Olli Miettinen (1936–) Finnish epidemiologist whose theoretical work on the “study base principle” underpins the modern teaching of case-control design (Miettinen, 1976).

No matching entries. Try a different search term.

Section 1

Introduction & The Study Base

⏱ Estimated reading time: 15 minutes

Section 1 of 4

Introduction & The Study Base

The backward-looking logic, the study base concept, and published examples that anchor the rest of the lesson.

The core logic

Start with disease, look back at exposure

The case-control study begins after the disease has occurred. Cases and controls are assembled, and exposure histories are compared.

Cases

People who have newly developed the disease or outcome of interest.

Controls

People who have not developed the outcome, drawn from the same source population.

The design

What is a case-control study?

Cases and controls are selected based on disease status. Exposure history is then compared between the two groups.

Most case-control studies are retrospective, but the design can be run prospectively when outcome timing is not yet known at enrolment.

Key distinction

Controls represent the source population

A case-control study is not a comparison between cases and healthy subjects. It is a comparison between cases and non-cases who would have been included as cases had they developed the outcome.Dohoo, Martin & Stryhn, 2012

Controls are not a healthy comparison group. They stand in for the exposure distribution of the population that produced the cases.

The anchor

Primary and secondary study bases

Primary base

An enumerable, defined population, such as a provincial disease registry. The source population is explicit.

Secondary base

Reconstructed from a clinical setting, such as a hospital. The source population must be inferred or conceptualised.

Nested case-control studies, a third option, sit inside a defined cohort and can uniquely estimate disease frequency by exposure level.

Published examples

Four studies that anchor the lesson

Dogan et al., 2010

Secondary-base, stored serum samples, cancer.

Dore et al., 2004

Rate-based, S. Typhimurium, three Canadian provinces.

Magura et al., 2008

Risk-based, secondary-base, prostate cancer, single hospital.

Rodrigo et al., 2011

Nested, rate-based, community study inside a randomised controlled trial.

The measure

The odds ratio from a 2×2 table

Odds ratio (cross-product)

\[ \color{#0B7B6B}{\text{OR}} = \frac{\color{#C2410C}{a} \cdot \color{#BE185D}{d}}{\color{#6D28D9}{b} \cdot \color{#1D4ED8}{c}} \]

OR odds ratioa exposed casesd unexposed controlsb unexposed casesc exposed controls

Where a = exposed cases, b = unexposed cases, c = exposed controls, d = unexposed controls. Disease frequency cannot be estimated from case-control data unless the study is nested.

Carry forward: the study base anchors the design; controls must represent exposure in that base; the odds ratio is the only valid association measure in the standard design.

Introduction and Overview

Modern teaching of case-control design follows the “study base” framework laid out by Vandenbroucke and Pearce (2012). An earlier lesson introduced the three sampling approaches that organize observational analytic studies: cross-sectional (sample without regard to disease), case-control (sample on the disease), and cohort (sample on the exposure). It walked through the cross-sectional design in detail. This lesson does the same work for case-control studies. The four content sections proceed from the most general design choices to the most specific: this section sets up the basic logic and the concept of the study base; a later section covers how cases are identified and how controls are selected; a later section distinguishes the two main flavors (risk-based and rate-based) and shows what the odds ratio actually estimates under each; a later section closes the loop on comparability, analysis, and reporting.

Two ideas from an earlier lesson carry over directly. First, the cross-sectional limit of measuring prevalence rather than incidence is one of the things case-control studies are designed to overcome. Second, the unified-approach discipline (think experiment first, fix design before seeing data, project forward to alternative results) applies just as much here as it did to cross-sectional designs, arguably more, because the choice of cases and controls creates more opportunities for things to go wrong.

Learning Objectives

Describe the fundamental logic of the case-control study design.
Distinguish between primary-base and secondary-base case-control studies.
Explain the concept of nested case-control studies.
Identify when case-control designs are performed prospectively vs. retrospectively.

What Is a Case-Control Study?

The basis of the case-control study design is to select individuals who have newly developed the disease or outcome of interest (the cases) and, as a comparison, individuals who have not developed the disease at the time of selection (the controls). We then contrast the frequency of exposure factors in the cases with the frequency of exposure factors in the controls.

▸ INTERACTIVE STORY, DETECTIVE DOLL
Open full screen ↗

Walk through the 1950 Doll & Hill (1950) case-control study scene by scene. Next ▶ advances at your pace.

A 7-scene reenactment of the first major case-control study: the rising lung-cancer ward, the backward-looking design, case and control interviews, the 2×2 table populating live, the tilting scale, and the moment OR ≈ 14 lands in print.

Key Distinction

A case-control study is not a comparison between a set of cases and a set of ‘healthy’ subjects. It is a comparison between a set of cases and a set of non-case subjects (people who have not developed the specific disease but may have other diseases) whose exposure to the factors of interest reflects the exposure in the source population.

The controls would have been included as cases if they had developed the outcome (disease) of interest. Most frequently, individual people are the units of interest, but the design also applies to aggregates of individuals.

Figure, The logic of case-control design: select cases and controls from the same source population, then compare their exposure histories.

Usually, case-control studies are performed retrospectively since the outcome (usually disease) has occurred when the study begins. However, it is possible to conduct case-control studies prospectively; in these, the cases have not yet developed until after the study begins, so the cases are enrolled as they occur over time.

The diagram makes the logic look simple, but it conceals a hard question: which source population? In case-control studies that question has a name and three standard answers, and the rest of the lesson essentially turns on getting it right.

The Study Base

The study base is the population from which the cases and (possibly) the controls are obtained. The nature of the study base determines how controls should be selected. The three flip cards below introduce the standard typology, primary base, secondary base, and the special case of nested designs. Click each one and notice that they differ in how directly the source population can be enumerated, which in turn determines how easy it is to draw a valid control sample.

Primary Base

Click to learn more

Secondary Base

Click to learn more

Nested Design

Click to learn more

The three study-base types make more sense when you can see them in published studies. The four examples below recur throughout the rest of the lesson; expand each one to see how the design choices were made and which combination of features (primary vs. secondary base; risk-based vs. rate-based; nested or not) the investigators chose. We will refer back to these examples by number in later sections.

Key Examples

Example 9.1, Prospective Risk-Based (Serum Estradiol & Breast Cancer) ▼

Dorgan et al (2010) used serum samples from a secondary-base case-control study. A total of 6,915 women who were free of cancer donated blood between 1977–1989. Of the 6,720 women in extended follow-up, 1,751 were identified as deceased. For each of the 117 potential cases, 2 potential controls were matched on age (±2 years), date (±1 year), and menstrual cycle day (±2 days). This is a risk-based sampling strategy. Conditional logistic regression was used to evaluate the association.

Example 9.2, Primary-Base (Salmonella Typhimurium Risk Factors) ▼

Dore et al (2004) conducted a rate-based study in Alberta, British Columbia, and Saskatchewan, Canada (Dec 1999–Nov 2000). Eligible cases had diarrheal illness with S. Typhimurium from stool samples. Controls were matched 1:1 on age and province of residence, randomly selected from provincial health registries. Cases and controls were interviewed by telephone using a pre-tested, standardised questionnaire covering demographics, health history, medication use, travel history, and animal contact.

Example 9.3, Secondary-Base (Hypercholesterolemia & Prostate Cancer) ▼

Magura et al (2008) used a risk-based, secondary-base case-control design. Cases were men newly diagnosed with prostate cancer at Meritcare hospital between 2004–2006. Controls were identified from the primary-care database of the same hospital: men without cancer, aged 50–74, who had annual physicals and lipid profiles within a year. Exclusion criteria included other cancers and non-Caucasian race. The authors used a widely accepted definition of hypercholesterolemia (total cholesterol >5.17 mmol/l) and estimated odds ratios using multiple logistic regression.

Example 9.4, Nested Rate-Based (Gastroenteritis Risk Factors) ▼

Rodrigo et al (2011) conducted a community-based (primary-base), nested, rate-based case-control study within a larger randomised controlled trial in South Australia. 300 households maintained weekly health diaries. The outcome (highly credible gastroenteritis (HCG)) was defined as 2+ loose stools, 2+ vomiting episodes, or combinations with abdominal pain/nausea in 24 hours. Controls were matched to cases by study week. Logistic regression was used, allowing for familial clustering and repeated observations.

Key Takeaways

Case-control studies select subjects based on disease status and look backward at exposure.
The study base can be a primary base (enumerable population) or secondary base (clinic/registry).
Nested designs allow estimation of disease frequency by exposure, a unique advantage.
Controls should represent the exposure experience of the source population that gave rise to the cases.

Now that you can see the design's full logic and a handful of working examples, the next box brings the analysis back to the 2×2 contingency table you met at the end of an earlier lesson. The same structure shows up; but because we sampled on case status this time, the only valid summary measure is the odds ratio.

First, what are “odds”?

The odds of an event are the number of times it happens divided by the number of times it does not. Take the smoking table in the R box just below. Among the 50 cases, 45 were smokers and 5 were not, so the odds of being a smoker among cases are 45 to 5, which is 9. Among the 150 controls, 60 were smokers and 90 were not, so the odds there are 60 to 90, or about 0.67. The odds ratio divides one by the other: 9 ÷ 0.67 ≈ 13.5. The cross-product shortcut, (a × d) / (b × c), reaches the same value in one step, which is why it is the formula you will see.

R Compute the odds ratio from a 2×2 case-control table

What you'll do: read a fictional smoking/lung-cancer 2×2 table into R, compute the cross-product odds ratio, and add a 95% confidence interval using the Woolf method. What to take away: the OR you compute here is the standard summary measure for every case-control study you will read in this course; a later lesson will show how its meaning changes again when sampling is by exposure rather than by case status.

Case-control studies are summarized by an odds ratio (OR), the only measure of association you can compute when you've sampled by case status. The arithmetic is just the cross-product of a 2×2 table.

# Hypothetical study: 50 lung cancer cases + 150 controls; smoking status known.
#                  Smoker   Nonsmoker
# Cases (lung Ca)     45         5
# Controls            60        90

tab <- matrix(c(45, 60,
                5,  90),
              nrow = 2, byrow = FALSE,
              dimnames = list(Status = c("Case", "Control"),
                              Smoke  = c("Yes", "No")))
tab

# Odds ratio = (a*d) / (b*c)
a <- tab["Case",    "Yes"]; b <- tab["Case",    "No"]
c <- tab["Control", "Yes"]; d <- tab["Control", "No"]
OR <- (a * d) / (b * c)
OR

# 95% CI from the log-OR (Woolf method)
log_se <- sqrt(1/a + 1/b + 1/c + 1/d)
ci <- exp(log(OR) + c(-1, 1) * 1.96 * log_se)
round(c(OR = OR, lower = ci[1], upper = ci[2]), 2)

Console output

OR lower upper 13.50 5.07 35.97

Reading the OR. Cases had ~13.5 times the odds of being smokers compared with controls. Because we sampled on case status, this OR (not a risk ratio) is the appropriate measure. Later in a later course you'll meet epitools::oddsratio() which produces the same result with one line of code.

R Reflect on what you just ran

Use the questions below to interpret the output you produced. Look at your console before answering.

1. The computed OR was 13.5 with a 95% CI of (5.07, 35.97). State in plain language what that OR means about the odds of being a smoker among lung-cancer cases versus controls. Does the CI exclude the null value of 1, and what does that tell you about statistical significance?

Model answerThe odds of being a smoker are 13.5 times higher among lung-cancer cases than among controls. The CI (5.07, 35.97) does not contain 1, so at α = 0.05 the association is statistically significant: chance alone is an implausible explanation. The headline number reproduces the Doll & Hill (1950) order of magnitude, an effect this large in a properly conducted case-control study is the textbook signal of a strong exposure-disease relationship.

2. The CI is very wide (about a sevenfold range). Looking at the four cell counts (a=45, b=5, c=60, d=90), which cell is driving the imprecision, and how does the Woolf formula sqrt(1/a + 1/b + 1/c + 1/d) make that intuitive?

Model answerCell b = 5 (unexposed cases, i.e., the nonsmokers among the lung-cancer cases) is the bottleneck. Woolf's formula for the SE of ln(OR) is √(1/a + 1/b + 1/c + 1/d); because 1/5 = 0.20 dominates the four reciprocals (the others are 0.022, 0.017, 0.011), the smallest cell controls the variance. The lesson generalises: the precision of any OR is set by its sparsest cell, which is why case-control studies of rare exposures need very large control samples even when the case count is fine.

3. Why is the OR (and not a risk ratio or risk difference) the only valid measure of association here? Reference how the data were sampled.

Model answerCase-control sampling fixes the row totals (cases and controls are selected by outcome status, not by exposure) so the proportions a/(a+c) and b/(b+d) reflect the sampling fractions, not the underlying risks. You cannot estimate the absolute risk in either exposure group, so risk ratio and risk difference are not identifiable without external information on the source population's disease frequency. The OR is the unique association measure that is invariant to outcome-based sampling, that's why it became the case-control workhorse, and why the OR ≈ RR approximation (rare-disease assumption) is what licenses translation back to risk.

Saved.

The reflection below is a personal application of the primary/secondary distinction. After working through it and the knowledge check, a later section takes the same design logic one level deeper: how do you actually identify cases, and how do you choose controls so that the comparison is fair?

Reflection

Consider a disease that is of interest to you. Would a primary-base or secondary-base case-control study be more feasible? What would be the advantages and trade-offs of each approach for your specific research question?

Model answerFor a relatively common chronic outcome (e.g., type-2 diabetes) with a well-defined catchment population (provincial health-system coverage), a primary-base design is feasible: define the source population in advance, identify cases as they arise, and sample controls from the same population at risk, usually via population registry or random digit dialling. Advantages: clean denominator, control selection that mirrors the case base, defensible inference. Trade-off: expensive, slow, and tracking who is in the base is hard. A secondary-base design (hospital cases plus hospital controls, or community controls without a defined base) is faster, cheaper, often the only feasible route for a rare cancer; but the price is that case ascertainment and control selection may not represent the same underlying population, opening selection bias (Berkson, 1946; referral, healthy-control bias). Pick by feasibility, name the specific selection biases your design can't avoid.

Minimum 20 characters required.

✓ Reflection saved

Section 2

The Case Series & Principles of Control Selection

⏱ Estimated reading time: 15 minutes

Section 2 of 4

The Case Series & Principles of Control Selection

Case ascertainment, diagnostic criteria, and the four principles that make a control group valid.

Case ascertainment

Who counts as a case?

Diagnostic criteria

Specific, documented, and consistently applied. Clinical signs plus laboratory confirmation where available.

Completeness

Every effort should be made to obtain all cases from the source. In secondary-base designs, tertiary-centre cases may not represent the broader disease spectrum.

Incident vs. prevalent

Always prefer incident cases

Incident cases are newly diagnosed. Prevalent cases have been living with the disease for an unknown period.

Prevalent cases introduce survival bias: exposure patterns among survivors may reflect prognosis, not disease onset. The odds ratio then estimates something other than the exposure-disease relationship at the time of incidence.

The rule

Four principles of control selection

Wacholder et al. (1992a, b, c) describe the four principles that govern valid control selection. They act together.

Controls must come from the same study base as the cases.
Controls must be sampled independently of exposure.
Controls must meet the eligibility criteria that would have made them cases.
Controls must be at risk at the time of case occurrence (temporal eligibility).

Control sources

Six sources and their trade-offs

Population controls

Most representative. Low response rates, recall differences from cases.

Hospital controls

Cooperative; similar recall. Hospitalisation may itself be linked to exposure.

Friend / neighbour controls

Demographic similarity. Risk over-matching on the exposures of interest.

Random digit dialling

Population representative. Declining response; mobile-only households.

Carry forward

Cases and controls are one linked decision

Incident cases are strongly preferred; prevalent cases change what the odds ratio means.
Controls must come from the same study base and be sampled independently of exposure.
Every control source involves a trade-off; there is no universally superior option.
A later section adds the time dimension: how sampling timing changes what the odds ratio estimates.

Introduction and Overview

An earlier section set up the design at the level of the source population. This section steps inside it. The two halves of a case-control study (the case series and the control group) each carry their own design decisions, and historically far more case-control studies have been ruined by control selection than by anything else. We start with the case series, where the choices are mostly about definition and ascertainment, then turn to the harder problem of choosing controls.

Learning Objectives

Describe the key elements in selecting and defining the case series.
Discuss the importance of diagnostic criteria and case ascertainment.
Articulate the four major principles of control selection.
Compare different sources of controls and their strengths and limitations.

The Case Series (Section 9.3)

Key elements in selecting the case series include: specifying the disease (including diagnostic criteria), identifying the source(s) of the cases, deciding whether only incident or both incident and prevalent cases are to be included, and estimating the required number of cases and total sample size.

Incident vs. Prevalent Cases

There is virtually unanimous agreement that, when possible, only incident cases should be used. There are specific circumstances where prevalent cases may be justified, but this would be the exception, not the rule. Usually, only the first occurrence of the outcome in each study subject is included (Examples 9.1 and 9.3); however, multiple occurrences of the same disease can be included (Example 9.4).

Where Do Cases Come From?

The primary/secondary distinction we met in an earlier section also shapes how cases themselves are identified. The two tabs below revisit each option with the case series specifically in view; notice how the source choice creates a very different downstream problem for control selection.

Primary-base cases come from a specific registry that contains virtually all cases for a defined population (e.g., provincial or state disease registries). Sampling or taking a census of cases directly from the primary source population avoids a number of potential selection biases, but may be more difficult to implement and more costly.

Primary-base designs are moderately common because provincial or state records allow complete enumeration of people and their health events.

Secondary-base cases are obtained from a physician’s clinic, one or more hospitals, or registries. A major challenge is to conceptualise the actual source population from which the cases arose. A common solution is to select controls from records at the same source (e.g., the same hospital; see Example 9.3).

Every effort should be made to obtain complete case ascertainment. In secondary-base studies, the set of cases from a tertiary care facility could become increasingly different from cases in the broader source population.

Diagnostic Criteria

The diagnostic criteria for a subject to become a case should include specific, well-defined manifestational (i.e., clinical) signs where appropriate and, when possible, clearly documented diagnostic criteria (e.g., laboratory test results) that can be applied to all study subjects in a uniform manner. In some instances, it might be desirable to subdivide the case series into subgroups based on differences in disease characteristics.

Diagnostic criteria settle who counts as a case. The harder question (the one that makes or breaks a case-control study) is who counts as an appropriate comparison. That is the rest of this section.

Principles of Control Selection (Section 9.4)

The selection of appropriate controls is often one of the most difficult aspects of a case-control design. The key guideline is that controls should be representative of the exposure experience in the population which gave rise to the cases.

The Four Major Principles

Wacholder, McLaughlin, Silverman, & Mandel (1992a; 1992b; 1992c) provide the classic discussions of control selection. The four principles below act together (not independently) to ensure that the controls' exposure experience really does mirror the population that gave rise to the cases.

Same Study Base

Click to explore

Closed Population Rule

Click to explore

Open Population Rule

Click to explore

Eligibility Period

Click to explore

Sources of Controls

The four principles tell you what valid controls look like in the abstract. In practice, every choice of control source trades a different strength against a different bias. The table below catalogues the six most common sources; the column to read most carefully is the third one, because the limitation of each source is exactly the kind of bias that source most often produces.

Source	Strengths	Limitations
Population controls	Representative of source population	Low response rates; recall bias; less motivated
Hospital controls	Accessible; cooperative; similar recall ability	Exposure may be related to hospitalisation
Friend controls	Similar recall; willing to participate	Over-matching; biased estimates (Bunin et al, 2011)
Neighbourhood controls	Similar socioeconomic background	If neighbourhood related to exposure, causes bias
Random digit dialling (RDD)	Population-representative sampling	Business vs. home phone issues; declining response rates
Partner controls	Shared environment; cooperative	Age-sex distribution differs; over-matching on exposures

Key Takeaways

Incident cases are strongly preferred over prevalent cases.
Cases can come from primary bases (registries) or secondary bases (clinics/hospitals).
Controls must represent the exposure experience of the source population.
The four key principles: same study base, closed/open population rules, and temporal eligibility.

The reflection below asks you to make a real control-selection decision and defend it against the biases the table above just named. Once you have done that and the knowledge check, a later section turns to a parallel choice: should the design be risk-based or rate-based, and how does that decision change what the odds ratio actually estimates?

Reflection

Imagine you are studying whether a specific dietary factor is associated with colorectal cancer. You plan to recruit cases from a hospital. What type of control group would you select (hospital, population, friend, etc.) and why? What biases might arise from your choice?

Model answerFor hospital colorectal-cancer cases, population controls are usually the most defensible if you have the registry to draw them; they reflect the diet of the source population the cases came from. Hospital controls (admitted for other conditions) risk distortion because admitted-for-anything-else patients have systematically different diets (alcohol-related admissions, GI conditions, frailty); the effect estimate gets pulled toward null or away depending on the comparator. Friend / spouse controls share environment but not chance of being in the source population, so they over-match on diet and dilute the exposure contrast. Specific biases to name: Berkson bias (hospitalisation correlated with both exposure and outcome), control-disease bias (controls drawn from disease groups themselves affected by diet), and selection bias from non-population sampling. State explicitly which biases you accept and which you can address with sensitivity analysis.

Minimum 20 characters required.

✓ Reflection saved

Section 3

Controls in Risk-Based & Rate-Based Designs

⏱ Estimated reading time: 15 minutes

Section 3 of 4

Controls in Risk-Based & Rate-Based Designs

Closed and open populations, incidence density sampling, and what the odds ratio estimates under each design.

Closed populations

Risk-based design and Equation 9.1

Risk-based odds ratio (Eq 9.1)

\[ \color{#0B7B6B}{\text{OR}} = \frac{\color{#C2410C}{a}/\color{#6D28D9}{b}}{\color{#1D4ED8}{c}/\color{#BE185D}{d}} = \frac{\color{#C2410C}{a} \cdot \color{#BE185D}{d}}{\color{#6D28D9}{b} \cdot \color{#1D4ED8}{c}} \]

OR odds ratioa exposed casesb unexposed casesc exposed controlsd unexposed controls

Controls are sampled at the end of follow-up from people who remained disease-free. The odds ratio estimates the risk ratio when disease frequency is low (below roughly 10 percent) in the source population.

Best suited to outbreak investigations and short, closed cohorts.

Open populations

Rate-based design: incidence density sampling

At each case occurrence, controls are sampled from the current risk set. A person who serves as a control can later become a case.

What the OR estimates

Rate-based OR estimates the incidence rate ratio

Rate-based design (Eqs 9.2–9.5)

\[ \color{#0B7B6B}{\text{OR}_{\text{rate-based}}} = \frac{\color{#C2410C}{I_1}}{\color{#6D28D9}{I_0}} = \text{incidence rate ratio} \]

OR_rate rate-based odds ratioI₁ incidence in exposedI₀ incidence in unexposed

Risk-based

OR estimates RR only when disease frequency is low (below approximately 10 percent). Rarity assumption required.

Rate-based

OR estimates the incidence rate ratio directly. No rarity assumption needed.

Practical properties

Advantages of incidence density sampling

No need to measure person-time at risk for potential controls.
No assumption that the population is stable over time.
The number of controls per case can vary across the study timeline.
An initial control can later become a case if they develop the outcome.

These properties make the rate-based design flexible and well-suited to most real epidemiological populations, which are open.

Carry forward

Design complete; execution next

Risk-based: closed populations, controls at end of follow-up; OR estimates RR when outcome is rare.
Rate-based: open populations, incidence density sampling; OR estimates incidence rate ratio without rarity assumption.
The same two-by-two table arithmetic produces both; interpretation depends on how controls were sampled.

Introduction and Overview

Earlier sections settled how the case series and the control group are assembled. The remaining design choice is how time enters the picture, specifically, whether the controls are people who survived the whole study period without becoming cases (a closed-population, risk-based design) or people sampled from the population at the moment each case occurs (an open-population, rate-based design). That choice changes what the odds ratio you eventually compute actually means. The two halves of this section unpack each design in turn, ending in matched 2×2 tables and the equations that connect them to risk and rate ratios.

Learning Objectives

Describe the data layout and sampling approach for risk-based case-control studies.
Derive and interpret the odds ratio (OR) in a risk-based design (Eq 9.1).
Describe the data layout and incidence density sampling for rate-based case-control studies.
Explain why the OR estimates the risk ratio in risk-based designs and the rate ratio in rate-based designs.

Risk-Based Case-Control Designs (Section 9.5)

The traditional approach to case-control studies has been risk-based (cumulative incidence) design. Controls are selected from among the people that did not become cases by the end of the study period. A subject can be selected as a control only once.

Design Requirements

This design is appropriate if the population is closed and is most informative if the risk period for the outcome has ended before subject selection begins. It fits situations such as outbreaks from infectious or toxic agents where the risk period is short and essentially all cases have occurred within the defined study period.

2×2 Table: Risk-Based Case-Control Design

The closed-source population can be categorised with respect to exposure and outcome (upper-case = population, lower case = sample):

	Exposed	Non-exposed	Total
Cases	a₁	a₀	m₁
Controls (Non-cases)	b₁	b₀	m₀

The cases (M₁) are those that arose during the study period, while the controls (M₀) are those that remained free of the outcome. Usually all or most cases are included (sampling fraction sf among cases approaches 1). We select controls independently of exposure status so that the sampling fractions in the two exposure groups should be equal:

The measure of association in risk-based designs is the odds ratio (OR):

Eq 9.1

\[ \color{#0B7B6B}{OR} = \frac{\color{#C2410C}{a_1}/\color{#1D4ED8}{a_0}}{\color{#6D28D9}{b_1}/\color{#BE185D}{b_0}} = \frac{\color{#C2410C}{a_1}\,\color{#BE185D}{b_0}}{\color{#6D28D9}{b_1}\,\color{#1D4ED8}{a_0}} \]

The odds ratio compares the exposure odds among exposed cases and unexposed cases against the same odds among exposed controls and unexposed controls. It reduces to the cross-product of the four cell counts.

What Does the OR Estimate?

The OR is a valid measure of association in its own right. It also estimates the ratio of risks (RR) if the outcome is relatively infrequent (e.g., <5%) in the source population. Whether the OR approximates the RR or rate ratio depends on the study design and assumptions about the source population (Knol, Vandenbroucke, Scott, & Egger, 2008).

The risk-based design works beautifully when the population stays put for the whole study window, an outbreak investigation, a closed cohort with short follow-up. Most of the populations epidemiology actually studies do not behave that way: people enter, leave, age, and accumulate exposure over time. For those populations the case-control design has to be rebuilt around person-time rather than head counts.

Rate-Based Case-Control Designs (Section 9.6)

Because the populations we study are often open, the case-control designs for these populations should use a rate-based approach (incidence density sampling), which ensures that the time-at-risk is taken into account when control subjects are selected.

2×2 Table: Rate-Based Case-Control Design

	Exposed	Non-exposed	Total
Cases	A₁	A₀	M₁
Person-time at risk	T₁	T₀	T

Recall that in a cohort study, the two rates of interest would be:

Eq 9.2

\[ \color{#0B7B6B}{I_1} = \frac{\color{#C2410C}{A_1}}{\color{#1D4ED8}{T_1}} \qquad \color{#6D28D9}{I_0} = \frac{\color{#C2410C}{A_0}}{\color{#1D4ED8}{T_0}} \]

The incidence rate in the exposed and the incidence rate in the unexposed are each the new cases in that group divided by its person-time at risk.

In a rate-based case-control study, we select controls using a sampling rate (sr) that is equal in exposed and non-exposed populations:

Eq 9.3

\[ \color{#0B7B6B}{sr} = \frac{\color{#6D28D9}{b_1}}{\color{#1D4ED8}{T_1}} \approx \frac{\color{#BE185D}{b_0}}{\color{#1D4ED8}{T_0}} \]

A common control sampling ratio means the exposed controls and unexposed controls are each drawn as the same fraction of their group’s person-time. This is what lets control counts stand in for person-time.

Therefore, the ratio of exposed to unexposed controls equals the ratio of the cumulative exposed and unexposed subject times:

Eq 9.4

\[ \frac{\color{#6D28D9}{b_1}}{\color{#BE185D}{b_0}} \approx \frac{\color{#1D4ED8}{T_1}}{\color{#047857}{T_0}} \]

It follows that the ratio of exposed controls to unexposed controls approximates the ratio of exposed person-time to unexposed person-time.

This means the OR from the case-control data estimates the incidence rate ratio (IR) in the source population:

Eq 9.5

\[ \frac{\color{#C2410C}{a_1}/\color{#6D28D9}{b_1}}{\color{#1D4ED8}{a_0}/\color{#BE185D}{b_0}} \approx \frac{\color{#0B7B6B}{A_1}/T_1}{\color{#0B7B6B}{A_0}/T_0} \]

Putting the pieces together, the case-control odds ratio (built from exposed cases, exposed controls, unexposed cases, and unexposed controls) approximates the underlying incidence-rate ratio in the source population.

Key Advantage of Rate-Based Design

In this design, the OR estimates the IR (from a cohort study) and no assumption about rarity of outcome is necessary for a valid estimate. This is a major advantage over risk-based designs where the rare disease assumption is needed for the OR to approximate the RR.

Equations 9.2–9.5 describe the relationship between the case-control sample and the underlying source-population rates. The practical question is how to draw the control sample so those equations actually hold. The answer is a specific sampling rule.

Incidence Density Sampling

The most common method of obtaining controls is by selecting a specified number of non-cases from the risk set, matched time-wise to the occurrence of each case. This is called incidence density sampling. At each time a subject develops the outcome, we choose b controls from the non-case subjects that exist in the source population at that point. Key features:

We do not need to know the time-at-risk for potential controls.
We do not need to assume the population is stable.
The number of controls per case can vary.
Subjects initially identified as controls can subsequently become cases.
Controls can subsequently become cases (and vice versa in rate-based designs).

Key Takeaways

Risk-based designs use closed populations; the OR estimates the RR when the outcome is rare (Eq 9.1).
Rate-based designs use open populations and incidence density sampling (Eqs 9.2–9.5).
In rate-based designs, the OR directly estimates the IR with no rarity assumption needed.
Incidence density sampling matches controls to cases by time of occurrence.

The reflection below pulls earlier sections together: it asks you to make the design-flavor choice explicitly and trace its consequences for interpretation. A later section then closes out the lesson with the four practical questions that any case-control investigator has to answer once the design is chosen, how many controls, whether to use multiple control groups, how to assess exposure, how to keep cases and controls comparable, and how to report the result honestly.

Reflection

Why is the distinction between risk-based and rate-based case-control designs important for interpreting the odds ratio? In what situations would you recommend a rate-based design over a risk-based design, and how would this affect control selection?

Model answerThe risk-based (cumulative-incidence) design selects controls at the start of follow-up (or among those disease-free at end) and yields an OR that approximates the risk ratio only when the outcome is rare; the OR is on the wrong scale for a common outcome. The rate-based (incidence-density) design samples controls at the time each case occurs (risk-set sampling), and the resulting OR estimates the incidence rate ratio (IRR) directly, with no rare-disease assumption. Recommend rate-based whenever (a) the outcome is common, (b) follow-up is long with substantial loss or competing risks, or (c) you want hazard-style interpretation. Control selection under risk-set sampling means a person can be a control at one time and a case later, and the same person can be selected as a control more than once.

Minimum 20 characters required.

✓ Reflection saved

Section 4

Comparability, Analysis & Reporting

⏱ Estimated reading time: 15 minutes

Section 4 of 4

Comparability, Analysis & Reporting

Number of controls, comparability tools, analysis, and STROBE reporting guidelines.

Controls per case

More controls improve precision, up to a point

Multiple control groups

Useful check, limited payoff

Abubaker et al., 2007

Crohn's disease. Hospital and community controls. Nine hospitals in England.

Brenner et al., 2010

Lung cancer in non-smokers, Toronto. Population-based and hospital-based controls.

General experience: when both groups agree, no new information. When they disagree, it is hard to know which to trust.

Exposure measurement

Blinding and reference dates

Blinding data collectors

When feasible, collectors should not know whether an individual is a case or control, preventing differential probing.

Reference dates

Cases: exposure at time of outcome. Controls: exposure at a comparable pre-defined reference date. Use the same measurement process for both groups.

Comparability tools

Exclusion, matching, and analytic control

Exclusion

Remove anyone whose characteristics would compromise the comparison, such as prior disease.

Matching

Pair each case to controls on specific variables. Requires matched analysis using conditional logistic regression.

Analytic control

Multivariable adjustment in the analysis. The most commonly used approach in practice.

STROBE

Reporting guidelines specific to case-control studies

Item 6a: Eligibility criteria, case ascertainment sources, control selection methods, and rationale.
Item 6b (matched): Matching criteria and number of controls per case.
Item 12: How matching was handled in the analysis.
Item 15: Numbers in each exposure category.

A study that cannot satisfy these reporting items has a design problem, not merely a reporting problem.

Carry forward

The design cycle, complete

Three to four controls per case is the practical ceiling for precision gain.
Exposure must be measured with the same process for cases and controls; blind data collectors when possible.
Comparability is engineered through exclusion, matching, and analytic control, not assumed.
STROBE items 6a, 6b, 12, and 15 are the case-control-specific reporting requirements.

Introduction and Overview

Earlier sections walked through the major design choices: study base, case ascertainment, control selection principles, and the risk-based/rate-based split. By the time you get here, the design is essentially set. This section is about the practical implementation choices that follow, how many controls, whether to use more than one control group, how to assess exposure, how to maintain comparability, how to analyse the resulting data, and how to report it. Each of these is a place where a sound design can still be undermined.

Learning Objectives

Discuss the number of controls per case and the use of multiple control groups.
Describe exposure and covariate assessment in case-control studies.
Explain the three approaches to keeping cases and controls comparable.
Describe the analysis of case-control data and STROBE reporting guidelines.

Number of Controls per Case (Section 9.8)

Most studies use a 1:1 case-control ratio; however, other than being statistically efficient, there is nothing magical about this ratio. If the information on covariates and exposure is already recorded (i.e., exposure data is ‘free’), one might use all qualifying non-cases as controls to avoid sampling issues.

Practical Guidelines

When the number of cases is small, the precision of association measures can be improved by selecting more than one control per case. There are formal approaches for deciding the optimal number (Schlesselman, 1974), but usually the benefit of increasing the number of controls per case is small; often 3–4 controls per case is the practical maximum.

Number of Control Groups (Section 9.9)

Beyond the question of how many controls per case is the related question of how many control groups. Some researchers use multiple control groups to balance a perceived bias with one specific control group (Examples 9.5 and 9.6). However, this should be clearly defined, as it adds complexity and can be difficult to interpret if the different control groups produce different results. The two examples below show the strategy in practice; in both cases the second control group functioned mainly as a robustness check on the first.

Example 9.5, Secondary-Base Study with Population Controls

Abubakar et al (2007) studied Crohn’s disease risk factors from 9 hospitals in England using both hospital-derived and community controls. The a priori design was matched with 104 cases. For community controls, 2 general practitioners per Crohn’s patient were randomly selected, matched by age (±1 year) and gender. The authors noted that the choice of control group had little impact on their results.

Example 9.6, Primary-Care and Population-Based Controls

Brenner et al (2010) evaluated lung cancer risk factors in never-smokers in Toronto. They used both population-based controls (randomly sampled from property tax files, n=425) and hospital-based controls (from a family medicine clinic, n=523). Unconditional logistic regression models were used. A separate analysis based on 156 non-smoking cases with 466 non-smoking controls confirmed the main findings.

Once you have settled how many controls and how many control groups, the next implementation question is how exposure and covariates are actually measured, and, especially in retrospective designs, how to keep that measurement from being shaped by case status itself.

Exposure & Covariate Assessment (Section 9.10)

Most case-control studies are retrospective, so a concise, workable definition of ‘exposure’ (and also of confounders) is needed when implementing the study design. When ascertaining exposure status and information on confounders, it is preferable to obtain the greatest accuracy possible using the same process for both cases and controls.

General Rules for Exposure Assessment

When possible, have data collectors blinded to case status. As a general rule, the exposure status of cases should be the exposure category that existed at the time of outcome occurrence. For controls, their exposure status reflects their exposure situation at the time of their selection.

Keeping Cases and Controls Comparable (Section 9.11)

Accurate exposure measurement is necessary but not sufficient. Even with perfect measurement, a confounded comparison gives biased answers. The three flip cards below describe the three standard tools for preventing that, restriction, matching, and analytic control. They are not interchangeable, and a well-designed case-control study often uses two or three of them in combination.

Exclusion / Inclusion

Click to learn more

Matching

Click to learn more

Analytic Control

Click to learn more

With design choices and comparability tools in place, what remains is the analysis itself, and, just as importantly, what the resulting odds ratio means under each combination of design and sampling decision we have made so far.

Analysis of Case-Control Data (Section 9.12)

The data format and analysis for both risk-based and rate-based designs proceeds in a similar manner. In a 2×2 table:

	Exposed	Non-exposed	Total
Cases	a₁	a₀	m₁
Controls	b₁	b₀	m₀

Remember that we cannot directly estimate disease frequency (unless the study is nested) because the m₁:m₀ ratio was fixed by the sampling design. Chapter 6 outlines the analysis including hypothesis testing, estimating the odds ratio, and developing confidence intervals.

The three tabs below summarize what the OR estimates under each of the design combinations we have built up across this lesson. The pattern to read away with: the same number, computed from the same 2×2 table, is interpreted differently depending on the sampling decisions you made before any data were collected.

With risk-based designs and sampling of controls at the end of the follow-up period, the odds ratio estimates the risk ratio if the frequency of disease in the source population is low (e.g., below 10%), and censoring is unrelated to exposure.

If concurrent sampling (incidence density sampling) is used, the odds ratio estimates the rate ratio in both closed and open populations. For validity, stability of exposure is needed in the closed population but not in the open population.

When controls are selected from an open population without concurrent sampling of controls, the odds ratio estimates the rate ratio only if the population is stable, otherwise it is just the odds ratio. If matching is used to select controls but is ignored in the analysis, the impact depends on the extent of exposure changes during the study period (Knol, Vandenbroucke, Scott, & Egger, 2008).

The final piece of the implementation puzzle is making the design transparent to the next reader. The STROBE statement, which we previewed in an earlier lesson and then introduced in an earlier lesson, has a case-control extension that names the items most likely to go missing in a write-up.

Reporting Guidelines (Section 9.13)

Vandenbroucke et al (2007) described the key elements of case-control studies that should be reported (STROBE). The complete listing is in Table 7.3; items specific to case-control studies are included in Table 9.1, expanded in the accordion below.

Table 9.1, STROBE Items Specific to Case-Control Studies ▼

Methods:

Item 6a: Give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls.
Item 6b: For matched studies, give matching criteria and the number of controls per case.
Item 12: If applicable, explain how matching of cases and controls was addressed.

Results:

Item 15: Report numbers in each exposure category, or summary measures of exposure.

Key Takeaways

3–4 controls per case is usually the practical maximum for improving precision.
Multiple control groups add complexity; the general experience is that more than one control group has limited value.
Exposure assessment should use the same process for cases and controls, with blinding when possible.
Comparability is achieved through exclusion, matching, or analytic control (multivariable techniques).
What the OR estimates (RR or IR) depends on the study design and sampling approach.

The reflection below is the section's payoff, a short colleague-asks-you-a-question prompt that requires you to use everything in this lesson to give a careful answer. Once you have worked through it and the knowledge check, the lesson moves to its final assessment, which integrates earlier sections.

Reflection

A colleague presents a case-control study with an odds ratio of 2.5 and asks: “Does this mean exposed people have 2.5 times the risk?” How would you respond? Consider the study design (risk-based vs. rate-based), the rarity of the outcome, and what the OR actually estimates under different conditions.

Model answer"Not necessarily" is the honest answer. The OR is mathematically equivalent to a risk ratio only when (a) the design is rate-based with risk-set sampling, so the OR estimates the IRR by construction, or (b) the outcome is genuinely rare in the source population (say, <10% cumulative incidence), making OR ≈ RR. For a common outcome (or a risk-based design without that approximation) an OR of 2.5 systematically overstates the risk ratio: e.g., baseline risk 0.20 vs. 0.50 corresponds to OR = 4.0, not 2.5. The right reply is to ask: how were controls sampled? What is the outcome prevalence in the source population? In what scenario should the OR be interpreted as anything other than a contrast of odds?

Minimum 20 characters required.

✓ Reflection saved

Section 5

Final Review & Assessment

⏱ Estimated time: 20 minutes

Bringing It All Together

This lesson worked from the inside out. An earlier section fixed the conceptual core, cases and controls must arise from a single, well-defined study base, and getting that right is what separates a clean primary-base design from a shaky secondary-base one. Earlier sections then translated that core into mechanics: how to define and recruit a case series, the four principles of control selection, and the choice between risk-based sampling (which gives you the OR as an approximation of the risk ratio) and rate-based / incidence density sampling (which gives you the OR directly as a rate ratio).

An earlier section closed the loop by moving from design to execution and reporting, choosing the number of controls, deciding when to use multiple control groups, ensuring comparability through exclusion, matching, and analytic control, and then communicating the whole package transparently using STROBE. Read end-to-end, the lesson is a single argument: case-control studies are valuable precisely because they are efficient, but every efficiency has a price, and the design choices you make have to be explicit, defensible, and reported.

The final reflection asks you to put that argument to work by sketching a brief case-control proposal of your own. The 15-question assessment then checks the conceptual content directly. From here, a later lesson turns the design around to follow exposed and unexposed people forward in time (cohort studies) and a lot of the vocabulary you just built (study base, comparability, sampling logic) will travel with you.

Key Takeaways from this lesson

A valid case-control study begins with a clearly specified study base; primary-base designs make it explicit, secondary-base designs reconstruct it.
Cases must satisfy a stable diagnostic definition; whether you use incident or prevalent cases changes what your odds ratio means.
Controls must be sampled from the same source population as the cases, independently of exposure, the four principles of control selection are non-negotiable.
Risk-based sampling and rate-based (incidence density) sampling answer different questions; the OR estimates a risk ratio in one and a rate ratio in the other.
Comparability is engineered, not assumed (through exclusion, matching, and analytic control) and matched designs require matched analyses.
Transparent reporting using STROBE is the bridge between a defensible design and a study other people can appraise, replicate, or extend.

R Activity, Odds ratios and 95% CIs from a 2x2 table

The companion R script r-activities/HSCI_230_Lesson_4_Case_Control_Studies.R walks you through the canonical case-control analysis: build a 2x2 table of cases and controls by exposure, compute the odds ratio as (a*d)/(b*c), and bracket it with a 95% confidence interval using the Woolf (log-OR) method. A short stretch block at the end repeats the same calculation with epitools::oddsratio() so you can compare the by-hand answer to a packaged one.

tab <- matrix(c(45, 60,
                5,  90),
              nrow = 2, byrow = FALSE,
              dimnames = list(Status = c("Case", "Control"),
                              Smoke  = c("Yes", "No")))
tab

# Odds ratio = (a*d) / (b*c)
a <- tab["Case",    "Yes"]; b <- tab["Case",    "No"]
c <- tab["Control", "Yes"]; d <- tab["Control", "No"]
OR <- (a * d) / (b * c)
OR

# 95% CI from the log-OR (Woolf method)
log_se <- sqrt(1/a + 1/b + 1/c + 1/d)
ci <- exp(log(OR) + c(-1, 1) * 1.96 * log_se)
round(c(OR = OR, lower = ci[1], upper = ci[2]), 2)

## -----------------------------------------------------------------------------
## Stretch: same analysis with epitools::oddsratio()
## -----------------------------------------------------------------------------
# install.packages("epitools")    # uncomment if not already installed
# library(epitools)
# oddsratio(tab, method = "wald")

Reflection

Design a brief case-control study proposal for a health question of your choice. Specify: (1) the research question, (2) whether you would use a primary or secondary study base and why, (3) how you would define and identify cases, (4) how you would select controls and from what source, (5) whether a risk-based or rate-based design is more appropriate, and (6) how you would ensure comparability.

Model answer(1) Research question: Does occupational exposure to organic solvents increase the odds of early-onset Parkinson's disease (onset < 55)? (2) Study base: primary base, all adults aged 30–55 covered by BC MSP for at least 5 years prior to index date, ensuring case and control denominators come from the same population. (3) Case definition: incident PD diagnosed by a movement-disorder neurologist (UK Brain Bank or MDS criteria), verified by chart review; first PD-coded visit within the study window. (4) Controls: population sample drawn from MSP rolls, matched on age (5-y), sex, and FSA, with random sampling at the time of each case (risk-set sampling for rate-based design). (5) Design: rate-based, because PD develops over decades and we want IRR interpretation without invoking rare-disease approximation. (6) Comparability: structured occupational history with job-exposure-matrix coding (blinded to case status), DAG-guided adjustment for smoking and pesticide exposure, sensitivity analysis to recall bias, and two interviewers per region to avoid interviewer effects.

Minimum 20 characters required.

✓ Reflection saved

Final Knowledge Assessment

This assessment covers all sections of this lesson. You must score 100% to complete the lesson. Review the feedback after each attempt.

🎉 Congratulations!

You have completed this lesson: Case-Control Studies.

You now understand the design, implementation, analysis, and reporting of case-control studies, including risk-based and rate-based designs, control selection principles, and STROBE reporting guidelines.

A later lesson closes the trio of observational analytic designs by turning the case-control logic on its head: instead of sampling on the disease and looking back at exposure, cohort studies sample on the exposure and follow forward to the disease. That inversion solves the rare-disease assumption you wrestled with in an earlier section and lets you measure incidence directly; but it introduces its own problems, especially loss to follow-up and the long timelines required to see chronic outcomes.

HSCI 230, Lesson 4

Evaluating Epidemiological Research

Case-ControlStudies

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction & The Study Base

Introduction & The Study Base

Start with disease, look back at exposure

Cases

Controls

What is a case-control study?

Controls represent the source population

Primary and secondary study bases

Primary base

Secondary base

Four studies that anchor the lesson

Dogan et al., 2010

Dore et al., 2004

Magura et al., 2008

Rodrigo et al., 2011

The odds ratio from a 2×2 table

Introduction and Overview

Learning Objectives

What Is a Case-Control Study?

Key Distinction

The Study Base

Key Examples

Key Takeaways

First, what are “odds”?

R Reflect on what you just ran

Reflection

Reflection

The Case Series & Principles of Control Selection

The Case Series & Principles of Control Selection

Who counts as a case?

Diagnostic criteria

Completeness

Always prefer incident cases

Four principles of control selection

Six sources and their trade-offs

Population controls

Hospital controls

Friend / neighbour controls

Random digit dialling

Cases and controls are one linked decision

Introduction and Overview

Learning Objectives

The Case Series (Section 9.3)

Incident vs. Prevalent Cases

Where Do Cases Come From?

Diagnostic Criteria

Principles of Control Selection (Section 9.4)

The Four Major Principles

Sources of Controls

Key Takeaways

Reflection

Reflection

Controls in Risk-Based & Rate-Based Designs

Controls in Risk-Based & Rate-Based Designs

Risk-based design and Equation 9.1

Rate-based design: incidence density sampling

Rate-based OR estimates the incidence rate ratio

Risk-based

Rate-based

Advantages of incidence density sampling

Design complete; execution next

Introduction and Overview

Learning Objectives

Risk-Based Case-Control Designs (Section 9.5)

Design Requirements

2×2 Table: Risk-Based Case-Control Design

What Does the OR Estimate?

Rate-Based Case-Control Designs (Section 9.6)

2×2 Table: Rate-Based Case-Control Design

Key Advantage of Rate-Based Design

Incidence Density Sampling

Key Takeaways

Reflection

Reflection

Comparability, Analysis & Reporting

Case-Control
Studies