Review of Study
Design Concepts

Fundamental Epidemiological Concepts and Approaches

Learning objectives for this lesson:

Distinguish between observational and experimental study designs
Compare and contrast descriptive, analytic, and cross-sectional studies
Describe the key features, strengths, and limitations of cohort and case-control studies
Identify the direction of inquiry and appropriate measures of association for each design
Explain the logic and limitations of ecological studies
Define the ecologic fallacy and recognize when it may occur
Describe how systematic reviews synthesize evidence across study types
Apply study design concepts to select appropriate designs for research questions

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Foundational Concepts

Observational Study A study in which the investigator does not assign exposures; participants are observed under their existing exposures. Includes cross-sectional, case-control, cohort, and ecological designs.

Experimental (Intervention) Study A study in which the investigator actively assigns exposure, ideally at random, and follows participants for outcomes. Includes RCTs, field trials, and community trials.

Descriptive Study A study that describes the distribution of disease or exposure by person, place, and time without testing specific causal hypotheses.

Analytic Study A study designed to test specific hypotheses about exposure-outcome relationships, typically with comparison groups and a measure of association.

Observational Study Designs

Cross-Sectional Study Exposure and outcome are measured simultaneously in a sample at one point in time. Yields prevalence and prevalence ratios; cannot establish temporal sequence.

Case-Control Study Cases (with disease) and controls (without) are sampled separately and compared on past exposure. Direction of inquiry: outcome → exposure. Yields odds ratios; efficient for rare diseases.

Cohort Study Disease-free participants classified by exposure are followed forward in time to compare incidence between groups. Direction: exposure → outcome. Yields risk ratios, rate ratios, and risk differences.

Prospective Cohort A cohort assembled and followed forward from the time exposure is measured. Strong on data quality; demanding in time and cost.

Retrospective (Historical) Cohort A cohort defined and exposure ascertained using existing records, with outcomes also already in the past. Faster and cheaper but reliant on records of variable quality.

Nested Case-Control Study A case-control study conducted within an established cohort, where controls are sampled from cohort members at risk at the time each case occurs. Efficient for expensive measurements.

Case-Cohort Study A variant in which controls are a random subcohort sampled at baseline (rather than matched to cases). One subcohort can serve multiple outcomes.

Ecological Study Unit of analysis is a group (country, region, school) rather than an individual. Useful for hypothesis generation and policy comparisons; vulnerable to the ecological fallacy.

Ecological Fallacy The error of inferring individual-level relationships from group-level associations. The opposite (atomistic fallacy) is inferring group-level relationships from individual-level data.

Case Series / Case Report Descriptive accounts of one or several patients with a particular condition. No comparison group; useful for hypothesis generation.

Experimental & Hybrid Designs

Randomized Controlled Trial (RCT) An experiment in which participants are randomly assigned to intervention and control groups. Random assignment balances measured and unmeasured confounders in expectation.

Cluster Randomized Trial An RCT in which intact groups (e.g., clinics, schools) rather than individuals are randomized. Used when contamination across individuals is a concern; analysis must account for clustering.

Quasi-Experimental Study An intervention study in which assignment is not randomized, using methods like interrupted time series, regression discontinuity, or natural experiments to approximate causal inference.

Hybrid Effectiveness-Implementation Design A design that simultaneously evaluates the clinical effectiveness of an intervention and an implementation strategy, commonly classified as Type 1, 2, or 3 depending on the relative emphasis.

Systematic Review / Meta-Analysis A structured synthesis of all available studies on a question, sometimes including a quantitative pooling (meta-analysis) of effect estimates. Sits at the top of evidence hierarchies.

Selection & Inference Concepts

Direction of Inquiry Whether a design moves from exposure to outcome (cohort, RCT) or from outcome back to exposure (case-control). Determines which measures of association are estimable.

Incidence-Density Sampling A control-selection strategy in nested case-control studies where controls are chosen at the moment each case occurs, allowing the OR to estimate the rate ratio.

Loss to Follow-Up Participants in a cohort or trial who drop out before the outcome is ascertained. Differential loss between groups can bias the effect estimate.

Hierarchy of Evidence A ranking of study designs by how strongly they support causal claims, typically systematic reviews / meta-analyses > RCTs > cohorts > case-control > cross-sectional > ecological > expert opinion.

No matching entries. Try a different search term.

Section 2

Review of Cohort & Case-Control Studies

⏱ Estimated reading time: 18 minutes

Section 2 of 3

Review of Cohort & Case-Control Studies

Direction of inquiry, measures of association, and when each design is the right tool.

The forward design

Cohort studies

Direction: Exposure → Outcome. Incidence can be measured directly.

Key measures: Risk ratio (relative risk), rate ratio, risk difference.

Two temporal forms

Prospective vs. retrospective cohort

Prospective

Enrol now; follow forward. Exposure assessed before outcome. Minimizes recall bias; establishes temporal sequence clearly. Expensive and slow for rare or late-onset outcomes.

Retrospective (historical)

Reconstruct past cohort from existing records; outcomes may already have occurred. Faster and cheaper, but limited to variables recorded at the time. Variable quality is the main vulnerability.

Measures of association

What cohort studies can estimate

Risk Ratio (RR)

\[ \color{#0B7B6B}{RR} = \frac{\color{#C2410C}{a/(a+b)}}{\color{#1D4ED8}{c/(c+d)}} \]

RR risk ratioa/(a+b) incidence in the exposedc/(c+d) incidence in the unexposed

Risk Difference (RD)

\[ \color{#0B7B6B}{RD} = \frac{\color{#C2410C}{a}}{a+b} - \frac{\color{#1D4ED8}{c}}{c+d} \]

RD risk differencea/(a+b) incidence in the exposedc/(c+d) incidence in the unexposed

Rate ratios use person-time denominators in place of cumulative incidence. All three measures require that you can observe incidence directly, which is why they belong to cohort designs.

The backward design

Case-control studies

Odds Ratio (OR)

\[ \color{#0B7B6B}{OR} = \frac{\color{#C2410C}{a}/\color{#1D4ED8}{c}}{\color{#BE185D}{b}/\color{#6D28D9}{d}} = \frac{\color{#C2410C}{a}\color{#6D28D9}{d}}{\color{#BE185D}{b}\color{#1D4ED8}{c}} \]

OR odds ratioa exposed casesb exposed controlsc unexposed casesd unexposed controls

The cross-product ad / bc pairs the two diagonals. When disease is rare, OR ≈ RR.

When to use which

Matching design to question

Feature	Cohort	Case-Control
Direction	Exposure → Outcome	Outcome → Exposure
Measures incidence?	Yes	No
Primary measure	Risk ratio, rate ratio	Odds ratio
Best for	Rare exposures	Rare diseases
Key bias	Loss to follow-up	Recall bias, selection bias

Carry forward

What to take into the next section

Direction of inquiry determines which measures are valid: risk and rate ratios belong to cohort designs; the odds ratio belongs to case-control designs.
The rare disease assumption: when disease prevalence is substantial, OR diverges meaningfully from RR.
Both designs are vulnerable to specific biases; knowing which bias matters for which design is part of critical appraisal.

Introduction and Overview

An earlier section reviewed cross-sectional designs, useful for prevalence estimation but limited for causal inference. This section turns to the two analytic designs that can support causal claims: cohort studies (sampling on exposure, following forward to disease) and case-control studies (sampling on disease, looking backward at exposure). Both produce specific measures of association you computed by hand in an earlier lesson.

Learning Objectives

Describe the design, direction of inquiry, and key features of cohort studies.
Describe the design, direction of inquiry, and key features of case-control studies.
Identify the appropriate measures of association for each design.
Compare the strengths and limitations of both designs.

Cohort Studies

A cohort study begins by identifying a group of individuals (a cohort) who are free of the outcome of interest, classifying them by exposure status, and following them over time to observe whether the outcome develops. The direction of inquiry moves from exposure to outcome, the same temporal direction as causation itself.

Types of Cohort Studies

Prospective (Concurrent) Cohort Study

Participants are enrolled in the present, exposure status is assessed, and they are followed forward in time to observe the development of outcomes. This is the classic cohort design. The Framingham Heart Study, which began enrolling participants in 1948 and continues to follow their descendants, is one of the most famous examples (Mahmood, Levy, Vasan, & Wang, 2014).

Advantage: Exposure is measured before the outcome occurs, minimizing recall bias and clearly establishing temporal sequence.

Disadvantage: Can be very expensive and time-consuming, especially for diseases with long latency periods or low incidence.

Retrospective (Historical) Cohort Study

The investigator uses historical records (e.g., employment records, medical charts, or registry data) to reconstruct a cohort whose exposure status was determined in the past. The outcomes may have already occurred or can be ascertained in the present.

Advantage: Faster and less expensive than a prospective study because the waiting period for outcome development has already elapsed.

Disadvantage: Relies on the quality and completeness of existing records. Important variables may not have been measured or may be recorded inconsistently.

Direction of Inquiry

Figure 8.1. Direction of inquiry in cohort vs. case-control studies. Cohort studies move from exposure to outcome; case-control studies start with outcome and look back at exposure.

Measures of Association in Cohort Studies

Before the formulas, it helps to fix the notation. Nearly every measure in this lesson is read off a 2×2 table that cross-classifies each person by exposure (the rows) and by disease (the columns). The four counts are labelled a, b, c, and d:

	Disease +	Disease −	Row total
Exposed	a	b	a + b
Unexposed	c	d	c + d

So a is the number of exposed people who develop the disease, and a + b is everyone who was exposed. That makes a/(a+b) the proportion of the exposed who get the disease and c/(c+d) the same proportion among the unexposed. With that picture in mind: because cohort studies follow participants over time and observe new cases, they can directly estimate incidence, which allows the calculation of:

Risk Ratio (Relative Risk): The ratio of cumulative incidence in the exposed group to cumulative incidence in the unexposed group. RR = (a/(a+b)) / (c/(c+d)).
Rate Ratio: The ratio of incidence rates (person-time denominators) in exposed vs. unexposed groups.
Risk Difference (Attributable Risk): The absolute difference in incidence between exposed and unexposed groups.

Why Cohort Studies Are Well Suited to Causal Questions

The ability to measure incidence directly is the central strength of the cohort design. It establishes temporal sequence (exposure precedes outcome), allows calculation of multiple measures of association, and can study multiple outcomes associated with a single exposure. For rare exposures, cohort studies are particularly efficient because you can intentionally over-sample exposed individuals.

Case-Control Studies

A case-control study begins by identifying individuals who have the outcome of interest (cases) and a comparable group who do not (controls). The investigator then looks backward in time to compare the exposure histories of the two groups. The direction of inquiry moves from outcome to exposure.

Selecting Cases and Controls

The validity of a case-control study depends critically on appropriate selection of cases and controls (Wacholder, McLaughlin, Silverman, & Mandel, 1992):

Case Selection ▼

Cases should be clearly defined using consistent diagnostic criteria. They may be incident cases (newly diagnosed) or prevalent cases (existing), though incident cases are preferred to reduce survival bias. Cases are typically identified from hospitals, disease registries, surveillance systems, or population-based sampling frames.

Control Selection ▼

Controls should come from the same source population that gave rise to the cases. They should be people who, had they developed the disease, would have been identified as cases in the study. Common sources include hospital-based controls (patients admitted for other conditions), population-based controls (random samples from the community), and neighbourhood or friend controls. The choice of control group is often the most scrutinized aspect of a case-control study.

Matching ▼

Matching involves selecting controls that are similar to cases on specific characteristics (e.g., age, sex, geography). Matching helps control for confounding by these variables and can improve statistical efficiency. However, matched variables can no longer be evaluated as risk factors, and matched analyses require conditional logistic regression.

Measures of Association in Case-Control Studies

Because case-control studies do not follow a defined cohort over time, they cannot directly calculate incidence. Therefore, the risk ratio cannot be computed directly. The deeper reason is that the investigator decides in advance how many cases and how many controls to enrol, so the fraction of the sample that is diseased is fixed by that design choice rather than by how common the disease actually is; a proportion like a/(a+b) therefore does not estimate a genuine risk. Instead, the primary measure of association is the odds ratio (OR):

The Odds Ratio

The odds ratio compares the odds of exposure among cases to the odds of exposure among controls: OR = (a/c) / (b/d) = ad/bc. Under certain conditions, particularly when the disease is rare in the source population (the "rare disease assumption"), the odds ratio approximates the risk ratio. This is why case-control studies are especially useful for studying rare diseases: the rare disease assumption is most likely to hold, and the design efficiently identifies a sufficient number of cases.

Here is the intuition for why rarity matters. The risk of disease in the exposed is a/(a+b), while the odds of disease is a/b. When the disease is rare, very few of the exposed are cases, so a is tiny next to b; that makes a+b almost equal to b, and so the risk a/(a+b) almost equals the odds a/b. The same holds among the unexposed, so the odds ratio and the risk ratio nearly coincide. When the disease is common, a is no longer negligible and the two measures pull apart, with the odds ratio landing farther from 1.

Comparing Cohort and Case-Control Designs

Feature	Cohort Study	Case-Control Study
Direction of inquiry	Exposure → Outcome	Outcome → Exposure
Starting point	Defined by exposure status	Defined by disease status
Measures incidence directly?	Yes	No
Primary measure of association	Risk ratio, rate ratio	Odds ratio
Best suited for	Rare exposures, multiple outcomes	Rare diseases, multiple exposures
Temporal sequence	Clearly established	Relies on retrospective data
Cost and time	Often expensive and lengthy	Generally less expensive and faster
Key biases	Loss to follow-up, selective attrition	Recall bias, selection bias in controls

Choosing the Right Design: A Practical Example

Suppose you want to study whether exposure to a specific industrial solvent increases the risk of a rare liver cancer. A prospective cohort study would require following thousands of exposed and unexposed workers for decades, which is extremely expensive and slow. A case-control study, by contrast, could identify 200 liver cancer cases from a cancer registry, select 400 matched controls, and assess past occupational exposure through interviews and employment records, producing results in months rather than years.

For rare diseases, case-control studies are usually the design of choice because they can efficiently identify enough cases to detect meaningful associations.

Key Takeaways

Cohort studies follow exposed and unexposed groups forward in time to observe outcome development; they can directly calculate incidence and risk ratios.
Case-control studies compare exposure histories of cases and controls; they estimate the odds ratio, which approximates the risk ratio when the disease is rare.
Cohort studies are preferred for rare exposures and when multiple outcomes are of interest; case-control studies are preferred for rare diseases and when multiple exposures are of interest.
Both designs are susceptible to specific biases: cohort studies to loss to follow-up, case-control studies to recall bias and control selection bias.

✦ Pass the knowledge check with 100% to continue

Section 3

Ecological Studies & Evidence Synthesis

⏱ Estimated reading time: 15 minutes

Section 3 of 3

Ecological Studies & Evidence Synthesis

Group-level analysis, the ecologic fallacy, Canadian data infrastructure, and systematic reviews.

The group-level design

Ecological studies: three types

Multi-group

Compare aggregate exposure and outcome across many groups at one point in time. Example: fat consumption vs. breast cancer rates across 30 countries.

Time-trend

Track exposure and outcome changes within one population over time. Example: cigarette sales and lung cancer rates across decades.

Mixed

Combines multi-group and time-trend features: changes across multiple groups over time.

Variable types

Three kinds of group-level variables

Aggregate

Summaries of individual-level data: mean blood pressure, percentage who smoke. Derived from individual measurements.

Environmental

Physical characteristics shared by all group members: air pollution, water fluoride, temperature. Inherently group-level.

Global

Group attributes with no individual analogue: population density, income inequality, healthcare system type.

The key hazard

The ecologic fallacy

An association observed at the group level cannot be assumed to hold at the individual level.Robinson, 1950

Aggregation conceals within-group variation. The exposed individuals and the diseased individuals may be entirely different people in the same group.

Classic example: Durkheim found higher suicide rates in more Protestant regions, but this group-level association does not establish that Protestant individuals specifically were at higher individual risk.

Canadian data

Infrastructure for observational research

PopData BC

Links administrative health records across BC: MSP billings, DAD, cancer registry, PharmaNet, vital statistics. Supports retrospective cohorts, nested case-control, and ecological analyses.

HDRN Canada

Federated network of provincial data centres (PopData BC, ICES, MCHP, and others). Enables multi-jurisdictional analyses without moving individual-level data.

CANUE

Postal-code-level environmental exposures: air pollutants, greenness, walkability, noise. Links to any cohort with postal codes, turning individual records into multilevel studies.

Synthesis

Systematic reviews and meta-analysis

Systematic review

Pre-specified protocol; comprehensive search; explicit inclusion criteria; risk-of-bias appraisal; qualitative or quantitative synthesis. Reproducible and transparent.

Meta-analysis

Statistical pooling of compatible study results. Increases precision, boosts power, and can explore heterogeneity by design, population, or exposure definition.

The hierarchy: systematic reviews > RCTs > cohort > case-control > cross-sectional > ecological > case reports. But hierarchy is a guide, not an absolute rule.

Carry forward

The hierarchy as a bias-control framework

Ecological studies: fast and policy-relevant; vulnerable to ecologic fallacy and group-level confounding.
Individual observational designs: address within-group variation; still face confounding and selection bias.
Experimental designs: control confounding via randomization; often infeasible for the most important public health questions.
Systematic reviews: add structure, not new data; synthesize evidence across designs and populations.

Introduction and Overview

Earlier sections covered designs that operate at the individual level. This section makes one final move, up to the group level, with ecological studies, and then steps back to ask how evidence from many such studies can be synthesized. The ecological fallacy you met in an earlier course reappears here as a specific limitation; systematic reviews are the formal answer to “what does the evidence as a whole say?”

Learning Objectives

Describe the design and purpose of ecological studies.
Explain the concept of group-level variables and the ecologic fallacy.
Discuss how systematic reviews synthesize evidence across study types.
Understand where ecological studies and systematic reviews fit in the hierarchy of evidence.

Ecological Studies

An ecological study (also called a group-level or aggregate study) uses groups, rather than individuals, as the unit of analysis. Instead of measuring exposure and outcome in each person, the investigator compares exposure levels and disease rates across defined populations, such as countries, regions, or time periods.

Types of Ecological Studies

Click each card to learn more:

Multi-Group
ComparisonClick to learn more

Time-Trend
(Temporal)Click to learn more

Mixed
DesignClick to learn more

Group-Level Variables

A distinctive feature of ecological studies is their reliance on group-level variables. These can be classified into three types:

Aggregate Measures ▼

Aggregate measures are summaries of individual-level data for a group, for example the mean blood pressure of a country's population, or the percentage of a county's residents who smoke. These are the most common type of group-level variable and are derived from individual measurements.

Environmental Measures ▼

Environmental measures are physical or chemical characteristics of the environment shared by all members of a group, for example ambient air pollution levels in a city, fluoride concentration in a water supply, or average annual temperature. These are inherently group-level and cannot be measured at the individual level.

Global Measures ▼

Global measures are attributes of the group that have no individual-level analogue, for example population density, the existence of a specific public health law, type of healthcare system, or the Gini coefficient (income inequality). These contextual factors can only be measured at the group level and are particularly interesting because they may represent structural or policy-level determinants of health.

The Ecologic Fallacy

The ecologic fallacy is the most important limitation of ecological studies. It occurs when an association observed at the group level is incorrectly assumed to hold at the individual level (Robinson, 1950/2009; Morgenstern, 1995). Just because countries with higher fat consumption have higher breast cancer rates does not mean that the individuals within those countries who eat more fat are the same individuals who develop breast cancer.

Classic Example: Durkheim and Suicide

Emile Durkheim observed that regions with higher proportions of Protestant residents had higher suicide rates than predominantly Catholic regions. However, this group-level association does not necessarily mean that Protestant individuals were more likely to commit suicide. It is possible that the social environment of predominantly Protestant regions, perhaps characterized by greater individualism or weaker social networks, affected everyone living there, including Catholics. Attributing the group-level finding to individual Protestants would be an ecologic fallacy.

The ecologic fallacy arises because within-group variation is hidden when data are aggregated. Individuals who are exposed and individuals who develop disease may be entirely different people within the same group.

Why Ecological Studies Are Still Valuable

Despite the ecologic fallacy, ecological studies serve several important purposes. They are often the only feasible design when individual-level data are unavailable (e.g., comparing disease rates across countries using national statistics). They are useful for studying the effects of policies, environmental exposures, and other group-level variables that cannot be measured individually. They are also inexpensive and quick, making them excellent for hypothesis generation. The key is to interpret ecological associations cautiously and seek corroboration from individual-level studies.

Canadian Data Infrastructure for Cohort, Case-Control, and Ecological Designs

Most epidemiological work in Canada does not involve enrolling a fresh cohort. Instead, researchers reuse data that have already been collected through health-system encounters, surveys, environmental monitoring, and registries. Three pieces of national infrastructure show up repeatedly:

Population Data BC (PopData BC)

A platform that links de-identified individual-level administrative data across the BC Ministry of Health, Vital Statistics, PharmaNet (every dispensed prescription), the BC Cancer Registry, MSP physician billings, hospital discharges (DAD), and education and social-services data. Researchers obtain a study-specific extract under a data access agreement.

Designs supported: retrospective cohorts, nested case-control, case-crossover, ecological/spatial analyses, intervention evaluations using natural experiments. Other provinces have analogous systems: ICES (Ontario), MCHP (Manitoba), HDNS (Nova Scotia), IRSPUM (Quebec).

Health Data Research Network Canada (HDRN Canada / SPOR DSP)

A federation of provincial data centres (PopData BC, ICES, MCHP, etc.) that supports multi-jurisdictional studies under a single application. Each centre runs the analysis behind its own firewall and only summary results are shared, so individual-level data never crosses provincial lines. Useful when you need national power or generalisability across health systems.

CANUE: Canadian Urban Environmental Health Research Consortium

A national repository of standardised, postal-code- and DA-level environmental exposures: air pollution (NO₂, PM_2.5, O₃), greenness (NDVI), walkability, noise, climate, neighbourhood SES indices. CANUE indicators can be linked to any cohort with postal codes (including PopData BC extracts and CCHS shared files), turning subject-level health data into ecological or multilevel exposure–outcome studies.

Worked Example: A PopData BC + CANUE Cohort Study

To estimate the effect of long-term PM_2.5 exposure on incident cardiovascular disease in BC adults, a researcher could:

Define the cohort using the PopData BC Consolidation File, that is, everyone with active MSP coverage on 1 Jan 2010 (a near-complete census of BC residents).
Pull baseline covariates from MSP and DAD (chronic conditions, comorbidity scores).
Link each person's six-character postal code to CANUE annual PM_2.5 estimates to assign exposure.
Follow forward in DAD/Vital Stats to ascertain incident MI, stroke, and CV death (the outcomes).
Apply Cox regression with a time-varying exposure, adjusting for area-level deprivation (also from CANUE).

This is a retrospective cohort with an environmental exposure, sitting at the boundary of cohort and ecological designs. The data come from three different stewards but no new participant was ever recruited.

Evidence Synthesis: Systematic Reviews

No single study, regardless of design, can definitively establish a causal relationship. Scientific knowledge accumulates through the synthesis of evidence across multiple studies, populations, and designs. Systematic reviews are the formal method for achieving this synthesis.

What Is a Systematic Review?

A systematic review uses a pre-specified, transparent, and reproducible protocol to identify, appraise, and synthesize all available evidence on a specific research question (Higgins et al., 2019). Unlike a narrative review (where the author selects studies subjectively), a systematic review:

Defines a precise research question (often using the PICO framework: Population, Intervention/Exposure, Comparison, Outcome).
Conducts comprehensive, systematic searches of multiple databases.
Applies explicit inclusion and exclusion criteria.
Critically appraises the quality and risk of bias of each included study.
Synthesizes findings quantitatively (meta-analysis) or qualitatively.

Meta-Analysis

When studies are sufficiently similar, their results can be combined statistically in a meta-analysis to produce a pooled estimate of effect (DerSimonian & Laird, 1986). Meta-analysis increases statistical power, provides more precise estimates, and can explore heterogeneity across studies (e.g., do results differ by study design, population, or exposure definition?).

The Hierarchy of Evidence

Systematic reviews and meta-analyses of well-conducted studies sit at the top of the traditional evidence hierarchy (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996). Below them are randomized controlled trials, then cohort studies, then case-control studies, then cross-sectional studies, then ecological studies, and finally case reports and expert opinion. However, the hierarchy is a guideline, not an absolute rule: a well-designed cohort study may provide stronger evidence than a poorly conducted RCT, and Concato, Shah, & Horwitz (2000) showed empirically that observational results often agree closely with RCTs on the same question.

R Activity: Picking the design-appropriate measure from one 2x2 table

The companion R script r-activities/HSCI_341_Lesson_8_Review_of_Study_Design_Concepts.R defines a small choose_measure() helper that returns the design-appropriate measure (PR for cross-sectional, RR for cohort, OR for case-control) from a single 2x2 table with cells a = 194, b = 1588, c = 303, d = 1314. You then call it three times to see how the same cells produce different measures depending on what design you claim to have run.

# 2x2 cells
a <- 194; b <- 1588
c <- 303; d <- 1314

choose_measure <- function(a, b, c, d, design) {
  switch(design,
    "cross-sectional" = c(PR = (a/(a+b)) / (c/(c+d))),
    "cohort"          = c(RR = (a/(a+b)) / (c/(c+d))),
    "case-control"    = c(OR = (a*d) / (b*c)),
    stop("design must be one of: cross-sectional, cohort, case-control"))
}

choose_measure(a, b, c, d, "cross-sectional")
choose_measure(a, b, c, d, "cohort")
choose_measure(a, b, c, d, "case-control")

# side-by-side
designs <- c("cross-sectional", "cohort", "case-control")
sapply(designs, function(d_) choose_measure(a, b, c, d, d_))

What you should be able to do after this activity: pick the design-appropriate measure of association from a 2x2 table, explain why PR and RR are numerically identical, and articulate when (and why) an OR diverges from an RR.

R Reflect on what you just ran

Use the questions below to interpret the three numbers choose_measure() produced. Look at your console output before answering.

1. Report the three returned measures (PR, RR, OR) from the same cells. Which two are numerically identical, and why does the formula make that the case?

Model answerPR and RR are numerically identical here because the formula is the same when computed cross-sectionally: PR = (a/(a+b)) / (c/(c+d)) and RR uses the same ratio if the denominators represent the populations at risk. They differ in interpretation, not in arithmetic: PR is a prevalence contrast; RR a risk contrast. The OR = (a×d)/(b×c) sits apart because it uses odds, not probabilities, in both numerator and denominator. The lesson: same cells, different scales of interpretation.

2. The OR differs from the RR. By how much? Is the rare-disease assumption reasonable given a = 194 vs. b = 1588 and c = 303 vs. d = 1314, and how does the cell pattern explain the gap?

Model answerThe OR sits farther from 1 than the RR/PR, so it is the more extreme of the two numbers. With the cells a=194, b=1588 (risk in the exposed = 194/(194+1588) = 0.109) and c=303, d=1314 (risk in the unexposed = 303/(303+1314) = 0.187), neither risk is rare. RR ≈ 0.109/0.187 ≈ 0.58; OR = (194×1314)/(1588×303) ≈ 0.53. Because the association here is protective (both measures fall below 1), being farther from 1 makes the OR the smaller number, not the larger one. The rare-disease assumption fails because both risks exceed 10%, so the OR diverges from the RR and overstates the strength of the association on the relative-risk scale. Report the RR whenever the design supports it.

3. Why is it INCORRECT to report a prevalence ratio (or risk ratio) from a case-control study, even though the formula technically runs in R? What is being held fixed by the case-control sampling design that breaks that interpretation?

Model answerCase-control sampling fixes the outcome margins (the case and control counts are chosen by the investigator, not generated from the source population). The denominators (a+b) and (c+d) therefore do not represent the population at risk; they represent the sampling fractions of cases vs. controls. So a/(a+b) is not the risk of disease among the exposed; it is the proportion of exposed people who happened to be sampled as cases, which is meaningless as a risk. The OR is the unique association measure invariant to outcome-based sampling, which is why case-control studies must report ORs, not RRs/PRs, regardless of what the R formula will technically compute.

Saved.

Reflection: Thinking Across Study Designs

Consider a public health question you care about (e.g., the effect of air pollution on childhood asthma, or the impact of a sugar tax on obesity rates). Which study designs would be most appropriate for different aspects of this question? Why might you need evidence from multiple designs to draw a convincing causal conclusion?

Model answerTake the sugar-tax / obesity question. Ecological / interrupted time series design speaks to population-level effects: did the tax change obesity prevalence at the city or province level after implementation, compared with similar jurisdictions that didn't? Cohort study of individuals can track per-person sugar intake before and after the tax, linking to weight gain and clinical outcomes, addresses the individual-level mechanism. Case-control can be useful for studying rare diabetic complications among heavy SSB consumers but is poor for the tax question itself. Quasi-experimental designs (difference-in-differences across tax / no-tax jurisdictions) bridge ecology and individual data. Multiple designs are needed because each answers a different question (population-level effect, individual-level dose-response, mechanism), and each fails differently (ecological fallacy vs. confounding vs. selection); converging evidence across designs is the only basis for a causal conclusion.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Ecological studies use groups as the unit of analysis and can examine aggregate, environmental, or global measures.
The ecologic fallacy occurs when group-level associations are incorrectly attributed to individuals.
Ecological studies are valuable for hypothesis generation, policy evaluation, and studying group-level exposures, but results must be interpreted cautiously.
Systematic reviews use transparent, reproducible methods to synthesize evidence across studies.
Meta-analysis can pool results statistically for greater precision and power.
No single study design is sufficient for establishing causation; converging evidence from multiple designs strengthens causal inference.

✦ Complete the reflection and pass the knowledge check with 100% to continue

HSCI 341, Lesson 8

Fundamental Epidemiological Concepts and Approaches

Review of StudyDesign Concepts

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Review of Observational Study Designs

Design and Measure Are One Decision

Review of Observational Study Designs

Observational vs. experimental

Experimental

Observational

Descriptive vs. analytic

Descriptive

Analytic

Cross-sectional studies

Prevalence-incidence bias

Over-represented

Under-represented

What to take into the next section

Introduction and Overview

Learning Objectives

Observational vs. Experimental Studies

Why Observational Studies Remain Essential

Descriptive vs. Analytic Studies

Descriptive Studies

Analytic Studies

Key Differences at a Glance

Cross-Sectional Studies

Key Characteristics

Example: Cross-Sectional Study of Diabetes and Physical Activity

Strengths and Limitations

Prevalence vs. Incidence: Why It Matters

Key Takeaways

Review of Cohort & Case-Control Studies

Review of Cohort & Case-Control Studies

Cohort studies

Prospective vs. retrospective cohort

Prospective

Retrospective (historical)

What cohort studies can estimate

Case-control studies

Matching design to question

What to take into the next section

Introduction and Overview

Learning Objectives

Cohort Studies

Types of Cohort Studies

Prospective (Concurrent) Cohort Study

Retrospective (Historical) Cohort Study

Direction of Inquiry

Measures of Association in Cohort Studies

Why Cohort Studies Are Well Suited to Causal Questions

Case-Control Studies

Selecting Cases and Controls

Measures of Association in Case-Control Studies

The Odds Ratio

Comparing Cohort and Case-Control Designs

Choosing the Right Design: A Practical Example

Key Takeaways

Ecological Studies & Evidence Synthesis

Ecological Studies & Evidence Synthesis

Ecological studies: three types

Multi-group

Time-trend

Mixed

Three kinds of group-level variables

Aggregate

Environmental

Global

The ecologic fallacy

Infrastructure for observational research

PopData BC

HDRN Canada

CANUE

Systematic reviews and meta-analysis

Systematic review

Meta-analysis

The hierarchy as a bias-control framework

Introduction and Overview

Learning Objectives

Review of Study
Design Concepts