HSCI 341 — Lesson 8

Review of Study
Design Concepts

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Distinguish between observational and experimental study designs
  • Compare and contrast descriptive, analytic, and cross-sectional studies
  • Describe the key features, strengths, and limitations of cohort and case-control studies
  • Identify the direction of inquiry and appropriate measures of association for each design
  • Explain the logic and limitations of ecological studies
  • Define the ecologic fallacy and recognize when it may occur
  • Describe how systematic reviews synthesize evidence across study types
  • Apply study design concepts to select appropriate designs for research questions

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Foundational Concepts
Observational Study A study in which the investigator does not assign exposures — participants are observed under their existing exposures. Includes cross-sectional, case-control, cohort, and ecological designs.
Experimental (Intervention) Study A study in which the investigator actively assigns exposure — ideally at random — and follows participants for outcomes. Includes RCTs, field trials, and community trials.
Descriptive Study A study that describes the distribution of disease or exposure by person, place, and time without testing specific causal hypotheses.
Analytic Study A study designed to test specific hypotheses about exposure-outcome relationships, typically with comparison groups and a measure of association.
Observational Study Designs
Cross-Sectional Study Exposure and outcome are measured simultaneously in a sample at one point in time. Yields prevalence and prevalence ratios; cannot establish temporal sequence.
Case-Control Study Cases (with disease) and controls (without) are sampled separately and compared on past exposure. Direction of inquiry: outcome → exposure. Yields odds ratios; efficient for rare diseases.
Cohort Study Disease-free participants classified by exposure are followed forward in time to compare incidence between groups. Direction: exposure → outcome. Yields risk ratios, rate ratios, and risk differences.
Prospective Cohort A cohort assembled and followed forward from the time exposure is measured. Strong on data quality; demanding in time and cost.
Retrospective (Historical) Cohort A cohort defined and exposure ascertained using existing records, with outcomes also already in the past. Faster and cheaper but reliant on records of variable quality.
Nested Case-Control Study A case-control study conducted within an established cohort — controls are sampled from cohort members at risk at the time each case occurs. Efficient for expensive measurements.
Case-Cohort Study A variant in which controls are a random subcohort sampled at baseline (rather than matched to cases). One subcohort can serve multiple outcomes.
Ecological Study Unit of analysis is a group (country, region, school) rather than an individual. Useful for hypothesis generation and policy comparisons; vulnerable to the ecological fallacy.
Ecological Fallacy The error of inferring individual-level relationships from group-level associations. The opposite (atomistic fallacy) is inferring group-level relationships from individual-level data.
Case Series / Case Report Descriptive accounts of one or several patients with a particular condition. No comparison group; useful for hypothesis generation.
Experimental & Hybrid Designs
Randomized Controlled Trial (RCT) An experiment in which participants are randomly assigned to intervention and control groups. Random assignment balances measured and unmeasured confounders in expectation.
Cluster Randomized Trial An RCT in which intact groups (e.g., clinics, schools) rather than individuals are randomized. Used when contamination across individuals is a concern; analysis must account for clustering.
Quasi-Experimental Study An intervention study in which assignment is not randomized — using methods like interrupted time series, regression discontinuity, or natural experiments to approximate causal inference.
Hybrid Effectiveness-Implementation Design A design that simultaneously evaluates the clinical effectiveness of an intervention and an implementation strategy — commonly classified as Type 1, 2, or 3 depending on the relative emphasis.
Systematic Review / Meta-Analysis A structured synthesis of all available studies on a question, sometimes including a quantitative pooling (meta-analysis) of effect estimates. Sits at the top of evidence hierarchies.
Selection & Inference Concepts
Direction of Inquiry Whether a design moves from exposure to outcome (cohort, RCT) or from outcome back to exposure (case-control). Determines which measures of association are estimable.
Incidence-Density Sampling A control-selection strategy in nested case-control studies where controls are chosen at the moment each case occurs, allowing the OR to estimate the rate ratio.
Loss to Follow-Up Participants in a cohort or trial who drop out before the outcome is ascertained. Differential loss between groups can bias the effect estimate.
Hierarchy of Evidence A ranking of study designs by how strongly they support causal claims — typically systematic reviews / meta-analyses > RCTs > cohorts > case-control > cross-sectional > ecological > expert opinion.
No matching entries. Try a different search term.
Section 1

Review of Observational Study Designs

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Lessons 1–7 of HSCI 341 built the building blocks: causal logic, surveillance, sampling, questionnaire design, frequency, screening, and association. Lesson 8 is a checkpoint — it consolidates the observational study designs you first met in HSCI 230 (Lessons 3–6) and explicitly maps each design to the measures of association from Lesson 7. The three content sections walk through this in order: cross-sectional studies (Section 1), cohort and case-control designs (Section 2), and ecological studies plus an introduction to evidence synthesis via systematic reviews (Section 3). Once the standard observational designs are consolidated, Lessons 9 and 10 will introduce hybrid and controlled designs that combine or extend these basics.

Learning Objectives

  • Distinguish between observational and experimental studies.
  • Differentiate descriptive from analytic study designs.
  • Describe the features and uses of cross-sectional studies.
  • Understand the role of study design in the hierarchy of evidence.

Observational vs. Experimental Studies

All epidemiologic studies can be broadly classified into two categories based on whether the investigator assigns the exposure. In experimental studies (also called intervention studies or controlled trials), the researcher deliberately allocates exposure — for example, assigning participants to receive a new treatment or a placebo. In observational studies, the researcher simply observes and measures exposures and outcomes as they occur naturally, without intervening.

This distinction matters enormously for causal inference. Experimental designs, particularly randomized controlled trials (RCTs), can establish exchangeability through random assignment, which strengthens our ability to attribute differences in outcomes to the exposure (Concato, Shah, & Horwitz, 2000). Observational studies, by contrast, must contend with the possibility that exposed and unexposed groups differ in ways that also affect the outcome — that is, they must account for confounding. Vandenbroucke (2008) argues that observational and experimental research serve distinct epistemic purposes — discovery and evaluation — rather than sitting on a single ladder of rigour.

Why Observational Studies Remain Essential

Despite the inferential strengths of experiments, most epidemiologic research is observational. Many important exposures — smoking, occupational hazards, environmental pollutants, dietary patterns — cannot ethically be assigned to participants. Others would be impractical to study experimentally because the outcomes are rare or take decades to develop. Observational designs therefore remain the backbone of epidemiologic evidence for most health questions, and Ioannidis (2005) reminds us that even at the top of the hierarchy, study findings depend on power, prior probability, and analytic flexibility — not on design alone.

Descriptive vs. Analytic Studies

Within observational epidemiology, a further distinction exists between descriptive and analytic study designs.

Descriptive Studies

Descriptive studies characterize the distribution of disease or health states in a population. They address questions of who, what, where, and when. Common descriptive designs include case reports, case series, and surveys that estimate disease prevalence or describe the demographic characteristics of affected populations.

Descriptive studies are often the first step in epidemiologic investigation. They generate hypotheses about potential risk factors but do not formally test causal associations. For example, a case series describing an unusual cluster of pneumonia cases among young men in Los Angeles in 1981 was among the first reports that led to the identification of HIV/AIDS.

Analytic Studies

Analytic studies go beyond description to evaluate associations between exposures and outcomes. They incorporate a comparison group — exposed vs. unexposed, cases vs. controls — and use statistical methods to estimate the strength and direction of associations. The major analytic observational designs are cohort studies, case-control studies, and cross-sectional analytic studies.

Analytic studies are designed to test hypotheses. They ask why and how questions: Does smoking increase the risk of lung cancer? Is a high-sodium diet associated with hypertension? By including a comparison group and measuring the magnitude of association, they move us closer to causal inference.

Key Differences at a Glance

Descriptive: No formal comparison group. Generates hypotheses. Focuses on patterns of disease distribution (person, place, time). Examples: case reports, prevalence surveys, ecological studies (when purely descriptive).

Analytic: Includes a comparison group. Tests hypotheses. Estimates measures of association (risk ratios, odds ratios, rate ratios). Examples: cohort studies, case-control studies, cross-sectional analytic studies.

The boundary between these categories is not always rigid. A cross-sectional survey that only reports prevalence is descriptive, but the same survey becomes analytic when it compares prevalence across exposure groups and estimates odds ratios.

Cross-Sectional Studies

A cross-sectional study measures exposure and outcome status simultaneously in a defined population at a single point in time (or over a short period). It provides a "snapshot" of the population, allowing researchers to estimate the prevalence of disease and the prevalence of various exposures.

Key Characteristics

  • Timing: Exposure and outcome are assessed at the same time — there is no follow-up period.
  • Measure of disease frequency: Prevalence (not incidence), because you are measuring how many people currently have the condition.
  • Measure of association: The prevalence ratio or prevalence odds ratio. Because exposure and outcome are measured simultaneously, temporal sequence is often unclear.
  • Sampling: Participants are typically sampled from a defined population regardless of exposure or disease status.

Example: Cross-Sectional Study of Diabetes and Physical Activity

Researchers survey 5,000 adults in a city, measuring their current level of physical activity and whether they have been diagnosed with type 2 diabetes. They find that 12% of sedentary adults have diabetes compared to 4% of active adults. The prevalence ratio is 12/4 = 3.0, suggesting sedentary adults are three times as likely to currently have diabetes.

However, can we conclude that being sedentary caused diabetes? Not necessarily — some people may have become sedentary because of their diabetes. This is the core limitation of cross-sectional designs: the inability to establish temporal sequence.

Strengths and Limitations

Strengths
  • Relatively quick and inexpensive to conduct.
  • Can study multiple exposures and outcomes simultaneously.
  • Useful for estimating disease prevalence and planning health services.
  • Good for generating hypotheses that can be tested with analytic designs.
  • Can be based on existing data sources (e.g., national health surveys).
Limitations
  • Cannot establish temporal sequence — you do not know whether the exposure preceded the outcome.
  • Measures prevalence, not incidence, so results are influenced by disease duration (conditions that last longer are over-represented).
  • Subject to prevalence-incidence bias (also called Neyman bias or survival bias): people with rapidly fatal or quickly resolved conditions may not be captured.
  • Susceptible to confounding, and the temporal ambiguity makes it harder to identify and control for confounders.

Prevalence vs. Incidence: Why It Matters

Cross-sectional studies capture prevalent cases — people who currently have the disease. This pool of cases over-represents conditions that are long-lasting and under-represents conditions that are rapidly fatal or quickly cured. As a result, associations observed in cross-sectional data may not reflect the actual causes of disease onset. For etiologic research, designs that measure incidence (new cases over time) — such as cohort studies — are generally preferred.

Key Takeaways

  • Observational studies observe naturally occurring exposures; experimental studies assign them.
  • Descriptive studies characterize disease distribution; analytic studies test hypotheses about associations.
  • Cross-sectional studies provide a snapshot of prevalence at a single point in time.
  • The inability to establish temporal sequence is the primary limitation of cross-sectional designs.
  • Prevalence-based measures can be distorted by disease duration, making cross-sectional studies less suitable for etiologic inference.

Knowledge Check — Section 1

1. What is the key difference between observational and experimental studies?

The defining feature of an experimental study is that the investigator controls and assigns the exposure. In observational studies, exposure status is determined by factors outside the researcher's control.

2. A cross-sectional study measures:

Cross-sectional studies assess both exposure and outcome at the same time, providing a "snapshot" of the population.

3. The primary limitation of cross-sectional studies for causal inference is:

Because exposure and outcome are measured simultaneously, temporal sequence cannot be determined — making it impossible to confirm that the exposure came before the outcome.

✦ Pass the knowledge check with 100% to continue

Section 2

Review of Cohort & Case-Control Studies

⏱ Estimated reading time: 18 minutes

Introduction and Overview

Section 1 reviewed cross-sectional designs — useful for prevalence estimation but limited for causal inference. Section 2 turns to the two analytic designs that can support causal claims: cohort studies (sampling on exposure, following forward to disease) and case-control studies (sampling on disease, looking backward at exposure). Both produce specific measures of association you computed by hand in Lesson 7.

Learning Objectives

  • Describe the design, direction of inquiry, and key features of cohort studies.
  • Describe the design, direction of inquiry, and key features of case-control studies.
  • Identify the appropriate measures of association for each design.
  • Compare the strengths and limitations of both designs.

Cohort Studies

A cohort study begins by identifying a group of individuals (a cohort) who are free of the outcome of interest, classifying them by exposure status, and following them over time to observe whether the outcome develops. The direction of inquiry moves from exposure to outcome — the same temporal direction as causation itself.

Types of Cohort Studies

Prospective (Concurrent) Cohort Study

Participants are enrolled in the present, exposure status is assessed, and they are followed forward in time to observe the development of outcomes. This is the classic cohort design. The Framingham Heart Study, which began enrolling participants in 1948 and continues to follow their descendants, is one of the most famous examples (Mahmood, Levy, Vasan, & Wang, 2014).

Advantage: Exposure is measured before the outcome occurs, minimizing recall bias and clearly establishing temporal sequence.

Disadvantage: Can be very expensive and time-consuming, especially for diseases with long latency periods or low incidence.

Retrospective (Historical) Cohort Study

The investigator uses historical records (e.g., employment records, medical charts, or registry data) to reconstruct a cohort whose exposure status was determined in the past. The outcomes may have already occurred or can be ascertained in the present.

Advantage: Faster and less expensive than a prospective study because the waiting period for outcome development has already elapsed.

Disadvantage: Relies on the quality and completeness of existing records. Important variables may not have been measured or may be recorded inconsistently.

Direction of Inquiry

Exposure Status Follow-up over time Outcome (Y/N) COHORT: Exposure → Outcome Outcome Status Look back in time Exposure (Y/N) CASE-CONTROL: Outcome → Exposure

Figure 8.1 — Direction of inquiry in cohort vs. case-control studies. Cohort studies move from exposure to outcome; case-control studies start with outcome and look back at exposure.

Measures of Association in Cohort Studies

Because cohort studies follow participants over time and observe the development of new cases, they can directly estimate incidence. This allows the calculation of:

  • Risk Ratio (Relative Risk): The ratio of cumulative incidence in the exposed group to cumulative incidence in the unexposed group. RR = (a/(a+b)) / (c/(c+d)).
  • Rate Ratio: The ratio of incidence rates (person-time denominators) in exposed vs. unexposed groups.
  • Risk Difference (Attributable Risk): The absolute difference in incidence between exposed and unexposed groups.

Why Cohort Studies Are Powerful

The ability to measure incidence directly is the central strength of the cohort design. It establishes temporal sequence (exposure precedes outcome), allows calculation of multiple measures of association, and can study multiple outcomes associated with a single exposure. For rare exposures, cohort studies are particularly efficient because you can intentionally over-sample exposed individuals.

Case-Control Studies

A case-control study begins by identifying individuals who have the outcome of interest (cases) and a comparable group who do not (controls). The investigator then looks backward in time to compare the exposure histories of the two groups. The direction of inquiry moves from outcome to exposure.

Selecting Cases and Controls

The validity of a case-control study depends critically on appropriate selection of cases and controls (Wacholder, McLaughlin, Silverman, & Mandel, 1992):

Case Selection

Cases should be clearly defined using consistent diagnostic criteria. They may be incident cases (newly diagnosed) or prevalent cases (existing), though incident cases are preferred to reduce survival bias. Cases are typically identified from hospitals, disease registries, surveillance systems, or population-based sampling frames.

Control Selection

Controls should come from the same source population that gave rise to the cases. They should be people who, had they developed the disease, would have been identified as cases in the study. Common sources include hospital-based controls (patients admitted for other conditions), population-based controls (random samples from the community), and neighbourhood or friend controls. The choice of control group is often the most scrutinized aspect of a case-control study.

Matching

Matching involves selecting controls that are similar to cases on specific characteristics (e.g., age, sex, geography). Matching helps control for confounding by these variables and can improve statistical efficiency. However, matched variables can no longer be evaluated as risk factors, and matched analyses require conditional logistic regression.

Measures of Association in Case-Control Studies

Because case-control studies do not follow a defined cohort over time, they cannot directly calculate incidence. Therefore, the risk ratio cannot be computed directly. Instead, the primary measure of association is the odds ratio (OR):

The Odds Ratio

The odds ratio compares the odds of exposure among cases to the odds of exposure among controls: OR = (a/c) / (b/d) = ad/bc. Under certain conditions — particularly when the disease is rare in the source population (the "rare disease assumption") — the odds ratio approximates the risk ratio. This is why case-control studies are especially useful for studying rare diseases: the rare disease assumption is most likely to hold, and the design efficiently identifies a sufficient number of cases.

Comparing Cohort and Case-Control Designs

FeatureCohort StudyCase-Control Study
Direction of inquiryExposure → OutcomeOutcome → Exposure
Starting pointDefined by exposure statusDefined by disease status
Measures incidence directly?YesNo
Primary measure of associationRisk ratio, rate ratioOdds ratio
Best suited forRare exposures, multiple outcomesRare diseases, multiple exposures
Temporal sequenceClearly establishedRelies on retrospective data
Cost and timeOften expensive and lengthyGenerally less expensive and faster
Key biasesLoss to follow-up, selective attritionRecall bias, selection bias in controls

Choosing the Right Design: A Practical Example

Suppose you want to study whether exposure to a specific industrial solvent increases the risk of a rare liver cancer. A prospective cohort study would require following thousands of exposed and unexposed workers for decades — extremely expensive and slow. A case-control study, by contrast, could identify 200 liver cancer cases from a cancer registry, select 400 matched controls, and assess past occupational exposure through interviews and employment records — producing results in months rather than years.

For rare diseases, case-control studies are usually the design of choice because they can efficiently identify enough cases to detect meaningful associations.

Key Takeaways

  • Cohort studies follow exposed and unexposed groups forward in time to observe outcome development; they can directly calculate incidence and risk ratios.
  • Case-control studies compare exposure histories of cases and controls; they estimate the odds ratio, which approximates the risk ratio when the disease is rare.
  • Cohort studies are preferred for rare exposures and when multiple outcomes are of interest; case-control studies are preferred for rare diseases and when multiple exposures are of interest.
  • Both designs are susceptible to specific biases: cohort studies to loss to follow-up, case-control studies to recall bias and control selection bias.
Knowledge Check — Section 2

1. In a cohort study, the direction of inquiry is:

Cohort studies classify participants by exposure and follow them forward in time to observe who develops the outcome.

2. The odds ratio is the primary measure of association in case-control studies because:

Because case-control studies select participants based on disease status rather than following a cohort, incidence cannot be directly calculated. The odds ratio serves as an alternative that approximates the risk ratio when the disease is rare.

3. Case-control studies are especially efficient for studying:

Case-control studies start by identifying cases, making them efficient for rare diseases. Cohort studies, by contrast, are more efficient for rare exposures.

✦ Pass the knowledge check with 100% to continue

Section 3

Ecological Studies & Evidence Synthesis

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Sections 1 and 2 covered designs that operate at the individual level. Section 3 makes one final move — up to the group level — with ecological studies, and then steps back to ask how evidence from many such studies can be synthesized. The ecological fallacy you met in HSCI 230 reappears here as a specific limitation; systematic reviews are the formal answer to “what does the evidence as a whole say?”

Learning Objectives

  • Describe the design and purpose of ecological studies.
  • Explain the concept of group-level variables and the ecologic fallacy.
  • Discuss how systematic reviews synthesize evidence across study types.
  • Understand where ecological studies and systematic reviews fit in the hierarchy of evidence.

Ecological Studies

An ecological study (also called a group-level or aggregate study) uses groups — rather than individuals — as the unit of analysis. Instead of measuring exposure and outcome in each person, the investigator compares exposure levels and disease rates across defined populations, such as countries, regions, or time periods.

Types of Ecological Studies

Click each card to learn more:

Multi-Group
Comparison
Click to learn more
Time-Trend
(Temporal)
Click to learn more
Mixed
Design
Click to learn more

Group-Level Variables

A distinctive feature of ecological studies is their reliance on group-level variables. These can be classified into three types:

Aggregate Measures

Aggregate measures are summaries of individual-level data for a group — for example, the mean blood pressure of a country's population, or the percentage of a county's residents who smoke. These are the most common type of group-level variable and are derived from individual measurements.

Environmental Measures

Environmental measures are physical or chemical characteristics of the environment shared by all members of a group — for example, ambient air pollution levels in a city, fluoride concentration in a water supply, or average annual temperature. These are inherently group-level and cannot be measured at the individual level.

Global Measures

Global measures are attributes of the group that have no individual-level analogue — for example, population density, the existence of a specific public health law, type of healthcare system, or the Gini coefficient (income inequality). These contextual factors can only be measured at the group level and are particularly interesting because they may represent structural or policy-level determinants of health.

The Ecologic Fallacy

The ecologic fallacy is the most important limitation of ecological studies. It occurs when an association observed at the group level is incorrectly assumed to hold at the individual level (Robinson, 1950/2009; Morgenstern, 1995). Just because countries with higher fat consumption have higher breast cancer rates does not mean that the individuals within those countries who eat more fat are the same individuals who develop breast cancer.

Classic Example: Durkheim and Suicide

Emile Durkheim observed that regions with higher proportions of Protestant residents had higher suicide rates than predominantly Catholic regions. However, this group-level association does not necessarily mean that Protestant individuals were more likely to commit suicide. It is possible that the social environment of predominantly Protestant regions — perhaps characterized by greater individualism or weaker social networks — affected everyone living there, including Catholics. Attributing the group-level finding to individual Protestants would be an ecologic fallacy.

The ecologic fallacy arises because within-group variation is hidden when data are aggregated. Individuals who are exposed and individuals who develop disease may be entirely different people within the same group.

Why Ecological Studies Are Still Valuable

Despite the ecologic fallacy, ecological studies serve several important purposes. They are often the only feasible design when individual-level data are unavailable (e.g., comparing disease rates across countries using national statistics). They are useful for studying the effects of policies, environmental exposures, and other group-level variables that cannot be measured individually. They are also inexpensive and quick, making them excellent for hypothesis generation. The key is to interpret ecological associations cautiously and seek corroboration from individual-level studies.

Canadian Data Infrastructure for Cohort, Case-Control, and Ecological Designs

Most epidemiological work in Canada does not involve enrolling a fresh cohort. Instead, researchers reuse data that have already been collected through health-system encounters, surveys, environmental monitoring, and registries. Three pieces of national infrastructure show up repeatedly:

Population Data BC (PopData BC)

A platform that links de-identified individual-level administrative data across the BC Ministry of Health, Vital Statistics, PharmaNet (every dispensed prescription), the BC Cancer Registry, MSP physician billings, hospital discharges (DAD), and education and social-services data. Researchers obtain a study-specific extract under a data access agreement.

Designs supported: retrospective cohorts, nested case-control, case-crossover, ecological/spatial analyses, intervention evaluations using natural experiments. Other provinces have analogous systems — ICES (Ontario), MCHP (Manitoba), HDNS (Nova Scotia), IRSPUM (Quebec).

Health Data Research Network Canada (HDRN Canada / SPOR DSP)

A federation of provincial data centres (PopData BC, ICES, MCHP, etc.) that supports multi-jurisdictional studies under a single application. Each centre runs the analysis behind its own firewall and only summary results are shared, so individual-level data never crosses provincial lines. Useful when you need national power or generalisability across health systems.

CANUE — Canadian Urban Environmental Health Research Consortium

A national repository of standardised, postal-code- and DA-level environmental exposures: air pollution (NO2, PM2.5, O3), greenness (NDVI), walkability, noise, climate, neighbourhood SES indices. CANUE indicators can be linked to any cohort with postal codes — including PopData BC extracts and CCHS shared files — turning subject-level health data into ecological or multilevel exposure–outcome studies.

Worked Example: A PopData BC + CANUE Cohort Study

To estimate the effect of long-term PM2.5 exposure on incident cardiovascular disease in BC adults, a researcher could:

  1. Define the cohort using the PopData BC Consolidation File — everyone with active MSP coverage on 1 Jan 2010 (a near-complete census of BC residents).
  2. Pull baseline covariates from MSP and DAD (chronic conditions, comorbidity scores).
  3. Link each person's six-character postal code to CANUE annual PM2.5 estimates to assign exposure.
  4. Follow forward in DAD/Vital Stats to ascertain incident MI, stroke, and CV death (the outcomes).
  5. Apply Cox regression with a time-varying exposure, adjusting for area-level deprivation (also from CANUE).

This is a retrospective cohort with an environmental exposure — sitting at the boundary of cohort and ecological designs. The data come from three different stewards but no new participant was ever recruited.

Evidence Synthesis: Systematic Reviews

No single study, regardless of design, can definitively establish a causal relationship. Scientific knowledge accumulates through the synthesis of evidence across multiple studies, populations, and designs. Systematic reviews are the formal method for achieving this synthesis.

What Is a Systematic Review?

A systematic review uses a pre-specified, transparent, and reproducible protocol to identify, appraise, and synthesize all available evidence on a specific research question (Higgins et al., 2019). Unlike a narrative review (where the author selects studies subjectively), a systematic review:

  • Defines a precise research question (often using the PICO framework: Population, Intervention/Exposure, Comparison, Outcome).
  • Conducts comprehensive, systematic searches of multiple databases.
  • Applies explicit inclusion and exclusion criteria.
  • Critically appraises the quality and risk of bias of each included study.
  • Synthesizes findings quantitatively (meta-analysis) or qualitatively.

Meta-Analysis

When studies are sufficiently similar, their results can be combined statistically in a meta-analysis to produce a pooled estimate of effect (DerSimonian & Laird, 1986). Meta-analysis increases statistical power, provides more precise estimates, and can explore heterogeneity across studies (e.g., do results differ by study design, population, or exposure definition?).

The Hierarchy of Evidence

Systematic reviews and meta-analyses of well-conducted studies sit at the top of the traditional evidence hierarchy (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996). Below them are randomized controlled trials, then cohort studies, then case-control studies, then cross-sectional studies, then ecological studies, and finally case reports and expert opinion. However, the hierarchy is a guideline, not an absolute rule: a well-designed cohort study may provide stronger evidence than a poorly conducted RCT, and Concato, Shah, & Horwitz (2000) showed empirically that observational results often agree closely with RCTs on the same question.

R Activity — Picking the design-appropriate measure from one 2x2 table

The companion R script r-activities/HSCI_341_Lesson_8_Review_of_Study_Design_Concepts.R defines a small choose_measure() helper that returns the design-appropriate measure (PR for cross-sectional, RR for cohort, OR for case-control) from a single 2x2 table with cells a = 194, b = 1588, c = 303, d = 1314. You then call it three times to see how the same cells produce different measures depending on what design you claim to have run.

# 2x2 cells
a <- 194; b <- 1588
c <- 303; d <- 1314

choose_measure <- function(a, b, c, d, design) {
  switch(design,
    "cross-sectional" = c(PR = (a/(a+b)) / (c/(c+d))),
    "cohort"          = c(RR = (a/(a+b)) / (c/(c+d))),
    "case-control"    = c(OR = (a*d) / (b*c)),
    stop("design must be one of: cross-sectional, cohort, case-control"))
}

choose_measure(a, b, c, d, "cross-sectional")
choose_measure(a, b, c, d, "cohort")
choose_measure(a, b, c, d, "case-control")

# side-by-side
designs <- c("cross-sectional", "cohort", "case-control")
sapply(designs, function(d_) choose_measure(a, b, c, d, d_))

What you should be able to do after this activity: pick the design-appropriate measure of association from a 2x2 table, explain why PR and RR are numerically identical, and articulate when (and why) an OR diverges from an RR.

R Reflect on what you just ran

Use the questions below to interpret the three numbers choose_measure() produced. Look at your console output before answering.

1. Report the three returned measures (PR, RR, OR) from the same cells. Which two are numerically identical, and why does the formula make that the case?

Model answerPR and RR are numerically identical here because the formula is the same when computed cross-sectionally: PR = (a/(a+b)) / (c/(c+d)) and RR uses the same ratio if the denominators represent the populations at risk. They differ in interpretation, not in arithmetic: PR is a prevalence contrast; RR a risk contrast. The OR = (a×d)/(b×c) sits apart because it uses odds, not probabilities, in both numerator and denominator. The lesson: same cells, different scales of interpretation.

2. The OR differs from the RR. By how much? Is the rare-disease assumption reasonable given a = 194 vs. b = 1588 and c = 303 vs. d = 1314, and how does the cell pattern explain the gap?

Model answerOR is larger than RR/PR — with the cells a=194, b=1588 (prevalence = 194/(194+1588) = 0.109), c=303, d=1314 (prevalence = 303/(303+1314) = 0.187), the cumulative risks aren't rare. RR ≈ 0.109/0.187 = 0.583; OR = (194×1314)/(1588×303) ≈ 0.530. The rare-disease assumption fails because both prevalences exceed 10%, so OR systematically diverges from RR (here downward away from 1 for protective effects). When this happens, OR over-states the strength of the association in the relative-risk sense; report RR if the design supports it.

3. Why is it INCORRECT to report a prevalence ratio (or risk ratio) from a case-control study, even though the formula technically runs in R? What is being held fixed by the case-control sampling design that breaks that interpretation?

Model answerCase-control sampling fixes the outcome margins (the case and control counts are chosen by the investigator, not generated from the source population). The denominators (a+b) and (c+d) therefore do not represent the population at risk — they represent the sampling fractions of cases vs. controls. So a/(a+b) is not the risk of disease among the exposed; it is the proportion of exposed people who happened to be sampled as cases, which is meaningless as a risk. The OR is the unique association measure invariant to outcome-based sampling, which is why case-control studies must report ORs, not RRs/PRs, regardless of what the R formula will technically compute.
Saved.

Reflection: Thinking Across Study Designs

Consider a public health question you care about (e.g., the effect of air pollution on childhood asthma, or the impact of a sugar tax on obesity rates). Which study designs would be most appropriate for different aspects of this question? Why might you need evidence from multiple designs to draw a convincing causal conclusion?

Model answerTake the sugar-tax / obesity question. Ecological / interrupted time series design speaks to population-level effects: did the tax change obesity prevalence at the city or province level after implementation, compared with similar jurisdictions that didn't? Cohort study of individuals can track per-person sugar intake before and after the tax, linking to weight gain and clinical outcomes — addresses the individual-level mechanism. Case-control can be useful for studying rare diabetic complications among heavy SSB consumers but is poor for the tax question itself. Quasi-experimental designs (difference-in-differences across tax / no-tax jurisdictions) bridge ecology and individual data. Multiple designs are needed because each answers a different question (population-level effect, individual-level dose-response, mechanism), and each fails differently (ecological fallacy vs. confounding vs. selection); converging evidence across designs is the only basis for a causal conclusion.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Ecological studies use groups as the unit of analysis and can examine aggregate, environmental, or global measures.
  • The ecologic fallacy occurs when group-level associations are incorrectly attributed to individuals.
  • Ecological studies are valuable for hypothesis generation, policy evaluation, and studying group-level exposures, but results must be interpreted cautiously.
  • Systematic reviews use transparent, reproducible methods to synthesize evidence across studies.
  • Meta-analysis can pool results statistically for greater precision and power.
  • No single study design is sufficient for establishing causation; converging evidence from multiple designs strengthens causal inference.
Knowledge Check — Section 3

1. The ecologic fallacy occurs when:

The ecologic fallacy is the erroneous inference that a relationship observed between group-level variables also holds at the individual level.

2. Population density is an example of which type of group-level variable?

Population density is an attribute of the group itself with no individual-level analogue — it is a global measure.

3. A systematic review differs from a narrative review primarily because it:

The hallmark of a systematic review is its pre-specified, transparent, and reproducible methodology, which minimizes selection bias in the choice of included studies.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 4

Knowledge Check & Final Assessment

⏱ Estimated time: 15 minutes

Bringing It All Together

This lesson revisited the three core observational designs — cross-sectional, cohort, and case-control — alongside ecological studies and the place of systematic reviews at the top of the evidence hierarchy. The point of returning to these designs after the analytic lessons (5, 6, 7) is that the choice of design is rarely a separate decision from the measure of association you want to estimate or the practical constraints you face. Reporting guidance for each design family is now codified: STROBE for observational studies (von Elm et al., 2007) and CONSORT for randomised trials (Schulz, Altman, & Moher, 2010).

The takeaways below pull the threads together as a portfolio: each design buys you something and costs you something else. Lesson 9 will introduce the hybrid designs that mix-and-match the strengths of these classics, and Lesson 10 will move into controlled experimental designs where the investigator finally controls exposure assignment.

Key Takeaways from Lesson 8

  • The observational vs. experimental split turns on a single question: who assigns the exposure? Descriptive vs. analytic is a different axis — whether comparison groups are used.
  • Cross-sectional studies measure exposure and outcome at one moment, producing prevalence data but no temporal sequence; useful for surveillance and hypothesis generation.
  • Cohort studies follow exposed and unexposed groups forward; they yield incidence and risk/rate ratios and are the strongest observational design for temporality.
  • Case-control studies sample on outcome and look back at exposure; efficient for rare diseases but vulnerable to recall and selection bias, and limited to odds ratios.
  • Ecological studies analyse group-level data and are useful for policy evaluation, but inferences to individuals risk the ecologic fallacy.
  • Systematic reviews and meta-analyses synthesise across studies under transparent, reproducible methods — the apex of the observational evidence hierarchy.

Final Reflection

Imagine you are an epidemiologist investigating whether long-term exposure to a newly identified environmental contaminant increases the risk of a rare autoimmune disorder. Describe which study design(s) you would consider, why, and what specific strengths and limitations you would need to address. Consider how you might use multiple study designs to build a stronger evidence base.

Model answerFor a rare autoimmune disorder and a new contaminant: start with a case-control study — cost-effective for rare outcomes, with cases identified through specialty clinic registries and controls drawn from a defined source population. Limitations: recall bias on long-latency exposure, selection bias in control sourcing, no direct risk estimates. Complement with a cohort study linking environmental monitoring data (exposure assignment by residence at biologically relevant exposure window) to incident cases in an administrative cohort — gives incidence and dose-response but is expensive and requires accurate exposure measurement. Ecological time-series comparing regions with different contaminant levels gives population-level signal but is vulnerable to ecological fallacy. Mendelian randomisation using genetic variants affecting metabolism of the contaminant can triangulate causation if instruments are valid. The strongest evidence comes from converging signal across these designs — biological coherence, dose-response in cohort, temporal sequence in time-series, and individual-level association in case-control.

Minimum 20 characters required.

✓ Reflection saved

Final Knowledge Assessment

Complete the following 12-question assessment. A score of 100% is required to complete the lesson. You may retake the assessment as many times as needed.

Final Assessment — 12 Questions

1. In an observational study, the investigator:

In observational studies, the researcher does not control or assign exposure — participants' exposure status is determined by factors outside the study.

2. An analytic study differs from a descriptive study primarily because it:

Analytic studies include a comparison group (exposed vs. unexposed, or cases vs. controls) and are designed to test hypotheses, whereas descriptive studies characterize disease patterns without formal comparisons.

3. The main measure of disease frequency in a cross-sectional study is:

Cross-sectional studies capture a snapshot at one point in time, measuring how many people currently have the condition (prevalence), not new cases over time (incidence).

4. A retrospective cohort study differs from a prospective cohort study in that:

A retrospective cohort uses existing records to reconstruct a cohort whose exposure was determined in the past. It still follows the cohort logic (exposure to outcome) and can estimate incidence.

5. Which measure of association can be directly calculated from a cohort study but NOT from a case-control study?

Cohort studies follow defined populations and can measure incidence directly, allowing calculation of the risk ratio. Case-control studies cannot directly calculate incidence, so the risk ratio cannot be computed directly.

6. The "rare disease assumption" in case-control studies refers to the idea that:

When prevalence is low, the odds of disease approximate the probability of disease, so the odds ratio from a case-control study closely estimates the risk ratio that would be obtained from a cohort study.

7. A key strength of cohort studies is that they:

By measuring exposure before the outcome occurs and following participants over time, cohort studies establish clear temporal sequence — a necessary condition for causal inference.

8. Recall bias is a particular concern in:

In case-control studies, participants report past exposures retrospectively. Cases who are seeking explanations for their disease may recall exposures more thoroughly (or differently) than healthy controls.

9. An ecological study uses which unit of analysis?

Ecological studies analyze data at the group level, comparing aggregate measures of exposure and outcome across populations rather than individuals.

10. A country with high average alcohol consumption also has high rates of liver cirrhosis. Concluding that the individuals who drink the most in that country are the ones with cirrhosis would be an example of:

The ecologic fallacy occurs when a group-level association (high national alcohol consumption correlating with high national cirrhosis rates) is incorrectly attributed to individuals within those groups.

11. Systematic reviews sit at the top of the evidence hierarchy because they:

Systematic reviews use pre-specified, transparent methods to identify, appraise, and synthesize all available evidence, providing the most comprehensive and least biased assessment of a research question.

12. You want to study whether a newly introduced workplace safety regulation reduces injury rates. Individual-level exposure data are unavailable, but national injury statistics exist for the years before and after the regulation. The most appropriate design is:

When individual-level data are unavailable and you want to compare population-level outcomes before and after a policy change, a time-trend ecological study is the most feasible and appropriate design.

✦ Complete the final reflection above before submitting