Review of Study
Design Concepts
Fundamental Epidemiological Concepts and Approaches
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Distinguish between observational and experimental study designs
- Compare and contrast descriptive, analytic, and cross-sectional studies
- Describe the key features, strengths, and limitations of cohort and case-control studies
- Identify the direction of inquiry and appropriate measures of association for each design
- Explain the logic and limitations of ecological studies
- Define the ecologic fallacy and recognize when it may occur
- Describe how systematic reviews synthesize evidence across study types
- Apply study design concepts to select appropriate designs for research questions
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Review of Observational Study Designs
⏱ Estimated reading time: 15 minutes
Learning Objectives
- Distinguish between observational and experimental studies.
- Differentiate descriptive from analytic study designs.
- Describe the features and uses of cross-sectional studies.
- Understand the role of study design in the hierarchy of evidence.
Observational vs. Experimental Studies
All epidemiologic studies can be broadly classified into two categories based on whether the investigator assigns the exposure. In experimental studies (also called intervention studies or controlled trials), the researcher deliberately allocates exposure — for example, assigning participants to receive a new treatment or a placebo. In observational studies, the researcher simply observes and measures exposures and outcomes as they occur naturally, without intervening.
This distinction matters enormously for causal inference. Experimental designs, particularly randomized controlled trials (RCTs), can establish exchangeability through random assignment, which strengthens our ability to attribute differences in outcomes to the exposure. Observational studies, by contrast, must contend with the possibility that exposed and unexposed groups differ in ways that also affect the outcome — that is, they must account for confounding.
Why Observational Studies Remain Essential
Despite the inferential strengths of experiments, most epidemiologic research is observational. Many important exposures — smoking, occupational hazards, environmental pollutants, dietary patterns — cannot ethically be assigned to participants. Others would be impractical to study experimentally because the outcomes are rare or take decades to develop. Observational designs therefore remain the backbone of epidemiologic evidence for most health questions.
Descriptive vs. Analytic Studies
Within observational epidemiology, a further distinction exists between descriptive and analytic study designs.
Descriptive Studies
Descriptive studies characterize the distribution of disease or health states in a population. They address questions of who, what, where, and when. Common descriptive designs include case reports, case series, and surveys that estimate disease prevalence or describe the demographic characteristics of affected populations.
Descriptive studies are often the first step in epidemiologic investigation. They generate hypotheses about potential risk factors but do not formally test causal associations. For example, a case series describing an unusual cluster of pneumonia cases among young men in Los Angeles in 1981 was among the first reports that led to the identification of HIV/AIDS.
Analytic Studies
Analytic studies go beyond description to evaluate associations between exposures and outcomes. They incorporate a comparison group — exposed vs. unexposed, cases vs. controls — and use statistical methods to estimate the strength and direction of associations. The major analytic observational designs are cohort studies, case-control studies, and cross-sectional analytic studies.
Analytic studies are designed to test hypotheses. They ask why and how questions: Does smoking increase the risk of lung cancer? Is a high-sodium diet associated with hypertension? By including a comparison group and measuring the magnitude of association, they move us closer to causal inference.
Key Differences at a Glance
Descriptive: No formal comparison group. Generates hypotheses. Focuses on patterns of disease distribution (person, place, time). Examples: case reports, prevalence surveys, ecological studies (when purely descriptive).
Analytic: Includes a comparison group. Tests hypotheses. Estimates measures of association (risk ratios, odds ratios, rate ratios). Examples: cohort studies, case-control studies, cross-sectional analytic studies.
The boundary between these categories is not always rigid. A cross-sectional survey that only reports prevalence is descriptive, but the same survey becomes analytic when it compares prevalence across exposure groups and estimates odds ratios.
Cross-Sectional Studies
A cross-sectional study measures exposure and outcome status simultaneously in a defined population at a single point in time (or over a short period). It provides a "snapshot" of the population, allowing researchers to estimate the prevalence of disease and the prevalence of various exposures.
Key Characteristics
- Timing: Exposure and outcome are assessed at the same time — there is no follow-up period.
- Measure of disease frequency: Prevalence (not incidence), because you are measuring how many people currently have the condition.
- Measure of association: The prevalence ratio or prevalence odds ratio. Because exposure and outcome are measured simultaneously, temporal sequence is often unclear.
- Sampling: Participants are typically sampled from a defined population regardless of exposure or disease status.
Example: Cross-Sectional Study of Diabetes and Physical Activity
Researchers survey 5,000 adults in a city, measuring their current level of physical activity and whether they have been diagnosed with type 2 diabetes. They find that 12% of sedentary adults have diabetes compared to 4% of active adults. The prevalence ratio is 12/4 = 3.0, suggesting sedentary adults are three times as likely to currently have diabetes.
However, can we conclude that being sedentary caused diabetes? Not necessarily — some people may have become sedentary because of their diabetes. This is the core limitation of cross-sectional designs: the inability to establish temporal sequence.
Strengths and Limitations
- Relatively quick and inexpensive to conduct.
- Can study multiple exposures and outcomes simultaneously.
- Useful for estimating disease prevalence and planning health services.
- Good for generating hypotheses that can be tested with analytic designs.
- Can be based on existing data sources (e.g., national health surveys).
- Cannot establish temporal sequence — you do not know whether the exposure preceded the outcome.
- Measures prevalence, not incidence, so results are influenced by disease duration (conditions that last longer are over-represented).
- Subject to prevalence-incidence bias (also called Neyman bias or survival bias): people with rapidly fatal or quickly resolved conditions may not be captured.
- Susceptible to confounding, and the temporal ambiguity makes it harder to identify and control for confounders.
Prevalence vs. Incidence: Why It Matters
Cross-sectional studies capture prevalent cases — people who currently have the disease. This pool of cases over-represents conditions that are long-lasting and under-represents conditions that are rapidly fatal or quickly cured. As a result, associations observed in cross-sectional data may not reflect the actual causes of disease onset. For etiologic research, designs that measure incidence (new cases over time) — such as cohort studies — are generally preferred.
Key Takeaways
- Observational studies observe naturally occurring exposures; experimental studies assign them.
- Descriptive studies characterize disease distribution; analytic studies test hypotheses about associations.
- Cross-sectional studies provide a snapshot of prevalence at a single point in time.
- The inability to establish temporal sequence is the primary limitation of cross-sectional designs.
- Prevalence-based measures can be distorted by disease duration, making cross-sectional studies less suitable for etiologic inference.
1. What is the key difference between observational and experimental studies?
2. A cross-sectional study measures:
3. The primary limitation of cross-sectional studies for causal inference is:
✦ Pass the knowledge check with 100% to continue
Review of Cohort & Case-Control Studies
⏱ Estimated reading time: 18 minutes
Learning Objectives
- Describe the design, direction of inquiry, and key features of cohort studies.
- Describe the design, direction of inquiry, and key features of case-control studies.
- Identify the appropriate measures of association for each design.
- Compare the strengths and limitations of both designs.
Cohort Studies
A cohort study begins by identifying a group of individuals (a cohort) who are free of the outcome of interest, classifying them by exposure status, and following them over time to observe whether the outcome develops. The direction of inquiry moves from exposure to outcome — the same temporal direction as causation itself.
Types of Cohort Studies
Prospective (Concurrent) Cohort Study
Participants are enrolled in the present, exposure status is assessed, and they are followed forward in time to observe the development of outcomes. This is the classic cohort design. The Framingham Heart Study, which began enrolling participants in 1948 and continues to follow their descendants, is one of the most famous examples.
Advantage: Exposure is measured before the outcome occurs, minimizing recall bias and clearly establishing temporal sequence.
Disadvantage: Can be very expensive and time-consuming, especially for diseases with long latency periods or low incidence.
Retrospective (Historical) Cohort Study
The investigator uses historical records (e.g., employment records, medical charts, or registry data) to reconstruct a cohort whose exposure status was determined in the past. The outcomes may have already occurred or can be ascertained in the present.
Advantage: Faster and less expensive than a prospective study because the waiting period for outcome development has already elapsed.
Disadvantage: Relies on the quality and completeness of existing records. Important variables may not have been measured or may be recorded inconsistently.
Direction of Inquiry
Figure 8.1 — Direction of inquiry in cohort vs. case-control studies. Cohort studies move from exposure to outcome; case-control studies start with outcome and look back at exposure.
Measures of Association in Cohort Studies
Because cohort studies follow participants over time and observe the development of new cases, they can directly estimate incidence. This allows the calculation of:
- Risk Ratio (Relative Risk): The ratio of cumulative incidence in the exposed group to cumulative incidence in the unexposed group. RR = (a/(a+b)) / (c/(c+d)).
- Rate Ratio: The ratio of incidence rates (person-time denominators) in exposed vs. unexposed groups.
- Risk Difference (Attributable Risk): The absolute difference in incidence between exposed and unexposed groups.
Why Cohort Studies Are Powerful
The ability to measure incidence directly is the central strength of the cohort design. It establishes temporal sequence (exposure precedes outcome), allows calculation of multiple measures of association, and can study multiple outcomes associated with a single exposure. For rare exposures, cohort studies are particularly efficient because you can intentionally over-sample exposed individuals.
Case-Control Studies
A case-control study begins by identifying individuals who have the outcome of interest (cases) and a comparable group who do not (controls). The investigator then looks backward in time to compare the exposure histories of the two groups. The direction of inquiry moves from outcome to exposure.
Selecting Cases and Controls
The validity of a case-control study depends critically on appropriate selection of cases and controls:
Cases should be clearly defined using consistent diagnostic criteria. They may be incident cases (newly diagnosed) or prevalent cases (existing), though incident cases are preferred to reduce survival bias. Cases are typically identified from hospitals, disease registries, surveillance systems, or population-based sampling frames.
Controls should come from the same source population that gave rise to the cases. They should be people who, had they developed the disease, would have been identified as cases in the study. Common sources include hospital-based controls (patients admitted for other conditions), population-based controls (random samples from the community), and neighbourhood or friend controls. The choice of control group is often the most scrutinized aspect of a case-control study.
Matching involves selecting controls that are similar to cases on specific characteristics (e.g., age, sex, geography). Matching helps control for confounding by these variables and can improve statistical efficiency. However, matched variables can no longer be evaluated as risk factors, and matched analyses require conditional logistic regression.
Measures of Association in Case-Control Studies
Because case-control studies do not follow a defined cohort over time, they cannot directly calculate incidence. Therefore, the risk ratio cannot be computed directly. Instead, the primary measure of association is the odds ratio (OR):
The Odds Ratio
The odds ratio compares the odds of exposure among cases to the odds of exposure among controls: OR = (a/c) / (b/d) = ad/bc. Under certain conditions — particularly when the disease is rare in the source population (the "rare disease assumption") — the odds ratio approximates the risk ratio. This is why case-control studies are especially useful for studying rare diseases: the rare disease assumption is most likely to hold, and the design efficiently identifies a sufficient number of cases.
Comparing Cohort and Case-Control Designs
| Feature | Cohort Study | Case-Control Study |
|---|---|---|
| Direction of inquiry | Exposure → Outcome | Outcome → Exposure |
| Starting point | Defined by exposure status | Defined by disease status |
| Measures incidence directly? | Yes | No |
| Primary measure of association | Risk ratio, rate ratio | Odds ratio |
| Best suited for | Rare exposures, multiple outcomes | Rare diseases, multiple exposures |
| Temporal sequence | Clearly established | Relies on retrospective data |
| Cost and time | Often expensive and lengthy | Generally less expensive and faster |
| Key biases | Loss to follow-up, selective attrition | Recall bias, selection bias in controls |
Choosing the Right Design: A Practical Example
Suppose you want to study whether exposure to a specific industrial solvent increases the risk of a rare liver cancer. A prospective cohort study would require following thousands of exposed and unexposed workers for decades — extremely expensive and slow. A case-control study, by contrast, could identify 200 liver cancer cases from a cancer registry, select 400 matched controls, and assess past occupational exposure through interviews and employment records — producing results in months rather than years.
For rare diseases, case-control studies are usually the design of choice because they can efficiently identify enough cases to detect meaningful associations.
Key Takeaways
- Cohort studies follow exposed and unexposed groups forward in time to observe outcome development; they can directly calculate incidence and risk ratios.
- Case-control studies compare exposure histories of cases and controls; they estimate the odds ratio, which approximates the risk ratio when the disease is rare.
- Cohort studies are preferred for rare exposures and when multiple outcomes are of interest; case-control studies are preferred for rare diseases and when multiple exposures are of interest.
- Both designs are susceptible to specific biases: cohort studies to loss to follow-up, case-control studies to recall bias and control selection bias.
1. In a cohort study, the direction of inquiry is:
2. The odds ratio is the primary measure of association in case-control studies because:
3. Case-control studies are especially efficient for studying:
✦ Pass the knowledge check with 100% to continue
Ecological Studies & Evidence Synthesis
⏱ Estimated reading time: 15 minutes
Learning Objectives
- Describe the design and purpose of ecological studies.
- Explain the concept of group-level variables and the ecologic fallacy.
- Discuss how systematic reviews synthesize evidence across study types.
- Understand where ecological studies and systematic reviews fit in the hierarchy of evidence.
Ecological Studies
An ecological study (also called a group-level or aggregate study) uses groups — rather than individuals — as the unit of analysis. Instead of measuring exposure and outcome in each person, the investigator compares exposure levels and disease rates across defined populations, such as countries, regions, or time periods.
Types of Ecological Studies
Click each card to learn more:
ComparisonClick to learn more
(Temporal)Click to learn more
DesignClick to learn more
Group-Level Variables
A distinctive feature of ecological studies is their reliance on group-level variables. These can be classified into three types:
Aggregate measures are summaries of individual-level data for a group — for example, the mean blood pressure of a country's population, or the percentage of a county's residents who smoke. These are the most common type of group-level variable and are derived from individual measurements.
Environmental measures are physical or chemical characteristics of the environment shared by all members of a group — for example, ambient air pollution levels in a city, fluoride concentration in a water supply, or average annual temperature. These are inherently group-level and cannot be measured at the individual level.
Global measures are attributes of the group that have no individual-level analogue — for example, population density, the existence of a specific public health law, type of healthcare system, or the Gini coefficient (income inequality). These contextual factors can only be measured at the group level and are particularly interesting because they may represent structural or policy-level determinants of health.
The Ecologic Fallacy
The ecologic fallacy is the most important limitation of ecological studies. It occurs when an association observed at the group level is incorrectly assumed to hold at the individual level. Just because countries with higher fat consumption have higher breast cancer rates does not mean that the individuals within those countries who eat more fat are the same individuals who develop breast cancer.
Classic Example: Durkheim and Suicide
Emile Durkheim observed that regions with higher proportions of Protestant residents had higher suicide rates than predominantly Catholic regions. However, this group-level association does not necessarily mean that Protestant individuals were more likely to commit suicide. It is possible that the social environment of predominantly Protestant regions — perhaps characterized by greater individualism or weaker social networks — affected everyone living there, including Catholics. Attributing the group-level finding to individual Protestants would be an ecologic fallacy.
The ecologic fallacy arises because within-group variation is hidden when data are aggregated. Individuals who are exposed and individuals who develop disease may be entirely different people within the same group.
Why Ecological Studies Are Still Valuable
Despite the ecologic fallacy, ecological studies serve several important purposes. They are often the only feasible design when individual-level data are unavailable (e.g., comparing disease rates across countries using national statistics). They are useful for studying the effects of policies, environmental exposures, and other group-level variables that cannot be measured individually. They are also inexpensive and quick, making them excellent for hypothesis generation. The key is to interpret ecological associations cautiously and seek corroboration from individual-level studies.
Evidence Synthesis: Systematic Reviews
No single study, regardless of design, can definitively establish a causal relationship. Scientific knowledge accumulates through the synthesis of evidence across multiple studies, populations, and designs. Systematic reviews are the formal method for achieving this synthesis.
What Is a Systematic Review?
A systematic review uses a pre-specified, transparent, and reproducible protocol to identify, appraise, and synthesize all available evidence on a specific research question. Unlike a narrative review (where the author selects studies subjectively), a systematic review:
- Defines a precise research question (often using the PICO framework: Population, Intervention/Exposure, Comparison, Outcome).
- Conducts comprehensive, systematic searches of multiple databases.
- Applies explicit inclusion and exclusion criteria.
- Critically appraises the quality and risk of bias of each included study.
- Synthesizes findings quantitatively (meta-analysis) or qualitatively.
Meta-Analysis
When studies are sufficiently similar, their results can be combined statistically in a meta-analysis to produce a pooled estimate of effect. Meta-analysis increases statistical power, provides more precise estimates, and can explore heterogeneity across studies (e.g., do results differ by study design, population, or exposure definition?).
The Hierarchy of Evidence
Systematic reviews and meta-analyses of well-conducted studies sit at the top of the traditional evidence hierarchy. Below them are randomized controlled trials, then cohort studies, then case-control studies, then cross-sectional studies, then ecological studies, and finally case reports and expert opinion. However, the hierarchy is a guideline, not an absolute rule: a well-designed cohort study may provide stronger evidence than a poorly conducted RCT.
Reflection: Thinking Across Study Designs
Consider a public health question you care about (e.g., the effect of air pollution on childhood asthma, or the impact of a sugar tax on obesity rates). Which study designs would be most appropriate for different aspects of this question? Why might you need evidence from multiple designs to draw a convincing causal conclusion?
Minimum 20 characters required.
Key Takeaways
- Ecological studies use groups as the unit of analysis and can examine aggregate, environmental, or global measures.
- The ecologic fallacy occurs when group-level associations are incorrectly attributed to individuals.
- Ecological studies are valuable for hypothesis generation, policy evaluation, and studying group-level exposures, but results must be interpreted cautiously.
- Systematic reviews use transparent, reproducible methods to synthesize evidence across studies.
- Meta-analysis can pool results statistically for greater precision and power.
- No single study design is sufficient for establishing causation; converging evidence from multiple designs strengthens causal inference.
1. The ecologic fallacy occurs when:
2. Population density is an example of which type of group-level variable?
3. A systematic review differs from a narrative review primarily because it:
✦ Complete the reflection and pass the knowledge check with 100% to continue
Knowledge Check & Final Assessment
⏱ Estimated time: 15 minutes
Lesson Summary
In this review lesson, you have refreshed your understanding of the major observational study designs used in epidemiologic research:
- Observational vs. experimental studies: The fundamental distinction based on whether the investigator assigns the exposure.
- Descriptive vs. analytic studies: Descriptive studies characterize disease distribution; analytic studies test hypotheses about exposure-outcome associations using comparison groups.
- Cross-sectional studies: Measure exposure and outcome simultaneously, yielding prevalence data but unable to establish temporal sequence.
- Cohort studies: Follow exposed and unexposed groups forward in time to measure incidence and calculate risk ratios — the strongest observational design for establishing temporal sequence.
- Case-control studies: Compare exposure histories of cases and controls to estimate odds ratios — efficient for rare diseases but susceptible to recall and selection bias.
- Ecological studies: Analyze group-level data, useful for policy evaluation and hypothesis generation, but vulnerable to the ecologic fallacy.
- Systematic reviews and meta-analyses: Synthesize evidence across studies using transparent, reproducible methods, sitting at the top of the evidence hierarchy.
These concepts form the foundation for the study designs you will encounter next: hybrid designs (Lesson 11) and controlled studies (Lesson 12).
Final Reflection
Imagine you are an epidemiologist investigating whether long-term exposure to a newly identified environmental contaminant increases the risk of a rare autoimmune disorder. Describe which study design(s) you would consider, why, and what specific strengths and limitations you would need to address. Consider how you might use multiple study designs to build a stronger evidence base.
Minimum 20 characters required.
Final Knowledge Assessment
Complete the following 12-question assessment. A score of 100% is required to complete the lesson. You may retake the assessment as many times as needed.
1. In an observational study, the investigator:
2. An analytic study differs from a descriptive study primarily because it:
3. The main measure of disease frequency in a cross-sectional study is:
4. A retrospective cohort study differs from a prospective cohort study in that:
5. Which measure of association can be directly calculated from a cohort study but NOT from a case-control study?
6. The "rare disease assumption" in case-control studies refers to the idea that:
7. A key strength of cohort studies is that they:
8. Recall bias is a particular concern in:
9. An ecological study uses which unit of analysis?
10. A country with high average alcohol consumption also has high rates of liver cirrhosis. Concluding that the individuals who drink the most in that country are the ones with cirrhosis would be an example of:
11. Systematic reviews sit at the top of the evidence hierarchy because they:
12. You want to study whether a newly introduced workplace safety regulation reduces injury rates. Individual-level exposure data are unavailable, but national injury statistics exist for the years before and after the regulation. The most appropriate design is:
✦ Complete the final reflection above before submitting