Introduction to Observational Studies
Evaluating Epidemiological Research
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Differentiate between descriptive and explanatory studies
- Differentiate between experimental and observational studies
- Describe the three main elements of the unified approach to observational study design
- Describe the advantages and limitations of case reports, case-series reports, and surveys
- Design a cross-sectional study accounting for its strengths and weaknesses
- Identify circumstances where a cross-sectional study is appropriate
- List three approaches for obtaining incidence estimates from cross-sectional prevalence data
- Differentiate between repeated cross-sectional studies and following a cohort in a longitudinal study
- Apply the STROBE checklist to reporting a cross-sectional study
Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Glossary — Key Terms, People & Concepts
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.
Study Classification & Design Framework
Introduction and Overview
Lessons 1 and 2 worked at the level of the discipline: where epidemiology came from, what counts as knowledge, and how the published record can fail and be appraised. Lesson 3 takes the first step from those framing concerns into the working tools of the field. Almost every bias the manufactured-doubt strategy weaponizes, every analytic error the integrity-and-reform material in Lesson 1 catalogued, and every pre-analytic discipline EGAP demands — all of them play out concretely once you have a study design in front of you. So this lesson is about the design.
The four content sections proceed from general to specific. Section 1 sets up a classification scheme for every study you will encounter for the rest of HSCI 230. Section 2 covers the simplest of those study types — descriptive studies, where a comparison group is absent. Section 3 introduces the first true analytic observational design: the cross-sectional study. Section 4 pushes on its limitations and shows what reporting it well requires. By the end of the lesson, the language you'll need for case-control studies (Lesson 4) and cohort studies (Lesson 5) will already be in place.
Learning Objectives
- Distinguish descriptive from explanatory (analytic) studies by their purpose and inferential claims.
- Distinguish experimental from observational studies in terms of investigator control and confounding.
- Apply Hernan’s and Rubin’s “unified approach” (thought experiment, design before data, forward projection) to a research question.
- Locate any study type within the hierarchy of evidence and explain what causal weight that placement does and does not imply.
Descriptive vs. Explanatory Studies
Epidemiologic studies can be classified into two major categories: descriptive and explanatory (analytic). This classification reflects both the study’s objectives and its ability to support causal inference. The two tabs below define each category in turn; click between them and notice that the difference is not about technique but about purpose — whether the study is built to characterize disease occurrence or to compare groups.
Descriptive studies include case reports, case-series reports, and surveys. They are designed solely to describe the nature and distribution of outcome events such as health-related phenomena. They describe the who, what, when, and where of disease occurrence.
Although a descriptive survey is not designed to assess hypotheses about manipulatable causes of the outcome event, the frequency of the outcome is usually described in categories of age, race, sex, season, and space.
Explanatory studies (also called analytic studies) are designed to make comparisons and contrasts between subgroups of study subjects based on exposure or outcome status. They allow the investigator to identify statistical associations between exposures and outcomes.
Explanatory studies can be further subdivided into experimental and observational studies, depending on whether the investigator controls the allocation of study subjects to exposure groups.
The descriptive–explanatory split asks what a study is trying to do. The next split — experimental versus observational — asks how the explanatory studies do it, and is the distinction this entire lesson hinges on.
Experimental vs. Observational Studies
In experimental studies, the investigator controls (usually through randomisation) the allocation of study subjects to exposure groups. In contrast, in observational studies, the investigators try not to influence the natural course of events for the study subjects.
Key Distinction
In experimental studies, we try to reduce variation from all sources through selection and control of the experimental setting. In observational studies, we embrace the presence of natural variation in order to identify important interactions among key variables and the exposure–disease association.
The price paid through the use of observational studies is that considerable efforts are required to prevent confounding (bias) of the exposure–disease association. Experiments are the preferred choice when the treatment is straightforward and easily manipulated, such as a vaccine or a specific therapeutic agent. The major advantage of the experimental approach is the ability to control potential confounders through the process of randomisation.
If observational studies have to fight confounding the way experiments avoid it, the natural question is how to design them so that the fight is winnable. Two well-known methodologists — Miguel Hernan and Donald Rubin — offer the same answer in slightly different language: borrow the discipline of the experiment, even when you cannot run one. That answer has three parts, summarized in the accordion below.
A Unified Approach to Study Design
Hernan (2005) stressed that when considering an observational study design, we should think about the design of a field experiment to accomplish the same objective. This approach, reinforced by Rubin (2007), emphasises that ‘design trumps analysis’ and that all elements of the study design should be completed before seeing any outcome data. Expand each of the three steps below in order — the order matters, because each step constrains the next.
As a first step in considering an epidemiological study, a ‘thought experiment’ can be accomplished and should specify the key elements of study group, its selection, assignment to exposure, procedures for follow-up, and detecting the outcome. The important part is that formal randomisation would ensure ‘exchangeability’—the groups being compared are so similar that it does not matter which group was assigned to exposure.
All design features are completed before anyone has seen the outcome data. This includes subject exclusion, selection criteria, and control of confounding. Rubin formalises the process through propensity scores (the probability of exposure given the covariates) in the exposed and non-exposed groups. Unless these are virtually equal, some degree of confounding is possible.
After completing the initial design, we project forward to the presentation of study results under 3 different scenarios: (1) the exposure appears to increase risk; (2) the exposure appears to decrease risk; or (3) the exposure does not appear to be associated. For each scenario, we must defend the proposed design. This process helps identify potential weaknesses.
The unified approach gives you the discipline that any single study should have. Stepping back further, we can also rank study types by how much causal weight any one well-designed instance of them can carry. That ranking — the so-called hierarchy of evidence — is the last piece of the classification map for this section.
Hierarchy of Evidence for Causal Inference
From the perspective of drawing causal inferences, experimental studies are generally referred to as the gold standard (hierarchy of evidence). The hierarchy of causal evidence (from strongest to weakest) is typically as shown in the table below. The Lesson 1 caveat is worth holding onto here: the hierarchy describes causal weight, not absolute quality. A well-conducted cross-sectional study can be far more useful than a poorly-conducted RCT for many real-world questions.
| Study Type | Difficulty | Investigator Control | Causal Evidence | Relevance |
|---|---|---|---|---|
| Laboratory trial | Moderate | Very high | Very high | Low |
| Controlled field trial (RCT) | Moderate | High | Very high | High |
| Cohort study | Difficult | High | High | High |
| Case-control study | Moderate | Moderate | Moderate | High |
| Cross-sectional study | Moderate | Low | Low | Moderate |
| Survey | Moderate | Moderate | Not applicable | High |
| Case series | Easy | Very low | Not applicable | Low to high |
| Case report | Very easy | Very low | Not applicable | Low to high |
The reflection below asks you to use the classification scheme we just built to evaluate a research question of your own. The knowledge check that follows tests the conceptual content directly. Once you have worked through both, Section 2 takes the bottom of the hierarchy — the descriptive designs — and asks what useful work they can do despite their lack of a comparison group.
Reflection
Think about a health outcome you are interested in studying. Would an experimental or observational approach be more appropriate, and why? Consider ethical, practical, and scientific factors in your answer.
1. Which of the following BEST distinguishes explanatory from descriptive studies?
2. What is the major advantage of experimental studies over observational studies for causal inference?
3. The ‘unified approach’ to observational study design includes the thought experiment, completing design features before seeing data, and:
Descriptive Studies: Case Reports, Case Series & Surveys
Introduction and Overview
The hierarchy at the end of Section 1 placed descriptive studies at the bottom for causal inference, and that ranking is fair — but it can give the wrong impression about their usefulness. Descriptive studies are where many real epidemiologic investigations start — from James Lind's 1753 treatise on scurvy to John Snow's 1854 mapping of the Broad Street cholera outbreak. They generate the hypotheses that the analytic designs in later sections then test.
Descriptive studies are used to describe the main features of a disease or health-related outcome. Although they are not designed to evaluate associations between exposures and outcomes, the observations made in a descriptive study can form the basis of hypotheses which are then further investigated in analytic studies. Three forms of descriptive studies are case reports, case-series reports, and surveys. The three flip cards below introduce each in turn; click through them and notice how each one expands the scope of observation — from a single unusual case, to a documented group of cases, to a full population sample.
Learning Objectives
- Define case reports, case-series, and surveys and explain what each contributes to the evidence base.
- Explain why descriptive designs cannot establish causal associations but are essential for hypothesis generation.
- Identify situations in public-health practice where descriptive studies are the right tool (emerging outbreaks, rare events, signal detection).
- Read the historical case-series literature with appropriate caveats about generalisability and selection.
Key Characteristics of Study Types
Descriptive studies differ from analytic observational studies in important ways. The following comparison highlights these differences:
Important Limitation
A common feature of both case reports and case-series reports is the absence of a comparison group. Without a comparison group, it is impossible to draw valid conclusions about causal associations. This is why descriptive studies are considered hypothesis-generating rather than hypothesis-testing.
The third descriptive type — the survey — is also the bridge into the rest of the lesson. Once a survey starts collecting risk-factor information alongside disease information, it stops being purely descriptive and becomes the design Section 3 is about.
From Survey to Analytic Study
Kalsbeek and Heiss (2000), and Speybroeck et al (2003) have described the appropriate analysis of surveys bearing in mind the study design. If the survey is designed to collect information about both an outcome of interest and potential exposures (risk factors) beyond the categories of people, place, and time, it then becomes a cross-sectional analytic study and as such, can be used to evaluate associations between exposures and outcomes. The Ontario Hypertension Survey scenario below is a good example of this transition in action; we will return to its design choices in Section 3.
Leenen et al (2008) conducted a survey of the prevalence of hypertension in Ontario. The sampling frame consisted of municipalities and dissemination areas. From 6,436 eligible dwellings, contact was made with 4,559 potential participants. Hypertension prevalence was found to be 21.3% of the population overall. This survey combined both prevalence estimation and risk factor analysis, making it a cross-sectional analytic study.
The reflection below puts the descriptive–analytic boundary to work. Once you have answered it and the knowledge check, Section 3 turns the question around: given that we now have a true analytic observational design (the cross-sectional study), how should we actually build one?
Reflection
Can you think of a disease or health condition for which a case-series report might be the most appropriate initial study design? What hypothesis might it generate for future analytic studies?
1. What is the primary limitation shared by both case reports and case-series reports?
2. A survey becomes a cross-sectional analytic study when it:
3. A case-series report documenting 50 patients with a rare autoimmune condition would be classified as:
Cross-Sectional Studies: Design & Implementation
Introduction and Overview
Section 1 placed observational analytic studies in the larger classification map, and Section 2 walked through the descriptive designs that lead up to them. This section is the first time we sit inside an analytic observational design and look at how it actually works. We start with two classification distinctions that apply to every observational study you will encounter (prospective vs retrospective; the three sampling approaches), and then narrow in on the design that sits at the centre of this lesson: the cross-sectional study.
Learning Objectives
- Distinguish prospective and retrospective observational designs and explain why cross-sectional studies are inherently retrospective.
- Compare the three sampling approaches (cross-sectional, cohort, case-control) and identify the question each is best suited to answer.
- Walk through the four design steps of a cross-sectional study (study group, exposure, outcome, confounding) and apply them to a real example.
- Explain why prevalence — not incidence — is the natural outcome measure for a cross-sectional design.
- Identify situations in which matching is and is not available as a confounding-control strategy.
Observational Studies Overview
Observational studies (a subgroup of analytic or explanatory studies) have an explicit formal contrast as part of their design: the prevalence of the outcome by exposure category groups is the central foundation. They differ from descriptive studies in that the comparison of two or more groups is central, and from experiments in that the researcher has no control over the allocation of study subjects to the exposure groups.
Prospective vs. Retrospective Designs
Observational studies can also be classified as prospective or retrospective. In prospective studies, the disease or outcome has not occurred at the time the study starts. In retrospective studies, both the exposure and the outcome have occurred when the study begins—hence cross-sectional studies are inherently retrospective in nature.
Sampling Drives the Design
Three Main Approaches
The choices of observational analytic study design have traditionally been among 3 approaches based on how study subjects are selected:
- Cross-sectional study: A sample is obtained from the source population, and the prevalence of both disease and exposure is determined at the time of subject selection.
- Cohort study: A sample of study subjects from a source population with heterogeneous exposure levels is obtained, and the incidence of the outcome in the follow-up period is determined. Landmark examples include the Framingham Heart Study (Dawber, Meadors, & Moore, 1951) and the Whitehall II study of British civil servants (Marmot et al., 1991).
- Case-control study: Subjects with the outcome (cases) are identified and their exposure history is contrasted with the exposure history of a sample of non-case subjects (controls). Doll & Hill's (1950) case-control study of smoking and lung cancer, and Herbst, Ulfelder & Poskanzer's (1971) investigation of DES and vaginal adenocarcinoma, are the canonical examples.
Of the three sampling approaches above, the cross-sectional study is the one this lesson will fully dissect. Lessons 5 and 6 will pick up case-control and cohort designs in the same way. So the rest of this section is the architectural tour of one design, but the questions it asks — how to obtain study subjects, how to assess exposure, how to define the outcome, how to control confounding — are the same questions you will be asking of every observational design from now on.
Cross-Sectional Study Design
The defining feature of a cross-sectional study is that it is an observational study whose outcome frequency measure is prevalence. The basis of the cross-sectional design is that a sample, or census, of subjects is obtained from the source population and the presence or absence of the outcome is ascertained at that point. The accordion below walks through the four design steps in the order a researcher would take them: define the study group, measure exposure, ascertain outcome, and design in protections against confounding from the start.
If the researcher wants to make inferences about the frequency of the outcome in a target population, then study subjects should be obtained by a formal random sampling procedure. The source population is the listing (real or implied) of potential study subjects from which the study group is obtained. The study group is that set of subjects who agree to take part in the study.
Exposure and other covariate status, such as demographic data, are obtained at the time of study subject selection or first contact/examination. Because the outcome measure is prevalence, it is sometimes difficult to know the appropriate time frame in which the exposure, if time-varying, might cause the outcome. Studying currently (prevalent) exposed subjects can also lead to bias when interpreting the impact of these exposures.
It is important to clearly define the outcome/disease of interest. In general, great care should be used if the outcome is a surrogate for a clinically important event. It is also important that widely accepted diagnostic criteria be used to identify the disease or outcome of interest.
The two main approaches used to prevent bias from factors associated with the outcome and whose distribution differs between exposure groups (confounders) are exclusion (restricted sampling) and analytic (statistical) control. Matching to prevent confounding cannot be applied in cross-sectional studies. Analytic control requires the use of a multivariable model.
The scenario below is a real Canadian example that uses every step of the accordion above. As you read it, ask yourself what the researchers did at each step — how the source population was defined, how exposure was measured, how the outcome was ascertained — and where you can already see the design being constrained by the cross-sectional choice. The reflection that follows is built around exactly that question.
Lanes et al (2011) conducted a cross-sectional study of postpartum depression (PPDS) among Canadian women. The survey used the Edinburgh Postnatal Depression Scale (EPDS) as the outcome measure. Potential risk factors included socioeconomic status, demographic factors, and maternal characteristics. Of 8,542 selected women, 6,421 responded. The national prevalence of minor/major and major PPDS was found to be 8.46% and 8.69% respectively. The mother’s stress level during pregnancy and prior depression had the strongest associations.
Reflection
In the postpartum depression study described above, the exposure ‘stress during pregnancy’ was measured retrospectively at the same time as the outcome. What challenges does this create for causal inference? How might you address these challenges?
1. The defining feature of a cross-sectional study is that its outcome frequency measure is:
2. Cross-sectional studies are inherently:
3. Which approach to controlling confounding CANNOT be applied in cross-sectional studies?
Limitations, Incidence Estimation & Reporting
Introduction and Overview
Section 3 introduced the cross-sectional design and walked through how to build one. This section is its honest counterweight. Once a study is in the field, three practical questions tend to arrive in sequence: What can this design not tell us? Can we squeeze incidence information out of it? And how should we report the whole thing so the next reader can judge it? Those three questions are the three subsections that follow.
Learning Objectives
- Explain why cross-sectional studies confound prevalence with disease duration and how that produces the reverse-causation problem.
- Identify the conditions under which a cross-sectional design can support a defensible causal inference (time-invariant exposures, plausible temporality).
- Describe the conditions under which incidence can be estimated from cross-sectional data, and the assumptions required.
- Apply the STROBE checklist as a reporting standard for observational research and explain why each section matters to the next reader.
Inferential Limitations of Cross-Sectional Studies
By its nature, a cross-sectional study design measures prevalence, which is a function of both incidence and duration of the disease. Consequently, it is often difficult to disentangle factors associated with persistence of the outcome from factors associated with developing the outcome in the first instance (i.e., becoming a new incident case).
The Reverse Causation Problem
When the exposure factors are time-varying, it is often very difficult to differentiate cause and effect. For example, if one is studying the relationship between dog ownership and blood pressure, and the association is negative, one cannot differentiate between people that obtained a dog because they had low blood pressure from those whose lifestyle changed, consequently lowering their blood pressure after obtaining a dog. The more changeable the exposure, the worse this issue becomes.
Cross-sectional studies are best suited for time-invariant exposures such as race or sex. In these instances, the investigator can be certain that the exposure preceded, or at least was not caused by, the outcome.
Reverse causation and the prevalence/duration confound are the conceptual limits of the design. The next thing to learn is what the design does let you do, computationally — namely, the cross-tabulation that almost every analytic move in this course is built on.
What you'll do: simulate 200 people with a known exposure-disease relationship, build the 2×2 contingency table, and read prevalence off each row. What to take away: the 2×2 table is the unit of analysis the rest of the course is built on; every measure of association you will meet in Lessons 5 and 6 starts from a structure that looks exactly like this.
Most observational studies start the same way: cross-tabulate exposure against outcome. Below we simulate a small cross-sectional dataset, build the 2×2 table, and read off the prevalence of disease in exposed vs unexposed.
# Simulate 200 people: half exposed, with higher disease prevalence among exposed.
set.seed(230)
n <- 200
exposure <- rep(c("exposed", "unexposed"), each = n/2)
disease <- c(rbinom(100, 1, 0.30), rbinom(100, 1, 0.10))
dat <- data.frame(exposure = exposure, disease = disease)
# 2x2 cross-tabulation -- the workhorse of descriptive epi.
tab <- table(dat$exposure, dat$disease,
dnn = c("Exposure", "Disease"))
tab
# Prevalence (proportion with disease=1) within each exposure group.
prop.table(tab, margin = 1) # margin=1 -> divide each row by its row total
Reading the table. 29% of the exposed group have disease vs 8% of the unexposed. That contrast — and the rows that produced it — is the launchpad for every measure of association you will learn in this course.
R Reflect on what you just ran
Use the questions below to interpret the output you produced. Look at your console table before answering.
1. Look at the raw tab output. How many exposed individuals had disease? How many unexposed had disease? Write the four cell counts (a, b, c, d) in the standard 2×2 layout (exposed-diseased, exposed-non-diseased, unexposed-diseased, unexposed-non-diseased).
tab: exposed with disease ≈ 29, exposed without disease ≈ 71; unexposed with disease ≈ 8, unexposed without disease ≈ 92. In standard layout: a = 29 (exposed/diseased), b = 71 (exposed/non-diseased), c = 8 (unexposed/diseased), d = 92 (unexposed/non-diseased). Totals: 100 exposed and 100 unexposed, 37 diseased and 163 not, n = 200. From these you can read off prevalence (a/(a+b)) for the exposed group, the prevalence ratio (PR), and the odds ratio (OR) — the three numbers that drive interpretation of any cross-sectional dataset.2. Using prop.table(tab, margin = 1), the prevalence of disease was 0.29 in the exposed group and 0.08 in the unexposed group. Why is this a measure of prevalence and not incidence, given the design that generated the data?
3. The simulation used rbinom(100, 1, 0.30) for the exposed and 0.10 for the unexposed. What would happen to the observed exposed-group prevalence if you changed the seed or doubled n to 400 per group? Would the contrast between groups become more or less reliable?
Estimating Incidence from Cross-Sectional Studies
Although cross-sectional studies directly measure prevalence, there are approaches for estimating incidence from prevalence data. This is often desirable because incidence data are more useful for causal inference. The three tabs below introduce the most common approaches in increasing order of analytical sophistication. Each one trades a different practical cost (run two surveys, develop two assays, build a mathematical model) for a different epistemic gain.
A simple way to obtain population-level incidence data is to perform two cross-sectional studies, one before and one after an event of interest. For example, Miller et al (2010) performed two cross-sectional studies before and after the 2009 H1N1 epidemic in England, giving a population-based estimate of incidence.
Other approaches include using two different tests—one that detects early immune response and one that detects long-lasting immunity. People who test negatively to the less sensitive test are followed forward for a defined time period to ascertain how many become positive. This approach has been refined for HIV studies.
Rajan and Sokal (2011) describe how to estimate age-specific incidence from prevalence data. Their general approach uses two prevalence estimates at different time points. The incidence rate at year ‘a’ is:
where ‘n’ is the time between the two prevalence estimates (Pa and Pa+n) in the cross-sectional survey.
The first incidence-estimation approach — running two cross-sections at different time points — raises a question of its own. If we are willing to track the same population twice, why not just track the same individuals through time? That is the difference between a repeated cross-section and a cohort.
Repeated Cross-Sectional vs. Cohort Studies
Sometimes it is desirable to follow a population over time. Two options exist: repeated cross-sectional samplings of the population, or a longitudinal study of the initial study subjects (a cohort approach). Each has distinct advantages, and the choice between them is a real one. The two cards below put their strengths and limitations side by side — cohorts return in detail in Lesson 5.
The previous subsections covered what cross-sectional studies cannot do, what they can do, and how they relate to longitudinal designs. The last piece of the puzzle is reporting — how to write up an observational study so the next reader can judge it without having to reverse-engineer the methods.
Reporting Observational Studies: The STROBE Statement
In 2004, a network of methodologists, researchers, and journal editors established what we now know as the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (von Elm et al., 2007; explained in detail by Vandenbroucke et al., 2007) — one of the EQUATOR-network reporting guidelines we previewed in Lesson 1. It provides a checklist of 22 items considered essential for good reporting of observational studies.
STROBE Checklist Key Sections
The STROBE checklist covers: Title & Abstract (indicate study design), Introduction (background, objectives, hypotheses), Methods (study design, setting, participants, variables, data sources, bias, sample size, statistical methods), Results (participants, descriptive data, outcome data, main results), and Discussion (key results, limitations, interpretation, generalisability).
The reflection below is the practical exit ticket for this section — a textbook reverse-causation case dressed up in plausible language. After working through it and the knowledge check, the lesson moves to its comprehensive final assessment, which integrates the classification map (Section 1), the descriptive designs (Section 2), the cross-sectional design itself (Section 3), and the limitations and reporting standards covered here.
Reflection
Consider a cross-sectional study that finds an association between pet ownership and lower blood pressure. Explain why this finding cannot be interpreted as causal evidence that pet ownership lowers blood pressure. What study design would be more appropriate?
1. The primary reason cross-sectional studies have limited ability to support causal inference is:
2. Cross-sectional studies are best suited for studying exposures that are:
3. The STROBE statement provides:
The companion R script r-activities/HSCI_230_Lesson_3_Introduction_to_Observational_Studies.R simulates a cross-sectional dataset of 200 people (half exposed, half unexposed) with a true disease prevalence of 30% in the exposed and 10% in the unexposed, then constructs the 2×2 cross-tabulation that anchors nearly every observational analysis. You will compute row-wise prevalence with prop.table(..., margin = 1) and, as a stretch, derive a prevalence ratio — a first taste of the measures-of-association we revisit in later lessons.
# Simulate 200 people: half exposed, with higher disease prevalence among exposed
set.seed(230)
n <- 200
exposure <- rep(c("exposed", "unexposed"), each = n/2)
disease <- c(rbinom(100, 1, 0.30), rbinom(100, 1, 0.10))
dat <- data.frame(exposure = exposure, disease = disease)
# 2x2 cross-tabulation -- the workhorse of descriptive epi
tab <- table(dat$exposure, dat$disease,
dnn = c("Exposure", "Disease"))
tab
# Prevalence (proportion with disease=1) within each exposure group
prop.table(tab, margin = 1) # margin=1 -> divide each row by its row total
## -----------------------------------------------------------------------------
## Stretch: prevalence ratio (a quick preview of measures-of-association)
## -----------------------------------------------------------------------------
prev_exposed <- prop.table(tab, margin = 1)["exposed", "1"]
prev_unexposed <- prop.table(tab, margin = 1)["unexposed", "1"]
prev_ratio <- prev_exposed / prev_unexposed
cat("Prevalence ratio:", round(prev_ratio, 2), "\n")
Lesson 3 — Final Review & Assessment
Bringing It All Together
This lesson started with a classification map (Section 1), walked down it to the descriptive designs (Section 2), then up into the first true analytic observational design (Section 3), and finished by interrogating that design’s honest limits and reporting standards (Section 4). The arc was deliberate. The vocabulary you now have — descriptive vs. explanatory, experimental vs. observational, prospective vs. retrospective, the three sampling approaches, prevalence vs. incidence, the unified approach to design, the STROBE reporting framework — is the working vocabulary of every observational study you will read for the rest of HSCI 230.
The deeper move was the one Hernan and Rubin pushed: design trumps analysis. An observational study cannot randomise its way out of confounding, but it can be built so that the analytic decisions are committed in advance, the source population is clearly defined, and the confounding strategy is not an afterthought. That is what the unified approach (the thought experiment, design before seeing data, forward projection) makes operational. Applied seriously, it converts an observational study from something the manufactured-doubt strategy can attack into something a critical reader can actually evaluate.
The two designs that come next — the case-control study (Lesson 4) and the cohort study (Lesson 5) — do not introduce a new framework. They are two more sampling approaches inside the one you already have, each making different trade-offs around how subjects are selected, how outcomes and exposures are ascertained, and what measure of association the design naturally produces. Read those lessons through the lens of this one and the through-line will be clear.
Key Takeaways from Lesson 3
- Descriptive vs. explanatory is a distinction of purpose: descriptive studies characterise disease occurrence and generate hypotheses; explanatory studies compare groups to test them.
- Experimental vs. observational is a distinction of investigator control: experiments randomise to manage confounding; observational studies must build in confounding control through design.
- The unified approach — thought experiment, design before data, forward projection — is how an observational study borrows the discipline of an experiment without being one.
- Cross-sectional studies measure prevalence, not incidence, which makes them useful for time-invariant exposures and unreliable for time-varying ones because of the reverse-causation problem.
- Matching is not available in cross-sectional studies; restriction and statistical adjustment are.
- STROBE-compliant reporting is part of an observational study’s integrity, not an administrative add-on — it is what allows the next reader to judge whether the design carries the inference the authors claim.
Final Reflection
Reflect on the full range of study designs discussed in this lesson. If you were asked to investigate the relationship between a novel environmental exposure and a chronic health outcome, what type of study would you begin with and why? How might your study design evolve as evidence accumulates?
Lesson 3 Comprehensive Assessment
This assessment covers all sections of Lesson 3. You must answer all 15 questions correctly to complete the lesson. Review the feedback after each attempt.
1. Which study type is designed to make comparisons between subgroups based on exposure or outcome status?
2. The key difference between experimental and observational studies is:
3. The ‘thought experiment’ in the unified approach to study design involves:
4. Case reports are considered useful primarily because they:
5. A case-series report should document which of the following?
6. In the study classification hierarchy, which observational study design provides the strongest evidence for causal inference?
7. In a cross-sectional study, subjects are selected from the source population based on:
8. The natural measure of association in a cross-sectional study of a binary outcome is:
9. A major limitation of cross-sectional studies when studying time-varying exposures is:
10. Prevalence is a function of:
11. Repeated cross-sectional studies are preferred over cohort studies when:
12. Which of the following is NOT a component of the STROBE checklist?
13. Propensity scores in observational study design are used to:
14. A purposive non-random sample in a cross-sectional study primarily threatens:
15. A study that samples subjects from the general population, measures their current blood pressure and dietary habits simultaneously, and compares blood pressure between high-salt and low-salt diet groups is: