HSCI 230 — Lesson 8

Introduction to Observational Studies

Evaluating Epidemiological Research

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Differentiate between descriptive and explanatory studies
  • Differentiate between experimental and observational studies
  • Describe the three main elements of the unified approach to observational study design
  • Describe the advantages and limitations of case reports, case-series reports, and surveys
  • Design a cross-sectional study accounting for its strengths and weaknesses
  • Identify circumstances where a cross-sectional study is appropriate
  • List three approaches for obtaining incidence estimates from cross-sectional prevalence data
  • Differentiate between repeated cross-sectional studies and following a cohort in a longitudinal study
  • Apply the STROBE checklist to reporting a cross-sectional study

Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Study Classification & Design Framework

Sections 7.1–7.2 of Dohoo, Martin & Stryhn

Descriptive vs. Explanatory Studies

Epidemiologic studies can be classified into two major categories: descriptive and explanatory (analytic). This classification reflects both the study’s objectives and its ability to support causal inference.

Descriptive studies include case reports, case-series reports, and surveys. They are designed solely to describe the nature and distribution of outcome events such as health-related phenomena. They describe the who, what, when, and where of disease occurrence.

Although a descriptive survey is not designed to assess hypotheses about manipulatable causes of the outcome event, the frequency of the outcome is usually described in categories of age, race, sex, season, and space.

Explanatory studies (also called analytic studies) are designed to make comparisons and contrasts between subgroups of study subjects based on exposure or outcome status. They allow the investigator to identify statistical associations between exposures and outcomes.

Explanatory studies can be further subdivided into experimental and observational studies, depending on whether the investigator controls the allocation of study subjects to exposure groups.

Experimental vs. Observational Studies

In experimental studies, the investigator controls (usually through randomisation) the allocation of study subjects to exposure groups. In contrast, in observational studies, the investigators try not to influence the natural course of events for the study subjects.

Key Distinction

In experimental studies, we try to reduce variation from all sources through selection and control of the experimental setting. In observational studies, we embrace the presence of natural variation in order to identify important interactions among key variables and the exposure–disease association.

The price paid through the use of observational studies is that considerable efforts are required to prevent confounding (bias) of the exposure–disease association. Experiments are the preferred choice when the treatment is straightforward and easily manipulated, such as a vaccine or a specific therapeutic agent. The major advantage of the experimental approach is the ability to control potential confounders through the process of randomisation.

A Unified Approach to Study Design

Hernan (2005) stressed that when considering an observational study design, we should think about the design of a field experiment to accomplish the same objective. This approach, reinforced by Rubin (2007), emphasises that ‘design trumps analysis’ and that all elements of the study design should be completed before seeing any outcome data.

1. The ‘Thought Experiment’

As a first step in considering an epidemiological study, a ‘thought experiment’ can be accomplished and should specify the key elements of study group, its selection, assignment to exposure, procedures for follow-up, and detecting the outcome. The important part is that formal randomisation would ensure ‘exchangeability’—the groups being compared are so similar that it does not matter which group was assigned to exposure.

2. Design Features Before Seeing Data

All design features are completed before anyone has seen the outcome data. This includes subject exclusion, selection criteria, and control of confounding. Rubin formalises the process through propensity scores (the probability of exposure given the covariates) in the exposed and non-exposed groups. Unless these are virtually equal, some degree of confounding is possible.

3. Forward Projection (Critical Appraisal)

After completing the initial design, we project forward to the presentation of study results under 3 different scenarios: (1) the exposure appears to increase risk; (2) the exposure appears to decrease risk; or (3) the exposure does not appear to be associated. For each scenario, we must defend the proposed design. This process helps identify potential weaknesses.

Hierarchy of Evidence for Causal Inference

From the perspective of drawing causal inferences, experimental studies are generally referred to as the gold standard. The hierarchy of causal evidence (from strongest to weakest) is typically:

Study TypeDifficultyInvestigator ControlCausal EvidenceRelevance
Laboratory trialModerateVery highVery highLow
Controlled field trial (RCT)ModerateHighVery highHigh
Cohort studyDifficultHighHighHigh
Case-control studyModerateModerateModerateHigh
Cross-sectional studyModerateLowLowModerate
SurveyModerateModerateNot applicableHigh
Case seriesEasyVery lowNot applicableLow to high
Case reportVery easyVery lowNot applicableLow to high

Reflection

Think about a health outcome you are interested in studying. Would an experimental or observational approach be more appropriate, and why? Consider ethical, practical, and scientific factors in your answer.

Reflection saved!

Section 1 Knowledge Check

1. Which of the following BEST distinguishes explanatory from descriptive studies?

Explanatory (analytic) studies are specifically designed to compare subgroups of subjects based on exposure or outcome status, whereas descriptive studies only characterise the distribution of disease.

2. What is the major advantage of experimental studies over observational studies for causal inference?

The major advantage of the experimental approach is the ability to control potential confounders, both measured and unmeasured, through the process of randomisation.

3. The ‘unified approach’ to observational study design includes the thought experiment, completing design features before seeing data, and:

The third component of the unified approach is forward projection, where the researcher projects forward to three scenarios (increased risk, decreased risk, no association) and defends the proposed design under each.
Complete the quiz and reflection to continue.
Section 2

Descriptive Studies: Case Reports, Case Series & Surveys

Section 7.3 of Dohoo, Martin & Stryhn

Descriptive studies are used to describe the main features of a disease or health-related outcome. Although they are not designed to evaluate associations between exposures and outcomes, the observations made in a descriptive study can form the basis of hypotheses which are then further investigated in analytic studies. Three forms of descriptive studies are case reports, case-series reports, and surveys.

📄
Case Reports
Click to explore
📋
Case-Series Reports
Click to explore
📊
Surveys
Click to explore

Key Characteristics of Study Types

Descriptive studies differ from analytic observational studies in important ways. The following comparison highlights these differences:

Important Limitation

A common feature of both case reports and case-series reports is the absence of a comparison group. Without a comparison group, it is impossible to draw valid conclusions about causal associations. This is why descriptive studies are considered hypothesis-generating rather than hypothesis-testing.

From Survey to Analytic Study

Kalsbeek and Heiss (2000), and Speybroeck et al (2003) have described the appropriate analysis of surveys bearing in mind the study design. If the survey is designed to collect information about both an outcome of interest and potential exposures (risk factors) beyond the categories of people, place, and time, it then becomes a cross-sectional analytic study and as such, can be used to evaluate associations between exposures and outcomes.

Scenario: The Ontario Hypertension Survey

Leenen et al (2008) conducted a survey of the prevalence of hypertension in Ontario. The sampling frame consisted of municipalities and dissemination areas. From 6,436 eligible dwellings, contact was made with 4,559 potential participants. Hypertension prevalence was found to be 21.3% of the population overall. This survey combined both prevalence estimation and risk factor analysis, making it a cross-sectional analytic study.

Reflection

Can you think of a disease or health condition for which a case-series report might be the most appropriate initial study design? What hypothesis might it generate for future analytic studies?

Reflection saved!

Section 2 Knowledge Check

1. What is the primary limitation shared by both case reports and case-series reports?

Both case reports and case-series reports include only cases; they lack an explicit comparison group, making it impossible to draw valid conclusions about causal associations.

2. A survey becomes a cross-sectional analytic study when it:

When a survey is designed to collect information about both an outcome of interest and potential exposures (risk factors) beyond the basic categories of people, place, and time, it becomes a cross-sectional analytic study.

3. A case-series report documenting 50 patients with a rare autoimmune condition would be classified as:

Case-series reports are descriptive studies. They describe the characteristics of a group of cases but do not make formal comparisons with a control or unexposed group.
Complete the quiz and reflection to continue.
Section 3

Cross-Sectional Studies: Design & Implementation

Sections 7.4–7.5 of Dohoo, Martin & Stryhn

Observational Studies Overview

Observational studies (a subgroup of analytic or explanatory studies) have an explicit formal contrast as part of their design: the prevalence of the outcome by exposure category groups is the central foundation. They differ from descriptive studies in that the comparison of two or more groups is central, and from experiments in that the researcher has no control over the allocation of study subjects to the exposure groups.

Prospective vs. Retrospective Designs

Observational studies can also be classified as prospective or retrospective. In prospective studies, the disease or outcome has not occurred at the time the study starts. In retrospective studies, both the exposure and the outcome have occurred when the study begins—hence cross-sectional studies are inherently retrospective in nature.

Sampling Drives the Design

Three Main Approaches

The choices of observational analytic study design have traditionally been among 3 approaches based on how study subjects are selected:

  • Cross-sectional study: A sample is obtained from the source population, and the prevalence of both disease and exposure is determined at the time of subject selection.
  • Cohort study: A sample of study subjects from a source population with heterogeneous exposure levels is obtained, and the incidence of the outcome in the follow-up period is determined.
  • Case-control study: Subjects with the outcome (cases) are identified and their exposure history is contrasted with the exposure history of a sample of non-case subjects (controls).

Cross-Sectional Study Design

The defining feature of a cross-sectional study is that it is an observational study whose outcome frequency measure is prevalence. The basis of the cross-sectional design is that a sample, or census, of subjects is obtained from the source population and the presence or absence of the outcome is ascertained at that point.

Obtaining the Study Group

If the researcher wants to make inferences about the frequency of the outcome in a target population, then study subjects should be obtained by a formal random sampling procedure. The source population is the listing (real or implied) of potential study subjects from which the study group is obtained. The study group is that set of subjects who agree to take part in the study.

Assessing Exposure

Exposure and other covariate status, such as demographic data, are obtained at the time of study subject selection or first contact/examination. Because the outcome measure is prevalence, it is sometimes difficult to know the appropriate time frame in which the exposure, if time-varying, might cause the outcome. Studying currently (prevalent) exposed subjects can also lead to bias when interpreting the impact of these exposures.

Assessing the Outcome of Interest

It is important to clearly define the outcome/disease of interest. In general, great care should be used if the outcome is a surrogate for a clinically important event. It is also important that widely accepted diagnostic criteria be used to identify the disease or outcome of interest.

Ensuring Comparability

The two main approaches used to prevent bias from factors associated with the outcome and whose distribution differs between exposure groups (confounders) are exclusion (restricted sampling) and analytic (statistical) control. Matching to prevent confounding cannot be applied in cross-sectional studies. Analytic control requires the use of a multivariable model.

Scenario: Postpartum Depression in Canadian Women

Lanes et al (2011) conducted a cross-sectional study of postpartum depression (PPDS) among Canadian women. The survey used the Edinburgh Postnatal Depression Scale (EPDS) as the outcome measure. Potential risk factors included socioeconomic status, demographic factors, and maternal characteristics. Of 8,542 selected women, 6,421 responded. The national prevalence of minor/major and major PPDS was found to be 8.46% and 8.69% respectively. The mother’s stress level during pregnancy and prior depression had the strongest associations.

Reflection

In the postpartum depression study described above, the exposure ‘stress during pregnancy’ was measured retrospectively at the same time as the outcome. What challenges does this create for causal inference? How might you address these challenges?

Reflection saved!

Section 3 Knowledge Check

1. The defining feature of a cross-sectional study is that its outcome frequency measure is:

The defining feature of a cross-sectional study is that its outcome frequency measure is prevalence, based on the number of existing cases at the time of the study.

2. Cross-sectional studies are inherently:

In cross-sectional studies, both the exposure and the outcome have already occurred when the study begins. The exposure and outcome are assessed at the same point in time, making them inherently retrospective.

3. Which approach to controlling confounding CANNOT be applied in cross-sectional studies?

Matching to prevent confounding cannot be applied in cross-sectional studies because subjects are sampled from the population without regard to their exposure or outcome status, unlike case-control studies where matching is feasible.
Complete the quiz and reflection to continue.
Section 4

Limitations, Incidence Estimation & Reporting

Sections 7.6–7.9 of Dohoo, Martin & Stryhn

Inferential Limitations of Cross-Sectional Studies

By its nature, a cross-sectional study design measures prevalence, which is a function of both incidence and duration of the disease. Consequently, it is often difficult to disentangle factors associated with persistence of the outcome from factors associated with developing the outcome in the first instance (i.e., becoming a new incident case).

The Reverse Causation Problem

When the exposure factors are time-varying, it is often very difficult to differentiate cause and effect. For example, if one is studying the relationship between dog ownership and blood pressure, and the association is negative, one cannot differentiate between people that obtained a dog because they had low blood pressure from those whose lifestyle changed, consequently lowering their blood pressure after obtaining a dog. The more changeable the exposure, the worse this issue becomes.

Cross-sectional studies are best suited for time-invariant exposures such as race or sex. In these instances, the investigator can be certain that the exposure preceded, or at least was not caused by, the outcome.

Estimating Incidence from Cross-Sectional Studies

Although cross-sectional studies directly measure prevalence, there are approaches for estimating incidence from prevalence data. This is often desirable because incidence data are more useful for causal inference.

A simple way to obtain population-level incidence data is to perform two cross-sectional studies, one before and one after an event of interest. For example, Miller et al (2010) performed two cross-sectional studies before and after the 2009 H1N1 epidemic in England, giving a population-based estimate of incidence.

Other approaches include using two different tests—one that detects early immune response and one that detects long-lasting immunity. People who test negatively to the less sensitive test are followed forward for a defined time period to ascertain how many become positive. This approach has been refined for HIV studies.

Rajan and Sokal (2011) describe how to estimate age-specific incidence from prevalence data. Their general approach uses two prevalence estimates at different time points. The incidence rate at year ‘a’ is:

Incidence from Prevalence (Eq 7.1)
Ia = 1 − [1 − (Pa+n − Pa) / (1 − Pa)]1/n

where ‘n’ is the time between the two prevalence estimates (Pa and Pa+n) in the cross-sectional survey.

Repeated Cross-Sectional vs. Cohort Studies

Sometimes it is desirable to follow a population over time. Two options exist: repeated cross-sectional samplings of the population, or a longitudinal study of the initial study subjects (a cohort approach). Each has distinct advantages:

🔄
Repeated Cross-Sectional
Click to compare
Cohort Studies
Click to compare

Reporting Observational Studies: The STROBE Statement

In 2004, a network of methodologists, researchers, and journal editors established what we now know as the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement. It provides a checklist of 22 items considered essential for good reporting of observational studies.

STROBE Checklist Key Sections

The STROBE checklist covers: Title & Abstract (indicate study design), Introduction (background, objectives, hypotheses), Methods (study design, setting, participants, variables, data sources, bias, sample size, statistical methods), Results (participants, descriptive data, outcome data, main results), and Discussion (key results, limitations, interpretation, generalisability).

Reflection

Consider a cross-sectional study that finds an association between pet ownership and lower blood pressure. Explain why this finding cannot be interpreted as causal evidence that pet ownership lowers blood pressure. What study design would be more appropriate?

Reflection saved!

Section 4 Knowledge Check

1. The primary reason cross-sectional studies have limited ability to support causal inference is:

Cross-sectional studies measure prevalence, which is a function of both incidence and duration. This makes it difficult to distinguish factors that cause disease from factors that affect disease duration or survival.

2. Cross-sectional studies are best suited for studying exposures that are:

Cross-sectional studies are best suited for time-invariant exposures such as race or sex, where the investigator can be certain that the exposure preceded the outcome and was not affected by it.

3. The STROBE statement provides:

The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement provides a checklist of 22 items considered essential for good reporting of observational studies.
Complete the quiz and reflection to continue.
Final Assessment

Lesson 8 — Final Review & Assessment

Final Reflection

Reflect on the full range of study designs discussed in this lesson. If you were asked to investigate the relationship between a novel environmental exposure and a chronic health outcome, what type of study would you begin with and why? How might your study design evolve as evidence accumulates?

Reflection saved!

Lesson 8 Comprehensive Assessment

This assessment covers all sections of Lesson 8. You must answer all 15 questions correctly to complete the lesson. Review the feedback after each attempt.

Final Assessment — Lesson 8: Introduction to Observational Studies (15 Questions)

1. Which study type is designed to make comparisons between subgroups based on exposure or outcome status?

Explanatory (analytic) studies are specifically designed to make comparisons and contrasts between subgroups of study subjects based on exposure or outcome status.

2. The key difference between experimental and observational studies is:

In experimental studies, the investigator controls (usually through randomisation) the allocation of study subjects to study groups. In observational studies, the investigator does not control the allocation.

3. The ‘thought experiment’ in the unified approach to study design involves:

The thought experiment involves imagining a field experiment that would accomplish the same objective as the proposed observational study, specifying study group selection, assignment to exposure, follow-up, and outcome detection.

4. Case reports are considered useful primarily because they:

Case reports describe unusual observations that can help researchers generate useful hypotheses to be investigated in future studies. They do not include comparison groups and cannot support causal claims.

5. A case-series report should document which of the following?

A case-series report documents the who (affected subjects), when (temporal aspects), and where (geographic aspects) of the condition in a group of subjects.

6. In the study classification hierarchy, which observational study design provides the strongest evidence for causal inference?

Among observational study designs, cohort studies are generally considered to provide the strongest evidence for causal inference because they can establish temporal sequence and measure incidence directly.

7. In a cross-sectional study, subjects are selected from the source population based on:

In a cross-sectional study, a sample or census of subjects is obtained from the source population, and the presence or absence of both exposure and outcome are determined at the same time.

8. The natural measure of association in a cross-sectional study of a binary outcome is:

The main comparison in a cross-sectional study is between the prevalence in the exposed and non-exposed subjects, and the natural measure of association is the prevalence risk ratio.

9. A major limitation of cross-sectional studies when studying time-varying exposures is:

When exposure factors are time-varying, it is very difficult to differentiate cause and effect because exposure and outcome are measured simultaneously. This reverse causation problem is a major limitation.

10. Prevalence is a function of:

Prevalence is a function of both the incidence (rate of new cases) and the duration of disease, which is why cross-sectional studies have limited ability to support causal inference about disease occurrence.

11. Repeated cross-sectional studies are preferred over cohort studies when:

Repeated cross-sectional studies are preferred when the research objective relates to events and associations within the full population at different periods of time, rather than tracking specific individuals.

12. Which of the following is NOT a component of the STROBE checklist?

STROBE is specifically for reporting observational studies, which do not involve randomisation. Specifying randomisation procedures would be part of the CONSORT statement for reporting randomised trials.

13. Propensity scores in observational study design are used to:

Rubin formalises the assessment of comparability through propensity scores, which estimate the probability of exposure given the covariates. If propensity scores differ substantially between groups, confounding is likely present.

14. A purposive non-random sample in a cross-sectional study primarily threatens:

Using a purposive non-random sample limits the external validity (ability to extrapolate results beyond the source population). While internal validity may be maintained if the study group is representative of the source population, generalisability is compromised.

15. A study that samples subjects from the general population, measures their current blood pressure and dietary habits simultaneously, and compares blood pressure between high-salt and low-salt diet groups is:

This study measures both exposure (dietary habits) and outcome (blood pressure) simultaneously in a sample from the population, making it a cross-sectional analytic study. It measures prevalence and makes comparisons between exposure groups.

Congratulations!

You have completed Lesson 8: Introduction to Observational Studies. You now understand the classification of epidemiological study designs, the unified approach to observational study design, the characteristics of descriptive studies, the design and implementation of cross-sectional studies, their inferential limitations, and the STROBE reporting guidelines.