Introduction to Observational Studies

Evaluating Epidemiological Research

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Differentiate between descriptive and explanatory studies
Differentiate between experimental and observational studies
Describe the three main elements of the unified approach to observational study design
Describe the advantages and limitations of case reports, case-series reports, and surveys
Design a cross-sectional study accounting for its strengths and weaknesses
Identify circumstances where a cross-sectional study is appropriate
List three approaches for obtaining incidence estimates from cross-sectional prevalence data
Differentiate between repeated cross-sectional studies and following a cohort in a longitudinal study
Apply the STROBE checklist to reporting a cross-sectional study

Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Study Classification & Design Framework

Sections 7.1–7.2 of Dohoo, Martin & Stryhn

Descriptive vs. Explanatory Studies

Epidemiologic studies can be classified into two major categories: descriptive and explanatory (analytic). This classification reflects both the study’s objectives and its ability to support causal inference.

Descriptive studies include case reports, case-series reports, and surveys. They are designed solely to describe the nature and distribution of outcome events such as health-related phenomena. They describe the who, what, when, and where of disease occurrence.

Although a descriptive survey is not designed to assess hypotheses about manipulatable causes of the outcome event, the frequency of the outcome is usually described in categories of age, race, sex, season, and space.

Explanatory studies (also called analytic studies) are designed to make comparisons and contrasts between subgroups of study subjects based on exposure or outcome status. They allow the investigator to identify statistical associations between exposures and outcomes.

Explanatory studies can be further subdivided into experimental and observational studies, depending on whether the investigator controls the allocation of study subjects to exposure groups.

Experimental vs. Observational Studies

In experimental studies, the investigator controls (usually through randomisation) the allocation of study subjects to exposure groups. In contrast, in observational studies, the investigators try not to influence the natural course of events for the study subjects.

Key Distinction

In experimental studies, we try to reduce variation from all sources through selection and control of the experimental setting. In observational studies, we embrace the presence of natural variation in order to identify important interactions among key variables and the exposure–disease association.

The price paid through the use of observational studies is that considerable efforts are required to prevent confounding (bias) of the exposure–disease association. Experiments are the preferred choice when the treatment is straightforward and easily manipulated, such as a vaccine or a specific therapeutic agent. The major advantage of the experimental approach is the ability to control potential confounders through the process of randomisation.

A Unified Approach to Study Design

Hernan (2005) stressed that when considering an observational study design, we should think about the design of a field experiment to accomplish the same objective. This approach, reinforced by Rubin (2007), emphasises that ‘design trumps analysis’ and that all elements of the study design should be completed before seeing any outcome data.

1. The ‘Thought Experiment’

As a first step in considering an epidemiological study, a ‘thought experiment’ can be accomplished and should specify the key elements of study group, its selection, assignment to exposure, procedures for follow-up, and detecting the outcome. The important part is that formal randomisation would ensure ‘exchangeability’—the groups being compared are so similar that it does not matter which group was assigned to exposure.

2. Design Features Before Seeing Data

All design features are completed before anyone has seen the outcome data. This includes subject exclusion, selection criteria, and control of confounding. Rubin formalises the process through propensity scores (the probability of exposure given the covariates) in the exposed and non-exposed groups. Unless these are virtually equal, some degree of confounding is possible.

3. Forward Projection (Critical Appraisal)

After completing the initial design, we project forward to the presentation of study results under 3 different scenarios: (1) the exposure appears to increase risk; (2) the exposure appears to decrease risk; or (3) the exposure does not appear to be associated. For each scenario, we must defend the proposed design. This process helps identify potential weaknesses.

Hierarchy of Evidence for Causal Inference

From the perspective of drawing causal inferences, experimental studies are generally referred to as the gold standard. The hierarchy of causal evidence (from strongest to weakest) is typically:

Study Type	Difficulty	Investigator Control	Causal Evidence	Relevance
Laboratory trial	Moderate	Very high	Very high	Low
Controlled field trial (RCT)	Moderate	High	Very high	High
Cohort study	Difficult	High	High	High
Case-control study	Moderate	Moderate	Moderate	High
Cross-sectional study	Moderate	Low	Low	Moderate
Survey	Moderate	Moderate	Not applicable	High
Case series	Easy	Very low	Not applicable	Low to high
Case report	Very easy	Very low	Not applicable	Low to high

Reflection

Think about a health outcome you are interested in studying. Would an experimental or observational approach be more appropriate, and why? Consider ethical, practical, and scientific factors in your answer.

Reflection saved!

Section 1 Knowledge Check

1. Which of the following BEST distinguishes explanatory from descriptive studies?

Explanatory studies use larger sample sizes Explanatory studies make comparisons between subgroups based on exposure or outcome status Explanatory studies are always experimental

Explanatory (analytic) studies are specifically designed to compare subgroups of subjects based on exposure or outcome status, whereas descriptive studies only characterise the distribution of disease.

2. What is the major advantage of experimental studies over observational studies for causal inference?

They are less expensive to conduct Randomisation controls for both measured and unmeasured confounders They always have larger sample sizes

The major advantage of the experimental approach is the ability to control potential confounders, both measured and unmeasured, through the process of randomisation.

3. The ‘unified approach’ to observational study design includes the thought experiment, completing design features before seeing data, and:

Conducting a meta-analysis Obtaining ethics approval Forward projection under three different result scenarios

The third component of the unified approach is forward projection, where the researcher projects forward to three scenarios (increased risk, decreased risk, no association) and defends the proposed design under each.

● Complete the quiz and reflection to continue.

Section 2

Descriptive Studies: Case Reports, Case Series & Surveys

Section 7.3 of Dohoo, Martin & Stryhn

Descriptive studies are used to describe the main features of a disease or health-related outcome. Although they are not designed to evaluate associations between exposures and outcomes, the observations made in a descriptive study can form the basis of hypotheses which are then further investigated in analytic studies. Three forms of descriptive studies are case reports, case-series reports, and surveys.

📄

Case Reports

Click to explore

📋

Case-Series Reports

Click to explore

📊

Surveys

Click to explore

Key Characteristics of Study Types

Descriptive studies differ from analytic observational studies in important ways. The following comparison highlights these differences:

Important Limitation

A common feature of both case reports and case-series reports is the absence of a comparison group. Without a comparison group, it is impossible to draw valid conclusions about causal associations. This is why descriptive studies are considered hypothesis-generating rather than hypothesis-testing.

From Survey to Analytic Study

Kalsbeek and Heiss (2000), and Speybroeck et al (2003) have described the appropriate analysis of surveys bearing in mind the study design. If the survey is designed to collect information about both an outcome of interest and potential exposures (risk factors) beyond the categories of people, place, and time, it then becomes a cross-sectional analytic study and as such, can be used to evaluate associations between exposures and outcomes.

Scenario: The Ontario Hypertension Survey

Leenen et al (2008) conducted a survey of the prevalence of hypertension in Ontario. The sampling frame consisted of municipalities and dissemination areas. From 6,436 eligible dwellings, contact was made with 4,559 potential participants. Hypertension prevalence was found to be 21.3% of the population overall. This survey combined both prevalence estimation and risk factor analysis, making it a cross-sectional analytic study.

Reflection

Can you think of a disease or health condition for which a case-series report might be the most appropriate initial study design? What hypothesis might it generate for future analytic studies?

Reflection saved!

Section 2 Knowledge Check

1. What is the primary limitation shared by both case reports and case-series reports?

They cannot describe disease occurrence They require very large sample sizes They lack a comparison group for evaluating causal associations

Both case reports and case-series reports include only cases; they lack an explicit comparison group, making it impossible to draw valid conclusions about causal associations.

2. A survey becomes a cross-sectional analytic study when it:

Includes more than 1,000 participants Collects data on both an outcome and potential exposures beyond person, place, and time Uses random sampling exclusively

When a survey is designed to collect information about both an outcome of interest and potential exposures (risk factors) beyond the basic categories of people, place, and time, it becomes a cross-sectional analytic study.

3. A case-series report documenting 50 patients with a rare autoimmune condition would be classified as:

A descriptive study An explanatory observational study An experimental study

Case-series reports are descriptive studies. They describe the characteristics of a group of cases but do not make formal comparisons with a control or unexposed group.

● Complete the quiz and reflection to continue.

Section 3

Cross-Sectional Studies: Design & Implementation

Sections 7.4–7.5 of Dohoo, Martin & Stryhn

Observational Studies Overview

Observational studies (a subgroup of analytic or explanatory studies) have an explicit formal contrast as part of their design: the prevalence of the outcome by exposure category groups is the central foundation. They differ from descriptive studies in that the comparison of two or more groups is central, and from experiments in that the researcher has no control over the allocation of study subjects to the exposure groups.

Prospective vs. Retrospective Designs

Observational studies can also be classified as prospective or retrospective. In prospective studies, the disease or outcome has not occurred at the time the study starts. In retrospective studies, both the exposure and the outcome have occurred when the study begins—hence cross-sectional studies are inherently retrospective in nature.

Sampling Drives the Design

Three Main Approaches

The choices of observational analytic study design have traditionally been among 3 approaches based on how study subjects are selected:

Cross-sectional study: A sample is obtained from the source population, and the prevalence of both disease and exposure is determined at the time of subject selection.
Cohort study: A sample of study subjects from a source population with heterogeneous exposure levels is obtained, and the incidence of the outcome in the follow-up period is determined.
Case-control study: Subjects with the outcome (cases) are identified and their exposure history is contrasted with the exposure history of a sample of non-case subjects (controls).

Cross-Sectional Study Design

The defining feature of a cross-sectional study is that it is an observational study whose outcome frequency measure is prevalence. The basis of the cross-sectional design is that a sample, or census, of subjects is obtained from the source population and the presence or absence of the outcome is ascertained at that point.

Obtaining the Study Group

If the researcher wants to make inferences about the frequency of the outcome in a target population, then study subjects should be obtained by a formal random sampling procedure. The source population is the listing (real or implied) of potential study subjects from which the study group is obtained. The study group is that set of subjects who agree to take part in the study.

Assessing Exposure

Exposure and other covariate status, such as demographic data, are obtained at the time of study subject selection or first contact/examination. Because the outcome measure is prevalence, it is sometimes difficult to know the appropriate time frame in which the exposure, if time-varying, might cause the outcome. Studying currently (prevalent) exposed subjects can also lead to bias when interpreting the impact of these exposures.

Assessing the Outcome of Interest

It is important to clearly define the outcome/disease of interest. In general, great care should be used if the outcome is a surrogate for a clinically important event. It is also important that widely accepted diagnostic criteria be used to identify the disease or outcome of interest.

Ensuring Comparability

The two main approaches used to prevent bias from factors associated with the outcome and whose distribution differs between exposure groups (confounders) are exclusion (restricted sampling) and analytic (statistical) control. Matching to prevent confounding cannot be applied in cross-sectional studies. Analytic control requires the use of a multivariable model.

Scenario: Postpartum Depression in Canadian Women

Lanes et al (2011) conducted a cross-sectional study of postpartum depression (PPDS) among Canadian women. The survey used the Edinburgh Postnatal Depression Scale (EPDS) as the outcome measure. Potential risk factors included socioeconomic status, demographic factors, and maternal characteristics. Of 8,542 selected women, 6,421 responded. The national prevalence of minor/major and major PPDS was found to be 8.46% and 8.69% respectively. The mother’s stress level during pregnancy and prior depression had the strongest associations.

Reflection

In the postpartum depression study described above, the exposure ‘stress during pregnancy’ was measured retrospectively at the same time as the outcome. What challenges does this create for causal inference? How might you address these challenges?

Reflection saved!

Section 3 Knowledge Check

1. The defining feature of a cross-sectional study is that its outcome frequency measure is:

Prevalence Incidence rate Cumulative incidence

The defining feature of a cross-sectional study is that its outcome frequency measure is prevalence, based on the number of existing cases at the time of the study.

2. Cross-sectional studies are inherently:

Prospective Retrospective Experimental

In cross-sectional studies, both the exposure and the outcome have already occurred when the study begins. The exposure and outcome are assessed at the same point in time, making them inherently retrospective.

3. Which approach to controlling confounding CANNOT be applied in cross-sectional studies?

Restriction (exclusion criteria) Statistical control via multivariable models Matching

Matching to prevent confounding cannot be applied in cross-sectional studies because subjects are sampled from the population without regard to their exposure or outcome status, unlike case-control studies where matching is feasible.

● Complete the quiz and reflection to continue.

Section 4

Limitations, Incidence Estimation & Reporting

Sections 7.6–7.9 of Dohoo, Martin & Stryhn

Inferential Limitations of Cross-Sectional Studies

By its nature, a cross-sectional study design measures prevalence, which is a function of both incidence and duration of the disease. Consequently, it is often difficult to disentangle factors associated with persistence of the outcome from factors associated with developing the outcome in the first instance (i.e., becoming a new incident case).

The Reverse Causation Problem

When the exposure factors are time-varying, it is often very difficult to differentiate cause and effect. For example, if one is studying the relationship between dog ownership and blood pressure, and the association is negative, one cannot differentiate between people that obtained a dog because they had low blood pressure from those whose lifestyle changed, consequently lowering their blood pressure after obtaining a dog. The more changeable the exposure, the worse this issue becomes.

Cross-sectional studies are best suited for time-invariant exposures such as race or sex. In these instances, the investigator can be certain that the exposure preceded, or at least was not caused by, the outcome.

Estimating Incidence from Cross-Sectional Studies

Although cross-sectional studies directly measure prevalence, there are approaches for estimating incidence from prevalence data. This is often desirable because incidence data are more useful for causal inference.

A simple way to obtain population-level incidence data is to perform two cross-sectional studies, one before and one after an event of interest. For example, Miller et al (2010) performed two cross-sectional studies before and after the 2009 H1N1 epidemic in England, giving a population-based estimate of incidence.

Other approaches include using two different tests—one that detects early immune response and one that detects long-lasting immunity. People who test negatively to the less sensitive test are followed forward for a defined time period to ascertain how many become positive. This approach has been refined for HIV studies.

Rajan and Sokal (2011) describe how to estimate age-specific incidence from prevalence data. Their general approach uses two prevalence estimates at different time points. The incidence rate at year ‘a’ is:

Incidence from Prevalence (Eq 7.1)

I_a = 1 − [1 − (P_a+n − P_a) / (1 − P_a)]^1/n

where ‘n’ is the time between the two prevalence estimates (P_a and P_a+n) in the cross-sectional survey.

Repeated Cross-Sectional vs. Cohort Studies

Sometimes it is desirable to follow a population over time. Two options exist: repeated cross-sectional samplings of the population, or a longitudinal study of the initial study subjects (a cohort approach). Each has distinct advantages:

🔄

Repeated Cross-Sectional

Click to compare

⏳

Cohort Studies

Click to compare

Reporting Observational Studies: The STROBE Statement

In 2004, a network of methodologists, researchers, and journal editors established what we now know as the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement. It provides a checklist of 22 items considered essential for good reporting of observational studies.

STROBE Checklist Key Sections

The STROBE checklist covers: Title & Abstract (indicate study design), Introduction (background, objectives, hypotheses), Methods (study design, setting, participants, variables, data sources, bias, sample size, statistical methods), Results (participants, descriptive data, outcome data, main results), and Discussion (key results, limitations, interpretation, generalisability).

Reflection

Consider a cross-sectional study that finds an association between pet ownership and lower blood pressure. Explain why this finding cannot be interpreted as causal evidence that pet ownership lowers blood pressure. What study design would be more appropriate?

Reflection saved!

Section 4 Knowledge Check

1. The primary reason cross-sectional studies have limited ability to support causal inference is:

They cannot include large sample sizes Prevalence reflects both incidence and duration, making it difficult to distinguish cause from consequence They always have selection bias

Cross-sectional studies measure prevalence, which is a function of both incidence and duration. This makes it difficult to distinguish factors that cause disease from factors that affect disease duration or survival.

2. Cross-sectional studies are best suited for studying exposures that are:

Time-invariant (e.g., sex, race, genetic factors) Easily modified by treatment Measured only in laboratory settings

Cross-sectional studies are best suited for time-invariant exposures such as race or sex, where the investigator can be certain that the exposure preceded the outcome and was not affected by it.

3. The STROBE statement provides:

A method for calculating sample size in cross-sectional studies A ranking of the quality of different study designs A checklist of 22 items for reporting observational studies

The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement provides a checklist of 22 items considered essential for good reporting of observational studies.

● Complete the quiz and reflection to continue.

HSCI 230 — Lesson 8

Evaluating Epidemiological Research

Introduction to Observational Studies

Learning objectives for this lesson:

Study Classification & Design Framework

Descriptive vs. Explanatory Studies

Experimental vs. Observational Studies

A Unified Approach to Study Design

Hierarchy of Evidence for Causal Inference

Reflection

Section 1 Knowledge Check

Descriptive Studies: Case Reports, Case Series & Surveys

Key Characteristics of Study Types

From Survey to Analytic Study

Reflection

Section 2 Knowledge Check

Cross-Sectional Studies: Design & Implementation

Observational Studies Overview

Prospective vs. Retrospective Designs

Sampling Drives the Design

Cross-Sectional Study Design

Reflection

Section 3 Knowledge Check

Limitations, Incidence Estimation & Reporting

Inferential Limitations of Cross-Sectional Studies

Estimating Incidence from Cross-Sectional Studies

Repeated Cross-Sectional vs. Cohort Studies

Reporting Observational Studies: The STROBE Statement

Reflection

Section 4 Knowledge Check

Lesson 8 — Final Review & Assessment

Final Reflection

Lesson 8 Comprehensive Assessment

Congratulations!