Systematic Reviews and Meta-Analysis

Evaluating Epidemiological Research

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Carry out the steps of a systematic review, from specifying the question to synthesising results
Complete the data-extraction process to provide data suitable for meta-analysis
Calculate summary estimates of effect and evaluate heterogeneity among study results
Choose between fixed- and random-effects models and explain when each is appropriate
Present and interpret forest plots and other graphical displays of meta-analysis results
Evaluate potential causes of heterogeneity using subgroup analysis, stratification, and meta-regression
Evaluate the potential impact of publication bias using funnel plots and related methods
Determine if results have been influenced by an individual study (sensitivity analysis)

Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1 of 5

Introduction & Systematic Reviews

⏱ Estimated reading time: 45 minutes

16.1 Why Systematic Reviews?

When making decisions about health interventions, we want to use all available information. Unfortunately, the literature is often inconclusive and conflicting—individual studies may produce results ranging from statistically significant to inconsequential, and the variation among results may be greater than expected from chance alone.

There are two fundamental approaches to formally reviewing available data: a narrative review and a systematic review (which may include a meta-analysis).

Narrative Reviews

In a narrative review, each study is considered individually, and the reviewer subjectively assesses the evidence. Narrative reviews have several limitations:

They tend to be carried out by subject experts who may bring preconceived opinions, resulting in biased review
They often lack a structured methodology for identifying and assessing relevant studies
Small but well-designed studies may be omitted if they lack statistical power
Inclusion criteria are often not described in adequate detail
There is a tendency to weight all studies equally, when they should not all receive equal weight

Narrative reviews should only be used to provide an overview of literature, not to guide treatment or policy decisions.

Systematic Reviews

A systematic review uses a structured, transparent methodology to identify, evaluate, and synthesise all relevant studies on a specific question. It minimises bias and provides reproducible results. A systematic review may or may not include a quantitative meta-analysis, depending on the nature and quality of available data.

16.2 Steps of a Systematic Review

A systematic review follows seven key steps (Sargeant et al, 2006):

1. Specify the Question

The question should be driven by a clinical or health-policy objective, not by data availability. It is often more desirable to address a broad question (e.g., the ability of β-blockers as a class to reduce myocardial infarction risk) rather than a narrow one (e.g., one specific drug), to enhance generalisability. The question should specify the intervention(s), outcomes, comparisons, and eligible study designs.

2. Lay Out the Protocol

The review protocol should be objective and transparent—a reader should be able to duplicate it. This corresponds to the “Materials and Methods” of a primary study and covers all subsequent steps. A clear protocol minimises subjective decisions during the review process.

3. Find All the Studies

The literature search must be complete and well-documented. This involves searching major electronic databases (e.g., PubMed/Medline), reviewing reference lists of identified papers, and searching for grey literature (conference proceedings, theses, unpublished studies). The search strategy, databases, date ranges, and keywords must all be documented.

4. Determine Relevance (Inclusion/Exclusion Criteria)

Inclusion criteria specify the intervention(s), population(s), outcome(s), and study types eligible for the review. Exclusion criteria may include language restrictions, publication date cutoffs, or accessibility. Relevance should be assessed independently by two or more reviewers using the title and abstract, followed by full-text review.

5. Evaluate Study Quality

Each study’s internal and external validity must be evaluated. The Cochrane Collaboration’s risk-of-bias tool assesses six domains: sequence generation, allocation concealment, blinding, incomplete data, selective reporting, and other sources of bias. Quality assessment results can be used to exclude studies, weight them differentially, or evaluate quality as a source of heterogeneity.

6. Extract the Relevant Data

From each study, you need the point estimate of the outcome and a measure of its precision (SE or CI). Data extraction should be carried out independently by two investigators using a standardised template, with any differences resolved by discussion. Watch for duplicate reporting of the same data in multiple publications.

7. Summarise and Synthesise the Results

Results can be summarised qualitatively (narrative description with tabular/graphical display) or quantitatively (meta-analysis). A quantitative meta-analysis computes a pooled summary estimate of the effect, weighted by the precision of each study, and investigates reasons for variation across studies.

Reflection

Think of a public health question relevant to your interests. How would you specify the question for a systematic review? What databases would you search, and what inclusion/exclusion criteria would you set?

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 1

1. Which of the following is a limitation of narrative reviews?

They use a structured, reproducible methodology They always include a meta-analysis They may bring preconceived opinions and selectively include studies

Narrative reviews tend to be subjective, with reviewers potentially bringing biased perspectives and selectively including studies that support their opinions. They lack the structured methodology of systematic reviews.

2. What is the first step in conducting a systematic review?

Searching all electronic databases Specifying the question to be answered Evaluating study quality

The first step is to specify a clear research question driven by a clinical or health-policy objective. This question guides all subsequent steps of the review, including the search strategy and inclusion criteria.

3. Why should data extraction in a systematic review be carried out by two independent investigators?

To minimise errors and subjective bias in recording study results To double the amount of data available for analysis To ensure that narrative and systematic reviews produce the same results

Duplicate independent data extraction minimises errors and subjective bias. The two datasets are then compared, and any differences are resolved by discussion, ensuring the accuracy and reliability of extracted data.

Section 2 of 5

Meta-Analysis: Data Types & Effect Models

⏱ Estimated reading time: 50 minutes

16.3 What Is a Meta-Analysis?

A meta-analysis is “the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings” (Glass, 1976). It is a formal process for combining results from multiple studies and is considered the “gold standard” for providing summary information about health interventions.

Objectives of Meta-Analysis

The objectives are to: (1) provide an overall estimate of an association or effect based on data from multiple studies, and (2) explore reasons for variation in the observed effect across studies. Because it combines data from multiple studies, meta-analysis gains statistical power for detecting effects.

16.3.1 Types of Data in Meta-Analysis

Three types of data can be used in a meta-analysis, each with different capabilities:

Data Type	Binary Outcome	Continuous Outcome
Summary estimate	Point estimate: RR, OR, RD, IR Precision: SE or CI	Point estimate: mean difference (MD) Precision: SE or CI
Group data	Cell values for treated and control groups (2×2 table)	Number, mean, and SD in each group
Individual patient data (IPD)	Raw data: outcome value (0 or 1) and individual characteristics	Raw data: outcome value and individual characteristics

Summary data are most commonly used. Group data allow computation of various effect measures. IPD are the most flexible but rarely available—they allow evaluation of study-, group-, and individual-level variables as sources of heterogeneity.

16.4 Fixed- vs. Random-Effects Models

A fundamental decision in any meta-analysis is whether to use a fixed-effects or random-effects model:

Fixed-Effects Model

Assumes the true treatment effect is constant across all studies. Any variation among observed study results is due solely to within-study random variation (sampling error).

Fixed-Effects Model (Eq 28.1)

T_i = θ + ε_i where ε_i ~ N(0, V_i)

Where T_i is the observed effect from study i, θ is the true overall effect, and V_i = [SE(T_i)]² is the known within-study variance. Weights are computed as W_i = 1/V_i (inverse variance weighting).

Advantage: Does not require estimating between-study variance (τ²).

Limitation: The assumption of a constant effect across all studies is often untenable, and ignoring between-study variation can lead to Type I errors and confidence intervals that are too narrow.

Random-Effects Model

Assumes a distribution of true treatment effects across studies (heterogeneity), with additional variability beyond within-study sampling error.

Random-Effects Model (Eq 28.4)

T_i = θ + u_i + ε_i where u_i ~ N(0, τ²) and ε_i ~ N(0, V_i)

Where u_i is the random effect for study i, and τ² is the between-study variance (heterogeneity). Weights become W_i = 1/(V_i + τ²).

Result: Produces a similar point estimate to fixed-effects but with a wider confidence interval (because it accounts for between-study variation). Random-effects models are now more commonly used.

Key Distinction

The fixed-effects model asks: “What is the single true effect?” The random-effects model asks: “What is the average of the distribution of true effects?” Random-effects models are generally preferred because the assumption of a constant treatment effect across all studies is rarely justified.

16.4.1 Weighting Methods

The most common weighting procedure is inverse variance weighting, applicable to both continuous and binary outcomes. For binary outcomes with sparse data, the Mantel-Haenszel procedure or the Peto method may be preferred. For continuous outcomes, when studies use different scales, standardised mean differences (effect sizes such as Cohen’s d or Hedges’ g) are used.

Reflection

Consider a meta-analysis of 10 studies examining the effect of a drug on blood pressure. Five studies were conducted in elderly populations and five in young adults. Would you expect a fixed-effects or random-effects model to be more appropriate? Why?

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 2

1. In a fixed-effects model, what is assumed about the true treatment effect?

It varies randomly across studies following a normal distribution It is constant across all studies, with variation due only to sampling error It is zero in all studies (null hypothesis)

The fixed-effects model assumes there is one true effect (θ) common to all studies. Any observed variation in study results is attributed to within-study random variation (sampling error) only.

2. What does τ² represent in a random-effects meta-analysis?

The within-study variance for each individual study The total variance across all observations The between-study variance (heterogeneity)

τ² represents the between-study variance—the variability in true treatment effects across studies. It quantifies heterogeneity beyond what would be expected from within-study sampling error alone.

3. Which type of data provides the most flexibility for exploring sources of heterogeneity in a meta-analysis?

Individual patient data (IPD) Summary estimate data Group data

IPD allow evaluation of study-, group-, and individual-level variables as sources of heterogeneity. Summary data can only evaluate study-level variables, while group data add some flexibility but not at the individual level.

Section 3 of 5

Forest Plots & Heterogeneity

⏱ Estimated reading time: 50 minutes

16.5 Presentation of Results: The Forest Plot

The forest plot is the most important graphical output of a meta-analysis. It displays the point estimate and confidence interval of the effect observed in each study, along with the summary estimate.

Anatomy of a Forest Plot

Each horizontal line represents one study’s results. The length of the line is the 95% CI. The centre box marks the point estimate, and the area of the box is proportional to the study’s weight. The dashed vertical line shows the overall summary estimate. The diamond at the bottom represents the pooled estimate and its CI. The solid vertical line marks the null value (e.g., 0 for mean difference, 1 for ratio measures).

Reading a Forest Plot

If all study CIs overlap considerably and cluster near the summary estimate, there is little heterogeneity. If CIs are widely scattered and many do not overlap, heterogeneity is substantial. Studies may be ordered by publication year (to detect time trends), quality score, or effect size.

16.6 Heterogeneity

Heterogeneity refers to variability among study results beyond what would be expected from random variation alone. It should always be evaluated in a meta-analysis.

16.6.1 Real vs. Artifactual Heterogeneity

🔍

Real Heterogeneity

Click to explore

⚠

Artifactual Heterogeneity

Click to explore

An important distinction is between clinical heterogeneity (real differences between populations, interventions, and settings) and statistical heterogeneity (variation in observed results beyond chance). Clinical heterogeneity is always expected; the key question is whether statistical heterogeneity is also present.

16.6.2 Measuring Heterogeneity: Cochran’s Q and Higgins I²

Cochran’s Q Statistic (Eq 28.7)

Q = Σ w_i(T_i − θ)²

Where w_i are the study weights, T_i are the study effects, and θ is the pooled estimate. Under the null hypothesis of no heterogeneity, Q follows a χ² distribution with k−1 degrees of freedom. However, the Q test has low power when the number of studies is small, so a non-significant result does not rule out heterogeneity. Consider using a relaxed P-value threshold (e.g., 0.10 instead of 0.05).

Higgins I² (Eq 28.8)

I² = [Q − (k − 1)] / Q × 100%

I² quantifies the proportion of variance between studies that is due to heterogeneity rather than chance. Benchmarks: 25% = low, 50% = moderate, 75% = high heterogeneity. An evaluation of possible causes should be undertaken whenever I² exceeds 25%.

16.6.3 Evaluating Causes of Heterogeneity

Subgroup Analysis

Identify a specific subgroup of studies defined by a characteristic of interest and examine the effect within that subgroup. However, results should be interpreted with caution—the best estimate for any subgroup is provided by considering all the evidence (Stein’s Paradox), not just the subgroup data. Subgroup analyses should be pre-specified in the review protocol.

Stratified Analysis

Data are stratified by a factor thought to influence the treatment effect, and a separate meta-analysis is carried out in each stratum. The between-strata heterogeneity can be tested using Q_B = Q_T − ΣQ_S. A disadvantage is that individual strata may contain few studies.

Galbraith Plot

A Galbraith plot plots the Z statistic (T_i/SE(T_i)) against the inverse of the SE (1/SE). The slope of the resulting line is the overall fixed-effect estimate, and lines at ±2 units from this line should encompass 95% of observations if there is no significant heterogeneity. Points outside these bounds are potential outliers contributing to heterogeneity.

Meta-Regression

Meta-regression is the most flexible approach: a weighted regression of observed treatment effects against study-level predictors (with inverse variance weights). It extends the random-effects model by adding predictors. Cautions: (1) even with RCTs, meta-regression is observational; (2) multiple comparisons inflate Type I error; (3) ecological fallacy applies since predictors are study-level averages.

Reflection

You conduct a meta-analysis of 20 studies and find I² = 82%. The forest plot shows widely scattered effect sizes. What steps would you take to investigate the causes of this high heterogeneity? Which methods from this section would you prioritise and why?

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 3

1. In a forest plot, what does the area of the box on each study line represent?

The sample size of the study The weight assigned to the study in the meta-analysis The p-value of the study result

In a forest plot, the area of the box is proportional to the weight assigned to that study in the meta-analysis. Studies with more precise estimates (smaller SEs) receive larger weights and thus larger boxes.

2. A meta-analysis reports I² = 75%. How should this be interpreted?

75% of studies found a statistically significant effect The treatment effect is 75% larger than the control 75% of the variance between studies is due to heterogeneity rather than chance

I² = 75% means that 75% of the observed variance between study results is attributable to real heterogeneity rather than sampling error. This is considered “high” heterogeneity, and an investigation of its causes is warranted.

3. What is meta-regression used for in a meta-analysis?

Evaluating whether study-level characteristics explain heterogeneity in treatment effects Computing the pooled estimate of effect Testing whether publication bias is present

Meta-regression is a weighted regression of observed treatment effects against study-level predictors. It is the most flexible approach for evaluating whether specific study characteristics (e.g., study design, population, intervention type) explain heterogeneity.

Section 4 of 5

Publication Bias, Influential Studies & Data Issues

⏱ Estimated reading time: 50 minutes

16.7 Publication Bias

A critical concern in meta-analysis is publication bias—studies with statistically significant or favourable results are more likely to be published than those with null or unfavourable results. Consequently, published studies may represent a biased subset of all work conducted on a topic.

Why Publication Bias Matters

If the meta-analysis only includes published studies, and published studies tend to overestimate the effect, the summary estimate will be biased away from the null. This can lead to erroneous conclusions about the effectiveness of interventions.

16.7.1 Detecting Publication Bias: The Funnel Plot

A funnel plot displays each study’s SE (or its inverse, 1/SE) plotted against its estimated effect. In the absence of publication bias, the plot should resemble an inverted funnel—symmetric around the summary estimate, with small studies (large SEs) scattered widely at the bottom and large studies (small SEs) clustered near the top.

Interpreting a Funnel Plot

Asymmetry in the funnel plot suggests publication bias. For example, if studies with large effects and large SEs are present, but studies with small or null effects and large SEs are missing (a “gap” on one side), this suggests that null-result studies were not published. However, asymmetry can also arise from other factors, so interpretation should be cautious.

16.7.2 Statistical Tests for Publication Bias

Two commonly used tests evaluate the relationship between study results and their precision:

Begg’s test: A rank correlation between effect estimates and their SEs. Simple but low power with few studies.
Egger’s test: A linear regression approach that is generally more powerful at detecting publication bias.

Neither test is very sensitive when the number of studies is small (<20), and both may produce false positives when there are large treatment effects, few events per trial, or all trials are of similar size.

16.7.3 Trim-and-Fill Method

How Trim-and-Fill Works

The trim-and-fill method (Duval and Tweedie, 2000) is a practical approach to assessing and adjusting for publication bias:

“Trim”: Produce a funnel plot and sequentially omit the most extreme studies on one side until the plot is approximately symmetrical.
Determine the centre of the trimmed, symmetrical plot (a new estimate of the treatment effect).
“Fill”: Replace the omitted studies along with their hypothetical “counterparts” on the other side of the centre line.
Redo the meta-analysis including both the original data and the hypothetical studies.

This provides an estimate of what the treatment effect would be if all studies had been published. The difference between the original and adjusted estimates indicates the potential impact of publication bias.

16.8 Influential Studies

It is important to determine whether individual studies have a profound influence on the summary estimate. A study might be much larger than others or have an extreme effect size. To evaluate this, sequentially delete each study from the meta-analysis and observe how the summary estimate changes.

Example: Sensitivity Analysis

In a meta-analysis of 25 studies with a pooled estimate of −2.121, one study (refid 218) was identified as a potential outlier in the Galbraith plot. Removing it changed the estimate to −2.011 (a 5% reduction in magnitude) and reduced I² from 95.6% to 88.1%. While the heterogeneity remained high, the analysis demonstrated that this single study had a meaningful influence on the results.

16.9 Outcome Scales and Data Issues

Published studies vary substantially in how they present data. Several practical issues arise:

🔢

Computing SEs

Click to explore

📏

Different Scales

Click to explore

🔄

Combining Outcomes

Click to explore

Reflection

You produce a funnel plot for your meta-analysis and notice asymmetry—there appear to be “missing” studies with small effects and large standard errors. What are the possible explanations for this pattern beyond publication bias? How would you investigate further?

Minimum 20 characters required.

✓ Reflection saved

Knowledge Check: Section 4

1. What pattern in a funnel plot suggests publication bias?

Perfect symmetry around the pooled estimate Asymmetry, with “missing” studies on one side of the plot All studies clustered at the bottom of the funnel

An asymmetric funnel plot, where studies with certain characteristics (e.g., small effects and large SEs) appear to be “missing,” suggests publication bias. However, asymmetry can also be caused by other factors such as heterogeneity.

2. What is the purpose of the trim-and-fill method?

To remove low-quality studies from the meta-analysis To standardise effect sizes across different outcome scales To estimate the treatment effect adjusted for potential publication bias

The trim-and-fill method creates a symmetrical funnel plot by adding hypothetical “missing” studies, then re-estimates the pooled effect. This provides an adjusted estimate showing what the result might be if all studies had been published.

3. Why is sensitivity analysis (sequentially removing studies) important in meta-analysis?

To determine whether the summary estimate is driven by a single influential study To increase the statistical power of the meta-analysis To convert between fixed- and random-effects models

Sensitivity analysis identifies studies that have a disproportionate influence on the summary estimate. If removing one study substantially changes the result, it warrants careful evaluation of that study’s quality and characteristics.

HSCI 230 — Lesson 14

Evaluating Epidemiological Research

Systematic Reviews and Meta-Analysis

Learning objectives for this lesson:

Introduction & Systematic Reviews

16.1 Why Systematic Reviews?

Narrative Reviews

Systematic Reviews

16.2 Steps of a Systematic Review

Reflection

Knowledge Check: Section 1

Meta-Analysis: Data Types & Effect Models

16.3 What Is a Meta-Analysis?

16.3.1 Types of Data in Meta-Analysis

16.4 Fixed- vs. Random-Effects Models

Fixed-Effects Model

Random-Effects Model

16.4.1 Weighting Methods

Reflection

Knowledge Check: Section 2

Forest Plots & Heterogeneity

16.5 Presentation of Results: The Forest Plot

16.6 Heterogeneity

16.6.1 Real vs. Artifactual Heterogeneity

16.6.2 Measuring Heterogeneity: Cochran’s Q and Higgins I²

16.6.3 Evaluating Causes of Heterogeneity

Reflection

Knowledge Check: Section 3

Publication Bias, Influential Studies & Data Issues

16.7 Publication Bias

16.7.1 Detecting Publication Bias: The Funnel Plot

16.7.2 Statistical Tests for Publication Bias

16.7.3 Trim-and-Fill Method

16.8 Influential Studies

16.9 Outcome Scales and Data Issues

Reflection

Knowledge Check: Section 4

Final Assessment

Final Reflection

Final Assessment: Systematic Reviews and Meta-Analysis

Congratulations!