HSCI 341 — Lesson 10

Controlled
Studies

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Design a controlled trial that produces a valid and efficient evaluation of an intervention
  • State trial objectives clearly and specify the target and source populations
  • Describe the phases of clinical research from Phase 0 through Phase IV
  • Allocate subjects to interventions using simple, stratified, cross-over, factorial, cluster, and split-plot randomisation
  • Distinguish single, double, and triple blinding and identify the bias each prevents
  • Compute and interpret sample size requirements, including the inflation factor for cluster randomised trials
  • Compare intent-to-treat and per-protocol analyses and identify when each is appropriate
  • Define direct, indirect, and total vaccine efficacy and apply the equations to a worked example
  • Apply the CONSORT 2010 reporting standards to plan and report a randomised trial

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas
Randomised Controlled Trial (RCT) A prospective experimental study in which the investigator assigns the intervention to participants using a chance-based mechanism, then compares outcomes between groups. The reference standard for causal inference about treatment effects.
Clinical Equipoise Genuine uncertainty in the expert clinical community about which arm of a trial is superior — the ethical precondition for randomising participants to one of several treatments (Freedman, 1987).
Allocation Concealment Procedures (e.g., central randomisation, sequentially numbered opaque sealed envelopes) that prevent investigators or participants from knowing the upcoming allocation before consent and enrolment, protecting against selection bias at entry (Schulz & Grimes, 2002b).
Blinding (Masking) Concealment of allocation status from participants, providers, outcome assessors, and/or analysts after randomisation. Reduces performance bias and detection bias. Single-, double-, triple-, and quadruple-blind designs differ in who is masked (Schulz & Grimes, 2002c).
Intention-to-Treat (ITT) Analysis principle in which participants are analysed in the group to which they were randomised, regardless of adherence, switching, or loss to follow-up. Preserves the benefits of randomisation and provides a pragmatic effect estimate (Hernán & Hernández-Díaz, 2012).
Per-Protocol Analysis Analysis restricted to participants who adhered to the assigned intervention as specified by the protocol. Estimates an idealised efficacy but is vulnerable to selection bias.
Efficacy vs Effectiveness Efficacy: the effect of an intervention under ideal/controlled conditions (explanatory trials). Effectiveness: the effect under usual real-world conditions (pragmatic trials). The PRECIS-2 tool maps trials along this continuum across nine design domains (Loudon et al., 2015).
Placebo An inert intervention indistinguishable from the active treatment, used to control for the placebo effect, regression to the mean, and natural history of the condition.
Contamination When control-arm participants inadvertently receive the intervention (or vice versa). Dilutes the estimated effect; cluster designs reduce this risk for community-level interventions.
Compliance / Adherence The extent to which participants follow the assigned regimen. Imperfect compliance attenuates ITT estimates and motivates per-protocol or instrumental-variable analyses.
Loss to Follow-Up Participants for whom outcome data are not obtained at the end of follow-up. Differential loss between arms threatens internal validity.
Phases of Clinical Research Phase I (safety, dose-finding, small N), Phase II (preliminary efficacy and side-effect profile), Phase III (definitive efficacy vs comparator, large RCTs), Phase IV (post-marketing surveillance).
Direct, Indirect, and Total Vaccine Efficacy Direct: protection of the vaccinated individual. Indirect: protection of unvaccinated people in vaccinated communities (herd effect). Total: combined direct + indirect protection within a vaccinated community.
CONSORT 2010 Consolidated Standards of Reporting Trials — the 25-item checklist and flow diagram used to standardise transparent reporting of parallel-group RCTs (Schulz, Altman, & Moher, 2010).
Methods, Measures & Designs
Simple Randomisation Each participant has an independent probability of being allocated to each arm (analogous to a coin toss). Easy but may yield unequal arm sizes in small trials.
Block Randomisation Allocation within sequential blocks of fixed size to maintain balanced arm sizes throughout the trial. Variable block lengths help conceal sequence in unblinded settings.
Stratified Randomisation Randomisation performed separately within strata defined by prognostic factors (e.g., site, age, severity), ensuring balance on those covariates.
Cluster Randomised Trial A trial in which intact groups (clinics, schools, villages) rather than individuals are randomised. Required when the intervention is delivered at the cluster level or contamination is unavoidable; analysis must account for intracluster correlation (Campbell, Elbourne, & Altman, 2004).
Crossover Trial Each participant receives both interventions in a randomly assigned order, separated by a washout period. Each person serves as their own control. Suitable only for stable, chronic conditions with reversible outcomes.
Factorial Design Two or more interventions tested simultaneously in a single trial (e.g., 2×2). Efficient when interventions act independently; allows tests of interaction.
Non-Inferiority & Equivalence Trials Trials designed to show that a new intervention is not unacceptably worse (non-inferiority) or is essentially equivalent (equivalence) to an active comparator within a pre-specified margin.
Sample Size & Power Calculation Pre-trial calculation of the number of participants needed to detect a clinically meaningful effect of a given size with specified type-I error (alpha) and power (1 - beta).
Data Safety Monitoring Board (DSMB) An independent committee that periodically reviews accumulating trial data and recommends continuation, modification, or early stopping for benefit, harm, or futility.
Key People
Austin Bradford Hill (1897–1991) British epidemiologist who designed the 1948 MRC streptomycin trial (Medical Research Council, 1948) — the first published modern RCT — and articulated the Bradford Hill viewpoints on causation.
Ronald A. Fisher (1890–1962) Statistician who introduced randomisation as a foundational principle of experimental design in his agricultural work at Rothamsted, providing the mathematical basis for RCT inference.
Archie Cochrane (1909–1988) Scottish epidemiologist whose advocacy for evidence-based medicine and systematic appraisal of RCT evidence inspired the Cochrane Collaboration.
No matching entries. Try a different search term.
Section 1

Foundations of RCTs & Trial Setup

⏱ Estimated reading time: 18 minutes

Introduction and Overview

Lessons 5–9 worked through the observational designs — cross-sectional, cohort, case-control, ecological, and the hybrid variants. Every one of them measures exposure as it occurs naturally, and every one of them must spend effort defending against confounding. Lesson 10 turns to the experimental alternative: when the investigator controls who is exposed via random assignment, confounding is broken by the design itself rather than by statistical adjustment. The three content sections move through an RCT in the order an investigator would build one: foundations and setup (Section 1), the central design choices around allocation, outcome measurement, sample size, and blinding (Section 2), and finally the conduct, analysis, and reporting of the trial (Section 3).

Learning Objectives

  • Define a randomised controlled trial and explain why RCTs are considered the gold standard for evaluating interventions.
  • Describe the five phases of clinical research from pre-clinical work through Phase IV post-marketing surveillance.
  • Distinguish between target population, source population, and study group.
  • Develop appropriate eligibility criteria that balance internal validity with generalisability.
  • Specify an intervention with the precision needed for replication.

What Is a Randomised Controlled Trial?

A randomised controlled trial (RCT) is a planned experiment in which the investigator deliberately allocates participants to one or more interventions, then follows them to observe outcomes. Because the allocation is determined by the investigator (rather than by self-selection or by clinical circumstance), randomisation produces groups that are comparable on both measured and unmeasured factors at baseline. This is the property that gives the RCT its inferential power.

▸ INTERACTIVE STORY — STREPTOMYCIN 1948 Open full screen ↗

Walk through the first modern RCT — the trial that turned scarcity into science. Next ▶ advances scenes.

A 7-scene retelling of the 1948 MRC streptomycin trial: postwar TB epidemic, the scarce U.S. drug shipment, Bradford Hill's elegant solution (random allocation), the slot-machine randomization, dramatic 6-month outcomes, BMJ publication, and the rise of the RCT as gold standard.

Throughout this lesson we follow the chapter convention of using RCT to describe any planned experiment evaluating products or procedures outside the laboratory. The terms clinical trial (often restricted to therapeutic products in clinical settings) and field trial (carried out in general population settings) are used interchangeably with RCT. The factor under investigation is called the intervention; the effect of interest is called the outcome; people or groups participating are called subjects or participants. The lineage of the controlled comparison reaches back to James Lind's 1753 shipboard trial of citrus for scurvy, but it was the 1948 MRC streptomycin trial that introduced formal random allocation as the design's defining feature.

Why RCTs Are the Gold Standard

RCTs allow much better control of potential confounders than observational studies and reduce bias from selection and misinformation. By randomly assigning the intervention, the investigator breaks any link between exposure and unmeasured confounders — a feat that no observational design can match. As Lavori and Kelsey put it, the RCT is at present the unchallenged source of the highest standard of evidence used to guide clinical decision-making. That said, a single RCT is rarely sufficient to answer questions about complex interventions, and concerns persist that some trials lack relevance to real-world practice.

CONSORT and Trial Registration

The Consolidated Standards of Reporting Trials (CONSORT) statement was developed to improve the quality of trial reporting. It was first published in 1996, updated in 2001, and again in 2010 (Schulz, Altman, & Moher, 2010). We will use its main headings as the structural template throughout this lesson; the full 25-item checklist appears in Section 3. As of 2005, the International Committee of Medical Journal Editors required investigators to register their trials prior to participant enrolment as a precondition for publishing in member journals. The WHO operates an International Clinical Trials Registry Platform (ICTRP), although registration remains effectively voluntary in many jurisdictions.

The Phases of Clinical Research

While controlled trials are valuable for assessing a wide range of factors affecting health, one of their most common uses is to evaluate pharmacological products. Before any trial in humans, extensive pre-clinical studies are conducted in vitro (test tube or cell culture) and in vivo (animal) using wide-ranging doses to obtain preliminary efficacy, toxicity, and pharmacokinetic information.

Click any phase to see its purpose, typical sample size, and key design features.

Phase 0
First-in-Human
Tap for details
Phase I
Safety
Tap for details
Phase II
Efficacy Signal
Tap for details
Phase III
Pivotal Efficacy
Tap for details
Phase IV
Post-Marketing
Tap for details

Background, Objectives, and Trial Design

The objectives of a trial must be stated clearly and succinctly. A good objective describes the intervention, the allocation design (parallel, factorial, cross-over, etc.), and the primary outcome(s). Each trial should have a limited number of objectives plus, if needed, a small number of secondary outcomes. Increasing the number of objectives complicates the protocol, jeopardises compliance, and sacrifices statistical power.

Most trials contrast two groups — intervention and comparison — and are sometimes referred to as two-arm studies. Trials with more arms can be efficient when factorial designs are used (Section 2). The comparison group might receive a placebo, no treatment, the usual treatment, or a different dose of the same product.

Choosing the Comparator: A Consequential Decision

Placebos are ideal when there is no established alternative intervention; where possible, a placebo is preferred to “no treatment.” However, when an effective standard treatment exists, withholding it from the comparison arm may be unethical — randomisation is ethically defensible only when there is genuine uncertainty in the expert community about which arm is superior, a state Freedman (1987) called clinical equipoise. In these settings, the standard of care serves as the comparator, and the trial may take the form of a non-inferiority trial — a trial that aims to show the new intervention is no worse than the existing standard by more than a clinically unimportant margin (delta). Determining the appropriate value of delta is one of the most consequential design decisions in a non-inferiority trial.

Example 12.1: A Trial of Prostate Cancer Screening

The Andriole and colleagues (2012) trial randomised 38,340 men aged 55–74 to a screening intervention and 38,345 to usual care across 10 USA screening centres between 1993 and 2001. Men in the intervention arm were offered annual PSA tests for six years and digital rectal examination for four years. Follow-up extended through 2009 or 13 years from trial entry. The primary analysis was an intention-to-screen comparison of prostate cancer-specific mortality. The trial illustrates how a clearly stated objective (mortality reduction from screening) can be embedded in a simple two-arm parallel design that can nonetheless run for two decades.

Notice that the comparator was “usual care,” which sometimes included opportunistic screening. This pragmatic choice (Tunis, Stryer, & Clancy, 2003; Loudon et al., 2015) makes the results applicable to the real US health system but complicates interpretation of the “true” effect of organised screening.

Participants: Defining the Study Group

Three nested populations must be distinguished in any trial:

Target Population (who results should apply to) Source Population (eligible & reachable) Study Group (eligible + consenting)

Figure 12.1 — The target population is the group to which results should generalise. The source population is the subset that is eligible and reachable. The study group is the smaller subset that meets eligibility criteria and consents to participate.

Defining the Three Populations

  • Target population — the population to which you want results to apply. Stating this explicitly helps ensure that conclusions remain practical and relevant.
  • Source population — representative of the target population and consisting of eligible subjects from whom the study group is drawn. The setting and location of the source population should be described.
  • Study group — the actual subjects who fit inclusion/exclusion criteria and agree to participate. Volunteers are unavoidable; how well they represent the source and target populations must be considered when extrapolating results.

Unit of Concern: Individuals or Clusters?

An early decision is the level at which the intervention will be applied. Some interventions can only be delivered to groups (e.g., medication added to drinking water; a school-wide curriculum). When the intervention is applied at the group level and the outcome is measured at the group level, this is a group-level study. When the outcome is measured on individuals within those groups, this is a cluster randomised study, which is discussed in Section 2.

Eligibility Criteria

Document the subject's relevant past history

Adequate records should be available to verify previous health history, prior treatments, and any condition relevant to the trial. This is especially important when the intervention may interact with prior therapies.

Use clear case definitions for therapeutic trials

For trials of therapeutic agents, a precise definition of the disease being treated is essential. Subjects who do not actually have the condition will dilute any treatment effect and reduce statistical power.

For prophylactic trials, document baseline health

For trials of preventive products, healthy subjects are required, and procedures must be in place to confirm and document health status at the start of the trial.

Subjects should be able to benefit; avoid high adverse-effect risk

Restricting a trial to subjects most likely to benefit increases statistical power but may limit generalisability. Subjects at high risk for adverse effects should generally be excluded both for ethical reasons and to protect the validity of the safety assessment.

The Width of Eligibility Criteria: A Trade-off

A narrow set of eligibility criteria yields a more homogeneous response and increases statistical power, but reduces generalisability. A broad set increases the pool of potential participants and can reveal subgroup variation, but introduces greater background variability that may hurt overall power. The recommended balance: use criteria reflecting the breadth of subjects who would receive the intervention in real-world practice if it proves effective.

Specifying the Intervention

The nature of the intervention and how it is administered must be clearly defined — with enough detail that another investigator could replicate it. Interventions vary widely:

  • Medical interventions (e.g., creatine supplementation, antibiotic regimens)
  • Surgical techniques (e.g., single-layer vs. double-layer uterine closure)
  • Devices or instruments (e.g., progressive lenses for presbyopia)
  • Screening programmes (e.g., PSA testing for prostate cancer)
  • Behavioural programmes (e.g., adolescent smoking cessation)

A fixed intervention (one with no flexibility) is appropriate for assessing new products in Phase III trials. A more flexible protocol is appropriate for products that have been in use long enough that some clinical judgment has accumulated. Whenever possible, the initial treatment assignment should remain masked so clinical decisions are not influenced by knowledge of group allocation. Clear instructions are critical when participants administer some or all of the intervention themselves (such as instructions for taking medication). A monitoring system should be in place to verify that the intervention is delivered as planned.

Example 12.2: A Sequential Trial of Creatine in ALS

Groeneveld and colleagues (2003) recruited ALS patients from neuromuscular outpatient clinics in Utrecht and Amsterdam. The two interventions were creatine monohydrate and a matching placebo, designated A and B. An independent physician, masked to assignment, instructed the research pharmacist which medication to dispense. Patients were seen at 1 month, 2 months, and every 4 months thereafter. Reasons for withdrawal (serious adverse events, withdrawal of consent) were documented; patients who stopped trial medication remained in the intent-to-treat analysis. The careful specification of this masking-and-allocation chain — from the masked physician through the pharmacist to the patient — illustrates how an unambiguous intervention specification protects the integrity of the trial.

Key Takeaways

  • RCTs are the gold standard for evaluating interventions because random allocation breaks the link between exposure and unmeasured confounders.
  • Clinical research progresses through pre-clinical studies, then Phase 0 (first-in-human), Phase I (safety), Phase II (efficacy signal), Phase III (pivotal efficacy), and Phase IV (post-marketing surveillance).
  • The CONSORT 2010 statement structures both trial design and reporting; trial registration is required by major journals.
  • Three nested populations matter: the target population (who results should apply to), the source population (eligible and reachable), and the study group (eligible + consenting).
  • Eligibility criteria balance internal validity against generalisability. Recommend using criteria that mirror who would receive the intervention in real practice.
  • Interventions must be specified with the precision needed for replication, and a monitoring system should verify delivery.
Knowledge Check — Section 1

1. Phase II clinical trials are primarily designed to:

Phase 0 is first-in-human (option 0); Phase I focuses on safety in healthy volunteers (option 1); Phase II is the first efficacy evaluation in patients (option 2); Phase IV is post-marketing surveillance (option 3).

2. The target population in an RCT refers to:

The target population is the broadest population to which results should generalise. The source population is the eligible and reachable subset; the study group is the further subset that consents.

3. A non-inferiority trial differs from a standard superiority RCT in that the null hypothesis is that:

In a non-inferiority trial, the goal is to show the new intervention is not worse than the standard by more than a clinically tolerable amount. Choosing the value of delta is a critical and sometimes contentious design decision.

✦ Pass the knowledge check with 100% to continue

Section 2

Allocation, Outcomes & Sample Size

⏱ Estimated reading time: 22 minutes

Introduction and Overview

Section 1 covered the up-front decisions: what an RCT is, what phase of clinical research it falls under, what the trial design is meant to test, and who gets included. Section 2 turns to the four central design decisions that follow: how the outcome is measured, how large the sample needs to be to detect the effect, how participants are allocated to groups, and who is masked from the allocation. Each of these is a place where a poorly run RCT can lose its inferential advantage.

Learning Objectives

  • Identify primary and secondary outcomes that are clinically relevant and measurable.
  • Understand the inputs to sample size calculations and how cluster randomisation, sequential designs, and adaptive designs change them.
  • Distinguish simple, stratified, cross-over, factorial, cluster, split-plot, and multicentre allocation strategies.
  • Compare single, double, and triple blinding and identify the bias each is designed to prevent.

Measuring the Outcome

A controlled trial should be limited to one or two primary outcomes and a small number (one to three) of secondary outcomes. Having too many outcomes leads to multiple-comparisons problems (covered in Section 3) and risks inflating the false-positive rate. Composite outcomes — combining several events into a single measure — are sometimes used but remain controversial; for our purposes a limited number of primary and secondary hypotheses is preferred.

Outcome Scales

Dichotomous Outcomes

The outcome is yes/no: occurrence of disease, death, recovery. This is the most common type in medical trials. Dichotomous outcomes generally require larger sample sizes than continuous outcomes for the same effect size, because they convey less information per subject.

Results should be reported in both absolute terms (risk difference, number needed to treat) and relative terms (risk ratio).

Continuous Outcomes

The outcome is measured on a numeric scale: blood pressure, FEV1, quality-of-life score. Continuous outcomes are often more statistically efficient than dichotomous ones because each subject contributes more information.

Adjustment for baseline (pre-intervention) values can substantially improve precision when the correlation between baseline and follow-up exceeds 0.5.

Time-to-Event Outcomes

Survival analysis, time to disease occurrence, time to relapse. These designs can be more powerful than simple occurrence-or-not in a defined follow-up window because they use all of the timing information. The accuracy of the actual time of event matters, although Korn and colleagues note that non-differential errors in event timing rarely have a major impact on treatment-effect estimation.

Choosing Clinically Relevant Outcomes

Outcomes that can be assessed objectively are preferred, but sometimes subjective outcomes are unavoidable (e.g., self-reported symptoms). When the outcome is not assessed by a near-gold-standard procedure, the impact of the intervention on the true outcome may differ from the impact on the surrogate. Intermediate outcomes (e.g., antibody titres in a vaccine trial) can illuminate mechanism but should not replace clinically relevant primary endpoints. Clinically relevant outcomes typically include:

  • Diagnosis of a particular disease — requires a clear case definition
  • Mortality — objective but still requires criteria for cause and time of death
  • Severity scores — difficult to develop reliably
  • Objective clinical measures (rectal temperature, blood biomarkers)
  • Quality-of-life and other patient-reported outcome measures

Sample Size

The size of a trial is determined through formal sample-size calculations that incorporate the estimated intervention effect, Type I error rate, and Type II error rate. Power is conventionally set to 90 percent. The sample sizes do not need to be equal in both arms.

Sample Size for Cluster Randomised Trials

When subjects are randomised in clusters (families, schools, clinics), the analysis must account for within-cluster similarity, summarised by the intra-cluster correlation coefficient (rho or ICC) and the cluster size (m). The required sample size is inflated by the design effect:

Design effect = 1 + ρ(m − 1)
Inflation factor for cluster randomisation. Even small ρ produces large inflation when m is large.

Even when rho is small, large clusters cause substantial inflation. Notably, the power of a cluster trial does not increase appreciably once the number of subjects per cluster exceeds 1/rho — so adding more individuals to existing clusters yields diminishing returns. Adding more clusters is often more efficient than adding more individuals per cluster.

Sequential and Adaptive Designs

Sequential designs

A sequential design (sometimes called a monitored study) allows hypothesis tests to be conducted on a number of occasions as data accumulate. Sample size is not fixed; instead, prespecified stopping rules halt the trial when efficacy, harm, or futility becomes clear. Sequential designs can be efficient but tend to lack power on a per-subject basis. Stopping early for benefit can produce overestimates of treatment effect, although the bias is often modest. Interim analyses should not be conducted unless the trial design accommodates them.

Adaptive designs

Adaptive designs allow the trial design to change as the study progresses. The most common adaptation is modifying the second-stage sample size based on first-stage power. Other adaptations include dropping or adding treatment arms, changing the primary endpoint, or even switching from non-inferiority to superiority. Outcome-adaptive designs use accumulating evidence to assign more subjects to the better-performing intervention (e.g., “play-the-winner”); these are appropriate only when the result of the intervention is identifiable shortly after treatment. The platform trial — an extension that evaluates multiple treatments simultaneously against a shared control under a master protocol, adding and dropping arms over time — has become an important variant in oncology and infectious-disease research (Berry, Connor, & Lewis, 2015).

Other sample-size considerations

Recruitment time matters. If season influences treatment response, the recruitment window should span a full calendar year. Loss to follow-up, non-compliance, and competing risks should be anticipated and the sample size adjusted upward to preserve power.

R Analysing a two-arm RCT: t-test, chi-square, log-rank

The three classic outcome scales (continuous / dichotomous / time-to-event) each have a one-line analysis in R. Below: simulate an RCT with all three outcome types and run the standard test for each.

set.seed(341)
n <- 200
arm <- factor(rep(c("control", "treatment"), each = n/2))

## (1) Continuous outcome: blood pressure
sbp <- rnorm(n, mean = ifelse(arm == "treatment", 128, 135), sd = 10)
t.test(sbp ~ arm)

## (2) Dichotomous outcome: cure (yes/no)
cure <- rbinom(n, 1, prob = ifelse(arm == "treatment", 0.45, 0.30))
tab <- table(arm, cure)
chisq.test(tab)
prop.test(table(arm, cure))            # with 95% CI for the difference

## (3) Time-to-event outcome: relapse (with right censoring)
library(survival)
time   <- rexp(n, rate = ifelse(arm == "treatment", 1/24, 1/14))
event  <- rbinom(n, 1, 0.7)
survdiff(Surv(time, event) ~ arm)         # log-rank test

One file, three deliverables. Continuous → t.test; dichotomous → chisq.test/prop.test; time-to-event → survdiff. Pre-specifying the test statistic in your protocol — before unblinding — is a key defence against p-hacking.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers from your three RCT analyses. Look at your console output before answering.

1. From t.test(sbp ~ arm), report the difference in mean SBP between treatment and control and the p-value. Did the treatment lower SBP by roughly the 7 mmHg effect that was simulated?

Model answerThe Welch t-test on simulated SBP gives a mean treatment-vs-control difference of roughly −7 mmHg (treatment lower) with p typically < 0.01. That recovers the simulated 7 mmHg true effect, and the small p-value confirms the difference is unlikely to be chance. The standard test reads off the size of the simulated effect with no adjustment needed because randomisation balanced confounders in expectation.

2. From chisq.test(tab) (cure outcome), what p-value did you get? The simulated cure probabilities were 0.45 vs. 0.30 - did your observed proportions land near those values?

Model answerchisq.test(tab) on the cure outcome gives p typically around 0.01–0.05 with observed cure proportions close to the simulated 0.45 (treatment) vs. 0.30 (control) — usually 0.42–0.48 and 0.28–0.32 depending on seed. The 15 percentage-point gap is statistically detectable at n = 120 per arm, in line with the simulated truth.

3. From survdiff(Surv(time, event) ~ arm), report the chi-square statistic and p-value. Given that the simulated mean relapse times were 14 days (control) vs. 24 days (treatment), does the log-rank result agree with what you would expect?

Model answerThe log-rank chi-square is around 8–12 with p < 0.01, agreeing with the simulated 10-day mean-time difference (24 vs. 14). The longer median relapse time in the treatment arm makes survival curves separate visibly on a Kaplan-Meier plot, and the log-rank confirms that separation is unlikely under the null of equal hazards. All three R outputs (t-test, chi-square, log-rank) recover the simulated effects in the expected direction and magnitude, as randomised trials should.
Saved.

Allocation of Study Subjects

Once enrolled, subjects must be allocated to interventions. Formal randomisation is the strongest method — without it, bias is very likely to distort findings (Schulz & Grimes, 2002a). Random allocation should occur as close to the start of the intervention as possible to minimise withdrawals between assignment and treatment, and concealment of the upcoming allocation from those enrolling participants is essential to prevent selection bias (Schulz & Grimes, 2002b).

Alternatives to Randomisation

Historical Controls and Systematic Assignment

Historical control trials compare outcomes after an intervention with outcomes from a pre-intervention period. For validity, four conditions must hold: predictable outcome, complete and accurate databases, constant diagnostic criteria, and no environmental changes for subjects. Rarely are all four met. Blinding is impossible in this design.

Systematic assignment (every other subject) can be reasonable in field settings (e.g., a vaccine clinic) and is often as effective as randomisation when outcome assessment is blinded. Randomise the very first subject's assignment to avoid predictability. Never give the intervention to the first half of subjects and the comparison to the second half — that introduces secular confounding.

R Generate the allocation list before the trial starts

Pre-generating — and version-controlling — the allocation sequence is one of the cleanest defences against allocation-concealment failures. Below: simple, stratified, and permuted-block randomisations.

set.seed(202509)                       # commit this seed before enrolment opens
N <- 120; arms <- c("Drug", "Placebo")

## (1) Simple randomisation
simple <- sample(arms, N, replace = TRUE)
table(simple)

## (2) Stratified by site (3 sites, 40 each)
strat <- unlist(lapply(c("Site_A", "Site_B", "Site_C"), function(s)
  paste(s, sample(rep(arms, 20)), sep = ":")))

## (3) Permuted-block randomisation (block size 4)
block <- unlist(replicate(N/4, sample(rep(arms, 2))))
head(block, 12)                             # every block of 4 has 2 of each arm

## Save the list - this is your audit trail
# write.csv(data.frame(id = 1:N, arm = block), "allocation_list_2025-09-01.csv",
#           row.names = FALSE)

Why pre-generate? If the allocation is generated only when each patient enrols, an unblinded coordinator could (consciously or not) game who enrols when. A locked, pre-generated list removes the temptation entirely. The block design also prevents prolonged imbalance during early enrolment.

R Reflect on what you just ran

Use the questions below to interpret the allocation lists you just generated. Compare the output of table(simple) and head(block, 12) before answering.

1. Look at table(simple). With simple randomisation, did you get exactly 60 Drug and 60 Placebo? Why is some imbalance expected under simple randomisation but not under permuted-block randomisation?

Model answerWith simple randomisation at n = 120 you typically get a few participants of imbalance — e.g., 63 Drug and 57 Placebo — not exactly 60/60. Simple randomisation flips an unbiased coin for each person, so by chance the counts drift slightly; the expected absolute imbalance scales with √n. Permuted-block randomisation forces balance within each block (every 4 participants give 2 Drug and 2 Placebo), so the running counts can never drift more than the block size minus one. Block randomisation is preferred when interim analyses or stratified subgroup checks are planned.

2. From head(block, 12) (the first three blocks of size 4), confirm by eye that every block of 4 contains exactly 2 Drug and 2 Placebo. Why is this an attractive feature if a safety committee plans an interim analysis at n = 60?

Model answerVisual inspection of head(block, 12) confirms two D's and two P's in each block (e.g., DPDP, PDDP, DPPD). At an n=60 interim analysis the data-safety committee will see exactly 30 Drug and 30 Placebo participants — balanced exposure makes the interim treatment-effect estimate efficient and unbiased by allocation imbalance. Without blocked randomisation, the interim might see 35 vs. 25 and produce a noisier effect estimate, complicating stopping decisions.

3. Why MUST set.seed(202509) be committed alongside the analysis script before enrolment opens? What happens to reproducibility (and to your audit trail) if it is forgotten?

Model answerCommitting set.seed(202509) before enrolment opens guarantees that anyone with the script and the data can re-run the analysis and get the same allocation, statistical tests, and CIs. Without the seed pre-committed, the allocation could be reproduced by trial-and-error to favour any particular outcome, destroying the audit trail and inviting accusations of post hoc data manipulation. Regulators and journals increasingly require pre-registration and seed commitment because it is the cheapest, strongest defence against accidental or motivated post hoc shifting of results.
Saved.

The Family of Random Allocation Designs

Random allocation does not mean haphazard allocation. A formal process — computer-based random number generator, sealed-envelope randomisation, or even a coin toss — must be used (Schulz & Grimes, 2002a). Click any design to see its features and an example.

Simple
Randomisation
Tap for details
Stratified
Randomisation
Tap for details
Cross-Over
Design
Tap for details
Factorial
Design
Tap for details
Cluster
Randomisation
Tap for details
Split-Plot
Design
Tap for details

Cluster Randomisation: Why It Is Less Efficient

Cluster randomised trials are statistically less efficient than individually randomised trials of the same total sample size (Campbell, Elbourne, & Altman, 2004). The clustering of subjects within groups must be accounted for in the analysis. The best follow-up scenario is to monitor all individuals for the duration of the study; if not, following a randomly selected cohort is the next most powerful approach. In some settings, repeated cross-sectional samples within each cluster must be used. Matched-cluster designs may be appropriate when the number of clusters is small, although “breaking the matches” can sometimes improve statistical efficiency.

Example 12.3: A Cluster RCT of Adolescent Smoking Cessation

Dalum and colleagues (2012) randomised 22 continuation schools in Denmark by coin toss to deliver a smoking cessation intervention or to act as control. The randomisation was blocked so that each county contained both intervention and control schools, balanced across school types (commercial vs. social-and-health). Smoking status was self-reported in surveys at baseline, week 11/2005 (short-term), and week 11/2006 (long-term). Analyses were intent-to-treat, with school as a random factor in logistic regression.

Why was cluster randomisation appropriate here? Because the intervention was delivered at the school level, individual randomisation would have introduced contamination — intervention and control students in the same hallway would influence each other.

Example 12.4: A 2×2×2 Factorial Caesarean-Section Trial

The CAESAR trial (CAESAR study collaborative group, 2010) randomised women aged >15 undergoing their first Caesarean section to three independent factors: single- vs. double-layer uterine closure, closure vs. non-closure of the peritoneum, and liberal vs. restricted use of a subsheath drain. Telephone randomisation with a minimisation algorithm balanced participating centre, labour status, and pregnancy multiplicity. The primary outcome was maternal infectious morbidity (any of: antibiotic use for febrile morbidity, endometritis, or treated wound infection). The 3,500 women required to detect a 12% to 9% reduction with 80% power illustrate how factorial designs efficiently address multiple research questions in one trial.

Example 12.5: A Split-Plot Shoulder-Pain Trial

Watson and colleagues (2008) conducted a pragmatic split-plot trial across UK general practices. Physicians in 91 practices (the whole plot) were randomised to additional training in shoulder injection or to no additional training. Within the practices, 215 patients with acute shoulder pain were then randomised to receive either a corticosteroid or a lignocaine injection (the split plot). The main outcome was the British Shoulder Disability Questionnaire score. Notice how this design lets the trial answer two questions simultaneously: does training help, and does corticosteroid outperform lignocaine?

Example 12.6: A Multicentre Cross-Over Trial of Progressive Lenses

Boutron and colleagues (2008) compared two generations of progressive lenses for presbyopia at five primary-care optical dispensaries. 127 patients aged 43–60 were randomised to wear one lens for four weeks, then cross over to the other for four weeks, blinded to the lens sequence. Patients and the statistical analyst were both blinded; all equipment was assembled in one laboratory to ensure consistency. The primary outcome was patient preference at week 8. The cross-over design is appropriate here because lens preference is reversible and short-acting — conditions are stable, and switching has no carry-over.

Multicentre Trials

If an adequate sample is not available at one site, a multicentre trial is required. Within-centre and between-centre variances must be accounted for in design and analysis. Multicentre trials enhance generalisability (because of the broader geographic and clinical reach) and create opportunities to detect interaction effects across sites. For statistical efficiency, the number of subjects per centre should be approximately equal. Example 12.6 was conducted across five centres.

Masking (Blinding)

Blinding (or masking) refers both to the methodological principle of withholding information from individuals to prevent bias and to the specific procedures used to do so (Schulz & Grimes, 2002c). Terms can be used inconsistently in the literature, so the specific masking mechanisms always need to be described, and ideally pilot-tested in larger trials.

Triple-blind: + analysts blinded Double-blind: + clinicians/outcome assessors blinded Single-blind: participant unaware of allocation reduces response bias, placebo effect

Figure 12.2 — The three nested layers of blinding. Each additional layer is added on top of the prior layers, prevents an additional source of bias, and is harder to achieve in practice.

What Each Level of Blinding Prevents

Single-Blind: Participant Unaware

In a single-blind study the participant does not know which intervention they are receiving. This helps:

  • Reduce response bias (subjects reporting symptoms differently based on what they think they are getting)
  • Prevent the placebo effect
  • Equalise differential attrition and non-compliance
  • Reduce co-intervention bias and follow-up bias

Double-Blind: Participant + Treatment/Outcome Personnel Unaware

In a double-blind study, both the participants and the people administering the intervention or assessing the outcome are unaware of allocation. This adds protection against:

  • Patient–provider interaction effects on the placebo response
  • Differential attrition, non-compliance, or co-intervention driven by clinician knowledge
  • Selective decisions and referrals based on treatment knowledge
  • Observer bias and diagnostic bias when assessing outcomes

Triple-Blind: + Data Analysts Unaware

In a triple-blind study, the people analysing the data are also unaware of group identity (often coded as A vs. B). This is designed to ensure unbiased analytic decisions — choice of subgroups, handling of outliers, modelling decisions — that might otherwise be subtly influenced by knowledge of which group is the “new” treatment.

The success of blinding should be evaluated rather than assumed. Methods for assessing blinding success have been published.

The Role of Placebos

A placebo is a product indistinguishable from the active intervention, administered to the comparison group. In drug trials, the placebo is often the vehicle without the active ingredient. Even apparently inert placebos can have positive or negative effects (e.g., a placebo vaccine without antigen can still induce some immunity through adjuvant). These issues should be addressed before the trial begins. In some situations, blinding cannot be achieved with placebos alone — but masking should be implemented wherever feasible.

The Take-Away on Blinding

Each additional layer of blinding prevents a different bias, but each is harder to achieve in practice. Whenever feasible, design the trial so that as many sources of error as possible are prevented from the start — this also reduces the impact of any differential errors that do arise. The goal is not maximum blinding for its own sake, but rigorous masking matched to the biases that most threaten the specific trial.

Key Takeaways

  • Limit a trial to 1–2 primary outcomes and 1–3 secondary outcomes; report effects in both absolute and relative terms when outcomes are dichotomous.
  • Sample size depends on the expected effect, Type I and Type II error rates, and the outcome scale. Cluster randomisation requires inflation by the design effect 1 + ρ(m − 1).
  • Sequential designs allow stopping for efficacy, harm, or futility but tend to lack power per subject; adaptive designs offer flexibility but require careful pre-specification.
  • Random allocation is the strongest assignment method. The major design types are simple, stratified, cross-over, factorial, cluster, and split-plot, plus multicentre extensions.
  • Single-blind protects against participant response bias; double-blind adds clinician and outcome-assessor blinding; triple-blind adds analyst blinding. Each layer prevents a different category of bias.
Knowledge Check — Section 2

1. In a cluster randomised trial with intra-cluster correlation ρ = 0.05 and average cluster size m = 41, the design effect (sample-size inflation factor) is:

Design effect = 1 + ρ(m − 1) = 1 + 0.05 × 40 = 1 + 2 = 3.0. Even a small intra-cluster correlation produces substantial inflation when cluster size is modest.

2. A cross-over design is appropriate when:

Cross-over designs require a stable condition and a short-acting intervention so that each subject can experience both treatments without carry-over. Mortality and irreversible disease progression rule out cross-over by definition.

3. The chief additional bias prevented by triple-blinding (over and above double-blinding) is:

Single-blind addresses participant response bias; double-blind adds clinician and outcome-assessor bias. Triple-blind extends masking to data analysts so that analytic choices (subgroup definitions, outlier handling, model specification) are not subtly influenced by knowledge of which group is which.

✦ Pass the knowledge check with 100% to continue

Section 3

Conduct, Analysis & Special Topics

⏱ Estimated reading time: 22 minutes

Introduction and Overview

Sections 1 and 2 settled the design. Section 3 walks through what happens once the trial is running: how to track participants through follow-up, how to analyse the data (including the intention-to-treat principle and the special case of vaccine efficacy), and how to report the trial honestly via the CONSORT checklist. The reporting framework here connects directly back to the integrity material in HSCI 230 Lesson 3.

Learning Objectives

  • Implement effective follow-up and compliance monitoring during a trial.
  • Distinguish intent-to-treat from per-protocol analysis and choose the appropriate approach.
  • Identify the sources of multiple comparisons in RCTs and apply the Bonferroni adjustment.
  • Compute and interpret direct, indirect, and total vaccine efficacy.
  • Use the CONSORT 2010 checklist to plan and report a randomised trial.

Follow-Up and Compliance

One of the most important practical issues is ensuring that all groups are followed rigorously and equally. The follow-up period must be long enough to capture all outcomes of interest. Some loss is inevitable through drop-out or non-compliance; for trials with long follow-up, the status of all subjects should be ascertained at regular intervals. The CONSORT statement strongly recommends a flow diagram showing participant numbers at allocation, intended intervention, protocol completion, and outcome assessment.

Strategies to Minimise Loss and Maximise Compliance

Maintain regular communication with all participants

Frequent contact — reminder messages, newsletters, study updates — reduces attrition. Incentives may be provided, including study-related information that participants would not otherwise have, or public recognition of their contribution (subject to confidentiality).

Capture data on dropouts whenever possible

For participants who drop out, information may still be available through routine databases if the participant consents. Documenting reasons for withdrawal allows comparison of withdrawn and remaining subjects, helping characterise potential bias.

Verify compliance directly and indirectly

Compliance can be assessed through interviews, biological samples (drug or metabolite levels), or indirect indicators such as collecting empty pill containers, vials, and packaging. Compliance data are essential for interpreting the difference between intent-to-treat and per-protocol results.

Statistical Methods and Analysis

Outcomes might be analysed on a continuous scale, as categorical (often dichotomous) data, or as time-to-event measurements. Time-to-event analyses can have greater power than simple occurrence-or-not in a defined window. Whatever the analysis, results should report both the effect size and its precision (typically a 95 percent confidence interval), and dichotomous outcomes should appear in both absolute (risk difference) and relative (risk ratio) terms.

Intent-to-Treat vs. Per-Protocol Analysis

This distinction is one of the most important in RCT analysis. Click the tabs to compare the two approaches.

Intent-to-Treat (ITT) Analysis

All subjects assigned to a specific intervention are analysed in that group, regardless of whether they completed the study or complied with the protocol. ITT yields a conservative estimate of the intervention effect — if anything, it will under-estimate the maximum potential benefit by including non-compliers and dropouts in the “treated” arm.

Why use it: ITT estimates the expected response when the intervention is rolled out in similar populations — in real-world use, some non-compliance and loss to follow-up are inevitable. ITT preserves the benefits of randomisation and is the recommended primary analysis.

Per-Protocol (PP) Analysis

Only subjects who complied with and completed the study as specified in the protocol are analysed. PP yields an estimate of effect under ideal compliance.

Why caution is needed: First, non-compliance is rarely a random event; non-compliers are often systematically different from compliers, so the PP estimate is likely biased. Second, future use of the intervention will involve some non-compliance, so an “assuming 100% compliance” estimate is unrealistic. PP should be reported alongside ITT but should not replace it.

Stating Numbers and Compliance Is Essential

Whichever analysis is primary, the number of subjects in each group, and whether or not they complied, must be reported. If there are considerable losses or major adherence problems, Hernán and Hernández-Díaz (2012) suggest using inverse probability weighting to reduce potential bias.

Both ITT and PP analyses are typically reported. ITT is the primary analysis for efficacy claims; PP is a secondary or sensitivity analysis to show the effect among those who actually adhered.

Baseline Comparison and Covariate Adjustment

Analysis usually starts with a baseline comparison of group characteristics as a check on randomisation. This should not be a statistical significance test but rather an assessment of comparability. Differences between groups, even if not statistically significant, should be noted and may justify covariate adjustment.

For dichotomous outcomes, adjustment for covariates is recommended — ideally for strong predictors identified a priori; failing that, for variables predictive of the outcome in the trial data. Adjustment can substantially increase power or reduce required sample size. Adjustment for non-confounders does little harm provided they are not intervening variables (mediators).

For continuous outcomes, controlling for baseline (pre-intervention) values can substantially improve precision. This can be done either by analysing the change score (post minus pre) or by including baseline as a covariate. Either approach gains power, particularly when the baseline-to-follow-up correlation exceeds 0.5.

The Multiple Comparisons Problem

Multiple comparisons in RCTs arise from three sources: examining multiple outcomes, examining multiple subgroups, and performing periodic interim analyses during the trial. The problem is that the experiment-wise (family-wise) error rate is much larger than the per-test rate — making spurious “significant” findings increasingly likely as the number of tests grows.

Bonferroni-adjusted α = αnominal / k
For k comparisons at experiment-wise rate αnominal (typically 0.05). With 5 comparisons, each test must reach p < 0.01 to claim significance.

The Bonferroni adjustment is the simplest fix: divide the desired experiment-wise alpha by the number of comparisons. It is conservative; less conservative procedures (Holm, Hochberg, Benjamini-Hochberg) are available in standard texts.

The Subgroup Analysis Trap

It is tempting to evaluate many subgroups to see whether the intervention works in any of them. Don't. Only subgroup analyses planned a priori should be carried out; data-driven subgroup analyses generate spurious associations at alarming rates. The recommended way to test whether an intervention's effect varies by subgroup is a single overall interaction test. Note that detecting interactions reliably typically requires a sample size at least four times larger than detecting the overall main effect; effect sizes for interactions need to be roughly twice the magnitude of the main effect to be detected with similar power.

Sequential Analyses: When to Stop

Sequential design studies plan periodic analyses throughout the trial to allow early stopping for one of three reasons:

  • Clear (and statistically significant) evidence of the superiority of one intervention over the other
  • Convincing evidence of harm from the intervention (regardless of statistical significance)
  • Little likelihood that the trial will produce evidence of an effect even if completed (futility)

Interim analyses must be pre-specified in the trial design; ad-hoc “peeking” inflates the false-positive rate and is regarded as a serious methodological breach.

Vaccine Trials: Direct, Indirect, and Total Efficacy

Standard RCT designs need modification when the intervention is a prophylactic against a communicable organism. The reason: an effective vaccine has effects on the vaccinated and on the unvaccinated, because vaccination reduces transmission. This means study subjects are not independent — an effect Hudgens and Halloran (2008) call interference.

Why Standard Direct Efficacy Estimates Are Misleading

The standard individually randomised, placebo-controlled trial of a vaccine yields an estimate of vaccine efficacy that is confounded by the proportion of the study population vaccinated. Two identical trials in populations with different transmission levels — or different vaccination coverage levels — will report different vaccine efficacy estimates even if the underlying biological protection is the same.

Three Measures of Vaccine Efficacy

To get a fuller picture, epidemiologists distinguish three measures, each requiring information from at least two populations with different vaccination coverage. Use Iv for the incidence in the vaccinated and Inv for the incidence in the unvaccinated; subscript A denotes the higher-coverage population, B the lower-coverage population.

VEd = (Inv − Iv) / Inv
Equation 11.1: Direct vaccine efficacy — computed within a single population by comparing vaccinated and unvaccinated individuals.
VEind = (InvB − InvA) / InvB
Equation 11.2: Indirect vaccine efficacy — the “herd immunity” effect, comparing the unvaccinated in the higher-coverage area with the unvaccinated in the lower-coverage area.
VEtot = (IB − IA) / IB
Equation 11.3: Total vaccine effect — comparing the overall incidence in the higher-coverage population with the lower-coverage population.

Example 12.7: Cholera Vaccine Effects in Bangladesh

Ali and colleagues (2005) and Hudgens and Halloran (2008) analysed an individually randomised, placebo-controlled trial of killed oral cholera vaccines in residential areas (baris) in Bangladesh. Two groups were compared: Group A (more than 50% coverage) and Group B (less than 28% coverage). The first-year risks per 1,000 were: RnvB = 7.01, RvB = 2.66, RnvA = 1.47, RvA = 1.27, RB = 4.13, RA = 1.34.

Direct effect in the high-coverage group A: VEd = (1.47 − 1.27) / 1.47 = 0.14. Looking at A alone you might conclude the vaccine has little effect.

Direct effect in the low-coverage group B: VEd = (7.01 − 2.66) / 7.01 = 0.62 — the vaccine reduces risk by 62% in the unprotected population.

Indirect effect (in the unvaccinated): (7.01 − 1.47) / 7.01 = 0.79 — herd immunity reduced the unvaccinated risk by 79%.

Total relative effect: (7.01 − 1.27) / 7.01 = 0.82. Overall effect: (4.13 − 1.34) / 4.13 = 0.68.

The lesson is striking: limiting analysis to the high-coverage population alone (Group A) would have suggested the vaccine barely worked. Looking across populations with different coverage reveals the dominant role of indirect effects.

Designing Vaccine Trials to Estimate All Three Measures

To estimate VEd, VEind, and VEtot, the design must include at least two comparable populations with different vaccination coverage. The recommended approach: in population A, randomise individuals to vaccine or placebo. In population B, leave everyone unvaccinated (or assign a lower coverage). The two populations must be (a) comparable in characteristics that affect the outcome, especially baseline transmission level, and (b) physically separated so subjects do not intermix.

An alternative when finding two similar populations is impractical is to exploit natural clustering — for example, randomly assign vaccination to half the children in a geographic area, then study spread within schools, recording the proportion of children at each school who were vaccinated. Riggs and Koopman (2005) and Longini and colleagues (1998, 2002) describe how to design and analyse such trials.

Reporting: The CONSORT 2010 Checklist

The CONSORT 2010 statement is the dominant reporting guideline for parallel-group randomised trials. Its 25 items align with the design and conduct topics covered throughout this lesson. The table below highlights items most relevant to the topics we have studied.

Section / ItemWhat to Report
1a TitleIdentify the study as a randomised trial in the title (improves “searchability”).
2a–b Background & objectivesScientific rationale and explicit hypotheses or objectives.
3a Trial designType of design (parallel, factorial, cross-over, cluster, split-plot) and allocation ratio.
4a–b ParticipantsEligibility criteria; settings and locations of data collection.
5 InterventionsSufficient detail for replication, including how and when administered.
6a–b OutcomesPre-specified primary and secondary outcomes and how they were assessed.
7a–b Sample sizeHow sample size was determined; explanation of any interim analyses or stopping guidelines.
8–10 RandomisationHow the allocation sequence was generated, what type of restriction (blocking) was used, the mechanism for implementing allocation, and who did what at enrolment.
11a–b BlindingWho was blinded after assignment (participants, providers, outcome assessors); description of intervention similarity if relevant.
12a–b Statistical methodsMethods for primary and secondary outcomes; methods for subgroup and adjusted analyses.
13a–b Participant flowFor each group, numbers randomly assigned, receiving intended treatment, and analysed; losses and exclusions with reasons. A flow diagram is strongly recommended.
14a–b RecruitmentDates of recruitment and follow-up; reasons the trial ended or was stopped.
15 Baseline dataTable of baseline demographic and clinical characteristics for each group.
16 Numbers analysedFor each group, denominators in each analysis and whether by originally assigned groups.
17a–b Outcomes & estimationEffect size and precision (e.g., 95% CI); for binary outcomes, both absolute and relative effect sizes.
18 Ancillary analysesOther analyses (subgroup, adjusted), distinguishing pre-specified from exploratory.
19 HarmsAll important harms or unintended effects in each group.
20–22 DiscussionLimitations (bias, imprecision, multiplicity); generalisability; interpretation balanced against benefits and harms.
23–25 OtherTrial registration number; where the full protocol can be accessed; sources of funding and role of funders.

CONSORT extensions exist for cluster randomised trials, non-inferiority and equivalence trials, non-pharmacological treatments, herbal interventions, and pragmatic trials. Up-to-date references are available at www.consort-statement.org.

Why Reporting Quality Matters

Poor reporting prevents readers from judging the reliability and validity of trial findings, prevents extraction for systematic reviews, and is associated with biased estimates of treatment effects. Inadequate reporting is therefore a methodological problem, not just a stylistic one. Following CONSORT during planning — not just at write-up — helps ensure that the methodological choices required for transparent reporting are actually made.

Reflection: Designing Your Own RCT

Imagine you have been funded to design a randomised controlled trial of a new mobile-app-based behavioural intervention to reduce daily smoking among young adults aged 18–25 living in rural communities. Briefly outline how you would address: (1) target population vs. source population vs. study group; (2) choice of allocation strategy (individual? cluster? factorial?); (3) choice of comparator (no app? sham app? existing standard care?); (4) how you would handle the analysis if compliance with the app turned out to be poor. Identify at least one trade-off you would face and explain how you would resolve it.

Model answer(1) Populations: target = all 18–25-year-old smokers in rural Canadian communities; source = those with smartphone access in the 4–6 sampled rural regions; study group = consenting volunteers from the source who pass screening (smoking ≥ 5/day, no current treatment). (2) Allocation: individual randomisation 1:1 stratified by community to balance regional effects, with permuted blocks of 4 within each stratum. Cluster randomisation isn't warranted because the intervention is delivered to the individual; factorial would dilute power unless a second clearly orthogonal arm is needed. (3) Comparator: sham app (matched look-and-feel, generic health content) rather than no-app control; this controls for app-engagement and Hawthorne effects. (4) Outcome: 7-day point-prevalence abstinence at 6 months, biochemically validated (cotinine in saliva or eCO breath test); secondary measure of cigarettes/day at 1, 3, 6 months. (5) Sample size: calculate from expected 25% abstinence in intervention vs. 12% in control, α=0.05, power=0.80, accounting for ~30% loss-to-follow-up. (6) Threats: selection (rural digital access varies), differential attrition (relapsers may drop out), contamination (intervention participants share app content with controls in same community), Hawthorne (knowing they're in a trial changes behaviour).

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Rigorous and equal follow-up of all groups is essential. Document compliance and reasons for withdrawal; a CONSORT flow diagram is strongly recommended.
  • Intent-to-treat analysis preserves randomisation and gives a conservative, real-world estimate; per-protocol analysis estimates the effect under ideal compliance and should be reported alongside ITT, not in place of it.
  • Multiple comparisons inflate the experiment-wise error rate. The Bonferroni adjustment (α / k) is the simplest correction. Subgroup analyses should be pre-specified and tested via interaction terms; data-driven subgroup analyses generate spurious findings.
  • For prophylactic vaccines, three measures matter: direct (VEd), indirect or herd-immunity (VEind), and total (VEtot) efficacy. Estimating all three requires data from at least two populations with different coverage.
  • The CONSORT 2010 checklist of 25 items structures both the planning and the reporting of randomised trials. Following CONSORT during design — not just at write-up — is what makes transparent reporting achievable.
Knowledge Check — Section 3

1. The major reason that intent-to-treat analysis is preferred as the primary analysis is that:

ITT analyses include all randomised subjects in their assigned group regardless of compliance, preserving the comparability that randomisation generated. Because real-world deployment will involve some non-compliance, ITT gives a realistic estimate of the effectiveness clinicians and policymakers can expect.

2. In a vaccine trial, the indirect vaccine efficacy (VEind) is estimated by:

VEind = (InvB − InvA) / InvB. This is the herd-immunity effect — how much lower the disease risk is among unvaccinated people in the high-coverage area than among unvaccinated people in the low-coverage area.

3. With four pre-specified primary outcome comparisons and a desired family-wise error rate of 0.05, the Bonferroni-adjusted significance threshold for each comparison is:

Bonferroni-adjusted alpha = 0.05 / 4 = 0.0125. Each comparison must reach p < 0.0125 to be declared significant at the experiment-wise 0.05 level.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 4

Knowledge Check & Final Assessment

⏱ Estimated time: 15 minutes

Bringing It All Together

This lesson moved from observational into experimental epidemiology. You worked through the rationale for randomised controlled trials, the five phases of clinical research, the practical mechanics of allocation and masking, and the analytic and reporting standards (intent-to-treat, CONSORT 2010) that govern modern trial conduct. The shift from observational to experimental is conceptually small — the investigator now assigns the exposure — but methodologically transformative, because randomisation is what gives RCTs their causal authority.

Together with the observational designs covered in earlier lessons and the hybrid designs covered in Lesson 9, the controlled trial completes your toolkit of analytic study designs in epidemiology. The takeaways below summarise the design and reporting decisions that distinguish a credible trial from one that merely uses the word "randomised."

Key Takeaways from Lesson 10

  • Randomisation is what makes the RCT the gold standard for causal inference: it balances measured and unmeasured confounders in expectation.
  • Trial design begins with explicit target, source, and study populations, eligibility criteria, and a precisely specified intervention — vague specifications produce unreplicable results.
  • Sample-size calculations must inflate for clustering by 1 + ρ(m − 1); sequential and adaptive designs change the inferential rules and require pre-specification.
  • Allocation strategies (simple, stratified, cross-over, factorial, cluster, split-plot, multicentre) and masking (single, double, triple) each prevent specific biases.
  • Intent-to-treat is the primary analysis because it preserves randomisation; per-protocol analyses are secondary and informative about efficacy under perfect compliance.
  • Vaccine trials distinguish direct (VEd), indirect (VEind), and total (VEtot) efficacy; the CONSORT 2010 25-item checklist structures both planning and reporting of trials.

Final Reflection

Imagine you are leading a multidisciplinary team designing a randomised controlled trial of a community-level intervention to reduce opioid overdose deaths in a rural region. Drawing on what you have learned across this lesson, identify three design decisions you would face that involve trade-offs (for example: individual vs. cluster randomisation; placebo vs. active control; broad vs. narrow eligibility criteria). For each decision, briefly describe the trade-off, your recommended choice, and one specific limitation of that choice that you would acknowledge in your discussion of the trial.

Model answerThree trade-offs for a community-level opioid-overdose RCT. (1) Individual vs. cluster randomisation: community-level interventions (e.g., distribution of naloxone, prescribing-norms changes) naturally cluster — cluster randomisation respects the intervention scale but requires far larger n to overcome the design effect. The trade-off is statistical power vs. ecological validity; opt for cluster randomisation here because the intervention has community-level mechanisms that individual randomisation can't capture. (2) Placebo vs. active control: in opioid-overdose prevention there is no placebo equivalent; the appropriate active control is current standard of care (e.g., status-quo naloxone distribution). Trade-off: pure null comparator gives the largest effect estimate but is unethical when effective interventions exist; active control gives smaller but practically meaningful effects. (3) Broad vs. narrow eligibility: broad inclusion (anyone in the community) maximises generalisability but dilutes the effect among lower-risk participants; narrow inclusion (high-risk users) maximises statistical power but limits external validity. Combine with pre-specified subgroup analyses to address both.

Minimum 20 characters required.

✓ Reflection saved

Final Knowledge Assessment

Complete the following 12-question assessment. A score of 100% is required to complete the lesson. You may retake the assessment as many times as needed.

Final Assessment — 12 Questions

1. The defining feature that distinguishes a randomised controlled trial from an observational analytic study is that:

The defining feature of an RCT is investigator-controlled allocation of the intervention. Random assignment is what produces baseline comparability on both measured and unmeasured factors — the inferential strength that observational designs cannot match.

2. Phase III clinical trials are typically:

Phase III trials enrol 300 to 3,000+ patients with the target condition, typically across multiple sites. They are the pivotal efficacy trials submitted for regulatory approval.

3. A trial's eligibility criteria that are very narrow will most likely:

Narrow eligibility criteria produce a more homogeneous study group, which reduces background variability and increases power. The trade-off is reduced generalisability: the results apply less clearly to the broader population in real-world practice.

4. In a cluster randomised trial of a school-based health programme with intra-cluster correlation ρ = 0.02 and an average cluster size of 51 students per school, the sample-size inflation factor is:

Design effect = 1 + ρ(m − 1) = 1 + 0.02 × 50 = 1 + 1 = 2.0. The trial requires twice the sample size of an individually randomised trial.

5. A factorial design is most appropriate when:

Factorial designs assign all combinations of two or more interventions, allowing each to be evaluated and any interactions to be tested in a single trial. Cluster randomisation refers to randomisation at the group level (option 2); cross-over fits stable conditions with short-lived interventions (option 3).

6. Triple-blinding adds which additional layer of masking compared with double-blinding?

In triple-blinded studies, the people analysing the data are also masked to allocation, preventing analytic decisions (subgroup definitions, outlier handling, model specification) from being subtly influenced by knowledge of which group is the new treatment.

7. The recommended primary analysis for a randomised controlled trial is:

ITT preserves the benefits of randomisation by analysing subjects in their originally assigned groups regardless of compliance. It gives a conservative, real-world estimate of effectiveness and is the recommended primary analysis. Per-protocol may be reported as a secondary analysis.

8. With seven pre-specified primary outcome tests and a desired family-wise error rate of 0.05, the Bonferroni-adjusted alpha for each test is approximately:

0.05 / 7 ≈ 0.0071. Each individual test must reach p < 0.0071 to claim significance at the experiment-wise 0.05 level. Bonferroni is conservative; less conservative alternatives (Holm, Hochberg) are available in standard texts.

9. Examining a wide range of post-hoc subgroups for differential treatment effects is risky because:

Trials are usually powered for the overall main effect, not subgroup-specific effects, so most subgroup tests are underpowered for true effects but overpowered for chance findings. The recommended approach is a single pre-specified interaction test rather than a battery of subgroup-specific significance tests.

10. In a vaccine trial of two populations (Group A with high coverage, Group B with low coverage), which of the following is the formula for the indirect vaccine efficacy?

VEind = (InvB − InvA) / InvB. Option 0 is direct VE within a population; option 1 is total VE across populations. Indirect VE compares unvaccinated subjects across populations to capture the herd-immunity effect.

11. The Bangladesh cholera vaccine example (Example 12.7) illustrated that:

In the high-coverage group A, direct VE was only 0.14 — suggesting the vaccine barely worked. In the low-coverage group B, direct VE was 0.62, and the indirect (herd-immunity) effect among unvaccinated people was 0.79. The lesson is that confining analysis to a single population obscures the dominant indirect effects.

12. The CONSORT 2010 statement is:

CONSORT (Consolidated Standards of Reporting Trials) is a reporting guideline first published in 1996, updated in 2001 and 2010. The 25-item checklist covers all aspects of trial design and reporting and has formal extensions for special trial types. It is endorsed by major journals but is not a regulatory mandate.

✦ Complete the final reflection above before submitting