Surveillance & Outbreak Investigation
Fundamental Epidemiological Concepts and Approaches
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Distinguish passive, active, sentinel, and syndromic surveillance and identify Canadian examples of each
- Trace the flow of a notifiable disease report from the clinic to the Public Health Agency of Canada
- Navigate the major federal and BC surveillance products (CNDSS, FluWatch, CCDSS, BCCDC dashboards, CVSD)
- Identify the registries and vital-statistics infrastructure that underpin Canadian population health data
- Apply the classic CDC 10-step outbreak investigation framework and the Canadian FIORP analogue
- Read a real Canadian outbreak case study and identify which surveillance signals triggered the response
- Explore PHAC and BCCDC dashboards firsthand and interpret what they show
This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.
Glossary — Key Terms, People & Concepts
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.
Foundations: What Surveillance Is and the Four System Types
Introduction and Overview
You are the duty epidemiologist at a regional health authority on a Tuesday afternoon. Three minutes ago, your phone buzzed: a paediatrician at a community clinic just called to say she has seen four children from the same school presenting with bloody diarrhoea over two days. She is requesting stool cultures and wants to know whether you have seen anything similar elsewhere. By the time you put the phone down, you will need to know — fast — whether four cases is unusual for this organism in this catchment, whether a notifiable disease report has already been filed, who else needs to be looped in, and what data sources you can pull within the next hour. That sequence of questions is what this lesson is about. The infrastructure that lets you answer them is called public-health surveillance, and the structured response that follows is called outbreak investigation.
Learning Objectives
- State Langmuir's working definition of surveillance and explain what it means to "close the loop."
- List the five core purposes of surveillance and the four conventional system types (passive, active, sentinel, syndromic).
- Distinguish strengths, weaknesses, and Canadian examples of each system type.
- Trace the notifiable-disease reporting flow from clinician to MOH to province to PHAC, and explain why under-reporting in passive systems is non-random.
2.1 What Public-Health Surveillance Is — and What It Is For
The classic working definition, attributed to Langmuir (1963) and refined by Thacker & Berkelman (1988), is the ongoing, systematic collection, analysis, interpretation, and dissemination of health-related data for the planning, implementation, and evaluation of public-health practice. Each of those verbs is doing real work. Ongoing distinguishes surveillance from a one-time study. Systematic rules out anecdote. Analysis and interpretation rule out a passive data warehouse. Dissemination for action rules out research that is never returned to the people who can act on it. A famous shorthand attributed to Langmuir (EIS founder) is that surveillance only counts if it “closes the loop”: information must come back out as decisions, alerts, or programs, otherwise the system is just bookkeeping.
The five purposes of surveillance
When you read a published surveillance report, the authors are usually doing one or more of these five things: (1) detect outbreaks, clusters, and unusual events early; (2) characterize who is getting sick, where, and why (descriptive epi by person, place, and time); (3) monitor trends in incidence, prevalence, and risk factors over time; (4) evaluate the effect of interventions and programs; and (5) plan resource allocation and policy. A single dataset can serve more than one purpose, but the design tradeoffs are different for each.
2.2 The Surveillance Action Loop
It helps to picture surveillance as a closed loop with four moving parts: data sources (clinical reports, lab results, vital records, administrative data), data systems (the registries, dashboards, and notifiable-disease platforms that ingest and store data), analysis and interpretation (the epidemiologists who turn case counts into rates, anomalies, and narratives), and action (the case finding, control measures, public communications, and policy decisions that flow from the analysis). Each handoff between these stages is also a place where the system can leak: a clinician who never reports a case, a province that does not share data with the federal level, an analyst who does not see a signal in time, or a recommendation that never reaches a decision-maker.
For the rest of this section we focus on the data-system layer because it determines what kinds of signals you can detect at all. The four conventional system types differ on a single axis: who is doing the work of finding cases.
2.3 The Four Surveillance System Types
Most public-health surveillance systems can be sorted into one of four types; the syndromic category is the newest, codified by the CDC framework of Buehler and colleagues (2004). They are not mutually exclusive — a single disease can be tracked by several at once — but they have different strengths, costs, and biases.
| Type | Who initiates the report | Strengths | Limitations | Canadian example |
|---|---|---|---|---|
| Passive | Clinicians and labs report when they encounter a notifiable disease. | Cheap, broad coverage, mandated by law, runs continuously. | Under-reporting (often substantial), variable timeliness, completeness depends on clinician burden. | The Canadian Notifiable Disease Surveillance System (CNDSS) — aggregated case counts for ~50 reportable conditions submitted by provinces to PHAC. |
| Active | Public-health staff actively contact providers, labs, or households to find cases. | Higher case ascertainment, better data quality, useful in outbreak investigations. | Resource-intensive, narrow scope, hard to sustain over time. | The 100-clinician active-search component of FluWatch, and contact tracing during COVID-19 case investigations. |
| Sentinel | A small, designated network of providers reports systematically — trading breadth for depth. | High data quality, manageable cost, can collect richer data than passive systems. | Not population-representative; trends but not absolute counts. | FluWatch sentinel practitioners (general practitioners reporting weekly influenza-like-illness rates) and the Canadian Paediatric Surveillance Program (CPSP). |
| Syndromic | Real-time signals (chief-complaint codes, EMS calls, OTC drug sales, school absences) flag clusters before lab confirmation. | Fast — can detect events before a definitive diagnosis. Useful for emerging or rare events. | Low specificity (lots of false alarms); validation is hard; needs analytic infrastructure. | BC's Acute and Communicable Disease Prevention ED chief-complaint monitoring; PHAC's pandemic-era wastewater surveillance dashboards. |
A fifth category, laboratory-based surveillance, is sometimes broken out separately because it sits beside (rather than under) the clinician's desk: provincial public-health labs aggregate isolates from clinical labs, perform serotyping or whole-genome sequencing, and feed the results into both passive and active systems. PulseNet Canada is the best-known example — the network that enabled the 2008 Maple Leaf listeriosis outbreak to be linked across provinces by genome (Gilmour et al., 2010).
Why “passive” under-reporting is not random
A common student misconception is that under-reporting in passive surveillance just makes case counts smaller. It does — but it also biases who is counted. Cases that present to the health system, get tested, and produce a positive lab result are systematically over-represented. The result is that severe cases, urban cases, and cases in well-insured populations are more visible in the data than mild, rural, or uninsured ones. When you read a CNDSS rate, the denominator is the catchment population, but the numerator is selected.
2.4 The Notifiable Disease Reporting Flow in Canada
For passive surveillance to function, every link in the chain has to do its part. The Canadian flow looks like this:
- Clinician or laboratory identifies a case of a notifiable disease (e.g., a positive shiga-toxin-producing E. coli culture or a clinical diagnosis of measles). Each province publishes its own list of conditions reportable under public-health legislation.
- Local Medical Officer of Health (MOH) or regional health authority receives the report (typically within 24 hours for urgent conditions, longer windows for routine ones). The MOH may immediately initiate case investigation, contact tracing, or public-health control measures.
- Provincial / territorial public-health authority (e.g., BCCDC, Public Health Ontario, the Quebec INSPQ) aggregates reports from the regional level, performs initial analysis, and shares anonymized aggregate data with the federal level.
- Public Health Agency of Canada (PHAC) publishes nationally aggregated counts in CNDSS and feeds disease-specific products like FluWatch and the CCDSS. PHAC also reports onward under the WHO International Health Regulations (2005) for events of international concern; the post-SARS rationale for that regime is laid out by Heymann & Rodier (2004).
Two features of this flow deserve emphasis. First, public-health legislation is provincial in Canada, so the list of notifiable conditions and reporting timelines differ across provinces. A condition might be urgent-reportable in one province and routine in another. Second, the federal level receives aggregated, de-identified data only; PHAC cannot pull individual records. This federal-provincial division is one reason that the COVID-19 pandemic exposed gaps in real-time data sharing — the legal architecture was not designed for the latency that a respiratory pandemic demands.
Reflection
You are designing a surveillance system to track post-secondary student mental-health outcomes in BC. Which of the four system types (or which combination) would you choose, and why? What kinds of cases would your design miss, and what would you do about that gap?
Minimum 20 characters required.
Question 1: A provincial laboratory operates a network of 80 family-medicine clinics across Canada that submit weekly counts of patients presenting with influenza-like illness, alongside the results of throat-swab subtyping. This is best described as:
Question 2: Which of the following is the strongest reason that passive surveillance under-counts cases?
Question 3: In Canada, who is legally responsible for maintaining the list of notifiable diseases that clinicians must report?
Canadian Surveillance Products and Data Infrastructure
Introduction and Overview
Section 1 sketched four types of surveillance system. This section turns to the actual products a Canadian epidemiologist works with. They sit in three layers — federal, provincial/territorial, and the long-running data infrastructure that underpins both. Knowing which product to reach for, and what its limitations are, is most of the practical skill of the job.
Learning Objectives
- Identify the major PHAC surveillance products (CNDSS, FluWatch, CCDSS, CVSD) and what each is best used for.
- Describe the provincial layer in BC: BCCDC dashboards, IRIS, and Panorama.
- Recognize the long-running data infrastructure (vital stats, cancer registries, DAD/NACRS, CCHS, wastewater) that surveillance products read from.
- Apply the five dimensions of surveillance data quality — timeliness, completeness, representativeness, sensitivity, and predictive value positive — to interrogate any data source.
2.5 The Federal Layer
PHAC operates several headline products. They are aggregated and curated — the agency cannot dispense individual-level data — and they each have a different cadence, scope, and intended audience.
The flagship passive system. Provinces submit weekly aggregated counts for ~50 nationally notifiable conditions (the list is harmonized but not identical to provincial lists). CNDSS feeds the agency's annual Notifiable Diseases Online reports and the underlying open-data tables on Open.canada.ca. Useful for long-run trends and inter-provincial comparisons; less useful for real-time outbreak detection because of the multi-week lag from clinic to PHAC.
A hybrid system: a sentinel network of ~150 family-medicine clinicians reports influenza-like-illness rates each week, lab partners across the country submit subtyped influenza and (post-2020) SARS-CoV-2 results, and provincial outbreak counts feed the national picture. The weekly FluWatch report is what most public-health communicators reach for during respiratory season.
A federal-provincial collaboration that uses validated case definitions on top of health-administrative records (physician billing claims, hospital discharge abstracts) to estimate prevalence and incidence of chronic conditions like diabetes, hypertension, asthma, dementia, and ischaemic heart disease. CCDSS is the largest single source of population-level chronic-disease data in Canada and powers the agency's chronic-disease infobase.
Not strictly a surveillance product but the spine of mortality-based surveillance. Statistics Canada compiles every death registered in Canada, with cause coded to ICD-10. CVSD enables life-expectancy estimates, cause-specific mortality trends, and excess-mortality analyses (most prominently used during COVID-19 to estimate pandemic burden).
PHAC also runs targeted systems for HIV (including the Canadian Perinatal Surveillance System branch), tuberculosis (CTBRS), antimicrobial resistance (CIPARS, CARSS), opioid-related harms, and others — alongside Canada's contribution to global digital surveillance, the Global Public Health Intelligence Network (GPHIN), described by Mykhalovskiy & Weir (2006) and complemented internationally by HealthMap (Brownstein, Freifeld, & Madoff, 2009). Each lives at canada.ca/en/public-health and is worth knowing exists.
2.6 The Provincial Layer (with BC examples)
The federal products are valuable for the country-level view, but most case-level work happens provincially. In British Columbia, the BC Centre for Disease Control (BCCDC) is the analytic and operational arm of provincial public health, and most of its products are publicly available.
A short tour of BCCDC dashboards worth bookmarking
- BCCDC respiratory pathogens dashboards — weekly influenza, RSV, and SARS-CoV-2 surveillance with regional breakdowns.
- BCCDC enteric pathogens dashboards — Salmonella, Campylobacter, STEC, Listeria; updated weekly.
- BC Vital Statistics overdose deaths — the unrestricted-toxic-drug-supply mortality reports that the BC Coroners Service and BCCDC co-produce.
- BC Cancer Registry & surveillance — cancer-incidence dashboards by health-service-delivery area.
- BC Sexually Transmitted Infection Quarterly — gonorrhea, chlamydia, syphilis, congenital syphilis, and infectious syphilis trends.
Behind these dashboards sit two case-management platforms: IRIS (Integrated Reporting Information System, BCCDC's communicable-disease platform) and Panorama (a multi-province public-health platform used in BC for outbreak management and immunization records).
2.7 The Long-Running Data Infrastructure
Most surveillance products do not generate their own data — they read from infrastructure that exists for other reasons.
- Vital statistics (births, deaths, marriages, divorces) are the oldest population health data in Canada, with continuous coverage since the late 19th century in some provinces. They are the denominator for life expectancy and the numerator for mortality surveillance.
- Cancer registries — the Canadian Cancer Registry aggregates provincial cancer registries and is one of the few systems with active follow-up and complete case ascertainment. The BC Cancer Registry is the provincial source.
- Health-administrative data — the Discharge Abstract Database (DAD, hospital admissions), the National Ambulatory Care Reporting System (NACRS, ED visits), and provincial Medical Services Plan billing claims (MSP in BC). CCDSS and many academic studies are built on top of these.
- Population health surveys — the Canadian Community Health Survey (CCHS) is the workhorse cross-sectional survey for self-reported health behaviours and outcomes; the Canadian Health Measures Survey (CHMS) layers in physical measurement.
- Wastewater-based surveillance — a rapidly maturing infrastructure since 2020. PHAC and many provinces now sample wastewater for pathogens (SARS-CoV-2, influenza, mpox) as a non-clinical signal that is independent of who seeks testing.
2.8 Data Quality: Five Dimensions to Question Every Source
Every surveillance product makes tradeoffs across these five dimensions. When you read a public-health report, the report's author has implicitly resolved each one:
- Timeliness — how long from event to data product? Wastewater can be days. CNDSS can be months.
- Completeness — what fraction of true events are captured? STIs are notoriously incomplete; cancer registries are nearly complete.
- Representativeness — do the captured cases reflect the affected population? Sentinel networks usually do not.
- Sensitivity — will the system detect a true outbreak when it occurs? Syndromic surveillance is highly sensitive, passive surveillance often is not.
- Predictive value positive — when the system flags an event, is it real? Inversely related to sensitivity, especially for syndromic systems.
Pick one of the BCCDC dashboards listed in 2.6 (or a comparable PHAC product) and spend ten minutes with it. Locate (a) the most recent week's case count, (b) the underlying case definition, and (c) any explicit data-quality caveats the dashboard publishes. Note the date of the most recent update versus today's date — how big is the lag?
Reflection
You are asked by a journalist for the “most accurate” current count of chlamydia cases in BC. CNDSS, the BCCDC STI quarterly, and a recent academic estimate using CCHS self-report all give different numbers. How would you explain the discrepancy without making any of the three systems sound discredited?
Minimum 20 characters required.
Question 1: Which Canadian surveillance product is built on top of physician billing claims and hospital discharge abstracts to estimate the prevalence of conditions like diabetes and hypertension?
Question 2: A surveillance system has high sensitivity but low predictive value positive. What does this mean in practice?
Question 3: Which feature distinguishes wastewater-based surveillance from clinical case-based surveillance?
Defining and Investigating Outbreaks
Introduction and Overview
Surveillance is meant to detect signals; outbreak investigation is what you do when a signal turns into a problem. This section gives you the operational vocabulary — what counts as an outbreak, the standard 10-step framework that organizes the response, and the specifically Canadian protocol used for foodborne investigations.
Learning Objectives
- Distinguish cluster, outbreak, epidemic, and pandemic, and explain why each is partly a statistical and partly an operational judgement.
- Walk through the CDC 10-step outbreak investigation framework and recognize which steps typically run in parallel.
- Describe the Canadian Foodborne Illness Outbreak Response Protocol (FIORP) and the roles of PHAC, CFIA, and Health Canada.
- Articulate the speed-vs-accuracy tension in outbreak response and why equity considerations belong in surveillance design.
2.9 What Counts as an Outbreak?
The textbook definition is the occurrence of more cases of a disease than expected in a given population, place, or time. Each italicized phrase is doing work. “More than expected” presupposes a baseline — the local 5-year average for influenza in the same week, or the seasonal threshold from a regression model. “Population, place, time” says outbreaks are local: 30 cases of Campylobacter across Canada in a week is not unusual; 30 cases at one wedding is.
Three working terms you need to use precisely
- Cluster — an aggregation of cases in time and/or space that may or may not be statistically unusual. Clusters are flagged by surveillance and triaged for further investigation.
- Outbreak — a cluster judged to exceed the expected baseline. The threshold is decided by the public-health authority, not by the data alone.
- Epidemic — a large-scale outbreak, often across multiple jurisdictions; in international usage often synonymous with outbreak.
- Pandemic — an epidemic with worldwide geographic spread. The WHO declares pandemics; severity is a separate dimension and is not part of the definition.
A common student error is to treat “pandemic” as “severe outbreak” — it is a geographic claim, not a severity claim.
The decision to call something an outbreak is partly statistical and partly operational. Statistically, you can compare current counts to a baseline distribution and apply a threshold (e.g., the upper 95% confidence limit of the 5-year mean). Operationally, an outbreak declaration mobilizes resources, triggers a coordinated response, and may require public communication — and authorities are reasonably cautious about both over- and under-calling.
2.10 The CDC 10-Step Outbreak Investigation Framework
The reference framework most North American epidemiologists learn is the CDC's 10-step process, articulated in the modern era by Reingold (1998) and elaborated in the CDC Field Epidemiology Manual. The steps look orderly on paper but in practice you often work several at once and revisit earlier ones as new information arrives.
Walk through the most famous post-war outbreak investigation, scene by scene. Next ▶ advances.
An 8-scene retelling of the 1976 American Legion convention outbreak in Philadelphia, illustrating the CDC's 10-step framework: detection, case definition, descriptive epi, hypothesis generation, analytic study, environmental investigation, agent identification (a brand-new bacterium), and control measures.
Before you arrive, you confirm authority and roles, brief on the suspected etiology, gather supplies (case-report forms, lab kits, PPE), and identify local liaisons (MOH, environmental health, lab). Preparation is the step new investigators most often skip and most often regret.
Compare current counts to a baseline. If the baseline is unstable (small denominators, seasonal variation), state how you constructed it. Ruling out artefact — a new lab test, a reporting policy change, a clinician on a reporting kick — is part of this step.
Talk to clinicians, review charts, check that lab results are correctly attributed. A pseudo-outbreak driven by a contaminated lab reagent or a misclassification is not unheard of.
A case definition has three parts: clinical criteria (symptoms, signs, lab tests), person/place/time restrictions (e.g., attendees of the August 12 potluck), and a level of certainty (suspect, probable, confirmed). You will revise it as the investigation evolves — that is normal.
Active case finding (chart review, asking clinicians, contacting attendees of the implicated event) yields a line list — one row per case with demographic, clinical, exposure, and outcome variables. The line list is the working dataset for everything that follows.
The classic person, place, time triad. The flagship visualization is the epidemic curve (epi curve) — a histogram of case counts by date of symptom onset. Its shape (point-source vs propagated) constrains your hypotheses about the exposure window.
From the descriptive epi (and from open-ended interviews of cases) you generate plausible exposures: a specific food, a specific water source, a specific event, a specific procedure. Good hypotheses are testable with the data you have or can collect.
The two workhorse designs in outbreak settings are the retrospective cohort (when you can enumerate everyone who attended an event, e.g., the wedding-guest list) and the case-control study (when you cannot). The 2×2 tables and risk ratios introduced here are formalized later in HSCI 341 (measures of disease frequency in Lesson 5; measures of association in Lesson 7); the outbreak workflow gives you an early concrete reason to want them.
The first analytic pass often points to several plausible exposures. Environmental sampling, traceback investigations (where did the implicated food come from?), and lab characterization (genome-typing of isolates) sharpen the inference.
Control measures (recalling a product, closing a venue, prophylaxis, isolation) often happen before Step 8 — the precautionary principle does not require you to wait for a p-value. Communication runs throughout: with the public, with affected communities, with policymakers, and through the final outbreak report.
2.11 The Canadian FIORP and Its Multi-Jurisdictional Structure
For foodborne outbreaks specifically, Canada operates under the Foodborne Illness Outbreak Response Protocol (FIORP), summarised by Vik & Hexemer (2014), which formalizes the roles of the federal partners and provincial/territorial public-health authorities. Three federal partners share the load:
- PHAC — epidemiology and surveillance lead, including PulseNet Canada (the lab network that does whole-genome sequencing of bacterial isolates).
- Canadian Food Inspection Agency (CFIA) — food-safety investigations, traceback, and product recalls.
- Health Canada — risk assessment of contaminated products and health-impact guidance.
FIORP defines escalation triggers (when a multi-jurisdictional outbreak exists), establishes an Outbreak Investigation Coordinating Committee (OICC) for inter-provincial events, and lays out communication protocols. The 2008 Maple Leaf listeriosis outbreak (the deli-meat-associated Listeria monocytogenes outbreak that killed 22 Canadians) is the case that drove the modern revisions to FIORP and to the supporting infrastructure of PulseNet.
2.12 Real-Time vs Retrospective — A Standing Tension
An outbreak investigation is run under two competing pressures. Speed — every day of delay can mean more illness — pushes you toward early hypotheses and precautionary control measures. Accuracy — falsely accusing a food product or a venue has real costs — pulls in the other direction. Experienced investigators learn to act on confident-enough evidence, communicate uncertainty honestly, and revise control measures as data evolve. The skill is partly statistical and partly ethical: who bears the cost of being wrong in either direction is rarely symmetric.
Equity in surveillance and outbreak response
Surveillance systems do not see all populations equally. Data quality, case ascertainment, and willingness to be tested all vary by social position; outbreak investigators are increasingly expected to ask whose communities are over- or under-represented in the line list and how to adjust the response accordingly. The COVID-19 pandemic made this question impossible to ignore in Canada — differential burdens by neighbourhood income, racialized status, and Indigenous identity were visible in surveillance data once collected, and absent when not.
Reflection
Imagine a hospital flags 11 cases of Clostridioides difficile on one ward over a single week. The ward typically sees 1–2 cases per week. As the consulting epidemiologist, walk through the first three CDC steps you would take in the next 24 hours, and identify which of those steps you would do in parallel rather than serially.
Minimum 20 characters required.
Question 1: A working case definition for an outbreak typically includes:
Question 2: Under the Canadian Foodborne Illness Outbreak Response Protocol (FIORP), which federal agency leads the food-safety investigation and product traceback?
Question 3: An epidemic curve with a steep rise, a single peak, and a tail roughly equal to the duration of one incubation period is most consistent with which exposure pattern?
Case Study and Hands-on Outbreak Investigation
Introduction and Overview
The two preceding sections gave you the institutional and methodological vocabulary. This section makes them concrete: a real Canadian outbreak walked through against the 10-step framework, and a hands-on R activity in which you investigate a simulated outbreak yourself. You will leave Section 4 having previewed the methods that the rest of HSCI 341 formalizes — attack rates, 2×2 tables, risk and rate ratios — on data you have generated and interrogated yourself.
Learning Objectives
- Walk the 2017–18 romaine lettuce E. coli O157:H7 outbreak against the CDC 10-step framework and identify the role of whole-genome sequencing.
- Explain when a case-case design is preferred to a traditional case-control design in a fast-moving foodborne investigation.
- Compute attack rates and risk ratios across multiple suspect foods from a line list, and interpret an epi curve.
- Identify the validity threats (recall bias, selection bias, confounding by other shared foods) that limit early outbreak inference and articulate what a defensible control action looks like under uncertainty.
2.13 Case Study — The 2017–18 Romaine Lettuce E. coli O157:H7 Outbreak
Between November 2017 and February 2018, PHAC and US CDC investigated a multi-jurisdictional outbreak of E. coli O157:H7 infections that ultimately involved 42 confirmed cases across 5 Canadian provinces (and 25 cases across 15 US states). The Canadian case fatality rate reached 5%; one death was reported. The outbreak is a useful teaching case because it shows the full FIORP machinery in motion and makes the role of whole-genome sequencing visible.
Step 2–3: Establishing the outbreak and verifying the diagnosis
PulseNet Canada flagged a cluster of E. coli O157:H7 isolates with matching pulsed-field gel electrophoresis (PFGE) patterns, later confirmed by whole-genome sequencing (WGS) to share a tight phylogenetic neighbourhood. The genomic signal is what made this an investigable outbreak rather than a scattered set of unrelated cases — without WGS, the same cases would have been distributed across the routine STEC surveillance baseline.
Steps 4–6: Case definition and descriptive epidemiology
Confirmed cases were Canadian residents with WGS-matched E. coli O157:H7 isolates and symptom onset between mid-November 2017 and the closing date. Probable cases were household contacts of a confirmed case with compatible symptoms. The descriptive analysis (epi curve, geographic distribution, age and sex distribution) showed a polymorphic temporal pattern with several waves — typical of a continuous common-source outbreak driven by an ongoing contaminated supply chain, not a single point exposure.
Steps 7–8: Hypothesis development and the case-case analysis
Standard hypothesis-generating interviews were conducted with confirmed cases using PHAC's Hypothesis Generating Questionnaire (HGQ) for STEC, which lists hundreds of food and exposure variables. Cases reported leafy greens consumption substantially more often than expected based on national consumption patterns. A case-case analysis comparing outbreak cases to historical sporadic STEC cases pointed strongly to romaine lettuce. CFIA conducted parallel traceback investigations from grocery purchases reported by cases.
Steps 9–10: Refinement, control, and communication
Joint PHAC–CFIA–US-CDC coordination produced a public advisory in late December 2017 urging Canadians in the affected provinces to avoid romaine lettuce. The Canadian outbreak was declared over in mid-January 2018. Importantly, despite intensive traceback, no specific grower or facility could be definitively implicated — a sobering reminder that even good investigations sometimes end without the closure of a single confirmed source.
Three teaching points are worth pulling out of this case. First, the outbreak would not have been detected at all without genomic surveillance — the case counts in any one province in any one week looked like background noise. Second, the case-case study design is a pragmatic alternative to a traditional case-control study when controls are hard to recruit during a fast-moving foodborne investigation; you trade some validity for considerable speed. Third, control measures (the public advisory) were issued before the source was definitively confirmed — precaution is a defensible public-health stance when downside asymmetry favours acting early.
2.14 Why Surveillance Comes Early — and What You Will Build On It
Surveillance comes early in HSCI 341 on purpose: every later lesson depends on having data to design against, sample from, measure with, and reason about. The outbreak workflow you will run below previews methods you have not yet formally met — sampling (Lesson 3), questionnaire design for hypothesis-generating interviews (Lesson 4), measures of disease frequency such as attack rates and incidence (Lesson 5), measures of association including risk ratios and odds ratios (Lesson 7), and the validity and confounding vocabulary that arrives in Lessons 8–12. Modern surveillance is also being reshaped by big-data and machine-learning approaches reviewed by Mooney & Pejaver (2018). Treat the 2×2 tables and risk ratios you compute in the R activity as a working preview; the formal derivations and validity diagnostics come in the lessons that follow. The point of doing it now is to ground the rest of the course in a concrete operational setting that needs all of it.
The companion R script r-activities/HSCI_341_Lesson_2_Surveillance_and_Outbreak_Investigation.R walks through a canonical foodborne-outbreak workflow on a simulated 200-attendee community potluck (line list phaa_outbreak.csv) plus 60 days of daily new-case counts in the surrounding town (phaa_outbreak_curve.csv). The same datasets are revisited later in HSCI 341 once measures of association and regression tools have been formally introduced.
# PART A -- read the line list and describe the outbreak
ll <- read.csv("phaa_outbreak.csv", stringsAsFactors = FALSE,
na.strings = c("", "NA"))
mean(ll$ill, na.rm = TRUE) # overall attack rate
# Epidemic curve: histogram of symptom-onset days
hist(ll$onset_day, breaks = seq(0.5, 12.5, 1),
col = "tomato", main = "Days since the potluck",
xlab = "Day of symptom onset")
# PART B -- 2x2 table for one suspect food (chicken)
tab <- table(Chicken = ll$chicken, Ill = ll$ill)
addmargins(tab)
a <- tab["1","1"]; b <- tab["1","0"]
c <- tab["0","1"]; d <- tab["0","0"]
rr <- (a/(a+b)) / (c/(c+d)) # relative risk
se <- sqrt(1/a - 1/(a+b) + 1/c - 1/(c+d))
lo <- exp(log(rr) - 1.96*se); hi <- exp(log(rr) + 1.96*se)
round(c(RR = rr, lo = lo, hi = hi), 2)
chisq.test(tab) # or fisher.test(tab) when small
# PART C -- surveillance time series with a 7-day moving average
curve <- read.csv("phaa_outbreak_curve.csv")
curve$ma7 <- stats::filter(curve$new_cases, rep(1/7, 7), sides = 2)
plot(curve$day, curve$new_cases, type = "h", col = "#0B7B6B",
xlab = "Day", ylab = "New cases", main = "Surveillance time series")
lines(curve$day, curve$ma7, col = "#CC0033", lwd = 2)
# PART D -- attack rates across all suspect foods, ranked
foods <- c("chicken", "potato_salad", "raw_milk_cheese", "green_salad", "punch")
sapply(foods, function(f) {
ex <- ll[[f]] == 1
ar_e <- mean(ll$ill[ex], na.rm = TRUE)
ar_u <- mean(ll$ill[!ex], na.rm = TRUE)
round(c(ar_exposed = ar_e, ar_unexposed = ar_u, rr = ar_e/ar_u), 2)
})
R Reflect on what you just ran
Use the questions below to interpret the actual numbers, table, and plot you produced. Answer in the matching numbered boxes in your R script.
1. What overall attack rate did mean(ll$ill, na.rm = TRUE) return, and how do you interpret that single number for the potluck?
2. From your ranked sapply() output, which food had the largest RR, and was its 95% CI clearly above 1? In one sentence, name the prime suspect.
3. Looking at the surveillance plot, on roughly which day does the 7-day moving average peak? What problem does the moving average solve that the raw daily bars do not?
Reflection
Your ranked attack-rate table from the R activity puts two foods at the top: chicken (RR = 4.1, attack rate 62% vs 15%) and potato salad (RR = 3.4, attack rate 55% vs 16%). Many guests ate both. What additional analytic step would you take to disentangle the two? (Think about stratifying on one food while looking at the other — a technique that HSCI 341 will formalize as confounding control in Lesson 12.) Briefly describe what evidence would convince you that one is the source rather than the other.
Minimum 20 characters required.
Question 1: In the 2017–18 romaine lettuce E. coli O157:H7 outbreak, what was the role of whole-genome sequencing?
Question 2: Why might investigators use a case-case design (comparing outbreak cases to sporadic background cases of the same organism) rather than a traditional case-control design during a fast-moving foodborne outbreak?
Question 3: An R-activity attack-rate table shows that 60% of attendees who ate chicken got ill, vs 15% of those who did not. The crude risk ratio is approximately:
Final Assessment
Bringing It All Together
Surveillance sits at position 2 in HSCI 341 because every later lesson depends on having data to work with. This lesson set up the institutional and methodological vocabulary — the four system types, the Canadian product layer, the CDC 10-step framework, FIORP, and a worked outbreak — that the rest of HSCI 341 will give analytic teeth to. The duty epidemiologist who answered the phone in the opening scenario is not doing something separate from research; she is doing the same kind of reasoning the rest of this course formalizes, but under time pressure and inside a multi-jurisdictional architecture of legislation, agencies, and dashboards.
The arc of the lesson moved from definitions and system types, to the actual Canadian products an epidemiologist reaches for, to the CDC 10-step framework and FIORP, and finally to a worked outbreak and a hands-on R activity. The final assessment below asks you to integrate across all four sections — recognizing system types from descriptions, choosing the right surveillance product for a question, walking an investigation through the framework, and reading attack rates from a line list.
What you take away from this lesson sets up the next stretch of HSCI 341. Lesson 3 (Sampling) tackles how to draw a defensible sample when surveillance can't enumerate everyone — the case-control side of the outbreak workflow you just previewed. Lesson 4 (Questionnaire Design) turns the hypothesis-generating interview into a measured instrument. Lessons 5 and 7 formalize the disease-frequency and association measures you computed by hand here, and Lessons 8–12 give you the validity, screening, and confounding vocabulary to defend those numbers.
Key Takeaways from Lesson 2
- Surveillance is defined by closing the loop: ongoing data collection only counts as surveillance if analysis and dissemination produce decisions, alerts, or programs.
- The four system types — passive, active, sentinel, syndromic — trade off coverage, cost, depth, and timeliness; almost every modern surveillance product is a hybrid.
- Canadian surveillance is multi-layered and provincially anchored: federal products like CNDSS, FluWatch, and CCDSS sit on top of provincial reporting (BCCDC, IRIS, Panorama) and on long-running data infrastructure (vital stats, DAD/NACRS, CCHS, wastewater).
- Data quality has five interrogable dimensions — timeliness, completeness, representativeness, sensitivity, and predictive value positive — and every dashboard implicitly resolves them.
- Outbreak investigation is the CDC 10-step framework applied under FIORP's federal/provincial division of labour; speed and accuracy compete, and equity is no longer optional in either.
- Outbreak methods preview the rest of HSCI 341 under time pressure: the cohort, case-control, 2×2, and confounding-control tools that Lessons 3–12 formalize are exactly what you compute in a line-list investigation.
Core Concepts Reviewed
Section 1: Langmuir's working definition of surveillance and the action loop (data → information → action); the five purposes (detect, characterize, monitor, evaluate, plan); the four system types — passive, active, sentinel, syndromic — with Canadian examples; the notifiable-disease reporting flow from clinic to MOH to province to PHAC to WHO.
Section 2: The federal product layer (CNDSS, FluWatch, CCDSS, CVSD, specialty systems); the BC provincial layer (BCCDC dashboards, IRIS, Panorama); long-running data infrastructure (vital statistics, cancer registries, DAD/NACRS, CCHS, wastewater); five dimensions of surveillance data quality (timeliness, completeness, representativeness, sensitivity, predictive value positive).
Section 3: Outbreak vs cluster vs epidemic vs pandemic; the CDC 10-step investigation framework; the Canadian FIORP and its three federal partners (PHAC, CFIA, Health Canada); the standing tension between speed and accuracy; the equity dimension of surveillance.
Section 4: The 2017–18 romaine lettuce E. coli O157:H7 outbreak as a worked example; the role of whole-genome sequencing in cluster detection; the case-case design as a pragmatic alternative; a working preview of the methods the rest of HSCI 341 will formalize — cohort and case-control designs, 2×2 tables, risk ratios — applied here under time pressure.
The final reflection below asks you to step out of method-mode and articulate what you would carry into the next investigation. There is no single right answer; the goal is to leave the lesson with an articulated stance, because the operational settings you encounter in the rest of HSCI 341 (and beyond) will keep pushing on it.
Final Reflection
You are the duty epidemiologist from the opening scenario. Two weeks have passed; the four bloody-diarrhoea cases turned out to be linked to a single school cafeteria. Looking back at how you used (or did not use) the four surveillance system types, the 10-step framework, and the FIORP machinery, what would you say is the single most important lesson you would carry into the next investigation? You can write about a methodological lesson, a communication lesson, or an equity lesson — whichever felt most generative.
Minimum 20 characters required.
Question 1: Which of the following is the strongest argument for treating surveillance as more than just “data collection”?
Question 2: A regional health authority sets up a system in which paramedics flag the chief complaint of every 911 call into a real-time dashboard, with anomaly-detection algorithms triggering review when call volume for a syndrome exceeds the baseline. This is best described as:
Question 3: Which Canadian surveillance product is the primary national source of weekly influenza-like-illness rates from a designated network of family-medicine clinicians?
Question 4: Public-health legislation in Canada is structured such that:
Question 5: The Canadian Chronic Disease Surveillance System (CCDSS) primarily uses what data source to estimate prevalence of chronic conditions?
Question 6: A surveillance dashboard reports a weekly count of laboratory-confirmed STEC infections in BC. The lag from symptom onset to appearance on the dashboard is roughly 14 days. Which dimension of surveillance data quality is most directly described by this lag?
Question 7: Which of the following best distinguishes an outbreak from a cluster?
Question 8: The shape of an epi curve where cases rise sharply, peak briefly, and fall away over a period roughly equal to one incubation period is most consistent with:
Question 9: Under the Canadian Foodborne Illness Outbreak Response Protocol, which agency conducts traceback investigations and product recalls?
Question 10: A working case definition during an outbreak typically includes a level of certainty (suspect, probable, confirmed). Why allow suspect cases at all?
Question 11: In a closed-population outbreak (e.g., a wedding), the appropriate analytic study design is typically:
Question 12: In the 2017–18 romaine lettuce E. coli O157:H7 outbreak, public-health authorities issued a consumption advisory before the source farm or facility was definitively identified. The most defensible justification for this is:
Question 13: What is the role of PulseNet Canada in modern foodborne outbreak detection?
Question 14: A surveillance system designed to capture self-reported exposures via a national household survey will tend to over-represent which type of bias when used as a denominator for foodborne outbreak comparisons?
Question 15: Reflecting on the lesson as a whole, which is the most accurate characterization of the relationship between routine surveillance and outbreak investigation?