Surveillance & Outbreak Investigation

Fundamental Epidemiological Concepts and Approaches

Learning objectives for this lesson:

Distinguish passive, active, sentinel, and syndromic surveillance and identify Canadian examples of each
Trace the flow of a notifiable disease report from the clinic to the Public Health Agency of Canada
Navigate the major federal and BC surveillance products (CNDSS, FluWatch, CCDSS, BCCDC dashboards, CVSD)
Identify the registries and vital-statistics infrastructure that underpin Canadian population health data
Apply the classic CDC 10-step outbreak investigation framework and the Canadian FIORP analogue
Read a real Canadian outbreak case study and identify which surveillance signals triggered the response
Explore PHAC and BCCDC dashboards firsthand and interpret what they show

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Surveillance Concepts

Public Health Surveillance Ongoing systematic collection, analysis, and dissemination of health-related data for public health action. Distinguished from research by its operational mandate and rapid feedback loop to action.

Passive Surveillance Surveillance system in which clinicians, labs, or facilities report cases on their own initiative according to legal or professional requirements. Cheap and broad but vulnerable to under-reporting.

Active Surveillance Surveillance staff regularly contact reporters or examine records to actively elicit case reports. More complete and timely than passive systems but resource-intensive; often used during outbreaks.

Sentinel Surveillance High-quality data collected from a selected subset of providers, sites, or populations chosen to represent broader trends. Trades coverage for timeliness and depth.

Syndromic Surveillance Use of pre-diagnostic data (chief complaints, ED visits, school absences, pharmacy sales) to detect outbreaks earlier than confirmed-case reporting allows.

Notifiable (Reportable) Disease A condition that, by law, providers and laboratories must report to public-health authorities (e.g., measles, TB, COVID-19). Lists are maintained provincially in Canada.

Case Definition Standardised criteria (clinical, epidemiologic, laboratory) used to classify suspect, probable, and confirmed cases. Tightening or loosening the definition trades sensitivity against specificity.

Cluster, Outbreak, Epidemic, Pandemic Cluster: aggregation of cases in a place or time. Outbreak: cases above expected (often local). Epidemic: widespread excess (often used interchangeably with outbreak). Pandemic: epidemic across multiple countries or continents.

Index Case (Primary Case) The first case identified in an outbreak investigation (index) or the first case in a transmission chain (primary). They are not always the same person.

Super-Spreader / Super-Spreading Event An individual or event that generates substantially more secondary infections than the population average, typically driven by host, agent, environmental, and behavioural factors converging.

R₀ (Basic Reproduction Number) The expected number of secondary cases produced by a typical infected individual in a fully susceptible population. R₀ > 1 implies epidemic potential (Wikipedia overview).

R_t (Effective Reproduction Number) The average number of secondary cases per infected individual at a specific time, accounting for current immunity and interventions. R_t < 1 indicates a shrinking epidemic.

Attack Rate The proportion of an at-risk population that develops disease during an outbreak (cumulative incidence). Secondary attack rate refers to spread among contacts of a primary case.

Incubation Period Time from exposure to onset of symptoms. Knowing the typical incubation distribution allows back-calculation from onset to likely exposure window.

Generation Time / Serial Interval Generation time: interval between infection of a primary case and infection of a secondary case. Serial interval: interval between symptom onsets in successive cases. Used to estimate R_t.

Herd (Population) Immunity Indirect protection against infection that occurs when a sufficient proportion of the population is immune, reducing the chance that susceptible people contact infectious ones.

Outbreak Investigation Methods

Ten Steps of Outbreak Investigation (CDC) A standardised sequence: prepare, confirm outbreak, verify diagnosis, define and find cases, descriptive epi, hypothesise, test hypotheses, refine hypotheses, implement control, communicate. Steps overlap in practice.

Descriptive Epidemiology (Person, Place, Time) Characterisation of cases by who is affected, where they live or were exposed, and when illness began. The first analytic step in any outbreak investigation.

Epidemic (Epi) Curve A histogram of case onsets over time. Shape suggests transmission pattern: point-source (sharp peak), continuous-source (plateau), propagated/person-to-person (successive peaks at intervals of one incubation period).

Line List A structured table with one row per case capturing demographic, clinical, exposure, and outcome variables. The operational data backbone of outbreak investigation.

Contact Tracing Identification, notification, and follow-up of people exposed to a confirmed case to enable testing, isolation, prophylaxis, or vaccination. Foundational for STI, TB, Ebola, COVID-19 control.

Hypothesis-Generating Interview An open-ended structured interview with a small number of cases (typically 5–10) to elicit possible exposures, such as foods, places, and activities, before a focused case-control or cohort analysis is launched.

Retrospective Cohort (Outbreak) When the at-risk population is well defined (a wedding, a cruise, a daycare), exposed and unexposed cohorts can be reconstructed and attack rates compared directly, the design of choice in defined-population outbreaks.

Case-Control (Outbreak) When the at-risk population is unbounded (community), confirmed cases are compared with appropriate controls on possible exposures to identify the source.

Environmental Sampling & Molecular Subtyping Laboratory analysis of suspect food, water, surfaces, or air, often combined with whole-genome sequencing of patient isolates, to confirm the outbreak source and link cases.

PHAC & BCCDC Public Health Agency of Canada (national surveillance and outbreak coordination) and the British Columbia Centre for Disease Control (provincial counterpart). Both publish public dashboards used in this lesson.

Key People

Alexander Langmuir (1910–1993) Founder of the CDC Epidemic Intelligence Service (EIS) in 1951; established modern shoe-leather outbreak investigation as a core public-health discipline (Langmuir biography).

William Farr (1807–1883) First compiler of statistical abstracts at the General Register Office in London; established systematic mortality surveillance and the foundation of routine vital statistics (Farr biography).

John Snow (1813–1858) London physician whose 1854 investigation of the Broad Street cholera outbreak combined a spot map, water-supply comparison, and shoe-leather case interviews, an enduring template for outbreak epidemiology (Snow biography).

No matching entries. Try a different search term.

Section 1 of 5

Foundations: What Surveillance Is and the Four System Types

⏱ Estimated reading time: 25 minutes

Lesson 2 · HSCI 341

The Phone Call That Drives the Course

A duty epidemiologist gets a call about four children with bloody diarrhoea from one school.

Section 1 of 5

Foundations: What Surveillance Is and the Four System Types

The definition, the action loop, and the four system types with Canadian examples.

Langmuir 1963

What surveillance is, and is not

The ongoing, systematic collection, analysis, interpretation, and dissemination of health-related data for public-health action.Langmuir, 1963; refined by Thacker & Berkelman, 1988

The key distinction: surveillance that does not feed back into decisions, alerts, or programs is bookkeeping, not surveillance.

The loop

The surveillance action loop

Four types

Passive, active, sentinel, syndromic

Passive

Clinicians and labs report on their own initiative. Cheap and broad; vulnerable to non-random under-reporting.

Active

Public-health staff elicit reports. Higher quality; resource-intensive and narrow.

Sentinel

A designated network reports systematically. Trades coverage for depth and data quality.

Syndromic

Pre-diagnostic signals flag clusters early. High sensitivity; low specificity.

Canadian examples

Matching types to real systems

Passive

CNDSS: 50 notifiable conditions, weekly provincial aggregates to PHAC.

Active

FluWatch active: 100-clinician search component; COVID-19 contact tracing.

Sentinel

FluWatch sentinel: ~150 GPs reporting weekly; Canadian Paediatric Surveillance Program.

Syndromic

BC ED monitoring; PHAC wastewater dashboards (SARS-CoV-2, influenza, mpox).

Carry forward

What to take into the next section

Surveillance closes the loop: collection without feedback to action is bookkeeping.
The four types trade coverage, cost, depth, and timeliness; modern products are hybrids.
Passive under-reporting is non-random: severe and urban cases are over-represented.

Introduction and Overview

You are the duty epidemiologist at a regional health authority on a Tuesday afternoon. Three minutes ago, your phone buzzed: a paediatrician at a community clinic just called to say she has seen four children from the same school presenting with bloody diarrhoea over two days. She is requesting stool cultures and wants to know whether you have seen anything similar elsewhere. By the time you put the phone down, you will need to know, quickly, whether four cases is unusual for this organism in this catchment, whether a notifiable disease report has already been filed, who else needs to be looped in, and what data sources you can pull within the next hour. That sequence of questions is what this lesson is about. The infrastructure that lets you answer them is called public-health surveillance, and the structured response that follows is called outbreak investigation.

Learning Objectives

State Langmuir's working definition of surveillance and explain what it means to "close the loop."
List the five core purposes of surveillance and the four conventional system types (passive, active, sentinel, syndromic).
Distinguish strengths, weaknesses, and Canadian examples of each system type.
Trace the notifiable-disease reporting flow from clinician to MOH to province to PHAC, and explain why under-reporting in passive systems is non-random.

2.1 What Public-Health Surveillance Is, and What It Is For

The classic working definition, attributed to Langmuir (1963) and refined by Thacker & Berkelman (1988), is the ongoing, systematic collection, analysis, interpretation, and dissemination of health-related data for the planning, implementation, and evaluation of public-health practice. Each of those verbs is doing real work. Ongoing distinguishes surveillance from a one-time study. Systematic rules out anecdote. Analysis and interpretation rule out a passive data warehouse. Dissemination for action rules out research that is never returned to the people who can act on it. A famous shorthand attributed to Langmuir (EIS founder) is that surveillance only counts if it “closes the loop”: information must come back out as decisions, alerts, or programs, otherwise the system is just bookkeeping.

The five purposes of surveillance

When you read a published surveillance report, the authors are usually doing one or more of these five things: (1) detect outbreaks, clusters, and unusual events early; (2) characterize who is getting sick, where, and why (descriptive epi by person, place, and time); (3) monitor trends in incidence, prevalence, and risk factors over time; (4) evaluate the effect of interventions and programs; and (5) plan resource allocation and policy. A single dataset can serve more than one purpose, but the design tradeoffs are different for each.

2.2 The Surveillance Action Loop

It helps to picture surveillance as a closed loop with four moving parts: data sources (clinical reports, lab results, vital records, administrative data), data systems (the registries, dashboards, and notifiable-disease platforms that ingest and store data), analysis and interpretation (the epidemiologists who turn case counts into rates, anomalies, and narratives), and action (the case finding, control measures, public communications, and policy decisions that flow from the analysis). Each handoff between these stages is also a place where the system can leak: a clinician who never reports a case, a province that does not share data with the federal level, an analyst who does not see a signal in time, or a recommendation that never reaches a decision-maker.

For the rest of this section we focus on the data-system layer because it determines what kinds of signals you can detect at all. The four conventional system types differ on a single axis: who is doing the work of finding cases.

2.3 The Four Surveillance System Types

Most public-health surveillance systems can be sorted into one of four types; the syndromic category is the newest, codified by the CDC framework of Buehler and colleagues (2004). They are not mutually exclusive, since a single disease can be tracked by several at once, but they have different strengths, costs, and biases.

Type	Who initiates the report	Strengths	Limitations	Canadian example
Passive	Clinicians and labs report when they encounter a notifiable disease.	Cheap, broad coverage, mandated by law, runs continuously.	Under-reporting (often substantial), variable timeliness, completeness depends on clinician burden.	The Canadian Notifiable Disease Surveillance System (CNDSS), aggregated case counts for ~50 reportable conditions submitted by provinces to PHAC.
Active	Public-health staff actively contact providers, labs, or households to find cases.	Higher case ascertainment, better data quality, useful in outbreak investigations.	Resource-intensive, narrow scope, hard to sustain over time.	The 100-clinician active-search component of FluWatch, and contact tracing during COVID-19 case investigations.
Sentinel	A small, designated network of providers reports systematically, trading breadth for depth.	High data quality, manageable cost, can collect richer data than passive systems.	Not population-representative; trends but not absolute counts.	FluWatch sentinel practitioners (general practitioners reporting weekly influenza-like-illness rates) and the Canadian Paediatric Surveillance Program (CPSP).
Syndromic	Real-time signals (chief-complaint codes, EMS calls, OTC drug sales, school absences) flag clusters before lab confirmation.	Fast, and can detect events before a definitive diagnosis. Useful for emerging or rare events.	Low specificity (lots of false alarms); validation is hard; needs analytic infrastructure.	BC's Acute and Communicable Disease Prevention ED chief-complaint monitoring; PHAC's pandemic-era wastewater surveillance dashboards.

A fifth category, laboratory-based surveillance, is sometimes broken out separately because it sits beside (rather than under) the clinician's desk: provincial public-health labs aggregate isolates from clinical labs, perform serotyping or whole-genome sequencing, and feed the results into both passive and active systems. PulseNet Canada is the best-known example, the network that enabled the 2008 Maple Leaf listeriosis outbreak to be linked across provinces by genome (Gilmour et al., 2010).

Why “passive” under-reporting is not random

A common student misconception is that under-reporting in passive surveillance just makes case counts smaller. It does, but it also biases who is counted. Cases that present to the health system, get tested, and produce a positive lab result are systematically over-represented. The result is that severe cases, urban cases, and cases in well-insured populations are more visible in the data than mild, rural, or uninsured ones. When you read a CNDSS rate, the denominator is the catchment population, but the numerator is selected.

2.4 The Notifiable Disease Reporting Flow in Canada

For passive surveillance to function, every link in the chain has to do its part. The Canadian flow looks like this:

Clinician or laboratory identifies a case of a notifiable disease (e.g., a positive shiga-toxin-producing E. coli culture or a clinical diagnosis of measles). Each province publishes its own list of conditions reportable under public-health legislation.
Local Medical Officer of Health (MOH) or regional health authority receives the report (typically within 24 hours for urgent conditions, longer windows for routine ones). The MOH may immediately initiate case investigation, contact tracing, or public-health control measures.
Provincial / territorial public-health authority (e.g., BCCDC, Public Health Ontario, the Quebec INSPQ) aggregates reports from the regional level, performs initial analysis, and shares anonymized aggregate data with the federal level.
Public Health Agency of Canada (PHAC) publishes nationally aggregated counts in CNDSS and feeds disease-specific products like FluWatch and the CCDSS. PHAC also reports onward under the WHO International Health Regulations (2005) for events of international concern; the post-SARS rationale for that regime is laid out by Heymann & Rodier (2004).

Two features of this flow deserve emphasis. First, public-health legislation is provincial in Canada, so the list of notifiable conditions and reporting timelines differ across provinces. A condition might be urgent-reportable in one province and routine in another. Second, the federal level receives aggregated, de-identified data only; PHAC cannot pull individual records. This federal-provincial division is one reason that the COVID-19 pandemic exposed gaps in real-time data sharing, because the legal architecture was not designed for the latency that a respiratory pandemic demands.

Reflection

You are designing a surveillance system to track post-secondary student mental-health outcomes in BC. Which of the four system types (or which combination) would you choose, and why? What kinds of cases would your design miss, and what would you do about that gap?

Model answerA defensible design uses a mixed surveillance system: (a) a passive system reporting from campus counselling centres (mandatory case reports of presenting concerns) for population denominator and trend, (b) an active sentinel system in a stratified sample of post-secondary institutions where standardised screening (PHQ-9, GAD-7) is administered to every student visit, and (c) periodic syndromic surveillance via EHR data linkage and anonymised counselling-line usage. Cases missed by passive: students who never present (the majority), international students who use community providers, and underserved campuses without counselling. Plug the gap with (i) a probability-sample survey component (annual ABLE survey or CCHS-style oversample on post-secondary students), and (ii) outreach to community providers and Indigenous campus services with mandatory de-identified reporting. Privacy and stigma considerations should drive opt-in / opt-out architecture and Indigenous data sovereignty for relevant subsamples.

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

Case Study and Hands-on Outbreak Investigation

⏱ Estimated reading time: 30 minutes · R activity: 30–45 minutes

Section 4 of 5

Case Study and Hands-on Outbreak Investigation

The 2017–18 romaine lettuce E. coli O157:H7 outbreak, and a line-list analysis in R.

The outbreak

2017–18 E. coli O157:H7, romaine lettuce

Scale

42 confirmed cases · 5 eastern provinces.
25 US cases · 15 states.
CFR: about 2% (1 death among 42 cases).

Three teaching points

1. Genomic surveillance made the cluster visible.
2. Case-case design traded validity for speed.
3. Precautionary advisory issued without a confirmed source.

Steps 2–3: The genomic signal

How PulseNet Canada made the cluster visible

WGS, whole-genome sequencing, revealed a shared phylogenetic neighbourhood. Same cases, newly visible as a cluster.

Steps 6–10

Descriptive epi, case-case analysis, precautionary advisory

Epi curve shape: several waves, consistent with a continuous common-source outbreak from an ongoing supply-chain contamination.

Case-case analysis: outbreak isolates vs. historical sporadic STEC cases pointed to romaine lettuce.

Advisory: issued Dec 2017 before a source was confirmed. Outbreak closed Jan 2018.

Outcome

No specific grower or facility was definitively identified. Precautionary action was still the right call.

R activity preview

What the line-list analysis asks you to do

Attack Rate (closed cohort)

\[ \color{#0B7B6B}{\text{AR}} = \frac{\color{#C2410C}{\text{cases among exposed}}}{\color{#6D28D9}{\text{total exposed}}} \]

AR attack ratecases among exposed ill people who ate the foodtotal exposed everyone who ate the food

Risk Ratio

\[ \color{#0B7B6B}{RR} = \frac{\color{#C2410C}{AR_{\text{exposed}}}}{\color{#6D28D9}{AR_{\text{unexposed}}}} \]

RR risk ratioAR_exposed attack rate among those who ate the foodAR_unexposed attack rate among those who did not

If two foods both show high RR and many guests ate both, stratify: compute the RR for Food A within strata of Food B. The food whose RR holds across both strata is the source.

Carry forward

What this section connected

WGS made the cluster visible; without PulseNet Canada, it would have been background noise.
Case-case design trades validity for speed when community controls are hard to recruit quickly.
Precautionary action is defensible when waiting for certainty means more cases.
The R activity is a preview: later lessons give these methods their formal foundations.

Introduction and Overview

The two preceding sections gave you the institutional and methodological vocabulary. This section makes them concrete: a real Canadian outbreak walked through against the 10-step framework, and a hands-on R activity in which you investigate a simulated outbreak yourself. You will leave this section having previewed the methods that the rest of this course formalizes: the attack rate (the proportion of an at-risk group who fall ill), the 2×2 table, and the risk and rate ratios, all computed on data you have generated and interrogated yourself.

Learning Objectives

Walk the 2017–18 romaine lettuce E. coli O157:H7 outbreak against the CDC 10-step framework and identify the role of whole-genome sequencing.
Explain when a case-case design is preferred to a traditional case-control design in a fast-moving foodborne investigation.
Compute attack rates and risk ratios across multiple suspect foods from a line list, and interpret an epi curve.
Identify the validity threats (recall bias, selection bias, confounding by other shared foods) that limit early outbreak inference and articulate what a defensible control action looks like under uncertainty.

2.13 Case Study: The 2017–18 Romaine Lettuce E. coli O157:H7 Outbreak

Between November 2017 and February 2018, PHAC and US CDC investigated a multi-jurisdictional outbreak of E. coli O157:H7 infections that ultimately involved 42 confirmed cases across 5 Canadian provinces (and 25 cases across 15 US states). All five affected Canadian provinces were in eastern Canada (Ontario, Quebec, New Brunswick, Nova Scotia, and Newfoundland and Labrador); western provinces were not involved. One death was reported among the 42 Canadian cases, a case fatality rate of about 2% (roughly 2 in every 100 diagnosed cases died). The outbreak is a useful teaching case because it shows the full FIORP machinery in motion and makes the role of whole-genome sequencing visible.

Step 2–3: Establishing the outbreak and verifying the diagnosis

PulseNet Canada flagged a cluster of E. coli O157:H7 isolates with matching pulsed-field gel electrophoresis (PFGE) patterns, later confirmed by whole-genome sequencing (WGS) to share a tight phylogenetic neighbourhood. The genomic signal is what made this an investigable outbreak rather than a scattered set of unrelated cases; without WGS, the same cases would have been distributed across the routine STEC surveillance baseline.

Steps 4–6: Case definition and descriptive epidemiology

Confirmed cases were Canadian residents with WGS-matched E. coli O157:H7 isolates and symptom onset between mid-November 2017 and the closing date. Probable cases were household contacts of a confirmed case with compatible symptoms. The descriptive analysis (epi curve, geographic distribution, age and sex distribution) showed a polymorphic temporal pattern with several waves, typical of a continuous common-source outbreak driven by an ongoing contaminated supply chain, not a single point exposure.

Steps 7–8: Hypothesis development and the case-case analysis

Standard hypothesis-generating interviews were conducted with confirmed cases using PHAC's Hypothesis Generating Questionnaire (HGQ) for STEC, which lists hundreds of food and exposure variables. Cases reported leafy greens consumption substantially more often than expected based on national consumption patterns. A case-case analysis comparing outbreak cases to historical sporadic STEC cases pointed strongly to romaine lettuce. CFIA conducted parallel traceback investigations from grocery purchases reported by cases.

Steps 9–10: Refinement, control, and communication

Joint PHAC–CFIA–US-CDC coordination produced a public advisory in late December 2017 urging Canadians in the affected provinces to avoid romaine lettuce. The Canadian outbreak was declared over in mid-January 2018. Importantly, despite intensive traceback, no specific grower or facility could be definitively implicated, a sobering reminder that even good investigations sometimes end without the closure of a single confirmed source.

Three teaching points are worth pulling out of this case. First, the outbreak would not have been detected at all without genomic surveillance; the case counts in any one province in any one week looked like background noise. Second, the case-case study design is a pragmatic alternative to a traditional case-control study when controls are hard to recruit during a fast-moving foodborne investigation; you trade some validity for considerable speed. Third, control measures (the public advisory) were issued before the source was definitively confirmed; precaution is a defensible public-health stance when downside asymmetry favours acting early.

2.14 Why Surveillance Comes Early, and What You Will Build On It

Surveillance comes early in this course on purpose: every later lesson depends on having data to design against, sample from, measure with, and reason about. The outbreak workflow you will run below previews methods you have not yet formally met: sampling (a later lesson), questionnaire design for hypothesis-generating interviews (a later lesson), measures of disease frequency such as attack rates and incidence (a later lesson), measures of association including risk ratios and odds ratios (a later lesson), and the validity and confounding vocabulary that arrives in later lessons. Modern surveillance is also being reshaped by big-data and machine-learning approaches reviewed by Mooney & Pejaver (2018). Treat the 2×2 tables and risk ratios you compute in the R activity as a working preview; the formal derivations and validity diagnostics come in the lessons that follow. The point of doing it now is to ground the rest of the course in a concrete operational setting that needs all of it.

R Activity: Investigating a foodborne outbreak from a line list

The companion R script r-activities/HSCI_341_Lesson_2_Surveillance_and_Outbreak_Investigation.R walks through a canonical foodborne-outbreak workflow on a simulated 200-attendee community potluck (line list phaa_outbreak.csv) plus 60 days of daily new-case counts in the surrounding town (phaa_outbreak_curve.csv). The same datasets are revisited later in this course once measures of association and regression tools have been formally introduced.

# PART A -- read the line list and describe the outbreak
ll <- read.csv("phaa_outbreak.csv", stringsAsFactors = FALSE,
               na.strings = c("", "NA"))
mean(ll$ill, na.rm = TRUE)                  # overall attack rate

# Epidemic curve: histogram of symptom-onset days
hist(ll$onset_day, breaks = seq(0.5, 12.5, 1),
     col = "tomato", main = "Days since the potluck",
     xlab = "Day of symptom onset")

# PART B -- 2x2 table for one suspect food (chicken)
tab <- table(Chicken = ll$chicken, Ill = ll$ill)
addmargins(tab)

a <- tab["1","1"];  b <- tab["1","0"]
c <- tab["0","1"];  d <- tab["0","0"]

rr <- (a/(a+b)) / (c/(c+d))                  # relative risk
se <- sqrt(1/a - 1/(a+b) + 1/c - 1/(c+d))
lo <- exp(log(rr) - 1.96*se);  hi <- exp(log(rr) + 1.96*se)
round(c(RR = rr, lo = lo, hi = hi), 2)

chisq.test(tab)                              # or fisher.test(tab) when small

# PART C -- surveillance time series with a 7-day moving average
curve <- read.csv("phaa_outbreak_curve.csv")
curve$ma7 <- stats::filter(curve$new_cases, rep(1/7, 7), sides = 2)
plot(curve$day, curve$new_cases, type = "h", col = "#0B7B6B",
     xlab = "Day", ylab = "New cases", main = "Surveillance time series")
lines(curve$day, curve$ma7, col = "#CC0033", lwd = 2)

# PART D -- attack rates across all suspect foods, ranked
foods <- c("chicken", "potato_salad", "raw_milk_cheese", "green_salad", "punch")
sapply(foods, function(f) {
  ex <- ll[[f]] == 1
  ar_e <- mean(ll$ill[ex], na.rm = TRUE)
  ar_u <- mean(ll$ill[!ex], na.rm = TRUE)
  round(c(ar_exposed = ar_e, ar_unexposed = ar_u, rr = ar_e/ar_u), 2)
})

What you should be able to do after this activity: compute attack rates and risk ratios across multiple suspect foods, draw and interpret an epi curve, identify the food with the strongest association, and articulate the limitations (small numbers, confounding by other shared foods, recall bias) before recommending a control action.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers, table, and plot you produced. Answer in the matching numbered boxes in your R script.

1. What overall attack rate did mean(ll$ill, na.rm = TRUE) return, and how do you interpret that single number for the potluck?

Model answerThe overall attack rate sits around 0.40 (40%): 40 of every 100 attendees became ill at the potluck. As a single number it tells you the outbreak is substantial (much larger than background gastroenteritis rates of a few per cent per week) and clusters in time, suggesting a common-source point exposure. Attack rate is the right epi metric for an outbreak because the cohort is closed (everyone there is at risk), the time window is short, and the outcome is detected by active follow-up.

2. From your ranked sapply() output, which food had the largest RR, and was its 95% CI clearly above 1? In one sentence, name the prime suspect.

Model answerThe food with the largest RR is typically the chicken or potato salad in this simulation; assume chicken with RR ≈ 4.0–4.5 and a 95% CI that clearly excludes 1 (e.g., 2.1–7.5). The prime suspect is that food: a fourfold higher attack rate among consumers vs. non-consumers, with a CI that does not span the null. The univariable RR is the first-pass diagnostic; multi-food adjustment (next reflection) refines it.

3. Looking at the surveillance plot, on roughly which day does the 7-day moving average peak? What problem does the moving average solve that the raw daily bars do not?

Model answerThe 7-day moving average peaks around day 8–10 of the surveillance window, smoothing the day-to-day jitter that the raw bars show. Daily counts are noisy because of reporting lags, weekend reporting drop-offs, and small denominators; the moving average filters that high-frequency variation while preserving the underlying trend, making the rising and falling phases of the outbreak visible. It is the standard tool for distinguishing signal from reporting noise in surveillance time series.

Saved.

Reflection

Your ranked attack-rate table from the R activity puts two foods at the top: chicken (RR = 4.1, attack rate 62% vs 15%) and potato salad (RR = 3.4, attack rate 55% vs 16%). Many guests ate both. What additional analytic step would you take to disentangle the two? (Think about stratifying on one food while looking at the other, a technique that this course will formalize as confounding control in a later lesson.) Briefly describe what evidence would convince you that one is the source rather than the other.

Model answerThe right next step is cross-tabulated (stratified) analysis: compute the RR for chicken among potato-salad eaters, the RR for chicken among non-eaters of potato salad, and the symmetric two strata for potato salad. If one food's RR remains elevated within both strata of the other while the other food's RR drops to near 1 when stratified, the persistent food is the source and the other was riding along (confounded by co-consumption). If both ratios remain elevated, joint contamination is plausible. Algebraically this is the Mantel-Haenszel logic that this course develops in a later lesson. For a quick sanity check, look at attack rates within the four joint cells (chicken+salad, chicken only, salad only, neither): the joint cell should not have the highest rate by much if the true source is one food.

Minimum 20 characters required.

✓ Reflection saved

HSCI 341 · Lesson 2

Fundamental Epidemiological Concepts and Approaches

Surveillance & Outbreak Investigation

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Foundations: What Surveillance Is and the Four System Types

The Phone Call That Drives the Course

Foundations: What Surveillance Is and the Four System Types

What surveillance is, and is not

The surveillance action loop

Passive, active, sentinel, syndromic

Passive

Active

Sentinel

Syndromic

Matching types to real systems

Passive

Active

Sentinel

Syndromic

What to take into the next section

Introduction and Overview

Learning Objectives

2.1 What Public-Health Surveillance Is, and What It Is For

The five purposes of surveillance

2.2 The Surveillance Action Loop

2.3 The Four Surveillance System Types

Why “passive” under-reporting is not random

2.4 The Notifiable Disease Reporting Flow in Canada

Reflection

Canadian Surveillance Products and Data Infrastructure

Canadian Surveillance Products and Data Infrastructure

Four PHAC headline products

CNDSS

FluWatch

CCDSS

CVSD

BCCDC dashboards worth knowing

The long-running layer everything reads from

Vital statistics

Cancer registries

Health-admin data

CCHS + wastewater

Five dimensions to interrogate every source

Speed vs completeness

Sensitivity vs specificity

What to take into the next section

Introduction and Overview

Learning Objectives

2.5 The Federal Layer

2.6 The Provincial Layer (with BC examples)

A short tour of BCCDC dashboards worth bookmarking

2.7 The Long-Running Data Infrastructure

2.8 Data Quality: Five Dimensions to Question Every Source

Reflection

Defining and Investigating Outbreaks

Defining and Investigating Outbreaks

Cluster, outbreak, epidemic, pandemic

Cluster

Outbreak

Epidemic

Pandemic

Steps 1 through 6

Reading the curve, then testing hypotheses

Three federal partners, one protocol

PHAC

CFIA

Health Canada

What to take into the next section

Introduction and Overview

Learning Objectives

2.9 What Counts as an Outbreak?

Three working terms you need to use precisely

2.10 The CDC 10-Step Outbreak Investigation Framework

2.11 The Canadian FIORP and Its Multi-Jurisdictional Structure

2.12 Real-Time vs Retrospective: A Standing Tension

Equity in surveillance and outbreak response

Reflection

Case Study and Hands-on Outbreak Investigation

Case Study and Hands-on Outbreak Investigation