# Lesson 2 — Surveillance & Outbreak Investigation (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5,500 words • ~30 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 2, Surveillance and Outbreak Investigation. This lesson comes early in the course on purpose, because it gives you a working picture of what an epidemiologist actually does day to day before we dive deeper into the design and measurement machinery.

**Sarah:** It's the operational anchor for the course. The rest of this material will fill in study design, measures of disease frequency, screening, measures of association, validity, and confounding. But by starting here, we get to see why those tools matter, because they are what surveillance and outbreak investigation actually run on.

**Kiffer:** Right. Most of the course is going to be about how to design a study or appraise a paper. Today we're talking about how a working epidemiologist spends a lot of their day. Two main activities. Public health surveillance, the ongoing monitoring of disease in populations. And outbreak investigation, the structured response when surveillance flags something unusual.

**Sarah:** And these are connected. Surveillance is what tells you something might be wrong. Outbreak investigation is what you do once it does.

**Kiffer:** Exactly. Section 1 of the lesson is surveillance. Section 2 is outbreak investigation. We'll do both, in that order, with concrete Canadian and American examples and a worked outbreak case study at the end.

**Sarah:** Before we dive in, let me set the scene with the opening vignette from the lesson, because I think it captures why this material matters. Imagine you're the duty epidemiologist at a regional health authority on a Tuesday afternoon. Your phone buzzes. A pediatrician at a community clinic just called. She has seen four children from the same school presenting with bloody diarrhea over two days. She wants to know if you've seen anything similar elsewhere.

**Kiffer:** And what's striking about that scenario is the questions you have to answer in the next hour. Is four cases unusual for this organism in this catchment? Has a notifiable disease report already been filed? Who else needs to be looped in? What data sources can you pull in real time?

**Sarah:** And the infrastructure that lets you answer those questions, in any reasonable amount of time, is what we call public health surveillance. The structured response that follows is outbreak investigation. So those two activities are the spine of the lesson.

**Kiffer:** Okay. Let's start with surveillance. Section 1.

**Sarah:** The classic working definition, attributed originally to Alexander Langmuir at the United States Centers for Disease Control in the nineteen-sixties, and refined later by the World Health Organization, goes like this. Surveillance is the ongoing, systematic collection, analysis, interpretation, and dissemination of health-related data, for the planning, implementation, and evaluation of public health practice.

**Kiffer:** And a quick note on Langmuir. He was the founder of the Epidemic Intelligence Service at the United States Centers for Disease Control, and he is widely credited with formalizing modern surveillance as a public health activity. The Epidemic Intelligence Service is essentially a training program for outbreak investigators, the disease detectives. So when we say Langmuir, we mean the person who basically invented this field as we know it.

**Sarah:** And the definition is doing real work, every word of it. Ongoing distinguishes surveillance from a one-time study. Systematic rules out anecdote. Analysis and interpretation rules out a passive data warehouse. And dissemination for action rules out research that just sits on a server somewhere.

**Kiffer:** Langmuir had a famous shorthand. Surveillance only counts if it closes the loop. Information has to come back out as decisions, alerts, programs. Otherwise the system is just bookkeeping.

**Sarah:** I love that phrase. Closes the loop. Because you can imagine a system that collects beautiful data and never produces an action. And by Langmuir's standard, that's not surveillance, that's just data collection.

**Kiffer:** Right. Surveillance is defined by the action it enables, not by the data it stores.

**Sarah:** Surveillance has roughly five purposes. One, detect outbreaks, clusters, and unusual events early. Two, characterize who is getting sick, where, and why, the descriptive epidemiology by person, place, and time. Three, monitor trends in incidence, prevalence, and risk factors over time. Four, evaluate the effect of interventions and programs. And five, plan resource allocation and policy.

**Kiffer:** And the same dataset can serve more than one purpose, but the design tradeoffs are different. A system that's optimized for early detection looks different from a system optimized for long-run trend monitoring.

**Sarah:** Now the lesson breaks surveillance systems into four classic types. Passive, active, sentinel, and syndromic. They are not mutually exclusive. A single disease can be tracked by several at once. But they differ on a single axis. Who is doing the work of finding cases.

**Kiffer:** Let's go through them. Passive first.

**Sarah:** Passive surveillance is the default for most reportable diseases. Clinicians and laboratories report cases when they encounter a notifiable condition, through routine channels. It's cheap, it's broad, it's mandated by public health legislation, and it runs continuously.

**Kiffer:** And the limitation is under-reporting. Often substantial. Completeness depends on clinician burden. A busy emergency room physician at the end of a shift may not file the form. The Canadian example is the Canadian Notifiable Disease Surveillance System. We'll just say its initials, CNDSS, after this. CNDSS aggregates case counts for roughly fifty reportable conditions submitted by provinces to the Public Health Agency of Canada, which we'll call PHAC.

**Sarah:** Active surveillance is the opposite. Public health staff actively contact providers, labs, or households to find cases. So during a Covid case investigation, a contact tracer phoning every named contact and asking about symptoms, that's active surveillance.

**Kiffer:** Higher case ascertainment, better data quality, very useful in outbreak investigations. But resource-intensive, narrow scope, and hard to sustain over time. The Canadian example is the active-search component of FluWatch. About a hundred clinicians who get directly contacted to confirm influenza-like-illness data each week.

**Sarah:** Sentinel surveillance is a middle path. A small designated network of providers reports systematically. So you trade breadth for depth. You're not trying to count every case. You're trying to follow trends from a representative sample with very high data quality.

**Kiffer:** The classic Canadian example is the FluWatch sentinel practitioners. About a hundred and fifty family-medicine clinicians across the country report weekly influenza-like-illness rates. Another example is the Canadian Paediatric Surveillance Program, where pediatricians report rare childhood conditions through a national network.

**Sarah:** And syndromic surveillance is the fourth type. Real-time signals based on chief complaint codes, emergency-medical-service calls, over-the-counter drug sales, school absences, that sort of thing. Things that flag clusters before lab confirmation.

**Kiffer:** So instead of waiting for a confirmed measles diagnosis, you might watch for the syndrome of fever plus rash showing up in emergency department triage. The advantage is speed. You can detect events before a definitive diagnosis. The disadvantage is low specificity. Lots of false alarms. Validation is hard.

**Sarah:** An example would be British Columbia's Acute and Communicable Disease Prevention emergency-department chief-complaint monitoring. Or PHAC's pandemic-era wastewater surveillance dashboards, which sample sewage for SARS-CoV-2 and other pathogens and flag rises before clinical cases climb.

**Kiffer:** And there's a fifth thing the lesson mentions, laboratory-based surveillance, which sits beside rather than under the clinician's desk. Provincial public-health labs aggregate isolates from clinical labs, do serotyping or whole-genome sequencing, and feed results into the other systems. PulseNet Canada is the network for foodborne-pathogen sequencing. We'll come back to it in the outbreak case study.

**Sarah:** Now there's a really important point about passive surveillance under-reporting that students sometimes miss. Under-reporting in passive surveillance does not just make case counts smaller. It biases who is counted.

**Kiffer:** Right. Cases that present to the health system, get tested, and produce a positive lab result are systematically over-represented. The result is that severe cases, urban cases, and cases in well-insured populations are more visible in the data than mild cases, rural cases, or cases in marginalized populations.

**Sarah:** When you read a CNDSS rate, the denominator is the catchment population. But the numerator is selected. And that selection is not random. It tracks access to care.

**Kiffer:** So a passive surveillance count is a partial picture. Always. And the unevenness of the partial picture matters for equity.

**Sarah:** Let's walk through the Canadian notifiable-disease reporting flow, because it's worth knowing concretely. A clinician or laboratory identifies a case of a notifiable disease, like a positive shiga-toxin-producing E. coli culture, or a clinical diagnosis of measles. They report to the local Medical Officer of Health, often within twenty-four hours for urgent conditions.

**Kiffer:** The Medical Officer of Health is a physician with a public health mandate at the regional level. They receive the report, may immediately initiate case investigation, contact tracing, or control measures. Then the regional level passes the report to the provincial public health authority. In British Columbia that's the BC Centre for Disease Control, or BCCDC.

**Sarah:** The province aggregates reports, does initial analysis, and shares anonymized aggregate data with the federal level, which is PHAC. PHAC publishes nationally aggregated counts in CNDSS and feeds disease-specific products like FluWatch.

**Kiffer:** And PHAC also reports onward to the World Health Organization under the International Health Regulations, for events of international concern. So that's the full chain. Clinic to Medical Officer of Health to province to PHAC to World Health Organization.

**Sarah:** Two features of this flow really matter. First, public health legislation is provincial in Canada, under the Constitution Act. So the list of notifiable conditions and the reporting timelines actually differ across provinces. A condition might be urgent-reportable in one province and routine in another.

**Kiffer:** Second, the federal level receives aggregated, de-identified data only. PHAC cannot pull individual records. This federal-provincial division is one reason that the Covid pandemic exposed gaps in real-time data sharing in Canada. The legal architecture was not designed for the speed a respiratory pandemic demands.

**Sarah:** Okay. That's the four system types and the reporting flow. Now Section 1's other big topic, which we'll fold in here, is the Canadian product layer. The actual surveillance dashboards an epidemiologist works with. Let's walk through the federal layer first.

**Kiffer:** PHAC operates several headline products. They are aggregated and curated. The agency cannot dispense individual-level data. Each one has a different cadence, scope, and intended audience.

**Sarah:** We already named CNDSS. It's the flagship passive system. Provinces submit weekly aggregated counts for about fifty nationally notifiable conditions. CNDSS feeds the agency's annual Notifiable Diseases Online reports. It is useful for long-run trends and inter-provincial comparisons. It is less useful for real-time outbreak detection, because of the multi-week lag from clinic to PHAC.

**Kiffer:** FluWatch is the weekly respiratory-virus surveillance product. It's actually a hybrid system. The sentinel network of clinicians, plus lab partners across the country submitting subtyped influenza and now SARS-CoV-2 results, plus provincial outbreak counts. That all feeds the national picture. The weekly FluWatch report is what most public health communicators reach for during respiratory season.

**Sarah:** Tied closely to FluWatch is the Respiratory Virus Detection Surveillance System, or RVDSS. RVDSS aggregates lab-confirmed detections for influenza, respiratory syncytial virus, SARS-CoV-2, and other respiratory pathogens. It's the lab-based companion to the clinical signal in FluWatch.

**Kiffer:** Then there's the Canadian Chronic Disease Surveillance System, CCDSS. This is a federal-provincial collaboration that uses validated case definitions on top of health-administrative records, physician billing claims and hospital discharge abstracts, to estimate prevalence and incidence of chronic conditions like diabetes, hypertension, asthma, dementia, and ischemic heart disease.

**Sarah:** CCDSS is the largest single source of population-level chronic-disease data in Canada. So when you read a Canadian estimate of diabetes prevalence, it is almost certainly coming out of CCDSS.

**Kiffer:** Then the Canadian Vital Statistics Death Database. This is not strictly a surveillance product, but it's the spine of mortality-based surveillance. Statistics Canada compiles every death registered in Canada, with cause coded to the International Classification of Diseases, version ten.

**Sarah:** It's what enables life-expectancy estimates, cause-specific mortality trends, and excess-mortality analyses. The Covid pandemic's burden was largely estimated using vital statistics data on excess mortality.

**Kiffer:** PHAC also runs targeted specialty systems. The Canadian Cancer Registry, which we'll come back to in a second, is one. There's also a tuberculosis system, antimicrobial resistance systems, opioid-harm systems, and others.

**Sarah:** The Canadian Cancer Registry is worth a special mention because it's one of the few Canadian systems with active follow-up and nearly complete case ascertainment. Provincial cancer registries feed into a national aggregate. Cancer is reportable, and the registries chase down completeness in ways most other systems don't.

**Kiffer:** Then the British Columbia provincial layer, anchored by BCCDC. The federal products are valuable for the country-level view, but most case-level work happens provincially. BCCDC operates many publicly available dashboards. The respiratory pathogens dashboards for influenza, RSV, and SARS-CoV-2 with regional breakdowns. The enteric pathogens dashboards for Salmonella, Campylobacter, shiga-toxin-producing E. coli, and Listeria. Cancer-incidence dashboards. Sexually transmitted infection quarterly reports.

**Sarah:** And really worth flagging, the BC Coroners Service produces the unregulated-toxic-drug-supply mortality reports, which BCCDC co-publishes. Those reports have been the central surveillance product for the toxic-drug-supply crisis in British Columbia. They also report on suicide deaths. So when you cite a BC overdose-death rate or a BC suicide rate, the BC Coroners Service is the source.

**Kiffer:** And then the Canadian Institute for Health Information, or CIHI. CIHI is not technically a surveillance system, but it stewards a lot of the underlying data. The Discharge Abstract Database for hospital admissions. The National Ambulatory Care Reporting System for emergency department visits. CCDSS sits on top of these. And many academic studies do too.

**Sarah:** So when you cite a Canadian rate, always state which system produced it. CNDSS, CCDSS, FluWatch, RVDSS, the Cancer Registry, Vital Statistics, BCCDC dashboards, the BC Coroners Service, or CIHI. Each has its own case definition, its own denominator, and its own reporting lag.

**Kiffer:** And those three things, case definition, denominator, and lag, drive almost all the differences in the numbers you see across systems. The journalist who cites the highest number does not necessarily have the most accurate count. They have the count from the system most willing to publish quickly.

**Sarah:** Which connects directly to the next big idea. How do you evaluate a surveillance system? The United States Centers for Disease Control, CDC, has published a framework with nine attributes for evaluating surveillance systems. They are worth knowing by name.

**Kiffer:** Let's go through them. One, simplicity. Is the system structurally simple, or does it require complex machinery to operate? Two, flexibility. Can it adapt when a new disease appears or the case definition changes? Three, data quality. Are the entries complete and valid?

**Sarah:** Four, acceptability. Are the people who feed data into the system willing to keep doing so? A surveillance system that asks too much of clinicians eventually breaks down.

**Kiffer:** Five, sensitivity. The proportion of true cases the system actually captures. Six, predictive value positive, or PV plus. Of the cases the system flags, the proportion that are real. Seven, representativeness. Does the system describe the affected population accurately?

**Sarah:** Eight, timeliness. How fast does information flow from event to product? And nine, stability. Is the system reliably available and operational?

**Kiffer:** And there's a really important relationship between sensitivity and predictive value positive. They are inversely related, especially for syndromic systems. If you make a system very sensitive, very willing to flag possible cases, you also make it produce more false alarms, so PV plus goes down.

**Sarah:** And the inverse. If you make a system more specific, more conservative about flagging, fewer false alarms, but you miss real events. Sensitivity goes down.

**Kiffer:** And the optimal balance depends entirely on what the system is for. For early detection of a serious emerging pathogen, you tolerate more false alarms in exchange for catching real events fast. For a long-run prevalence estimate of a chronic condition, you want the opposite.

**Sarah:** And this trade-off has a familiar shape that you'll see again later in the course. It's the same logic as sensitivity and specificity in screening tests, which we'll cover in Lesson 5. You're tuning a threshold, and where you set it depends on the costs of being wrong in either direction.

**Kiffer:** Right. The math is the same. Just applied to populations and time-series rather than to individual patients.

**Sarah:** Okay. That covers Section 1. Surveillance is structured ongoing monitoring. Four system types. Major Canadian products. Nine evaluation attributes. Should we move into Section 2?

**Kiffer:** Yeah. Section 2. Outbreak investigation.

**Sarah:** An outbreak is the occurrence of cases of a disease in excess of what would normally be expected, in a defined population, place, and time. Each of those phrases is doing work.

**Kiffer:** More than expected presupposes a baseline. The local five-year average for influenza in the same week, or a seasonal threshold from a regression model. Population, place, time says outbreaks are local. Thirty cases of Campylobacter across Canada in a week is not unusual. Thirty cases at one wedding is.

**Sarah:** Three working terms you need to use precisely. Cluster, an aggregation of cases in time or space that may or may not be statistically unusual. Outbreak, a cluster judged to exceed the expected baseline. Epidemic, a large-scale outbreak, often across multiple jurisdictions. And pandemic, an epidemic with worldwide geographic spread.

**Kiffer:** And one common student error. Pandemic is a geographic claim, not a severity claim. The World Health Organization declares a pandemic based on geographic spread. Severity is a separate dimension. You can have a mild pandemic, in principle, and you can have a very severe outbreak that never becomes pandemic.

**Sarah:** The decision to call something an outbreak is partly statistical and partly operational. Statistically, you compare current counts to a baseline distribution and apply a threshold. Operationally, an outbreak declaration mobilizes resources and triggers public communication. Authorities are reasonably cautious about both over-calling and under-calling.

**Kiffer:** And the methods are reasonably standardized. The reference framework most North American epidemiologists learn is the CDC's ten-step process from the Field Epidemiology Manual. The steps look orderly on paper, but in practice you often work several at once and revisit earlier ones as new information arrives. Let's walk through them.

**Sarah:** Step one, prepare for fieldwork. Before you arrive, you confirm authority and roles. You brief on the suspected cause. You gather supplies, case-report forms, lab kits, personal protective equipment. You identify local liaisons, the Medical Officer of Health, environmental health, the lab. Preparation is the step new investigators most often skip and most often regret.

**Kiffer:** Step two, establish the existence of an outbreak. Compare current counts to a baseline. If the baseline is unstable, small denominators, seasonal variation, you state how you constructed it. And ruling out artifact is part of this step. A new lab test, a reporting policy change, a clinician on a reporting kick. All of those can produce an apparent outbreak that isn't real.

**Sarah:** Step three, verify the diagnosis. Talk to clinicians. Review charts. Check that lab results are correctly attributed. A pseudo-outbreak driven by a contaminated lab reagent or a misclassification is not unheard of.

**Kiffer:** Step four, construct a working case definition. A case definition has three parts. Clinical criteria, the symptoms, signs, lab tests. Person, place, time restrictions, like attendees of the August twelfth potluck. And a level of certainty, suspect, probable, confirmed.

**Sarah:** And worth expanding on that, because it's important. Cases are graded by certainty. Confirmed cases meet the full clinical and laboratory criteria. Probable cases meet most criteria but lack final lab confirmation. Suspect cases meet looser clinical criteria, often used for early case-finding before testing is complete.

**Kiffer:** And the case definition is expected to evolve. You start broad to find cases, then tighten as you learn more. That is normal. Not a sign you got it wrong the first time.

**Sarah:** Step five, find cases systematically and record information. Active case finding through chart review, asking clinicians, contacting attendees of an implicated event. The product is a line list. One row per case, with demographic, clinical, exposure, and outcome variables. The line list is your working dataset for everything that follows.

**Kiffer:** Step six, perform descriptive epidemiology. The classic person, place, time triad. The flagship visualization is the epidemic curve, often shortened to epi curve. A histogram of case counts by date of symptom onset.

**Sarah:** And the shape of the epi curve is genuinely diagnostic. It constrains your hypotheses about the exposure. Let me walk through the three main shapes.

**Kiffer:** Please.

**Sarah:** A sharp peak with a steep rise, a single peak, and a tail roughly equal to the duration of one incubation period suggests a point-source outbreak. Everyone exposed in a single brief window. Like a wedding meal. The shape mirrors the incubation-period distribution.

**Kiffer:** A plateau, a sustained elevated level over time, suggests continued common-source exposure. Like a contaminated water supply that keeps producing new cases until the source is identified and shut down.

**Sarah:** And multiple peaks, repeating waves spaced by roughly one incubation period, suggest secondary transmission. Person-to-person spread. The first wave infects contacts who become the second wave who infect more contacts.

**Kiffer:** So the shape of the curve is genuinely informative. A new investigator can sometimes guess the transmission mode from the curve alone, before the analytic study is done.

**Sarah:** Plus you describe by place, mapping cases geographically. And by person, looking at age, sex, occupation, and other demographics. The classic time-place-person triad.

**Kiffer:** Step seven, develop hypotheses. From the descriptive epidemiology and from open-ended interviews of cases, you generate plausible exposures. A specific food. A specific water source. A specific event. A specific procedure. Good hypotheses are testable with the data you have or can collect.

**Sarah:** Step eight, evaluate hypotheses with an analytic study. The two workhorse designs in outbreak settings are the retrospective cohort, when you can enumerate everyone who attended an event, like a wedding-guest list, and the case-control study, when you cannot.

**Kiffer:** And the two-by-two tables, risk ratios, and odds ratios you will compute later in the course are exactly what you compute here. Outbreak investigation is not a new method. It's a familiar method stack applied under time pressure. Cohort design and case-control design come earlier in this series and again here earlier in this series. We'll build measures of association in Lesson 6 of this material, and they show up directly in step eight.

**Sarah:** Step nine, reconsider and refine hypotheses. The first analytic pass often points to several plausible exposures. You take that ambiguity, do environmental sampling, traceback investigations, lab characterization through genome typing, and you sharpen the inference.

**Kiffer:** And the reconciliation with laboratory and environmental studies is part of step nine in the lesson framing. The epidemiologic evidence and the laboratory and environmental evidence have to agree, or you have a problem you need to explain.

**Sarah:** Step ten, implement control and prevention measures. Once the source is identified, or even before, you act. Recall a contaminated product. Close a venue. Issue prophylaxis. Isolate cases.

**Kiffer:** And here's a really important point. Control measures often happen before step eight. The precautionary principle does not require you to wait for a p-value. If the cost of being wrong about the hypothesis is small, and the cost of being wrong about not acting is large, you act early. The romaine lettuce advisory we'll talk about in a minute is a clear example.

**Sarah:** And then maintain or initiate surveillance to track whether your control measures worked. New cases falling, new cases stopping, new cases continuing. That tells you if the intervention worked.

**Kiffer:** And finally, communicate findings. Both during the investigation, to inform the public-health response, and after, through the final outbreak report, to share what was learned. Communication runs throughout, not just at the end.

**Sarah:** That's the ten steps. Now let's make this concrete with the worked example the lesson uses. The nineteen-seventy-six Legionnaires' disease outbreak in Philadelphia.

**Kiffer:** Quick context. In July of nineteen-seventy-six, the American Legion, which is a United States veterans' service organization, held its annual convention at the Bellevue-Stratford Hotel in Philadelphia, Pennsylvania. About four thousand attendees. Within days, hundreds were sick. By the end, two hundred and twenty-one cases of severe pneumonia, thirty-four deaths.

**Sarah:** A really alarming cluster. People started getting sick within days of the convention. Severe pneumonia, often requiring hospitalization. The case fatality rate was about fifteen percent. Investigators from CDC and the Pennsylvania Department of Health were on it almost immediately.

**Kiffer:** And the early hypotheses were wrong. The initial suspicion was a respiratory virus, possibly a novel influenza strain, given the recent swine flu concerns. Or alternatively, a chemical exposure, possibly a nickel-carbonyl poisoning or sabotage.

**Sarah:** Investigators systematically ruled out each of those. They did extensive case-finding, line-listing every confirmed pneumonia attendee. They built an epi curve, which showed a point-source pattern. They did case-control analyses comparing attendees who got sick to attendees who did not.

**Kiffer:** And they collected environmental samples. Air samples. Water samples. Surface samples from the hotel. Then the CDC laboratory team, working with samples from autopsy lung tissue and environmental samples, eventually isolated a previously unknown bacterium.

**Sarah:** It took about six months. The bacterium was named Legionella pneumophila. Pneumophila roughly meaning lung-loving. The investigators traced its transmission to the hotel's air-conditioning cooling tower, which had aerosolized contaminated water and spread the bacterium through the hotel ventilation system.

**Kiffer:** And what's amazing in retrospect is that Legionella pneumophila turned out to be a not-uncommon environmental bacterium that thrives in warm-water systems. Cooling towers. Hot tubs. Plumbing systems. Once the organism was characterized, public health labs could go back and find it in earlier outbreaks of unexplained pneumonia. It had been there all along.

**Sarah:** There are three lessons from Legionnaires that the lesson really wants to land. First, initial hypotheses are often wrong. Respiratory virus, chemical exposure, sabotage. None of those were right. The systematic methods are what got investigators to the answer, not the initial intuition.

**Kiffer:** Second, patience and systematic methods matter. Six months from outbreak to organism. That is a long time when people are dying. But the alternative, declaring a cause prematurely, would have been worse.

**Sarah:** And third, new pathogens can emerge in seemingly familiar settings. A hotel in Philadelphia. An air-conditioning cooling tower. The Legionnaires' outbreak helped establish that the universe of human pathogens is larger than the textbook list, and that environmental sources can produce unexpected outbreaks.

**Kiffer:** And the lesson notes a quick contemporary parallel that's worth flagging. The twenty-seventeen to twenty-eighteen multi-province E. coli outbreak in Canada, eventually traced to romaine lettuce. PulseNet Canada, the genome-sequencing network we mentioned earlier, was what allowed scattered cases across provinces to be linked into a single investigable cluster. Without whole-genome sequencing, the cases would have looked like background noise.

**Sarah:** And in that outbreak the public advisory was issued before the source was definitively confirmed. Precaution is a defensible public-health stance when the asymmetry favors acting early.

**Kiffer:** Right. The cost of being wrong, in either direction, is rarely symmetric. Falsely accusing a food product hurts producers. Failing to recall a real contaminant hurts consumers. Experienced investigators learn to act on confident-enough evidence, communicate uncertainty honestly, and revise control measures as data evolve.

**Sarah:** And there's an equity dimension to this we should mention. Surveillance systems do not see all populations equally. Data quality, case ascertainment, willingness to be tested, all vary by social position. So outbreak investigators are increasingly expected to ask whose communities are over- or under-represented in the line list, and how to adjust the response accordingly.

**Kiffer:** The Covid pandemic made this question impossible to ignore in Canada. Differential burdens by neighborhood income, by racialized status, by Indigenous identity, were visible in surveillance data once collected, and absent when not. Equitable surveillance is not an add-on. It's part of the system design.

**Sarah:** Okay. Let's pull the takeaways for the lesson, because there's a lot here.

**Kiffer:** Yeah. Let's do them carefully.

**Sarah:** First. Surveillance is the ongoing, systematic collection, analysis, interpretation, and dissemination of health-related data, for the planning, implementation, and evaluation of public health practice. Langmuir's standard is that it has to close the loop. Information must come back out as decisions and actions, otherwise it's just bookkeeping.

**Kiffer:** Second. Four system types. Passive, the routine reporting of notifiable diseases. Active, public health staff actively soliciting cases. Sentinel, a designated network reporting systematically. Syndromic, real-time signals before lab confirmation. Each is suited to different priorities, and most diseases are tracked by more than one.

**Sarah:** Third. Major Canadian surveillance systems. CNDSS for notifiable diseases. CCDSS for chronic disease prevalence built on health-administrative data. FluWatch and RVDSS for respiratory viruses. The Canadian Cancer Registry. Canadian Vital Statistics for mortality. BCCDC dashboards for the BC provincial picture. The BC Coroners Service for unregulated drug deaths and suicide. CIHI for hospital and emergency-department data.

**Kiffer:** And when you cite a Canadian rate, always state which system produced it. Different case definitions, different denominators, different lags. The number you see depends on which system you asked.

**Sarah:** Fourth. CDC's nine attributes for evaluating a surveillance system. Simplicity, flexibility, data quality, acceptability, sensitivity, predictive value positive, representativeness, timeliness, stability. And the central trade-off is sensitivity versus predictive value positive. More sensitive systems catch more real events but also generate more false alarms.

**Kiffer:** Fifth. An outbreak is the occurrence of cases in excess of what would normally be expected, in a defined population, place, and time. Cluster, outbreak, epidemic, pandemic are graded terms. Pandemic is a geographic claim, not a severity claim.

**Sarah:** Sixth. The CDC ten-step framework. Prepare for fieldwork. Establish the existence. Verify the diagnosis. Construct a case definition. Find cases systematically. Perform descriptive epidemiology. Develop hypotheses. Evaluate hypotheses with analytic studies. Reconsider and refine. Reconcile with laboratory and environmental studies. Implement control. Maintain surveillance. Communicate findings.

**Kiffer:** And those ten steps are not strictly serial. You revisit earlier ones as new information arrives. Control measures often happen before the analytic study is finished. The framework is a checklist, not a flowchart.

**Sarah:** Seventh. The epidemic curve is your single most valuable initial visualization. Sharp peak suggests point-source. Plateau suggests continued common-source exposure. Multiple peaks suggest secondary person-to-person transmission. The shape genuinely informs the hypothesis.

**Kiffer:** Eighth. Case definitions are graded by certainty. Confirmed, probable, suspect. They evolve as the investigation matures. That's normal.

**Sarah:** Ninth. The nineteen-seventy-six Legionnaires' disease outbreak. Two hundred and twenty-one cases of pneumonia at an American Legion convention in Philadelphia. Thirty-four deaths. Initial hypotheses, respiratory virus and chemical exposure, were wrong. Six months of systematic methods identified Legionella pneumophila, a previously unknown bacterium, transmitted through the hotel's air-conditioning cooling tower.

**Kiffer:** And the three lessons from Legionnaires. Initial hypotheses are often wrong. Patience and systematic methods matter. New pathogens can emerge in familiar settings.

**Sarah:** And tenth. Outbreak investigation is not a new method. It is the methods stack you will build across this material, applied under time pressure. Cohort and case-control designs. Two-by-two tables. Risk ratios. Validity vocabulary. Confounding control. All deployed in days, not years.

**Kiffer:** Which is also why we're putting this lesson early in the course. The rest of the term, when we work on study design or measures of association or confounding, you can come back to this lesson and ask, how does this concept actually get used in surveillance or in an outbreak. The applied frame is up and running from week two.

**Sarah:** Right. The toolkit will keep getting deeper. Sampling next lesson. Then questionnaire design. Measures of disease frequency. Screening. Measures of association. Validity. Confounding. And the analytic side carries forward into this material.

**Kiffer:** And the worked outbreak datasets in this lesson actually carry forward into this material, where we revisit the same potluck data with logistic regression and an SIR transmission model. So the bridge is concrete, not just thematic.

**Sarah:** If there's one thing I'd want a student to walk away with from this lesson, it's the connection between the data infrastructure and the analytic methods. Surveillance is what generates the signal. Outbreak investigation is what turns a signal into action. The methods you're going to spend the course learning are how you do that turning. Without them, surveillance is just numbers. With them, it's public health.

**Kiffer:** And one practical note. Spend ten minutes on the BCCDC dashboards or the PHAC Health Infobase. Look up an actual recent surveillance product. Read its case definition. Read its data-quality caveats. Notice how big the lag is between event and report. That ten minutes will teach you something the lesson can't fully convey in writing.

**Sarah:** Because the abstract idea of timeliness is one thing. Seeing that the most recent FluWatch report covers the week that ended ten days ago is another. The lag is real. The trade-offs are real. And once you've seen them concretely, the rest of the lesson clicks into place.

**Kiffer:** And that's the work. Read a real surveillance product, recognize what type it is, identify which of CDC's nine attributes it prioritizes, and form your own judgment about what it can and cannot tell you. That's epidemiology in practice.

**Sarah:** Next lesson is Sampling. We'll shift from the system level back to the question of who actually gets counted, and how a sample either represents or distorts the population it claims to describe.

**Kiffer:** Which connects directly back to today, because every surveillance product we just walked through is, in effect, a sampling problem. Who shows up in the data, and who doesn't. That's the thread we'll pick up next time.

**Sarah:** Thanks for sticking with us through Lesson 2. See you in Lesson 3.

**Kiffer:** Take care, everyone.