# Lesson 3 — Sampling (v3 expanded)

*Companion-podcast transcript • Sarah & Kiffer* 
*~5060 words • ~27.3 min audio*

---

**Sarah:** Welcome back to Office Hours. I'm Sarah.

**Kiffer:** And I'm Kiffer. Today we're working through Lesson 3, Sampling. This is the first design lesson of the course, and it sits at the foundation of every study you'll ever read or design.

**Sarah:** Before we dive in, can you set up why this lesson matters? Lesson 1 was about causal concepts and Lesson 2 was about surveillance and outbreak investigation. Where does sampling fit in?

**Kiffer:** Lesson 1 closed with the counterfactual framework. The idea that to talk about a cause, you have to compare what happened under one exposure to what would have happened under a different exposure. Lesson 2 then showed how surveillance systems actually collect the data that public health acts on. Lesson 3 asks the immediate next question. Which people end up in those comparison groups, or in those surveillance systems? Because the choice of who is in your study determines almost everything else. The validity of the result, the generalizability of the conclusion, the bias inventory students built up earlier in this series. All of it traces back to who got sampled.

**Sarah:** Okay. And the lesson is structured into four sections.

**Kiffer:** Right. First, the population hierarchy and the probability theory that makes inference from a sample possible. Second, types of error and the sampling methods that abandon probability theory. Third, the major probability sampling designs. And fourth, complex survey analysis and how to calculate a sample size before you collect data.

**Sarah:** Let's take Section 1. Census versus sample. Walk me through that.

**Kiffer:** A census evaluates every individual in the target population. A sample evaluates only a subset. Sampling is generally cheaper, faster, and more practical, and a well-planned sample can give you essentially the same information at a fraction of the cost. Interestingly, even a census is a kind of sample. It captures the population at a single point in time, so it's a sample of the population over time.

**Sarah:** And the practical distinction is what kinds of error each one introduces.

**Kiffer:** In a census, the only source of error is measurement error. The instrument you used, how the question was asked, whether the respondent answered honestly. In a sample, you contend with measurement error and a second source called sampling error, which is the variability that comes from the fact that any subset of people you draw will be slightly different from the population as a whole.

**Sarah:** Can we ground this in real Canadian examples? Students sometimes hear census and immediately think of a textbook abstraction.

**Kiffer:** Sure. Start with the Census of Population. It's run by Statistics Canada, the national statistical agency, every five years. The most recent one was in 2021, and the next one is in 2026. It's a near-complete enumeration of every household in the country. The short-form census goes to all households. The long-form census goes to a 25 percent mandatory sample. The Census provides the denominators behind almost every population health rate you will ever calculate.

**Sarah:** And the major health surveys?

**Kiffer:** The big one is the Canadian Community Health Survey. Let me spell that out. Canadian Community Health Survey, abbreviated CCHS. It's a continuous cross-sectional sample survey jointly run by Statistics Canada, Health Canada, and the Public Health Agency of Canada. Roughly 65,000 respondents per cycle. It covers self-reported health, health behaviors, and health-care use. It's the flagship descriptive survey for population health surveillance in Canada.

**Sarah:** Self-reported is a key qualifier there.

**Kiffer:** It is. Self-report is cheap and scalable, but smokers under-report cigarettes, drinkers under-report drinks, and everyone over-reports vegetables. So Statistics Canada also runs the Canadian Health Measures Survey, or CHMS. It's smaller, about 5,700 respondents per cycle. The CHMS does direct physical measurements. Blood pressure measured by a trained technician. Height and weight measured, not asked. Blood drawn for biomarkers. There's even a biobank attached. The CHMS anchors objective measurement of population health when self-report isn't trustworthy enough.

**Sarah:** And there's a longitudinal predecessor, right?

**Kiffer:** The National Population Health Survey, the NPHS. It ran from 1994 to 2011. Longitudinal, meaning it followed the same people over time rather than drawing a fresh cross-section each cycle. It's still used for life-course research today, even though it's been replaced by the CCHS.

**Sarah:** So the Census gives you denominators. The CCHS and CHMS give you population estimates of health states with sampling error attached. Choosing among them is the first applied sampling decision a public health analyst makes.

**Kiffer:** Exactly. The second decision is descriptive versus analytic studies.

**Sarah:** Walk me through that distinction.

**Kiffer:** A descriptive study describes a population. What proportion of British Columbians had a flu shot last winter? An analytic study estimates the magnitude of an association. Is your water source associated with the incidence of diarrhea? Establishing an association is the first step to inferring causation, which is exactly what Lesson 1 was about.

**Sarah:** And on top of those decisions sits the hierarchy of populations.

**Kiffer:** Three nested groups. First, the target population. That's the broadest group to which you want to generalize. The classic example is researchers studying rainwater cisterns in Pernambuco State, in northeastern Brazil. Pernambuco is a semi-arid state with chronic water access challenges. They might define the target as residents of Pernambuco. But somebody else reading the study might want to generalize to all semi-arid regions globally.

**Sarah:** Second, the source population.

**Kiffer:** The population from which study subjects are actually drawn. All units in the source population should be listable, in principle, with a non-zero probability of inclusion. In the Brazil study, the source was families from households participating in the One Million Cisterns Project, a large rural water-infrastructure program.

**Sarah:** And the third is the study sample.

**Kiffer:** The individuals who actually end up in the study. The actual rows in your dataset. The set of people who agreed to participate and whose data met quality requirements.

**Sarah:** And there's vocabulary attached to the arrows between the three populations.

**Kiffer:** Three labels. External validity links target to source. It's a subjective assessment of whether the source is representative enough that results generalize to the broader target. Sampling frame links source to study sample. The frame is the actual list of units the sample is drawn from. Internal validity asks whether the study results are valid for members of the source population at all. So, internal validity is whether you got the right answer for the source. External validity is whether that answer travels to the target.

**Sarah:** And the sampling frame is where most exclusions and selection bias enter, in real public health work.

**Kiffer:** Yes. A sampling frame is the list of all sampling units in the source population. Sampling units are the basic elements that will be sampled. Sometimes households, sometimes individuals, sometimes clinics or classrooms.

**Sarah:** Where do those frames actually come from in Canadian practice?

**Kiffer:** Several places. First, Statistics Canada's Address Register. Spelling that out, Address Register, abbreviated AR. It's a maintained list of every residential address in Canada, used by the Census and many StatCan household surveys. Second, the Labour Force Survey area frame. Spelling out Labour Force Survey, then LFS. The LFS is a monthly StatCan survey that produces the official unemployment rate. Its area frame is built from Census enumeration areas, and the CCHS draws part of its sample from it.

**Sarah:** Third?

**Kiffer:** Provincial health insurance registries. In British Columbia it's the Medical Services Plan, the MSP, with a client registry of nearly every BC resident. Population Data BC, sometimes shortened to PopData BC, is the data linkage centre researchers use to access linked records derived from MSP. And fourth, disease registries. The Canadian Cancer Registry covers essentially every cancer diagnosis. Provincial reportable-disease lists do the same for infectious diseases.

**Sarah:** And each frame has different coverage. That's where bias enters.

**Kiffer:** A registry-based frame misses people without provincial coverage. Undocumented residents, recent immigrants without a health card yet, people experiencing severe homelessness who have lost theirs. The LFS area frame excludes residents on First Nations reserves and people in institutions, including long-term care homes and prisons. The frame, not the questionnaire, is usually where exclusions and selection bias enter.

**Sarah:** Okay. Now the lesson moves into probability theory. Why is that the next step?

**Kiffer:** Because the deep question underneath all of sampling is, why can we learn about a whole population from a small subset at all? The answer comes from probability theory. Three concepts do most of the work.

**Sarah:** Walk me through them.

**Kiffer:** First, a random variable. A numerical outcome of a random process. The number of new tuberculosis cases reported in Vancouver Coastal Health next week is a random variable. The systolic blood pressure of the next adult who walks into a clinic is a random variable. Second, the expected value, sometimes called the mean. Written using the Greek letter mu. The long-run average across many repetitions of the process. In most studies, the expected value is exactly what you're trying to estimate. Third, the variance and its square root, the standard deviation. Variance is sigma squared. Standard deviation is sigma. Both measure spread around the mean. And spread, not the mean alone, is what controls how precisely you can estimate things from a sample.

**Sarah:** And there's a fourth idea worth naming.

**Kiffer:** Independence. Two observations are independent when knowing one tells you nothing about the other. Independence is the assumption that lets a small random sample stand in for a much larger population. And it's the assumption most often violated in real public health data. People in the same household are not independent. Repeated measurements on the same person are not independent. Cases of an infectious disease in the same school are not independent because of contagion.

**Sarah:** And every confidence interval or hypothesis test is arithmetic on a probability distribution.

**Kiffer:** If you don't know what distribution your statistic comes from, you can't honestly attach uncertainty to it. So the lesson introduces the most common distributions you'll meet in public health. Discrete distributions describe countable outcomes, zero, one, two cases. Continuous distributions describe measurements that can take any value on a range. Height, blood pressure, time.

**Sarah:** Let's start with Bernoulli.

**Kiffer:** A single yes-or-no trial with success probability p. Mean is p. Variance is p times one minus p. The example is whether one randomly chosen adult currently smokes.

**Sarah:** Second, Binomial.

**Kiffer:** The number of successes in n independent Bernoulli trials. Mean is n times p. If you take a CCHS sample of 1,000 adults and count the smokers, that count is Binomial.

**Sarah:** Third, Poisson.

**Kiffer:** The number of rare events in a fixed interval of time, area, or person-time. The mean equals the variance, both written as the Greek letter lambda. The classic example is new cases of measles per week reported to a public health unit. Or emergency-room visits per hour. Anytime you're counting rare events arriving roughly independently, Poisson is your default.

**Sarah:** And the continuous distributions?

**Kiffer:** Start with Uniform. Every value between two bounds is equally likely. The default-ignorance distribution. It shows up in random-digit dialing within an area code. Then Normal. The famous bell curve. Symmetric, centered on its mean, with about 95 percent of values within roughly two sigma of mu. It's the default for adult height, systolic blood pressure, standardized test scores. Then Exponential. Time between independent events occurring at a constant rate lambda. Right-skewed and memoryless. Time between successive emergency-department arrivals. Survival time under a constant hazard. And finally Log-normal. Variables that are positive and span several orders of magnitude. The log of the variable is approximately Normal. Household income, hospital length of stay, viral loads in HIV or hepatitis. Long right tail.

**Sarah:** Okay. Those are distributions of individual values. Now the distribution of a statistic. A summary of a sample.

**Kiffer:** And this is where the Central Limit Theorem comes in. Spelling that out, Central Limit Theorem, abbreviated CLT. There are two key facts. First, the standard error. If individual observations have standard deviation sigma, and you draw a random sample of size n, then the standard deviation of the sample mean is sigma divided by the square root of n. That quantity, the standard deviation of a statistic, is called the standard error. Quadrupling sample size halves the standard error. Precision grows with the square root of n, not linearly.

**Sarah:** So if you want to make your estimate twice as precise, you need four times the sample.

**Kiffer:** Exactly. And the second fact is the Central Limit Theorem itself. For sufficiently large n, the sampling distribution of the mean is approximately Normal, regardless of what the underlying population looks like. Even if the population is heavily skewed or bimodal or discrete, the distribution of sample means becomes approximately Normal as n grows. Sufficiently large is often around n equals 30 for moderately skewed distributions, smaller for symmetric ones.

**Sarah:** And this is the engine behind every confidence interval.

**Kiffer:** The 1.96 in the formula estimate plus or minus 1.96 times standard error comes from the Normal distribution that the CLT promises us applies to the estimate, even when the underlying data are not Normal at all.

**Sarah:** What's a common student misconception here?

**Kiffer:** Students sometimes think a large enough sample makes the data themselves Normal. It does not. Income data with 1,000 observations is still right-skewed. What becomes Normal is the distribution of the sample mean across hypothetical repeated samples. The CLT is about means, not individual data. Other statistics, medians, proportions, extreme values, may need different methods like bootstrapping.

**Sarah:** Okay. Section 2 turns to errors and non-probability sampling. Start with errors.

**Kiffer:** There are two kinds in hypothesis testing. A Type I error happens when you conclude an effect exists when it really doesn't. You falsely reject the null. The probability of a Type I error is denoted alpha. The convention is alpha at 0.05, a 5 percent chance of a false positive. A Type II error is the opposite. You conclude there's no association when there is one. Probability of a Type II error is beta.

**Sarah:** And related to beta is statistical power.

**Kiffer:** Power is the probability of finding a statistically significant difference when a real difference of a defined magnitude exists. Power equals one minus beta. The conventional target is 80 percent. Increasing power mostly means increasing sample size. So-called negative findings, where a study fails to find a difference, are less commonly reported, partly because many of those studies were just underpowered.

**Sarah:** Now non-probability sampling. The methods that abandon formal random selection.

**Kiffer:** A non-probability sample is one with no formal process for determining each individual's probability of selection. There are three flavors. First, judgment sampling. The investigator picks who to include based on professional judgment. They think these particular cases are representative. Second, convenience sampling. The investigator includes whoever is easy to access. Students in their own classroom, patients in their own clinic. Third, purposive sampling. The investigator selects participants who fit specific criteria of interest, like a particular professional role. Often used in qualitative research.

**Sarah:** And the limitation is the same for all three?

**Kiffer:** Yes. Non-probability samples are generally inappropriate for descriptive studies, because you can't generalize prevalence estimates without knowing each individual's probability of being included. If you don't know the probabilities, you can't compute a valid weighted average that estimates population prevalence. Non-probability methods are sometimes used in analytic studies, where comparing groups is the priority and absolute prevalence matters less. But for descriptive estimates, they're not the right tool.

**Sarah:** Okay. Section 3 is the heart of the lesson. The six probability sampling designs.

**Kiffer:** First, simple random sampling. Every member of the source population has an equal probability of being included. You need a complete list and a formal random process, like computer-generated random numbers.

**Sarah:** Worked example?

**Kiffer:** You want to study wait times in an emergency room. You have 13,000 admissions over the past year and want 1,000 records. Generate 1,000 random numbers between 1 and 13,000 and pull those records. The advantages are that it's conceptually simple and all standard analyses apply directly. The limitation is that you need that complete list of all 13,000 admissions.

**Sarah:** Second, systematic random sampling.

**Kiffer:** Here you don't need a complete list. You just need an estimate of the total population size and sequential access. Compute a sampling interval j, which is population size divided by desired sample size. Randomly pick a starting point between 1 and j, then take every j-th individual. Same emergency room. 13,000 patients, sample of 1,000. Sampling interval is 13. Pick a number between 1 and 13. Say you pick 7. Select patient 7, then 20, then 33, then 46, every 13th patient. One caution. Bias may occur if there's a periodic pattern in admissions that lines up with your sampling interval. Suppose every 13th admission tends to be an overnight transfer from a particular satellite clinic. You've systematically over-sampled or under-sampled that subgroup.

**Sarah:** Third, stratified random sampling.

**Kiffer:** Divide the population into mutually exclusive strata, based on factors likely to affect the outcome. Within each stratum, draw a simple or systematic random sample. In proportional stratified sampling, the number sampled from each stratum is proportional to that stratum's share of the population.

**Sarah:** And the three big advantages?

**Kiffer:** First, it ensures all strata are represented. You don't accidentally end up with too few people from some small subgroup. Second, more precise overall estimates than simple random sampling, because between-strata variation is removed from the overall estimate. Third, it allows estimation of stratum-specific outcomes, which means you can report separate estimates for, say, men and women, with reasonable precision in each. For example, if hospital wait times differ by sex, you'd stratify records by sex and randomly sample within each group. The CCHS does this at a much larger scale, stratifying by health region across all of Canada.

**Sarah:** Fourth, cluster sampling.

**Kiffer:** A cluster is a natural grouping of study subjects. A household is a cluster of people. A classroom is a cluster of students. A clinic is a cluster of patients. The cluster itself is the primary sampling unit, sometimes shortened to PSU. Spelling that out, Primary Sampling Unit, PSU. You sample whole clusters, and every individual within a selected cluster is included.

**Sarah:** Worked example?

**Kiffer:** To estimate smoking prevalence among grade 12 students in a district, randomly select 10 of 47 grade 12 classes, and survey every student in those 10. The practical advantages are clear. It's easier when getting a list of clusters is simpler than listing every individual. And it's cheaper to visit fewer locations. The statistical limitation is that individuals within a cluster tend to be more alike than individuals chosen at random. Students in the same class share neighborhood, socioeconomic profile, friend group, and a teacher. They're not independent. That non-independence inflates sampling variation for a given sample size compared to simple random sampling.

**Sarah:** There's an important nuance from the lesson here.

**Kiffer:** A sample is only a cluster sample if the group is the sampling unit and individuals within the group are the unit of concern. If the group itself is the unit of concern, like asking whether anyone in the household smokes indoors, where the household is the unit of analysis, then it's not a cluster sample. It's just a household-level survey.

**Sarah:** Fifth, multistage sampling.

**Kiffer:** This is similar to cluster sampling, except after selecting primary sampling units, you draw a sample of secondary sampling units within each PSU rather than surveying everyone. Same smoking study. Randomly select 10 classes, then within each selected class randomly select 5 students rather than surveying all 30. To make sure every student has the same probability of selection, you either choose PSUs proportional to their size, or use a constant sampling proportion within each PSU.

**Sarah:** Sixth, targeted sampling.

**Kiffer:** Sometimes called risk-based sampling. It stratifies the population based on characteristics associated with the probability of disease, then focuses sampling on the strata where disease is most likely to be found. Individuals are assigned point values based on their probability of having the disease, and sampling proceeds until a predetermined number of points are sampled. It's unequal probability sampling. Some individuals may have a zero probability of inclusion. The advantage is huge for rare diseases. You need a much smaller sample when key risk characteristics can be identified. The limitation is that key parameters like risk ratios may not be known in advance and have to be estimated from other evidence.

**Sarah:** And then the lesson has a beautiful worked example showing how the CCHS combines all of these.

**Kiffer:** First, stratification. The Canadian population is stratified by health region. About 110 health regions across Canada. A target sample size is allocated to each region. Second, clustering. Within each health region, dwellings are sampled from the LFS area frame. Groups of dwellings sharing a geographic boundary form the clusters. Third, selection within cluster. One person is randomly selected from each chosen household. Fourth, top-up samples. A random-digit-dialing telephone frame fills in coverage where the area frame is sparse. Spelling that out, Random Digit Dialing, abbreviated RDD.

**Sarah:** And the result is a probability sample where every Canadian resident has a known, non-zero chance of selection.

**Kiffer:** But the selection probability differs by region, household size, and frame. That is why CCHS data must be analyzed with survey weights and bootstrap replicate weights. Which brings us to Section 4.

**Sarah:** Section 4. Complex survey analysis and sample size calculation.

**Kiffer:** Real surveys typically combine several designs. Stratification at the top level, clustering within strata, multistage selection within clusters. That combination is precisely what makes the analysis non-trivial.

**Sarah:** Why can't you just throw CCHS data into ordinary regression?

**Kiffer:** Because ordinary regression assumes simple random sampling, which means it assumes every observation is independent and equally weighted. CCHS data violates both. People in the same cluster aren't independent. People sampled with different probabilities aren't equally weighted. Ignore the design and you get the wrong point estimate and wildly underestimated standard errors.

**Sarah:** So there are three things to account for.

**Kiffer:** First, stratification. If the population was divided into strata before sampling, that has to be reflected in the analysis. Stratification mainly affects precision, not the point estimate. The total population size in each stratum has to be known to compute appropriate weights. Second, sampling weights. Not all individuals have the same probability of selection. The sampling weight for each individual is the inverse of their overall selection probability. For a household survey, the per-person probability is roughly the probability of selecting their household times the probability of being chosen within the household. The weight tells you how many people in the source population each sampled person represents. Third, clustering. Observations in the same cluster are not independent, and standard errors must be adjusted upward. The most common approach is variance linearization, which identifies the primary sampling unit and adjusts standard error calculations for clustering at that level.

**Sarah:** And there's a single number that summarizes how much the design hurts precision.

**Kiffer:** The design effect, abbreviated deff. It's the ratio of variance from the complex design to variance from a simple random sample of the same size. A deff greater than one means the complex design produces less precise estimates than a simple random sample would. In the Brazil diarrhea study, deff was 4.43, meaning the variance was 4.43 times larger. So you'd need roughly 4.43 times the sample size to achieve the same precision as a simple random sample.

**Sarah:** And the lesson has a table showing how each design feature shifts the estimate.

**Kiffer:** Yes. The unadjusted simple-random-sample estimate of incidence was 14.62 percent with a standard error of 0.0061. Adding stratification barely moves anything. Adding weights bumps the estimate up to 17.51 percent and the standard error to 0.0091. Adding clustering raises the standard error to 0.0088. With all features combined, the estimate is 17.51 percent and the standard error is 0.0128. Ignoring the design would give a point estimate off by about 3 percentage points and a confidence interval about half as wide as it should be.

**Sarah:** And in Canadian practice, Statistics Canada distributes the CCHS and CHMS with bootstrap replicate weights.

**Kiffer:** 500 bootstrap replicate weights, specifically. StatCan doesn't release the underlying cluster identifiers because that would risk re-identification. Instead they release 500 alternate weights. To get correct standard errors, you re-run your analysis 500 times, once with each replicate weight, and combine the results. Most analysts use the survey or srvyr packages in R, the svy commands in Stata, or the PROC SURVEY procedures in SAS. If you ignore the bootstrap weights and analyze the CCHS as if it were a simple random sample, your standard errors will typically be 30 to 80 percent too small.

**Sarah:** There's also a small detail called the finite population correction.

**Kiffer:** When the proportion of the population sampled is greater than about 10 percent, precision improves beyond what you'd expect from an infinite population. The finite population correction, the FPC, adjusts the variance downward. The FPC equals the population size minus the sample size, divided by the population size minus one. The FPC should not be applied in multistage sampling. It's only applicable to descriptive studies using simple or stratified random sampling.

**Sarah:** Okay. The second half of Section 4 is sample-size calculation. The before-the-data version of all this.

**Kiffer:** Analyzing data correctly is necessary but not sufficient. An underpowered study cannot be rescued by clever analysis. So sample size has to be planned in advance. There are statistical and non-statistical considerations. Non-statistical means resources. Time, money, personnel, the practical limits of the sampling frame. Statistical means four things.

**Sarah:** First?

**Kiffer:** Precision. The more precise you need your estimate, the larger the sample. If you want diarrhea prevalence within plus or minus 5 percent, you need more subjects than if plus or minus 10 percent is acceptable. Precision is denoted L, the allowable error, or half the desired confidence interval width.

**Sarah:** Second?

**Kiffer:** Expected variation. For a proportion, variance is p times q, where q is one minus p. So you need a rough estimate of the proportion. For a continuous variable like BMI, you need an estimate of the population variance, sigma squared. Here's one practical trick. Estimate the range that covers 95 percent of values, divide by 4 to get sigma, then square it for sigma squared.

**Sarah:** Third?

**Kiffer:** Confidence level. Typically 95 percent. It determines how confident you want to be that the confidence interval includes the true population value. For 95 percent confidence, the corresponding Z-alpha is 1.96. Higher confidence requires a larger sample.

**Sarah:** And fourth, for analytic studies.

**Kiffer:** Power. Often 80 percent. It determines the sample size needed to detect a specific effect size. For 80 percent power, Z-beta is negative 0.84. Greater power requires a larger sample.

**Sarah:** Walk me through the formula for estimating a single proportion, in plain words.

**Kiffer:** Required sample size for a proportion is, in plain words, Z-alpha squared times p times q, divided by L squared. Z-alpha is the critical value from the Normal distribution corresponding to your chosen confidence level. p is your best guess at the proportion. q is one minus p. L is the allowable error. Sample size grows with the square of Z-alpha and shrinks with the square of L. Tighter intervals or higher confidence both inflate the required sample quickly.

**Sarah:** And for comparing two groups, the formula depends on effect size and power.

**Kiffer:** Yes. The formula adds Z-beta to Z-alpha. Required sample size per group depends on the effect size, the variability in the data, and the power. Smaller effects require larger samples, because they're harder to detect against background noise. The lesson includes a worked example. Determine whether rainwater cisterns reduce monthly diarrhea risk from 15 percent to 10 percent. With 95 percent confidence and 80 percent power, the formula gives 685 per group, so 1,370 individuals total.

**Sarah:** And then they apply a clustering correction.

**Kiffer:** If the outcome is clustered within households, with intra-class correlation 0.45 and average household size 6, the clustering adjustment increases the requirement to 2,230 per group. That's more than triple the unadjusted estimate. It's the design effect biting back at sample-size calculation. If you're going to do cluster sampling, you have to budget for it up front.

**Sarah:** Okay. Let's pull this all together. Takeaways.

**Kiffer:** First. A census measures everyone, a sample measures a subset. Both involve measurement, but samples also introduce sampling error. The Canadian Census of Population gives you denominators every five years. The Canadian Community Health Survey and Canadian Health Measures Survey give you population estimates of health states with sampling error attached.

**Sarah:** Second. Descriptive studies characterize populations. Analytic studies evaluate associations.

**Kiffer:** Third. Three populations form a hierarchy. Target, source, and study sample. External validity links target to source. Sampling frame links source to study sample. Internal validity asks whether results are valid for the source population at all. The frame, not the questionnaire, is usually where exclusions and selection bias enter.

**Sarah:** Fourth. The math underneath sampling is probability theory. Random variable, expected value, variance, independence. Common public-health distributions are Bernoulli, Binomial, Poisson, Uniform, Normal, Exponential, and log-normal.

**Kiffer:** Fifth. The standard error of the mean equals sigma divided by the square root of n. Quadrupling sample size halves standard error. Precision grows with the square root of n, not linearly.

**Sarah:** Sixth. The Central Limit Theorem says the sampling distribution of the mean is approximately Normal for sufficiently large n, regardless of the underlying population shape. The CLT is about means, not individual data.

**Kiffer:** Seventh. Type I error is a false positive, controlled at alpha, conventionally 0.05. Type II error is a false negative, controlled at beta. Power is one minus beta, conventionally 80 percent.

**Sarah:** Eighth. Non-probability sampling, judgment, convenience, and purposive, lacks formal random selection and is generally inappropriate for descriptive studies.

**Kiffer:** Ninth. There are six probability designs. Simple random, systematic random, stratified random, cluster, multistage, and targeted. Real surveys like the CCHS combine stratification, clustering, individual selection, and a random-digit-dialing top-up frame.

**Sarah:** Tenth. Complex survey data must be analyzed with survey weights and design-based variance estimation. Ignoring the design typically shrinks standard errors by 30 to 80 percent, making confidence intervals and p-values meaningless.

**Kiffer:** Eleventh. Sample size for a proportion depends on desired precision, expected proportion, and confidence level. For comparing two groups, also effect size and power. Clustering adjustments can multiply the required sample size by a factor of 2 to 5.

**Sarah:** And the practical recommendation?

**Kiffer:** Two things. First, when you read a paper, read the methods section for the sampling frame and design before the results. Most of what determines whether you can trust the headline number is in those paragraphs. Second, when you design a study, calculate sample size before you start collecting, and budget explicitly for the design effect of any clustering. An underpowered study is wasted effort.

**Sarah:** And the connection back to earlier lessons and forward to Lesson 4?

**Kiffer:** Lesson 1 set up the counterfactual framework. Lesson 2 showed how surveillance systems detect and respond to disease patterns. Lesson 3 now answers the question of who ends up in the comparison groups, and how to think about the uncertainty that comes with sampling. Lesson 4 turns to questionnaire design, which is how you measure the people you've sampled. The frame chooses who. The questionnaire chooses what. Both decisions sit upstream of every analysis you'll ever do.

**Sarah:** Up next is Lesson 4. Questionnaire design.

**Kiffer:** Take care, everyone.

**Sarah:** See you there.
