Sampling

Fundamental Epidemiological Concepts and Approaches

Learning objectives for this lesson:

Distinguish between a census and a sample, and between descriptive and analytic studies
Describe the hierarchy of populations and the concept of a sampling frame
Explain types of error, including Type I and Type II errors, and the concept of statistical power
Compare non-probability sampling methods (judgement, convenience, purposive)
Describe probability sampling methods (simple random, systematic, stratified, cluster, multistage, targeted)
Understand the implications of complex sampling designs on data analysis
Compute required sample sizes for common analytic objectives

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based on Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary: Key Terms, People & Concepts

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas

Target Population The full population to which study findings are intended to apply, defined by the research question.

Study (Source) Population The accessible subset of the target population from which a sample can actually be drawn (also called the source population).

Sampling Frame The actual list, register, or operational mechanism used to enumerate and reach members of the study population. Coverage error arises when the frame omits or double-counts target members.

Sampling Unit The element selected at each stage of the sampling process; this could be an individual, household, school, or cluster, depending on design.

Census Data collection from every member of a population. Eliminates sampling error but is usually impractical and may suffer from large non-response.

Probability Sampling Sampling design in which every unit in the frame has a known, non-zero probability of selection. Required for valid statistical inference using survey weights.

Non-Probability Sampling Sampling without known selection probabilities, for example convenience, purposive, snowball, or quota. Useful for hard-to-reach groups but inference to a population is limited.

Sampling Error Random variation in estimates that arises because we observe a sample rather than the whole population. Quantified by the standard error and decreases as sample size increases.

Non-Sampling Error Errors arising from sources other than random sampling, such as coverage, non-response, measurement, and processing. Often more consequential than sampling error.

Type I Error (α) Rejecting a true null hypothesis: a false positive. Conventionally controlled at 0.05.

Type II Error (β) Failing to reject a false null hypothesis: a false negative. Power = 1 − β, conventionally targeted at 0.80 or higher.

Statistical Power The probability that a study correctly detects an effect of a given size when one exists. Depends on sample size, effect size, variability, and α.

Design Effect (DEFF) Ratio of the variance under the actual complex design to the variance under simple random sampling of the same size. DEFF > 1 indicates the effective sample size is smaller than n.

Oversampling Deliberately sampling certain subgroups at higher rates than their population share to improve subgroup precision. Requires weighting in analysis.

Non-Response Bias Systematic error that occurs when individuals who do not respond differ in important ways from those who do.

Sampling Designs & Methods

Simple Random Sample (SRS) A probability sample in which every possible sample of size n from the frame has an equal chance of selection. The reference design against which others are compared.

Systematic Sampling Selecting every kth unit from the frame after a random start. Approximates SRS unless periodicity in the list aligns with the sampling interval.

Stratified Sampling Dividing the population into mutually exclusive strata and sampling independently within each. Improves precision when strata differ in the outcome and ensures representation of subgroups.

Cluster Sampling Sampling intact groups (clusters) such as schools or villages instead of individuals. Reduces field cost but inflates variance because units within a cluster are correlated.

Multistage Sampling A design that selects clusters in successive stages (e.g., regions → neighbourhoods → households → people). Common in national surveys.

Sampling Weight The reciprocal of the selection probability (often adjusted for non-response and post-stratification). Each respondent represents 1/weight people in the population.

Convenience Sampling Recruiting whoever is most accessible. Fast and cheap but vulnerable to selection bias; common in pilot studies.

Purposive (Judgement) Sampling Researcher selects units believed to be informative or representative based on theoretical criteria. Common in qualitative work.

Snowball Sampling Existing participants refer additional participants. Useful for hidden or stigmatised populations; respondent-driven sampling (RDS) is a probability-based variant.

Quota Sampling Non-probability sampling that fills predefined quotas (e.g., by age and sex) using convenience recruitment within each cell.

No matching entries. Try a different search term.

Section 1

Introduction to Sampling

⏱ Estimated reading time: 22 minutes

Lesson 3 · HSCI 341

The Bridge Between Question and Data

Who enters your study determines validity, precision, and every inference you later draw.

Section 1 of 4

Introduction to Sampling

Population hierarchy, sampling frames, and the probability theory that makes inference from a sample possible.

First distinction

Census vs. sample

Census

Every individual measured. Only measurement error applies.

Sample

A subset measured. Both measurement error and sampling error apply, but a well-designed sample provides nearly the same information at a fraction of the cost.

Canada runs both: the Census of Population (every 5 years) and surveys like the CCHS and CHMS that produce population estimates with sampling error attached.

Two study purposes

Descriptive vs. analytic

Descriptive

Characterises population attributes: frequency, prevalence, average. Requires representative sampling.

Analytic

Estimates the magnitude of an exposure-outcome association. Comparison is the priority.

Three levels

The population hierarchy

The sampling frame is the list of units in the source population. In Canadian practice it is often the Statistics Canada Address Register, an LFS area frame, or a provincial health insurance registry.

Probability foundations

Random variables, expected value, variance

Random variable

A numerical outcome of a random process: e.g., the number of new TB cases in a health region next week.

Expected value (μ)

The long-run average across many repetitions. The quantity we are usually trying to estimate.

Variance (σ²)

How spread out the values are around the mean. Spread controls the precision of sample-based estimates.

Standard error of the mean

\[ \color{#0B7B6B}{SE} = \frac{\color{#C2410C}{\sigma}}{\sqrt{\color{#6D28D9}{n}}} \]

SE standard error of the meanσ population standard deviationn sample size

Key distributions

Probability distributions in public health

Discrete

Bernoulli(p): single yes/noBinomial(n, p): count of successesPoisson(λ): rare event counts

Continuous

Normal(μ, σ²): symmetric measurementsExponential(λ): skewed waiting timesUniform(a, b): equal probability over a range

The engine of inference

The Central Limit Theorem

Sampling distribution of the mean

\[ \color{#0B7B6B}{\bar{X}} \sim N\!\left(\color{#C2410C}{\mu},\; \frac{\color{#6D28D9}{\sigma^2}}{\color{#1D4ED8}{n}}\right) \text{ for large } n \]

X̄ sample meanμ population meanσ² population variancen sample size

No matter how skewed or irregular the underlying population, the distribution of sample means approaches Normal as n grows. This is what permits confidence intervals and hypothesis tests even with non-Normal raw data.

The interactive simulator below this walkthrough lets you watch the theorem in action across six different population shapes.

Carry forward

What to take into the next section

Target → source → study is the hierarchy; external and internal validity describe how well each transition holds.
The sampling frame is where exclusions and selection bias typically enter, not the questionnaire.
The Central Limit Theorem and $ SE = \sigma/\sqrt{n} $ are the foundation of frequentist inference.

Introduction and Overview

An earlier lesson closed with the counterfactual framework: causal inference requires us to compare groups under different exposure conditions. This lesson asks the immediate next question: which people should be in those groups? Sampling sits at the foundation of every study you'll design in this course because the choice of who ends up in the study determines almost everything else: the validity of the results, the generalisability of conclusions, and the bias inventory you spent an earlier course learning to recognise. The four content sections move from broad principles to concrete formulas: this section sets up the population hierarchy, sampling frame, and the probability theory that makes inference from a sample possible; a later section covers types of error and non-probability sampling; a later section details the major probability sampling designs; a later section closes with how to analyse data from complex surveys and how to determine sample size in advance.

Learning Objectives

Distinguish between a census and a sample.
Contrast descriptive and analytic studies.
Describe the hierarchy of populations (target, source, study sample).
Explain the concepts of internal and external validity.
Define a sampling frame and explain its importance.
Describe the foundational ideas of probability theory (random variables, expected value, and variance) that underpin statistical inference from a sample.
Recognise common probability distributions (Bernoulli, Binomial, Poisson, Normal, Exponential, Uniform) and identify the public-health phenomena they typically describe.
Explain what a sampling distribution is and state the Central Limit Theorem in plain language.

Census vs. Sample

When we conduct research, we need data from either all individuals in a population or a subset of them. The process of obtaining this data is called measurement.

In a census, every individual in the population is evaluated. In a sample, data are collected from only a subset. Sampling is generally more convenient and less costly than conducting a full census. Interestingly, even a census can be viewed as a kind of sample: it captures the population at one point in time, making it a "sample" of the population over time.

Key Distinction

In a census, the only source of error is the measurement itself. With a sample, you contend with both measurement error and sampling error. However, a well-planned sample can provide virtually the same information as a census at a fraction of the cost.

Canadian Examples: Census vs. National Health Surveys

Canada runs both kinds of data collection at the population scale, and you will encounter all of them in public health practice:

Census of Population (Statistics Canada, every 5 years; most recent 2021). A near-complete enumeration of every household in Canada. The short-form census goes to all households; the long-form census goes to a 25% mandatory sample. Provides the denominators behind almost every population health rate you will calculate.
Canadian Community Health Survey (CCHS): Statistics Canada / Health Canada / PHAC. A continuous cross-sectional sample survey (~65,000 respondents per cycle) covering self-reported health, behaviours, and health-care use. The flagship descriptive survey for population health surveillance.
Canadian Health Measures Survey (CHMS): Statistics Canada / Health Canada / PHAC. A multi-stage sample survey that adds direct physical measurements (blood pressure, biomarkers, fitness) and a biobank to self-report data. Smaller (~5,700 respondents per cycle) but anchors objective measurement of population health.
National Population Health Survey (NPHS): the longitudinal predecessor (1994–2011) to the CCHS, still used for life-course research.

The Census gives you a denominator and demographic context; CCHS and CHMS give you population estimates of health states with sampling error attached. Choosing among them is the first applied sampling decision a public-health analyst makes.

Descriptive vs. Analytic Studies

Samples support two fundamental types of studies:

Descriptive Studies (Surveys)

A descriptive study aims to describe population attributes such as the frequency of disease or the prevalence of an exposure. Surveys answer questions like: "What proportion of people had diarrhea over a 1-month period?" or "What is the average BMI of students in Grade 12?"

The focus is on characterizing the current state of a population rather than establishing cause-and-effect relationships.

Analytic Studies

An analytic study is designed to estimate the magnitude of an association between exposures and outcomes. These studies contrast groups and seek explanations for differences between them.

Examples: "Is water source associated with the incidence of diarrhea?" or "How does time spent playing video games affect the BMI of Grade 12 students?"

Establishing an association is the first step to inferring causation, as discussed in an earlier lesson.

Choosing between a census and a sample is the first decision; choosing between a descriptive and an analytic purpose is the second. Both decisions sit on top of an even more fundamental concept: the relationship among the different populations a single study touches.

Hierarchy of Populations

Understanding the different populations involved in a study is essential for evaluating validity. There are three key populations to consider, each nested inside the next. The diagram and accordion below define them in turn; the labels that appear next to the arrows (external validity, sampling frame, internal validity) are the technical vocabulary you will use to talk about how a sample inherits or loses information from the broader population it was meant to represent.

Figure 3.1. Hierarchy of populations in epidemiologic research. The target population is the broadest; the source population is the accessible subset; the study sample consists of those who actually participate.

Target Population ▼

The target population is the population to which you want to extrapolate your results. It is often not clearly defined and may vary depending on the perspective of the person interpreting the study. For example, researchers studying rainwater cisterns in Pernambuco State, Brazil might define the target as that state, while someone else may want to generalize the findings to all semi-arid regions of Brazil.

Source Population ▼

The source population is the population from which study subjects are actually drawn. All units in the source population should be "listable" and have a non-zero probability of being included in the study. For example, in a diarrhea study in Brazil, the source population included families from households participating in the One Million Cisterns Project (OMCP).

Study Sample ▼

The study sample (or study group) consists of the individuals who actually end up in the study. It is typically a subset drawn from the source population. Researchers determine the necessary sample size, draw their sample, collect data from eligible subjects, and the final study sample consists of those who agreed to participate and whose data met quality requirements.

Validity: Internal and External

Internal validity refers to whether the study results are valid for members of the source population. It indicates whether the study obtained the "correct" answer for that population. Much of epidemiology is dedicated to methods that ensure internal validity.

External validity involves a subjective assessment of whether results can be generalized to the broader target population. It is generally easier to generalize results from analytic studies (which evaluate associations) than from descriptive studies (which estimate prevalence).

The Sampling Frame

The sampling frame is the list of all sampling units in the source population. Sampling units are the basic elements that will be sampled (e.g., households, individuals). A complete list of all sampling units is required for drawing a simple random sample, though some other methods do not require such a complete listing.

Example: Brazil Diarrhea Study

In a study of water cisterns and diarrhea in Brazil, a suitable sampling frame was the list of all households eligible for the One Million Cisterns Project. Once households were selected, a separate strategy was used for selecting individuals within each household.

Canadian Examples: Sampling Frames You Will Actually Use

National-scale Canadian surveys rarely have a single tidy list of every person. Instead, they assemble a frame from administrative listings:

Statistics Canada Address Register (AR): the dwelling-level frame used by the Census and many StatCan household surveys.
Labour Force Survey (LFS) area frame: CCHS draws part of its sample from the LFS area frame (which is itself based on Census enumeration areas).
Provincial health insurance registries (e.g., the BC Medical Services Plan client registry), close to a population census of residents and the backbone of administrative-data research at Population Data BC (PopData BC).
Disease registries such as the Canadian Cancer Registry or provincial reportable-disease lists serve as case frames for surveillance.

Notice how each frame has different coverage: a registry-based frame misses people without provincial coverage; an LFS area frame excludes residents on First Nations reserves and in institutions. The frame, not the questionnaire, is usually where exclusions and selection bias enter.

The vocabulary of populations and sampling frames is necessary but not sufficient. The deeper question is why we can learn about a whole population from a small sample at all. The answer comes from probability theory.

Probability Theory: Why Sampling Works

Every quantity we estimate from a sample, whether a prevalence, a mean, or a risk ratio, is the value of a random variable. If we drew a different sample tomorrow, the value would be slightly different. Probability theory is the formal language we use to describe how those values vary, and it is what lets us turn a single sample into a defensible statement about a population.

Three concepts do most of the work in introductory biostatistics:

Three Foundational Ideas

1. A random variable is a numerical outcome of a random process. For example, the number of new TB cases reported in a health region next week, or the systolic blood pressure of the next adult who walks into a clinic.

2. The expected value (also called the mean, written μ or E[X]) is the long-run average of a random variable across many repetitions of the random process. It is the value we are usually trying to estimate.

3. The variance (σ²) and its square root, the standard deviation (σ), measure how spread out the random variable is around its mean. Spread, not the mean alone, is what controls how precisely we can estimate things from a sample.

A second idea worth naming is independence. Two observations are independent when knowing the value of one tells you nothing about the value of the other. Independence is the assumption that lets a small random sample stand in for a much larger population, and it is the assumption most often violated in real public-health data (clustered households, repeated measures on the same person, contagion in infectious disease).

Why You Should Care

Whenever you compute a confidence interval, run a hypothesis test, or quote a margin of error, you are doing arithmetic on a probability distribution, usually a Normal distribution, that describes what would happen if you repeated the study many times. If you don’t know what distribution your statistic comes from, you can’t honestly attach uncertainty to it.

Types of Probability Distributions

A probability distribution describes the values a random variable can take and how likely each value is. Distributions split first into discrete (countable outcomes such as 0, 1, 2 cases) and continuous (any value on a range, such as height or blood pressure, time). The handful below are the ones you will encounter again and again in public health.

Discrete distributions

Distribution	What it models	Public-health example
Bernoulli(p)	A single yes/no trial with success probability p. Mean = p, variance = p(1−p).	Whether one randomly chosen adult currently smokes.
Binomial(n, p)	Number of "successes" in n independent Bernoulli trials. Mean = np, variance = np(1−p).	Number of smokers in a CCHS sample of 1,000 adults.
Poisson(λ)	Number of rare events in a fixed interval of time, area, or person-time. Mean = variance = λ.	New cases of measles per week in a public-health unit; ER visits per hour.

Continuous distributions

Distribution	What it models	Public-health example
Uniform(a, b)	Every value between a and b is equally likely. The "default ignorance" distribution.	Random-digit dialling within an area code; random selection from a list.
Normal(μ, σ²)	The classic bell curve: symmetric, with most mass within 2σ of the mean. The default for many continuous biological measurements and, importantly, for sample means (see CLT below).	Adult height; systolic blood pressure; standardized test scores.
Exponential(λ)	Time between independent events occurring at a constant rate λ. Right-skewed, memoryless. Mean = 1/λ.	Time between successive ED arrivals; survival time under a constant hazard.
Log-normal / Right-skewed	Variables that are positive and span several orders of magnitude. The log of the variable is approximately Normal.	Household income; hospital length of stay; viral loads.

How to Read a Distribution

Every distribution is summarized by two things: a shape (symmetric? right-skewed? bimodal?) and a small set of parameters that control its location and spread. When you see "BMI ~ Normal(27, 4²)", read it as: BMI is approximately Normal, centred at 27 kg/m², with a standard deviation of 4. About 95% of the population falls within 2 SDs of the mean, roughly 19 to 35.

Figure 3.2. Four common distributions you will meet in public-health data. Discrete distributions assign probability to whole-number outcomes (left two); continuous distributions describe a smooth curve over a real-valued measurement (right two).

🔥 Try it Yourself: Distribution Simulator

What you'll do: use the simulator below to play with each distribution's parameters and watch the shape change in real time, then run the Central Limit Theorem demo to see why sample means from any population eventually look Normal. What to take away: distributions describe specific public-health phenomena rather than serving as textbook abstractions, and the CLT is what lets us trust confidence intervals and power calculations even when the underlying data are skewed.

Aim for at least 10–15 minutes of play; this is the kind of intuition you will draw on for every confidence interval and power calculation in the rest of the course. Use the tabs to switch between exploring single distributions, the CLT demonstration, and a side-by-side comparison.

How to use this

Choose any of the six common distributions on the left. Adjust its parameters to see how the shape, mean, and spread respond. Then click "Draw a sample" to take a random draw and watch the empirical histogram converge on the theoretical curve as your sample grows.

Distribution

Pick a distribution

Number of successes in n independent yes/no trials. The natural model for "how many out of N have the condition?"

Parameters

Number of trials (n) 20

Probability of success (p) 0.30

Sampling

Sample size n 100

Tip: Bigger samples produce histograms that look more like the theoretical curve. The "law of large numbers" in action.

Theoretical distribution · Binomial(n = 20, p = 0.30)

The probability mass (discrete) or density (continuous) of the chosen distribution. Red dashed line marks the mean.

Empirical histogram from your sample (no sample yet)

Each draw of n values gets binned. Compare the bars (your data) to the red line (the theoretical curve). They should align as n grows.

Theoretical mean (μ)

6.000

Theoretical SD (σ)

2.049

Sample mean (x̄)

–

Sample SD (s)

–

About Binomial: n = 20, p = 0.30

Binomial(n, p) counts the number of successes in n independent Bernoulli trials with the same success probability p.

Mean = np = 6.00, SD = √np(1−p) = 2.05.

For large n and moderate p, the binomial looks Normal; this is one of the oldest examples of the CLT.
Used for sample-size formulas around proportions and prevalence estimates.
Assumes independence and constant p; both can fail in clustered data (households, schools).

Central Limit Theorem demonstration

Pick a population shape (the heavily skewed ones show the effect most clearly). Set a sample size n, then click Draw 1,000 sample means. Each sample of size n is summarised by its mean, and we plot those means. Watch the histogram of means converge on a Normal curve as n grows, even when the underlying population is wildly non-Normal.

Population shape

Population to sample from

The CLT says it does not matter which one of these you pick; sample means converge on a Normal curve.

Sample size

Sample size n 5

Each "trial" draws n values from the population and computes their mean.

Number of trials 1000

Compare: Try n = 1 (the means just are the population). Then bump to n = 5, 30, 100. Notice the histogram becomes Normal-shaped and narrower.

Population distribution

This is the underlying "true" distribution we are sampling from. It can be anything: symmetric, skewed, or bimodal.

Sampling distribution of the mean (n = 5)

Each bar counts how many trials produced a sample mean in that range. The red curve is the Normal approximation predicted by the CLT.

Population mean (μ)

–

Population SD (σ)

–

Mean of sample means

–

SD of sample means (SE)

–

Predicted SE = σ/√n

–

Trials drawn

Ready. Draw sample means to see the CLT in action.

Side-by-side: shapes at a glance

Six distributions, plotted on the same axes. Use this view to remember which shape goes with which name, and to compare how parameters change appearance. Hover for tooltips.

Display options

Binomial(20, 0.3)

Poisson(λ=3)

Normal(μ=10, σ=2)

Exponential(λ=0.3)

Uniform(0, 20)

Log-normal(μ=2, σ=0.5)

Different distributions can describe similar-looking data, so choosing the right one depends on the underlying mechanism as much as the shape.

Comparison of common distributions (rescaled to relative density)

Each curve is rescaled so peak density is 1, to make shape comparison easier. Shape, not absolute height, is what matters here.

How to choose a distribution in practice

Binary outcome (yes/no for one person)? Bernoulli. Count of "yes"s in a fixed sample? Binomial.
Counting rare events in time/space (cases per week, ER visits per hour)? Poisson.
Continuous, symmetric biological measurement (BP, height, lab values)? Normal, or check first whether the variable is approximately Normal.
Time until next event under a constant hazard? Exponential. Survival under varying hazard? Weibull or Gamma (advanced).
Strictly positive, right-skewed, multiplicative process (income, length of stay, viral load)? Log-normal; analyze on the log scale.
No prior information about likely values? Uniform on a sensible range.

Each distribution above describes the underlying behaviour of a single random variable. But when we draw a sample from a population, the quantity we typically care about is a summary of that sample: a mean, proportion, or rate. To make inferences from such summaries, we need one more layer of theory.

Sampling Distributions and the Central Limit Theorem

So far we have talked about distributions of individuals in a population. Now consider the distribution of a statistic, for example the mean of a random sample of n people. Because each sample of size n would give a slightly different mean, the mean itself has a distribution. We call it the sampling distribution of the mean.

Two facts about that sampling distribution drive almost all of frequentist inference:

The Two Pillars

1. Standard error. If individual observations have standard deviation σ, then sample means of size n have standard deviation σ/√n. This quantity, the SD of a statistic, is called the standard error (SE). Quadrupling your sample size halves the standard error. Keep the two ideas apart: the standard deviation tells you how much individual people differ from one another, while the standard error tells you how much a summary such as the mean would bounce around if you repeated the whole study.

2. The Central Limit Theorem (CLT). For sufficiently large n, the sampling distribution of the mean is approximately Normal, no matter what the underlying population looks like, even if the population is heavily skewed, bimodal, or discrete. "Sufficiently large" is often around n = 30 for moderately skewed distributions, and much smaller for symmetric ones.

The CLT is the hidden engine behind the bell curve that shows up everywhere in statistics. It is the reason a 95% confidence interval can be written as estimate ± 1.96 · SE: the 1.96 comes from the Normal distribution that the CLT promises us applies to the estimate, even when we have no idea what shape the underlying population has. The theorem traces from de Moivre (1733) through Laplace's 1812 binomial approximation to Lyapunov's 1901 general proof (Central limit theorem, Wikipedia).

Figure 3.3. The Central Limit Theorem in action. The population can be wildly skewed, but as n grows the sampling distribution of the mean becomes increasingly Normal and increasingly narrow (its SD = σ/√n).

Caveat: The CLT Is About Means, Not Individuals

A common student misconception is that “a large enough sample makes the data Normal.” It does not. Income data with 1,000 observations is still right-skewed. What becomes Normal is the distribution of the sample mean across hypothetical repeated samples; that is what we use to construct confidence intervals around the mean. Inference for medians, proportions, ratios, or extreme values relies on the CLT or its analogues in different ways and may need larger n or different methods (bootstrapping, exact methods).

Key Takeaways

A census measures everyone; a sample measures a subset; both involve measurement, but samples also introduce sampling error.
Descriptive studies characterize populations; analytic studies evaluate associations between exposures and outcomes.
The three populations (target, source, study sample) form a hierarchy, each linked to different aspects of study validity.
The sampling frame is the list of all units from which the sample is drawn; in Canadian practice this often means a StatCan address register, an LFS area frame, or a provincial health insurance registry.
Random variables, expected values, and variances are the language we use to describe how sample-based estimates vary; they are the foundation of every confidence interval and hypothesis test that follows.
A small set of probability distributions (Bernoulli, Binomial, Poisson, Normal, Exponential, log-normal) describes the bulk of public-health phenomena you will encounter.
The standard error of the mean is σ/√n, and by the Central Limit Theorem the sampling distribution of the mean is approximately Normal for sufficiently large n; this is the engine behind frequentist inference.

✦ Pass the knowledge check with 100% to continue

Section 2

Types of Error & Non-Probability Sampling

⏱ Estimated reading time: 12 minutes

Section 2 of 4

Types of Error & Non-Probability Sampling

Type I and Type II error, statistical power, and the designs that abandon random selection.

Two ways to be wrong

Type I and Type II error

Type I error (α)

Reject the null when it is true. A false positive. Conventional threshold: α = 0.05.

Type II error (β)

Fail to reject the null when it is false. A false negative. Usually set at β = 0.20.

Decision matrix

\[ \begin{array}{l|cc} & \text{Effect present} & \text{No effect} \\ \hline \text{Reject } H_0 & \text{Correct} & \color{#C2410C}{\alpha}\text{ error} \\ \text{Fail to reject} & \color{#6D28D9}{\beta}\text{ error} & \text{Correct} \end{array} \]

α error Type I: reject a true null (false positive)β error Type II: fail to reject a false null (false negative)

Power

Power = 1 − β

Statistical power

\[ \color{#0B7B6B}{\text{Power}} = 1 - \color{#6D28D9}{\beta} \]

Power probability of detecting a true effectβ Type II error rate

A study with 80% power (β = 0.20) will detect a true effect of the specified size eight times out of ten.

To increase power: increase sample size, reduce measurement error, or target a larger effect size.
A negative finding is only informative if power was adequate. Most underpowered studies are inconclusive, not reassuring.

No random selection

Three non-probability designs

Judgement

Investigator selects participants believed to be representative. Subject to unmeasured selection bias.

Convenience

Whoever is easiest to reach. Fast and cheap; rarely generalisable.

Purposive

Deliberate targeting of specific individuals or groups. Common in qualitative research and case series.

Hidden populations

Snowball and hybrid designs

Snowball sampling (Goodman, 1961): participants recruit peers. No sampling frame needed. Used for hidden populations.

Respondent-driven sampling

Heckathorn (1997). Uses structured referral chains to estimate population parameters. Assumes equilibrium across chains.

Time-location sampling

Systematic sampling at known venues. Used in HIV behavioural surveillance. Requires venue attendance to be unrelated to the outcome.

Carry forward

What to take into the next section

Type I (α) = false positive; Type II (β) = false negative; power = 1 − β.
Non-probability samples are generally inappropriate for descriptive studies: without known selection probabilities, prevalence estimates cannot be generalised.
Selection bias enters when the selection mechanism is correlated with the outcome, not merely with exposure.

Introduction and Overview

An earlier section set up the population hierarchy and the probability theory behind sampling. This section turns to two practical consequences. First: when we draw conclusions from a sample, we make systematic types of mistake (Type I and Type II errors), and these mistakes are quantifiable. Second: there are sampling strategies that abandon probability theory altogether (convenience samples, judgement samples, snowball samples) with predictable consequences for inference. Both halves of this section give you the vocabulary to evaluate when those approaches are acceptable and when they are not.

Learning Objectives

Explain the two types of statistical error (Type I and Type II).
Define the null hypothesis, P-values, and statistical power.
Describe three non-probability sampling methods and their limitations.

Types of Error

In any study based on a sample, the variability of the outcome, measurement error, and sample-to-sample variability all affect results. When making inferences based on sample data, they are subject to error. Within hypothesis testing in analytical studies, there are two key types of error:

Table 3.1. Types of Error

Conclusion of Analysis	Effect Truly Present	Effect Truly Absent
Effect present (reject null)	Correct	Type I (α) error
No effect (accept null)	Type II (β) error	Correct

Type I (α) Error ▼

A Type I error occurs when you conclude that the outcomes in the groups are different (i.e., that an association exists), when in fact they are not. In other words, you falsely reject the null hypothesis. The probability of a Type I error is denoted α.

Statistical tests are aimed at disproving the null hypothesis (that there is no difference between groups). When P ≤ 0.05, we are "reasonably sure" that any detected effect is not due to chance, but there remains a 5% chance of making a Type I error.

Type II (β) Error ▼

A Type II error occurs when you conclude that there is no association between the exposure and outcome, when in fact there is. You fail to reject the null hypothesis when you should have. The probability of a Type II error is denoted β.

Reasons a study might fail to find a real effect include: the exposure truly had no effect, the study design was inappropriate, the sample size was too small (low power), or simply bad luck.

Statistical Power ▼

Power is the probability that you will find a statistically significant difference when a real difference of a defined magnitude exists. Mathematically, power = 1 − β.

For example, if a study has 80% power, it has an 80% chance of detecting a true effect of the specified size. To increase power, you need to increase the sample size. So-called negative findings (failure to find a difference) are less commonly reported in the literature, partly because many studies lack adequate power.

🎲 Interactive: Sample Size & the Law of Large Numbers

What you'll do: pick a population, set a sample size n, draw repeated samples, and watch the distribution of sample means concentrate around the true population mean as n grows. What to take away: the standard error shrinks as 1/√n; this is the formal reason why “more data” means better precision and higher statistical power. The intuition you build here drives every confidence interval and sample-size calculation in the rest of the course.

Population

Sample size n 10

Population Distribution

The "true" distribution we're sampling from. Red line = true mean (μ).

Sampling Distribution of the Mean

Each bar is the count of sample means falling in that range. Yellow = most recent sample's mean.

True mean (μ)

3.50

Mean of sample means

–

Std. error (observed)

–

Theoretical SE = σ/√n

0.54

Samples drawn

Last sample mean

–

Try this: set n = 1, draw 100 samples, then bump n to 50 and draw 100 more (after Reset). The spread of the yellow histogram shrinks dramatically, the 1/√n precision gain in action.

The error framework above assumes a probability sample. The next subsection covers what happens when investigators forgo formal random selection altogether, and what that costs them.

Non-Probability Sampling

Samples drawn without an explicit method for determining each individual's probability of selection are known as non-probability samples. Whenever there is no formal process for random selection, the sample should be considered non-probability. Sample selection that is unrelated to the outcome of interest leaves inference intact, but selection that depends on unmeasured determinants of the outcome produces selection bias, a form of specification error formalised by Heckman (1979) and reviewed for hidden populations by Sudman & Kalton (1986). There are three main types:

Click each card to learn more:

Judgement
SampleClick to learn more

Convenience
SampleClick to learn more

Purposive
SampleClick to learn more

Important Limitation

Non-probability samples are generally inappropriate for descriptive studies because you cannot generalize prevalence estimates to the source population without knowing each individual's probability of being included. However, non-probability methods are commonly used in analytical studies where comparing exposure groups is the priority.

Chain-Referral and Hybrid Designs

Snowball sampling, first formalised by Goodman (1961), recruits hidden-population members through peer referrals and is widely used when no sampling frame exists. Two newer hybrids partially recover probability-style inference: respondent-driven sampling (RDS), introduced by Heckathorn (1997) and extended with unbiased estimators by Salganik & Heckathorn (2004); and time-location (venue-based) sampling, applied at national scale for HIV behavioural surveillance by MacKellar and colleagues (2007). Magnani, Sabin, Saidel, & Heckathorn (2005) review when each design is appropriate for hard-to-reach populations.

Key Takeaways

Type I (α) error means falsely concluding there is an effect; Type II (β) error means missing a real effect.
Power (1 − β) is the probability of detecting a true effect; increasing sample size increases power.
Non-probability samples (judgement, convenience, purposive) lack a formal random selection process and are primarily used in analytic studies.

✦ Pass the knowledge check with 100% to continue

Section 3

Probability Sampling Methods

⏱ Estimated reading time: 15 minutes

Section 3 of 4

Probability Sampling Methods

Simple, systematic, stratified, cluster, multistage, and targeted designs, and how to choose among them.

The defining property

What makes a sample a probability sample

Every element in the sampling frame has a known, non-zero probability of selection, assigned through a formal random process.

This is what separates probability sampling from all non-probability designs. The known probability is what allows you to compute unbiased estimates and valid standard errors.

Random does not mean haphazard. A formal, reproducible process is required: computer-generated numbers or a random number table.

The building blocks

Simple random and systematic sampling

Simple random

Equal probability for all. Needs complete list. All standard analyses apply directly.

Systematic

Pick a random start, then every jth unit. Practical for sequential records. Watch for periodicity.

Sampling interval

\[ \color{#0B7B6B}{j} = \frac{\color{#C2410C}{N}}{\color{#6D28D9}{n}} \]

j sampling interval (take every j-th unit)N population sizen desired sample size

Dividing first

Stratified random sampling

Sampling groups

Cluster sampling

Primary sampling unit = the cluster (household, class, clinic).

Randomly select clusters; measure everyone inside each selected cluster.

Advantages: only a cluster list is needed; fewer locations to visit.

Limitation: within-cluster similarity inflates variance relative to SRS for the same n.

Two-stage selection

Multistage sampling

Select primary sampling units (PSUs), then randomly sample individuals within each PSU rather than taking everyone.

CCHS design

Stratify by health region → cluster by dwelling within each region → select one person per dwelling (Kish grid).

Key requirement

Equal selection probabilities require probability-proportional-to-size (PPS) at the PSU stage, or a constant sampling fraction within each PSU.

Carry forward

Targeted sampling and the design decision

Targeted (risk-based) sampling focuses on high-risk strata: efficient for rare diseases but requires prior knowledge of risk factors and cannot yield standard risk ratios without external data.

Decision question

Does your frame support random selection? If yes, choose a probability design. If no, document the non-probability design and its analytic limits.

Complexity cost

Stratification, clustering, and multistage designs require design-corrected analysis in Section 4. The analysis must match the design.

Introduction and Overview

An earlier section closed by walking through what non-probability sampling looks like and why it forfeits inferential validity. This section returns to probability sampling and details the major variants you'll encounter in real surveys. The six tabs below are not interchangeable; each design buys a different combination of cost, complexity, and statistical efficiency. Read the comparison table at the end of the section as a decision aid you can return to whenever you're choosing a design.

Learning Objectives

Define a probability sample and explain why random selection is essential.
Describe simple random, systematic random, stratified random, cluster, multistage, and targeted (risk-based) sampling methods.
Identify the advantages and disadvantages of each method.

What Is a Probability Sample?

A probability sample is one in which every element in the population has a known, non-zero probability of being included. This implies that a formal process of random selection has been applied to the sampling frame. The key advantage is that probability samples allow for valid statistical inferences about the source population.

Random ≠ Haphazard

Random selection uses a formal, reproducible process (e.g., computer-generated random numbers, random number tables); it is not the same as selecting participants haphazardly or arbitrarily.

Types of Probability Sampling

Simple Random Sample

In a simple random sample, every study subject in the source population has an equal probability of being included. A complete list of the source population is required, and a formal random process is used to select individuals.

Example: To study wait times in a hospital emergency room, you need 1,000 records from 13,000 admissions over the past year. You randomly generate 1,000 numbers between 1 and 13,000 and pull those records.

Advantage: Conceptually simple; all standard statistical analyses apply directly.

Limitation: Requires a complete list of the entire source population.

Systematic Random Sample

In a systematic random sample, a complete list is not required; you only need an estimate of the total population and sequential access to individuals. The sampling interval (j) is computed as the population size divided by the desired sample size.

How it works: Randomly pick a starting point between 1 and j, then select every j^th subject after that.

Example: To sample 1,000 from 13,000 emergency patients, the sampling interval is 13. Randomly pick a number between 1 and 13 for your starting patient, then select every 13th patient thereafter. So if your random start happens to be 7, you would sample patients 7, 20, 33, 46, and so on down the list.

Caution: Bias may occur if the factor you are studying is related to the sampling interval (e.g., periodic patterns in admissions).

Stratified Random Sample

The population is divided into mutually exclusive strata based on factors likely to affect the outcome. Then, within each stratum, a simple or systematic random sample is chosen. The mathematical foundations of stratification, including the now-standard optimum (Neyman) allocation rule for assigning sample size across strata, were laid out by Neyman (1934) in his landmark Royal Statistical Society paper that effectively founded probability sampling theory.

In proportional stratified sampling, the number sampled from each stratum is proportional to that stratum's share of the total population.

Three key advantages:

Ensures all strata are represented in the sample.
Can produce more precise overall estimates than a simple random sample because between-strata variation is removed.
Allows estimation of stratum-specific outcomes.

Example: If hospital wait times differ between males and females, stratify records by sex and randomly sample within each group.

Cluster Sampling

A cluster is a natural grouping of study subjects with one or more common characteristics (e.g., a household is a cluster of people; a classroom is a cluster of students; a clinic is a cluster of patients).

In cluster sampling, the primary sampling unit (PSU) is the cluster itself, and it is often larger than the unit of concern. Every individual within a selected cluster is included in the sample.

Example: To estimate smoking prevalence among Grade 12 students, randomly select 10 of 47 Grade 12 classes and survey all students in those 10 classes.

Advantage: Easier when getting a list of clusters is simpler than listing all individuals. Often cheaper to visit fewer locations.

Limitation: Individuals within a cluster tend to be more alike, increasing sampling variation for a given sample size compared to SRS.

Important: A sample is only a "cluster sample" if the group is the sampling unit and the individuals within it are the unit of concern. If the group itself is the unit of concern (e.g., "does anyone in the household smoke indoors?"), it is not a cluster sample.

Multistage Sampling

Multistage sampling is similar to cluster sampling, except that after selecting primary sampling units (PSUs), a sample of secondary sampling units (individuals) is drawn within each PSU rather than surveying everyone.

Example: To study smoking among students, first randomly select 10 classes (PSUs), then randomly select 5 students from each class rather than surveying all students in every class. Within-household selection in face-to-face surveys is most often done using a Kish grid, the objective respondent-selection procedure introduced by Kish (1949).

To ensure all individuals have the same probability of being selected, either choose PSUs proportional to their size, or use a constant sampling proportion within each PSU; the latter requires PPS (probability-proportional-to-size) selection at earlier stages.

The number of individuals per cluster (n_i) can be optimized by balancing within-cluster and between-cluster variance against the costs of sampling groups versus individuals.

Targeted (Risk-Based) Sampling

Targeted sampling stratifies the source population based on characteristics associated with the probability of disease occurrence, then focuses sampling on strata where disease is most likely to be found.

Individuals are assigned point values based on their probability of having the disease of interest, and sampling proceeds until a predetermined number of points have been sampled. This is an unequal probability sampling strategy; some individuals may even have a zero probability of inclusion.

Advantage: Requires a much smaller sample to detect rare diseases when key risk characteristics can be identified.

Limitation: Key epidemiological parameters (e.g., risk ratios) may not be known for the study population and must be estimated from other evidence.

Comparison of Sampling Methods

Method	Requires Complete List?	Key Advantage	Key Limitation
Simple Random	Yes	Simple; all standard analyses apply	Needs complete population list
Systematic	No (needs estimate)	Practical; easy to implement	Periodic bias if factor linked to interval
Stratified	Yes (within strata)	More precise; ensures representation	Needs to know stratum membership
Cluster	List of clusters only	Cheaper; no need to list individuals	Higher variance than SRS for same n
Multistage	List of PSUs only	Flexible; cost-effective	Complex design; needs more subjects
Targeted	No (risk-based)	Efficient for rare diseases	Needs prior knowledge of risk factors

Worked Example: How the CCHS Combines These Methods

The Canadian Community Health Survey illustrates a real multistage probability design in action:

Stratification: the population is first stratified by health region (about 110 health regions across Canada), and a target sample size is allocated to each so that every region produces stable estimates.
Clustering: within each health region, dwellings are sampled from the LFS area frame (groups of dwellings that share a geographic boundary). This is the cluster stage.
Selection within cluster: one person is randomly selected from each chosen household to complete the interview.
Top-up samples: an RDD (random digit dialling) telephone frame fills in coverage for areas where the area frame is sparse.

The result is a probability sample where every Canadian resident has a known, non-zero chance of selection, but the selection probability differs by region, household size, and frame. That is why CCHS data must be analysed with survey weights and bootstrap replicate weights (covered on the next page).

Reflection

Think of a health research question you are interested in. Which sampling method would be most appropriate, and why? What practical constraints (cost, time, available lists) would influence your choice?

Model answerA defensible answer names the question (e.g., prevalence of food insecurity in BC post-secondary students) and matches design to it. Stratified random sampling is the right default: stratify by institution type (research-intensive vs. teaching-focused vs. community college) and within each stratum draw an SRS proportional to enrolment, oversampling smaller strata to ensure precise estimates. Practical constraints: cost favours administrative-data sampling frames over door-to-door; time favours online survey delivery; available lists (institution registrar files) determine the stratification variables that are feasible. If institutional lists are unavailable, fall back to cluster sampling on courses or classes, accept the design effect, and inflate n accordingly. Document refusal rates and run weighted analysis to address differential non-response.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Probability samples give every element a known, non-zero chance of selection, enabling valid statistical inference.
Simple random sampling requires a complete list; systematic sampling needs only sequential access.
Stratified sampling improves precision by removing between-strata variation.
Stratified versus cluster sampling. Strata are heterogeneous internally and every stratum is sampled, which removes between-stratum variation and improves precision. Clusters are relatively homogeneous internally and only a subset is sampled, so cases within a cluster carry overlapping information and precision falls (the design effect).
Cluster and multistage sampling are practical when listing all individuals is impractical, but they require more subjects for the same precision.
Targeted sampling is efficient for rare outcomes but requires prior knowledge of risk characteristics.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 4

Analysing Survey Data & Sample Size

⏱ Estimated reading time: 15 minutes

Section 4 of 4

Analysing Survey Data & Sample Size

Design-correct analysis, the design effect, and the formulae for determining sample size in advance.

Three adjustments

Stratification, weights, clustering

Stratification

Combine stratum-specific estimates. Can reduce overall SE when the stratifying variable predicts the outcome.

Sampling weights

Weight = 1 / p(selection). Corrects for unequal selection probabilities. May change both point estimate and SE.

Clustering

Within-cluster similarity inflates variance. Adjust SE at the PSU level. Ignoring it produces SE values that are 30–80% too small for CCHS data.

Horvitz-Thompson weight

\[ \color{#0B7B6B}{w_i} = \frac{1}{\color{#C2410C}{\pi_i}} \]

w_i survey weight for unit iπ_i selection probability of unit i

Quantifying the cost of complexity

The design effect (deff)

Design effect (Kish, 1965)

\[ \color{#0B7B6B}{\text{deff}} = \frac{\color{#C2410C}{\text{Var}_{\text{complex}}}}{\color{#6D28D9}{\text{Var}_{\text{SRS}}}} \]

deff design effectVar_complex variance under the actual designVar_SRS variance under simple random sampling

deff = 1

Complex design is as precise as SRS.

deff > 1

Less precise than SRS. Brazil diarrhea study: deff = 4.43. The variance was 4.43 times the SRS equivalent.

Two corrections

Finite population correction and bootstrap weights

Finite population correction (FPC)

\[ \color{#0B7B6B}{\text{FPC}} = \frac{\color{#C2410C}{N} - \color{#6D28D9}{n}}{\color{#C2410C}{N} - 1} \]

FPC finite population correctionN population sizen sample size

Apply when sampling fraction > 10%, in simple or stratified designs only. Not applicable in multistage sampling.

CCHS / CHMS practice: Statistics Canada releases 500 bootstrap replicate weights (Rao & Wu, 1988). Run your analysis 500 times; use survey or srvyr in R. Ignoring the weights typically makes standard errors 30–80% too small.

Four formulae

Key sample-size formulae

Estimate a proportion

\[ \color{#0B7B6B}{n} = \frac{\color{#C2410C}{Z_{\alpha}^{2}}\, \color{#6D28D9}{p}\, \color{#1D4ED8}{q}}{\color{#BE185D}{L^{2}}} \]

n required sample sizeZ_α z-value for the confidence levelp expected proportionq 1 − pL margin of error

Estimate a mean

\[ \color{#0B7B6B}{n} = \frac{\color{#C2410C}{Z_{\alpha}^{2}}\, \color{#6D28D9}{\sigma^{2}}}{\color{#BE185D}{L^{2}}} \]

n required sample sizeZ_α z-value for the confidence levelσ² population varianceL margin of error

Compare two proportions (per group)

\[ \color{#0B7B6B}{n} = \frac{\left[\color{#C2410C}{Z_{\alpha}}\sqrt{2\bar{p}\bar{q}} - \color{#6D28D9}{Z_{\beta}}\sqrt{p_1 q_1 + p_2 q_2}\right]^{2}}{(\color{#1D4ED8}{p_1} - \color{#BE185D}{p_2})^{2}} \]

n sample size per groupZ_α z-value for significanceZ_β z-value for powerp₁ proportion in group 1p₂ proportion in group 2

Clustering inflates n

Design-effect and FPC adjustments

Clustering adjustment

\[ \color{#0B7B6B}{n'} = \color{#C2410C}{n} \times \bigl[1 + \color{#6D28D9}{\rho}\,(\color{#1D4ED8}{m} - 1)\bigr] \]

n' cluster-adjusted sample sizen unadjusted sample sizeρ intracluster correlationm average cluster size

Finite population correction adjustment

\[ \color{#0B7B6B}{n'} = \frac{1}{\dfrac{1}{\color{#C2410C}{n}} + \dfrac{1}{\color{#6D28D9}{N}}} \]

n' corrected sample sizen uncorrected sample sizeN population size

Brazil example: unadjusted n = 685 per group → after clustering (ρ = 0.45, m = 6) → n = 2,230 per group. More than three times the unadjusted estimate.

Carry forward

What to take into the final section

Design-correct analysis requires accounting for stratification, weights, and clustering. Ignoring them produces false precision.
deff > 1 means clustering has cost you precision; inflate the required sample size accordingly with $ n' = n \times [1 + \rho(m-1)] $.
Sample-size calculations are assumption documents: precision, variance, confidence level, power, plus adjustments for clustering, attrition, and finite populations.

Introduction and Overview

An earlier section walked through the major probability designs. Real surveys (the CCHS being the canonical Canadian example) typically combine several of these: stratification at the top level, clustering within strata, multistage selection within clusters. That combination is precisely what makes the analysis non-trivial: a complex sample design demands a complex analysis. The first half of this section covers how to do that analysis correctly. The second half closes the loop by walking through how to determine sample size before the data are collected.

Learning Objectives

Explain how stratification, sampling weights, and clustering affect the analysis of survey data.
Define the design effect and the finite population correction.
Describe the key factors that determine sample size.
Apply basic sample-size formulae for estimating proportions and means.

Analysing Complex Survey Data

When data come from a complex sampling design (involving stratification, weighting, or clustering), the analysis must account for these features. Ignoring them can lead to incorrect point estimates and underestimated standard errors.

Accounting for Stratification

If the population was divided into strata before sampling, this must be reflected in the analysis. Stratification provides stratum-specific estimates and can reduce the standard error of the overall estimate if the stratifying variable is related to the outcome.

However, stratification alone does not change the overall point estimate; it primarily affects precision. The total population size in each stratum must be known to compute appropriate sampling weights.

Sampling Weights

Not all individuals in a probability sample necessarily have the same probability of selection. The sampling weight for each individual is the inverse of their overall selection probability; this inverse-probability weighting underlies the Horvitz-Thompson estimator introduced by Horvitz & Thompson (1952), which produces unbiased totals and means from any probability sample with known inclusion probabilities.

The probability of selection depends on multiple stages. For example, in a household survey:

p(selection) = (n/N) × (m/M)

where n = households in sample, N = households in source population, m = individuals selected per household, and M = total people in that household.

Multiplying the two stages captures a simple idea: your overall chance of being sampled is the chance your household is chosen times the chance you are then picked within it.

The sampling weight = 1/p(selection). This weight reflects how many people in the source population each sampled individual "represents." Incorporating weights may change both the point estimate and the standard error.

Accounting for Clustering

In cluster and multistage sampling, individuals within groups are usually more alike than randomly chosen individuals. This means observations are not independent, and standard errors must be adjusted upward.

The most common approach is to identify the primary sampling unit (PSU) and adjust all standard error calculations for clustering at that level. The technique called variance linearisation is widely used for this purpose and requires a large number of PSUs to be reliable.

The Design Effect (deff)

The design effect (deff) summarizes the overall impact of the sampling plan on precision. It is the ratio of the variance from the complex sampling design to the variance that would have been obtained from a simple random sample of the same size. The concept and the term were coined by Leslie Kish in his classic textbook Survey Sampling (Kish, 1965) and remain the standard summary statistic for complex-design efficiency.

Interpreting the Design Effect

A deff > 1 means the complex design produces less precise (larger variance) estimates than a simple random sample would. For example, in the Brazil diarrhea study, the deff was 4.43, meaning the variance of the incidence estimate was 4.43 times larger than what a simple random sample of the same size would have produced.

Example: Impact of Survey Design on Estimates

Type of Analysis	Incidence Estimate	SE
Simple random sample (assumed)	0.1462	0.0061
+ Stratification	0.1462	0.0059
+ Stratification + Weights	0.1751	0.0091
+ Clustering	0.1462	0.0088
All features combined	0.1751	0.0128

Notice how incorporating all features of the sampling plan changes both the point estimate (from 14.62% to 17.51%) and dramatically increases the standard error (from 0.0061 to 0.0128). Ignoring the sampling design would give a misleadingly precise, and potentially incorrect, result.

Canadian Practice: Bootstrap Weights for the CCHS and CHMS

Statistics Canada distributes the CCHS and CHMS with a set of 500 bootstrap replicate weights rather than releasing the underlying cluster identifiers (which would risk re-identification). The rescaling-bootstrap method that produces these weights was developed by Rao & Wu (1988). To get correct standard errors you re-run your analysis 500 times, once with each replicate weight, and combine the results.

Most analysts use survey or srvyr in R, svy commands in Stata, or SAS PROC SURVEY* procedures. If you ignore the bootstrap weights and just analyse the CCHS as if it were a simple random sample, your standard errors will typically be 30–80% too small, and your confidence intervals and p-values become meaningless.

Finite Population Correction (FPC)

When the proportion of the population sampled is relatively large (>10%), precision improves beyond what would be expected from an "infinite" population. The finite population correction adjusts the estimated variance downward:

FPC Formula

FPC = (N − n) / (N − 1)

where N is the population size and n is the sample size. The FPC should not be applied in multistage sampling even if the number of PSUs sampled exceeds 10% of the total PSUs. It is only applicable to descriptive studies using simple or stratified random sampling.

The intuition is straightforward: once you have already measured a large share of the population, little of it is left to be uncertain about, so the estimate is more precise than the standard (infinite-population) formula assumes.

Analysing data correctly is necessary but not sufficient. Just as important is making sure you collected enough data to begin with; an underpowered study cannot be rescued by clever analysis. The remainder of this section covers sample-size calculation.

Sample-Size Determination

Choosing the right sample size involves both statistical and non-statistical considerations. Non-statistical factors include available resources (time, money, personnel) and the nature of the sampling frame. Statistical considerations include:

Precision of the Estimate ▼

The more precise you need your estimate to be, the larger the sample you need. If you want to know diarrhea prevalence within ±5%, you need more subjects than if ±10% is acceptable. Precision is denoted L (the "allowable error" or half the desired confidence interval width).

Expected Variation in the Data ▼

For proportions, variance = p × q (where q = 1 − p). You need a rough estimate of the proportion to calculate the required sample size. For continuous variables like BMI, you need an estimate of the population variance (σ²). One approach: estimate the range that covers 95% of values, divide by 4 to get σ, then square it for σ².

Level of Confidence ▼

The confidence level (typically 95%) determines how sure you want to be that the confidence interval includes the true population value. This is linked to the Z-value: for 95% confidence, Z_α = 1.96. Higher confidence requires a larger sample.

Power (for Analytic Studies) ▼

In analytical studies, you also need to specify the desired power (often 80%). Power determines the sample size needed to detect a specific effect size. For 80% power, Z_β = −0.84. Greater power requires a larger sample.

Key Sample-Size Formulae

Objective	Formula	Variables
Estimate a proportion	n = Z_α² × p × q / L²	p = expected proportion; L = precision
Estimate a mean	n = Z_α² × σ² / L²	σ² = population variance; L = precision
Compare 2 proportions	n = [Z_α√(2pq) − Z_β√(p₁q₁ + p₂q₂)]² / (p₁−p₂)²	p = (p₁+p₂)/2; n = per group
Compare 2 means	n = 2[(Z_α−Z_β)² × σ²] / (μ₁−μ₂)²	σ² = population variance; n = per group
FPC adjustment	n′ = 1 / (1/n + 1/N)	n = initial estimate; N = population size
Clustering adjustment	n′ = n × [1 + ρ(m−1)]	ρ = intra-class correlation; m = cluster size

Worked Example: Comparing Two Proportions

Suppose you want to determine if rainwater cisterns reduce the monthly risk of diarrhea from 15% to 10%. With 95% confidence and 80% power:

p₁ = 0.15, p₂ = 0.10, p = 0.125, q = 0.875

Applying the formula yields n = 685 per group, so you would need 1,370 total individuals (685 with cisterns, 685 without).

If the outcome is clustered within households (ρ = 0.45, average household size m = 6), the clustering adjustment increases the requirement to 2,230 per group, more than triple the unadjusted estimate.

That multiplier, 1 + ρ(m - 1) = 3.25, is itself the design effect for this clustered design: because responses within a household are correlated, each additional person in a cluster carries less new information than an independent draw, so a larger overall sample is needed to reach the same precision.

R Activity: Sampling designs, survey weights, and sample size

The companion R script r-activities/HSCI_341_Lesson_3_Sampling.R walks through three blocks: (A) drawing simple random, stratified, and cluster samples in base R; (B) computing weighted prevalence with the survey package; and (C) running sample-size calculations with power.prop.test and power.t.test, then adjusting for clustering via a design effect.

# PART A -- three probability sampling designs from a 1,000-row frame
set.seed(341)
N <- 1000
frame <- data.frame(id = 1:N,
                    province  = sample(c("BC", "AB", "ON", "QC"), N, replace = TRUE),
                    household = sample(1:300, N, replace = TRUE))

srs   <- frame[sample(N, 100), ]                        # simple random
strat <- do.call(rbind, by(frame, frame$province,
                          function(d) d[sample(nrow(d), 25), ])) # stratified
sel_hh <- sample(unique(frame$household), 30)
clust  <- frame[frame$household %in% sel_hh, ]               # cluster

c(SRS = nrow(srs), Stratified = nrow(strat), Cluster = nrow(clust))

# PART B -- design-corrected prevalence with the survey package
library(survey)
dat <- data.frame(province = sample(c("BC","AB","ON","QC"), 2000, replace = TRUE),
                  smoker   = rbinom(2000, 1, 0.18),
                  weight   = runif(2000, 800, 2200))
des <- svydesign(ids = ~1, strata = ~province, weights = ~weight, data = dat)

mean(dat$smoker)                                # naive (unweighted)
svymean(~smoker, design = des)                  # design-corrected
confint(svymean(~smoker, design = des))         # 95% CI

# PART C -- sample-size calculations + design-effect adjustment
power.prop.test(p1 = 0.15, p2 = 0.10,
                power = 0.80, sig.level = 0.05)        # two proportions
power.t.test(delta = 5, sd = 14,
             power = 0.80, sig.level = 0.05)           # two means (SBP)

n_srs <- 685; rho <- 0.45; m <- 6
ceiling(n_srs * (1 + rho*(m - 1)))                # cluster-adjusted n

What you should be able to do after this activity: draw each of the three probability samples, fit a survey design and report a weighted prevalence with its CI, and compute a sample size for two proportions, two means, and a cluster design.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers you produced. Look at your console output before answering.

1. The line c(SRS = ..., Stratified = ..., Cluster = ...) printed three sample sizes. What were the three numbers, and which design gave you the most variable sample size on a re-run? Why is the cluster sample size NOT exactly 100 or 200 here?

Model answerThe three numbers are typically: SRS = 100 (exact), Stratified = 100 (exact, by construction of the strata sizes), Cluster = around 96–120 depending on the seed (varies). Cluster has the most variable sample size on re-run because it samples clusters, not individuals; once you pick a cluster, you take everyone in it, so the total n depends on the actual sizes of the sampled clusters. SRS and stratified explicitly draw fixed individual counts, so their n is deterministic.

2. Compare mean(dat$smoker) with svymean(~smoker, design = des). Were they nearly the same, and why does that make sense given that the weights came from runif(800, 2200) with no relationship to province or smoker?

Model answermean(dat$smoker) and svymean(~smoker, design = des) were nearly identical because the weights drawn from runif(800, 2200) are independent of both province and smoker status. Weights only matter when they are correlated with the variable being estimated (or with selection probability); under random weights with no informative structure, the weighted mean equals the unweighted mean in expectation. The simulation's point: weights fix design-induced bias only when there is design-induced bias to fix.

3. power.prop.test(p1 = 0.15, p2 = 0.10, power = 0.80, ...) returned an n per group, and the cluster adjustment (rho = 0.45, m = 6) multiplied 685 by roughly 3.25 to give ~2,227. In one sentence, what does that ratio tell you about the price of cluster sampling vs. SRS?

Model answerThe cluster sample needs 2,227 / 685 ≈ 3.25 times as many participants as an SRS to achieve the same statistical power. That ratio is the design effect = 1 + (m−1)ρ = 1 + 5(0.45) = 3.25, confirming the formula. In plain terms: when responses within clusters are correlated (ρ = 0.45 is large), each additional person in a cluster adds less new information than a fresh independent draw; you pay for the operational convenience of cluster sampling by needing a much larger overall n.

Saved.

Reflection

Why do you think it is important to account for clustering when determining sample size? What would happen to your study conclusions if you ignored the clustering effect?

Model answerIgnoring clustering treats every observation as an independent draw, but cluster-correlated data violate that assumption; nearby students share the same teacher, the same socioeconomic context, the same outbreak exposure. The result is under-stated standard errors: CIs are too narrow, p-values too small, and you reject the null when you should not. Concretely, a study analyzing 1,000 students from 50 classes as if they were 1,000 independent observations would report SE that is too tight by a factor of √DE ≈ 1.8 (for ρ = 0.45, m = 6). Conclusions would be over-confident: false-positive associations declared as real, replication fails, and policy decisions made on a sandcastle. The fix: cluster-robust standard errors, multilevel/mixed models, or generalised estimating equations, chosen by data structure and inferential target.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Complex survey analyses must account for stratification, sampling weights, and clustering to produce correct estimates and valid standard errors.
The design effect (deff) quantifies how much less precise a complex design is relative to a simple random sample.
Sample size depends on desired precision, expected variance, confidence level, and (for analytic studies) power.
Clustering can dramatically increase the required sample size, especially when the intra-class correlation is high.
The finite population correction reduces sample size requirements when sampling a large fraction (>10%) of the population.

✦ Complete the reflection and pass the knowledge check with 100% to continue

HSCI 341 · Lesson 3

Fundamental Epidemiological Concepts and Approaches

Sampling

Learning objectives for this lesson:

Glossary: Key Terms, People & Concepts

Introduction to Sampling

The Bridge Between Question and Data

Introduction to Sampling

Census vs. sample

Census

Sample

Descriptive vs. analytic

Descriptive

Analytic

The population hierarchy

Random variables, expected value, variance

Random variable

Expected value (μ)

Variance (σ²)

Probability distributions in public health

The Central Limit Theorem

What to take into the next section

Introduction and Overview

Learning Objectives

Census vs. Sample

Key Distinction

Canadian Examples: Census vs. National Health Surveys

Descriptive vs. Analytic Studies

Descriptive Studies (Surveys)

Analytic Studies

Hierarchy of Populations

Validity: Internal and External

The Sampling Frame

Example: Brazil Diarrhea Study

Canadian Examples: Sampling Frames You Will Actually Use

Probability Theory: Why Sampling Works

Three Foundational Ideas

Why You Should Care

Types of Probability Distributions

Discrete distributions

Continuous distributions

How to Read a Distribution

🔥 Try it Yourself: Distribution Simulator

How to use this

Distribution

Parameters

Sampling

About Binomial: n = 20, p = 0.30

Central Limit Theorem demonstration

Population shape

Sample size

Side-by-side: shapes at a glance

Display options

How to choose a distribution in practice

Sampling Distributions and the Central Limit Theorem

The Two Pillars

Caveat: The CLT Is About Means, Not Individuals

Key Takeaways

Types of Error & Non-Probability Sampling

Types of Error & Non-Probability Sampling

Type I and Type II error

Type I error (α)

Type II error (β)

Power = 1 − β

Three non-probability designs

Judgement

Convenience

Purposive

Snowball and hybrid designs

Respondent-driven sampling

Time-location sampling

What to take into the next section

Introduction and Overview

Learning Objectives

Types of Error

Table 3.1. Types of Error

🎲 Interactive: Sample Size & the Law of Large Numbers

Population Distribution

Sampling Distribution of the Mean

Non-Probability Sampling