HSCI 341 — Lesson 3

Sampling

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Distinguish between a census and a sample, and between descriptive and analytic studies
  • Describe the hierarchy of populations and the concept of a sampling frame
  • Explain types of error, including Type I and Type II errors, and the concept of statistical power
  • Compare non-probability sampling methods (judgement, convenience, purposive)
  • Describe probability sampling methods (simple random, systematic, stratified, cluster, multistage, targeted)
  • Understand the implications of complex sampling designs on data analysis
  • Compute required sample sizes for common analytic objectives

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Reference

Glossary — Key Terms, People & Concepts

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and ideas you will meet in this lesson. Use it as a reference while you work through the material, or as a review before assessments. Type in the search box to filter entries.

Key Concepts & Ideas
Target Population The full population to which study findings are intended to apply — defined by the research question.
Study (Source) Population The accessible subset of the target population from which a sample can actually be drawn (also called the source population).
Sampling Frame The actual list, register, or operational mechanism used to enumerate and reach members of the study population. Coverage error arises when the frame omits or double-counts target members.
Sampling Unit The element selected at each stage of the sampling process — could be an individual, household, school, or cluster, depending on design.
Census Data collection from every member of a population. Eliminates sampling error but is usually impractical and may suffer from large non-response.
Probability Sampling Sampling design in which every unit in the frame has a known, non-zero probability of selection. Required for valid statistical inference using survey weights.
Non-Probability Sampling Sampling without known selection probabilities — e.g., convenience, purposive, snowball, quota. Useful for hard-to-reach groups but inference to a population is limited.
Sampling Error Random variation in estimates that arises because we observe a sample rather than the whole population. Quantified by the standard error and decreases as sample size increases.
Non-Sampling Error Errors arising from sources other than random sampling — coverage, non-response, measurement, processing. Often more consequential than sampling error.
Type I Error (α) Rejecting a true null hypothesis — a false positive. Conventionally controlled at 0.05.
Type II Error (β) Failing to reject a false null hypothesis — a false negative. Power = 1 − β, conventionally targeted at 0.80 or higher.
Statistical Power The probability that a study correctly detects an effect of a given size when one exists. Depends on sample size, effect size, variability, and α.
Design Effect (DEFF) Ratio of the variance under the actual complex design to the variance under simple random sampling of the same size. DEFF > 1 indicates the effective sample size is smaller than n.
Oversampling Deliberately sampling certain subgroups at higher rates than their population share to improve subgroup precision. Requires weighting in analysis.
Non-Response Bias Systematic error that occurs when individuals who do not respond differ in important ways from those who do.
Sampling Designs & Methods
Simple Random Sample (SRS) A probability sample in which every possible sample of size n from the frame has an equal chance of selection. The reference design against which others are compared.
Systematic Sampling Selecting every kth unit from the frame after a random start. Approximates SRS unless periodicity in the list aligns with the sampling interval.
Stratified Sampling Dividing the population into mutually exclusive strata and sampling independently within each. Improves precision when strata differ in the outcome and ensures representation of subgroups.
Cluster Sampling Sampling intact groups (clusters) such as schools or villages instead of individuals. Reduces field cost but inflates variance because units within a cluster are correlated.
Multistage Sampling A design that selects clusters in successive stages (e.g., regions → neighbourhoods → households → people). Common in national surveys.
Sampling Weight The reciprocal of the selection probability (often adjusted for non-response and post-stratification). Each respondent represents 1/weight people in the population.
Convenience Sampling Recruiting whoever is most accessible. Fast and cheap but vulnerable to selection bias; common in pilot studies.
Purposive (Judgement) Sampling Researcher selects units believed to be informative or representative based on theoretical criteria. Common in qualitative work.
Snowball Sampling Existing participants refer additional participants. Useful for hidden or stigmatised populations; respondent-driven sampling (RDS) is a probability-based variant.
Quota Sampling Non-probability sampling that fills predefined quotas (e.g., by age and sex) using convenience recruitment within each cell.
No matching entries. Try a different search term.
Section 1

Introduction to Sampling

⏱ Estimated reading time: 22 minutes

Introduction and Overview

Lesson 1 closed with the counterfactual framework: causal inference requires us to compare groups under different exposure conditions. This lesson asks the immediate next question: which people should be in those groups? Sampling sits at the foundation of every study you'll design in HSCI 341 because the choice of who ends up in the study determines almost everything else — the validity of the results, the generalisability of conclusions, and the bias inventory you spent HSCI 230 learning to recognise. The four content sections move from broad principles to concrete formulas: Section 1 sets up the population hierarchy, sampling frame, and the probability theory that makes inference from a sample possible; Section 2 covers types of error and non-probability sampling; Section 3 details the major probability sampling designs; Section 4 closes with how to analyse data from complex surveys and how to determine sample size in advance.

Learning Objectives

  • Distinguish between a census and a sample.
  • Contrast descriptive and analytic studies.
  • Describe the hierarchy of populations (target, source, study sample).
  • Explain the concepts of internal and external validity.
  • Define a sampling frame and explain its importance.
  • Describe the foundational ideas of probability theory — random variables, expected value, and variance — that underpin statistical inference from a sample.
  • Recognise common probability distributions (Bernoulli, Binomial, Poisson, Normal, Exponential, Uniform) and identify the public-health phenomena they typically describe.
  • Explain what a sampling distribution is and state the Central Limit Theorem in plain language.

Census vs. Sample

When we conduct research, we need data from either all individuals in a population or a subset of them. The process of obtaining this data is called measurement.

In a census, every individual in the population is evaluated. In a sample, data are collected from only a subset. Sampling is generally more convenient and less costly than conducting a full census. Interestingly, even a census can be viewed as a kind of sample — it captures the population at one point in time, making it a "sample" of the population over time.

Key Distinction

In a census, the only source of error is the measurement itself. With a sample, you contend with both measurement error and sampling error. However, a well-planned sample can provide virtually the same information as a census at a fraction of the cost.

Canadian Examples: Census vs. National Health Surveys

Canada runs both kinds of data collection at the population scale, and you will encounter all of them in public health practice:

  • Census of Population (Statistics Canada, every 5 years; most recent 2021). A near-complete enumeration of every household in Canada. The short-form census goes to all households; the long-form census goes to a 25% mandatory sample. Provides the denominators behind almost every population health rate you will calculate.
  • Canadian Community Health Survey (CCHS) — Statistics Canada / Health Canada / PHAC. A continuous cross-sectional sample survey (~65,000 respondents per cycle) covering self-reported health, behaviours, and health-care use. The flagship descriptive survey for population health surveillance.
  • Canadian Health Measures Survey (CHMS) — Statistics Canada / Health Canada / PHAC. A multi-stage sample survey that adds direct physical measurements (blood pressure, biomarkers, fitness) and a biobank to self-report data. Smaller (~5,700 respondents per cycle) but anchors objective measurement of population health.
  • National Population Health Survey (NPHS) — the longitudinal predecessor (1994–2011) to the CCHS, still used for life-course research.

The Census gives you a denominator and demographic context; CCHS and CHMS give you population estimates of health states with sampling error attached. Choosing among them is the first applied sampling decision a public-health analyst makes.

Descriptive vs. Analytic Studies

Samples support two fundamental types of studies:

Descriptive Studies (Surveys)

A descriptive study aims to describe population attributes such as the frequency of disease or the prevalence of an exposure. Surveys answer questions like: "What proportion of people had diarrhea over a 1-month period?" or "What is the average BMI of students in Grade 12?"

The focus is on characterizing the current state of a population rather than establishing cause-and-effect relationships.

Analytic Studies

An analytic study is designed to estimate the magnitude of an association between exposures and outcomes. These studies contrast groups and seek explanations for differences between them.

Examples: "Is water source associated with the incidence of diarrhea?" or "How does time spent playing video games affect the BMI of Grade 12 students?"

Establishing an association is the first step to inferring causation, as discussed in Lesson 1.

Choosing between a census and a sample is the first decision; choosing between a descriptive and an analytic purpose is the second. Both decisions sit on top of an even more fundamental concept — the relationship among the different populations a single study touches.

Hierarchy of Populations

Understanding the different populations involved in a study is essential for evaluating validity. There are three key populations to consider, each nested inside the next. The diagram and accordion below define them in turn; the labels that appear next to the arrows (external validity, sampling frame, internal validity) are the technical vocabulary you will use to talk about how a sample inherits or loses information from the broader population it was meant to represent.

Target Population The broadest group to which you want to generalize results External validity Source Population The population from which study subjects are drawn Sampling frame Study Sample The individuals who actually participate and provide data Internal validity

Figure 3.1 — Hierarchy of populations in epidemiologic research. The target population is the broadest; the source population is the accessible subset; the study sample consists of those who actually participate.

Target Population

The target population is the population to which you want to extrapolate your results. It is often not clearly defined and may vary depending on the perspective of the person interpreting the study. For example, researchers studying rainwater cisterns in Pernambuco State, Brazil might define the target as that state, while someone else may want to generalize the findings to all semi-arid regions of Brazil.

Source Population

The source population is the population from which study subjects are actually drawn. All units in the source population should be "listable" and have a non-zero probability of being included in the study. For example, in a diarrhea study in Brazil, the source population included families from households participating in the One Million Cisterns Project (OMCP).

Study Sample

The study sample (or study group) consists of the individuals who actually end up in the study. It is typically a subset drawn from the source population. Researchers determine the necessary sample size, draw their sample, collect data from eligible subjects, and the final study sample consists of those who agreed to participate and whose data met quality requirements.

Validity: Internal and External

Internal validity refers to whether the study results are valid for members of the source population. It indicates whether the study obtained the "correct" answer for that population. Much of epidemiology is dedicated to methods that ensure internal validity.

External validity involves a subjective assessment of whether results can be generalized to the broader target population. It is generally easier to generalize results from analytic studies (which evaluate associations) than from descriptive studies (which estimate prevalence).

The Sampling Frame

The sampling frame is the list of all sampling units in the source population. Sampling units are the basic elements that will be sampled (e.g., households, individuals). A complete list of all sampling units is required for drawing a simple random sample, though some other methods do not require such a complete listing.

Example: Brazil Diarrhea Study

In a study of water cisterns and diarrhea in Brazil, a suitable sampling frame was the list of all households eligible for the One Million Cisterns Project. Once households were selected, a separate strategy was used for selecting individuals within each household.

Canadian Examples: Sampling Frames You Will Actually Use

National-scale Canadian surveys rarely have a single tidy list of every person. Instead, they assemble a frame from administrative listings:

  • Statistics Canada Address Register (AR) — the dwelling-level frame used by the Census and many StatCan household surveys.
  • Labour Force Survey (LFS) area frame — CCHS draws part of its sample from the LFS area frame (which is itself based on Census enumeration areas).
  • Provincial health insurance registries (e.g., the BC Medical Services Plan client registry) — close to a population census of residents and the backbone of administrative-data research at Population Data BC (PopData BC).
  • Disease registries such as the Canadian Cancer Registry or provincial reportable-disease lists serve as case frames for surveillance.

Notice how each frame has different coverage: a registry-based frame misses people without provincial coverage; an LFS area frame excludes residents on First Nations reserves and in institutions. The frame — not the questionnaire — is usually where exclusions and selection bias enter.

The vocabulary of populations and sampling frames is necessary but not sufficient. The deeper question is why we can learn about a whole population from a small sample at all. The answer comes from probability theory.

Probability Theory: Why Sampling Works

Every quantity we estimate from a sample — a prevalence, a mean, a risk ratio — is the value of a random variable. If we drew a different sample tomorrow, the value would be slightly different. Probability theory is the formal language we use to describe how those values vary, and it is what lets us turn a single sample into a defensible statement about a population.

Three concepts do most of the work in introductory biostatistics:

Three Foundational Ideas

1. A random variable is a numerical outcome of a random process — for example, the number of new TB cases reported in a health region next week, or the systolic blood pressure of the next adult who walks into a clinic.

2. The expected value (also called the mean, written μ or E[X]) is the long-run average of a random variable across many repetitions of the random process. It is the value we are usually trying to estimate.

3. The variance (σ2) and its square root, the standard deviation (σ), measure how spread out the random variable is around its mean. Spread, not the mean alone, is what controls how precisely we can estimate things from a sample.

A second idea worth naming is independence. Two observations are independent when knowing the value of one tells you nothing about the value of the other. Independence is the assumption that lets a small random sample stand in for a much larger population — and it is the assumption most often violated in real public-health data (clustered households, repeated measures on the same person, contagion in infectious disease).

Why You Should Care

Whenever you compute a confidence interval, run a hypothesis test, or quote a margin of error, you are doing arithmetic on a probability distribution — usually a Normal distribution — that describes what would happen if you repeated the study many times. If you don’t know what distribution your statistic comes from, you can’t honestly attach uncertainty to it.

Types of Probability Distributions

A probability distribution describes the values a random variable can take and how likely each value is. Distributions split first into discrete (countable outcomes — 0, 1, 2 cases) and continuous (any value on a range — height, blood pressure, time). The handful below are the ones you will encounter again and again in public health.

Discrete distributions

DistributionWhat it modelsPublic-health example
Bernoulli(p) A single yes/no trial with success probability p. Mean = p, variance = p(1−p). Whether one randomly chosen adult currently smokes.
Binomial(n, p) Number of "successes" in n independent Bernoulli trials. Mean = np, variance = np(1−p). Number of smokers in a CCHS sample of 1,000 adults.
Poisson(λ) Number of rare events in a fixed interval of time, area, or person-time. Mean = variance = λ. New cases of measles per week in a public-health unit; ER visits per hour.

Continuous distributions

DistributionWhat it modelsPublic-health example
Uniform(a, b) Every value between a and b is equally likely. The "default ignorance" distribution. Random-digit dialling within an area code; random selection from a list.
Normal(μ, σ2) The classic bell curve — symmetric, with most mass within 2σ of the mean. The default for many continuous biological measurements and, crucially, for sample means (see CLT below). Adult height; systolic blood pressure; standardized test scores.
Exponential(λ) Time between independent events occurring at a constant rate λ. Right-skewed, memoryless. Mean = 1/λ. Time between successive ED arrivals; survival time under a constant hazard.
Log-normal / Right-skewed Variables that are positive and span several orders of magnitude. The log of the variable is approximately Normal. Household income; hospital length of stay; viral loads.

How to Read a Distribution

Every distribution is summarized by two things: a shape (symmetric? right-skewed? bimodal?) and a small set of parameters that control its location and spread. When you see "BMI ~ Normal(27, 42)", read it as: BMI is approximately Normal, centred at 27 kg/m2, with a standard deviation of 4. About 95% of the population falls within 2 SDs of the mean — roughly 19 to 35.

Binomial(20, 0.3) Discrete · symmetric-ish Poisson(3) Discrete · counts of rare events Normal(μ, σ) Continuous · symmetric bell Exponential(λ) Continuous · right-skewed waits

Figure 3.2 — Four common distributions you will meet in public-health data. Discrete distributions assign probability to whole-number outcomes (left two); continuous distributions describe a smooth curve over a real-valued measurement (right two).

🔥 Try it Yourself: Distribution Simulator

What you'll do: use the simulator below to play with each distribution's parameters and watch the shape change in real time, then run the Central Limit Theorem demo to see why sample means from any population eventually look Normal. What to take away: distributions are not just textbook abstractions — they describe specific public-health phenomena, and the CLT is what lets us trust confidence intervals and power calculations even when the underlying data are skewed.

Aim for at least 10–15 minutes of play; this is the kind of intuition you will draw on for every confidence interval and power calculation in the rest of the course. Use the tabs to switch between exploring single distributions, the CLT demonstration, and a side-by-side comparison.

How to use this

Choose any of the six common distributions on the left. Adjust its parameters to see how the shape, mean, and spread respond. Then click "Draw a sample" to take a random draw and watch the empirical histogram converge on the theoretical curve as your sample grows.

Distribution

Number of successes in n independent yes/no trials. The natural model for "how many out of N have the condition?"

Parameters

Sampling

Tip: Bigger samples produce histograms that look more like the theoretical curve. The "law of large numbers" in action.
Theoretical distribution · Binomial(n = 20, p = 0.30)
The probability mass (discrete) or density (continuous) of the chosen distribution. Red dashed line marks the mean.
Theoretical mean (μ)
6.000
Theoretical SD (σ)
2.049
Sample mean (x̄)
Sample SD (s)

About Binomial — n = 20, p = 0.30

Binomial(n, p) counts the number of successes in n independent Bernoulli trials with the same success probability p.

Mean = np = 6.00, SD = √np(1−p) = 2.05.

  • For large n and moderate p, the binomial looks Normal — this is one of the oldest examples of the CLT.
  • Used for sample-size formulas around proportions and prevalence estimates.
  • Assumes independence and constant p — both can fail in clustered data (households, schools).

Central Limit Theorem demonstration

Pick a population shape (try the heavily skewed ones — that's where the CLT feels most magical). Set a sample size n, then click Draw 1,000 sample means. Each sample of size n is summarised by its mean, and we plot those means. Watch the histogram of means converge on a Normal curve as n grows — even when the underlying population is wildly non-Normal.

Population shape

The CLT says it doesn't matter which one of these you pick — sample means converge on a Normal curve.

Sample size

Each "trial" draws n values from the population and computes their mean.
Compare: Try n = 1 (the means just are the population). Then bump to n = 5, 30, 100. Notice the histogram becomes Normal-shaped and narrower.
Population distribution
This is the underlying "true" distribution we are sampling from. It can be anything — symmetric, skewed, bimodal.
Population mean (μ)
Population SD (σ)
Mean of sample means
SD of sample means (SE)
Predicted SE = σ/√n
Trials drawn
0
Ready. Draw sample means to see the CLT in action.

Side-by-side: shapes at a glance

Six distributions, plotted on the same axes. Use this view to remember which shape goes with which name, and to compare how parameters change appearance. Hover for tooltips.

Display options

Different distributions can describe similar-looking data — choosing the right one depends on the underlying mechanism, not just the shape.
Comparison of common distributions (rescaled to relative density)
Each curve is rescaled so peak density is 1, to make shape comparison easier. Shape, not absolute height, is what matters here.

How to choose a distribution in practice

  • Binary outcome (yes/no for one person)? Bernoulli. Count of "yes"s in a fixed sample? Binomial.
  • Counting rare events in time/space (cases per week, ER visits per hour)? Poisson.
  • Continuous, symmetric biological measurement (BP, height, lab values)? Normal — or check first whether the variable is approximately Normal.
  • Time until next event under a constant hazard? Exponential. Survival under varying hazard? Weibull or Gamma (advanced).
  • Strictly positive, right-skewed, multiplicative process (income, length of stay, viral load)? Log-normal — analyze on the log scale.
  • No prior information about likely values? Uniform on a sensible range.

Each distribution above describes the underlying behaviour of a single random variable. But when we draw a sample from a population, the quantity we typically care about is a summary of that sample — a mean, proportion, or rate. To make inferences from such summaries, we need one more layer of theory.

Sampling Distributions and the Central Limit Theorem

So far we have talked about distributions of individuals in a population. Now consider the distribution of a statistic — for example, the mean of a random sample of n people. Because each sample of size n would give a slightly different mean, the mean itself has a distribution. We call it the sampling distribution of the mean.

Two facts about that sampling distribution drive almost all of frequentist inference:

The Two Pillars

1. Standard error. If individual observations have standard deviation σ, then sample means of size n have standard deviation σ/√n. This quantity — the SD of a statistic — is called the standard error (SE). Quadrupling your sample size halves the standard error.

2. The Central Limit Theorem (CLT). For sufficiently large n, the sampling distribution of the mean is approximately Normal, no matter what the underlying population looks like — even if the population is heavily skewed, bimodal, or discrete. "Sufficiently large" is often around n = 30 for moderately skewed distributions, and much smaller for symmetric ones.

The CLT is the hidden engine behind the bell curve that shows up everywhere in statistics. It is the reason a 95% confidence interval can be written as estimate ± 1.96 · SE: the 1.96 comes from the Normal distribution that the CLT promises us applies to the estimate, even when we have no idea what shape the underlying population has. The theorem traces from de Moivre (1733) through Laplace's 1812 binomial approximation to Lyapunov's 1901 general proof (Central limit theorem, Wikipedia).

Population (skewed) e.g., household income draw samples of size n Means at n=2 Still skewed Means at n=10 Approaching Normal Means at n=50 Tightly Normal

Figure 3.3 — The Central Limit Theorem in action. The population can be wildly skewed, but as n grows the sampling distribution of the mean becomes increasingly Normal and increasingly narrow (its SD = σ/√n).

Caveat: The CLT Is About Means, Not Individuals

A common student misconception is that “a large enough sample makes the data Normal.” It does not. Income data with 1,000 observations is still right-skewed. What becomes Normal is the distribution of the sample mean across hypothetical repeated samples — that is what we use to construct confidence intervals around the mean. Inference for medians, proportions, ratios, or extreme values relies on the CLT or its analogues in different ways and may need larger n or different methods (bootstrapping, exact methods).

Key Takeaways

  • A census measures everyone; a sample measures a subset — both involve measurement, but samples also introduce sampling error.
  • Descriptive studies characterize populations; analytic studies evaluate associations between exposures and outcomes.
  • The three populations (target, source, study sample) form a hierarchy, each linked to different aspects of study validity.
  • The sampling frame is the list of all units from which the sample is drawn — in Canadian practice this often means a StatCan address register, an LFS area frame, or a provincial health insurance registry.
  • Random variables, expected values, and variances are the language we use to describe how sample-based estimates vary — they are the foundation of every confidence interval and hypothesis test that follows.
  • A small set of probability distributions (Bernoulli, Binomial, Poisson, Normal, Exponential, log-normal) describes the bulk of public-health phenomena you will encounter.
  • The standard error of the mean is σ/√n, and by the Central Limit Theorem the sampling distribution of the mean is approximately Normal for sufficiently large n — this is the engine behind frequentist inference.
Knowledge Check — Section 1

1. What is the key difference between a census and a sample?

A census collects data from every individual in the population, while a sample collects data from a subset. Samples introduce sampling error but are more practical and cost-effective.

2. An analytic study differs from a descriptive study in that it:

Analytic studies contrast groups and seek to estimate the strength of associations between exposures and outcomes, while descriptive studies aim to characterize population attributes.

3. Internal validity refers to whether:

Internal validity concerns whether the study produced the "correct" answer for the source population. External validity concerns generalizability to the broader target population.

4. Which distribution would best describe the number of new measles cases reported per week in a public-health unit, where cases are rare and arrive roughly independently?

The Poisson distribution describes the number of rare, independent events occurring in a fixed interval of time, area, or person-time — the canonical model for case counts in surveillance.

5. The Central Limit Theorem says that, for sufficiently large n:

The CLT is about the sampling distribution of the mean, not the raw data. Even if the underlying population is heavily skewed, the distribution of the sample mean across many hypothetical samples becomes approximately Normal as n grows.

6. If a measurement has population standard deviation σ = 12, and you draw a random sample of n = 144, what is the standard error of the sample mean?

The standard error of the mean is σ/√n = 12/√144 = 12/12 = 1. Note that quadrupling n halves the SE — precision grows with the square root of sample size, not linearly.

✦ Pass the knowledge check with 100% to continue

Section 2

Types of Error & Non-Probability Sampling

⏱ Estimated reading time: 12 minutes

Introduction and Overview

Section 1 set up the population hierarchy and the probability theory behind sampling. Section 2 turns to two practical consequences. First: when we draw conclusions from a sample, we make systematic types of mistake (Type I and Type II errors), and these mistakes are quantifiable. Second: there are sampling strategies that abandon probability theory altogether — convenience samples, judgement samples, snowball samples — with predictable consequences for inference. Both halves of this section give you the vocabulary to evaluate when those approaches are acceptable and when they are not.

Learning Objectives

  • Explain the two types of statistical error (Type I and Type II).
  • Define the null hypothesis, P-values, and statistical power.
  • Describe three non-probability sampling methods and their limitations.

Types of Error

In any study based on a sample, the variability of the outcome, measurement error, and sample-to-sample variability all affect results. When making inferences based on sample data, they are subject to error. Within hypothesis testing in analytical studies, there are two key types of error:

Table 3.1 — Types of Error

Conclusion of AnalysisEffect Truly PresentEffect Truly Absent
Effect present (reject null)CorrectType I (α) error
No effect (accept null)Type II (β) errorCorrect
Type I (α) Error

A Type I error occurs when you conclude that the outcomes in the groups are different (i.e., that an association exists), when in fact they are not. In other words, you falsely reject the null hypothesis. The probability of a Type I error is denoted α.

Statistical tests are aimed at disproving the null hypothesis (that there is no difference between groups). When P ≤ 0.05, we are "reasonably sure" that any detected effect is not due to chance — but there remains a 5% chance of making a Type I error.

Type II (β) Error

A Type II error occurs when you conclude that there is no association between the exposure and outcome, when in fact there is. You fail to reject the null hypothesis when you should have. The probability of a Type II error is denoted β.

Reasons a study might fail to find a real effect include: the exposure truly had no effect, the study design was inappropriate, the sample size was too small (low power), or simply bad luck.

Statistical Power

Power is the probability that you will find a statistically significant difference when a real difference of a defined magnitude exists. Mathematically, power = 1 − β.

For example, if a study has 80% power, it has an 80% chance of detecting a true effect of the specified size. To increase power, you need to increase the sample size. So-called negative findings (failure to find a difference) are less commonly reported in the literature, partly because many studies lack adequate power.

🎲 Interactive: Sample Size & the Law of Large Numbers

What you'll do: pick a population, set a sample size n, draw repeated samples, and watch the distribution of sample means concentrate around the true population mean as n grows. What to take away: the standard error shrinks as 1/√n — this is the formal reason why “more data” means better precision and higher statistical power. The intuition you build here drives every confidence interval and sample-size calculation in the rest of the course.

Population Distribution

The "true" distribution we're sampling from. Red line = true mean (μ).

0.506.50μ=3.50Fair die
Sampling Distribution of the Mean

Each bar is the count of sample means falling in that range. Yellow = most recent sample's mean.

Draw samples to see distribution of means0.506.50Sample mean (x̄)
True mean (μ)
3.50
Mean of sample means
Std. error (observed)
Theoretical SE = σ/√n
0.54
Samples drawn
0
Last sample mean
Try this: set n = 1, draw 100 samples, then bump n to 50 and draw 100 more (after Reset). The spread of the yellow histogram shrinks dramatically — that's 1/√n precision gain in action.

The error framework above assumes a probability sample. The next subsection covers what happens when investigators forgo formal random selection altogether — and what that costs them.

Non-Probability Sampling

Samples drawn without an explicit method for determining each individual's probability of selection are known as non-probability samples. Whenever there is no formal process for random selection, the sample should be considered non-probability. Sample selection that is unrelated to the outcome of interest leaves inference intact, but selection that depends on unmeasured determinants of the outcome produces selection bias — a form of specification error formalised by Heckman (1979) and reviewed for hidden populations by Sudman & Kalton (1986). There are three main types:

Click each card to learn more:

Judgement
Sample
Click to learn more
Convenience
Sample
Click to learn more
Purposive
Sample
Click to learn more

Important Limitation

Non-probability samples are generally inappropriate for descriptive studies because you cannot generalize prevalence estimates to the source population without knowing each individual's probability of being included. However, non-probability methods are commonly used in analytical studies where comparing exposure groups is the priority.

Chain-Referral and Hybrid Designs

Snowball sampling — first formalised by Goodman (1961) — recruits hidden-population members through peer referrals and is widely used when no sampling frame exists. Two newer hybrids partially recover probability-style inference: respondent-driven sampling (RDS), introduced by Heckathorn (1997) and extended with unbiased estimators by Salganik & Heckathorn (2004); and time-location (venue-based) sampling, applied at national scale for HIV behavioural surveillance by MacKellar and colleagues (2007). Magnani, Sabin, Saidel, & Heckathorn (2005) review when each design is appropriate for hard-to-reach populations.

Key Takeaways

  • Type I (α) error means falsely concluding there is an effect; Type II (β) error means missing a real effect.
  • Power (1 − β) is the probability of detecting a true effect; increasing sample size increases power.
  • Non-probability samples (judgement, convenience, purposive) lack a formal random selection process and are primarily used in analytic studies.
Knowledge Check — Section 2

1. A Type I (α) error occurs when you:

A Type I error is a "false positive" — you reject the null hypothesis and conclude there is an association when there really is not.

2. Statistical power is defined as:

Power = 1 − β, where β is the Type II error rate. A study with 80% power will detect a true effect 80% of the time.

3. Why are non-probability samples generally inappropriate for descriptive studies?

Without knowing the probability of selection for each individual, you cannot reliably estimate population parameters like prevalence or incidence.

✦ Pass the knowledge check with 100% to continue

Section 3

Probability Sampling Methods

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Section 2 closed by walking through what non-probability sampling looks like and why it forfeits inferential validity. Section 3 returns to probability sampling and details the major variants you'll encounter in real surveys. The six tabs below are not interchangeable — each design buys a different combination of cost, complexity, and statistical efficiency. Read the comparison table at the end of the section as a decision aid you can return to whenever you're choosing a design.

Learning Objectives

  • Define a probability sample and explain why random selection is essential.
  • Describe simple random, systematic random, stratified random, cluster, multistage, and targeted (risk-based) sampling methods.
  • Identify the advantages and disadvantages of each method.

What Is a Probability Sample?

A probability sample is one in which every element in the population has a known, non-zero probability of being included. This implies that a formal process of random selection has been applied to the sampling frame. The key advantage is that probability samples allow for valid statistical inferences about the source population.

Random ≠ Haphazard

Random selection uses a formal, reproducible process (e.g., computer-generated random numbers, random number tables) — it is not the same as selecting participants haphazardly or arbitrarily.

Types of Probability Sampling

Simple Random Sample

In a simple random sample, every study subject in the source population has an equal probability of being included. A complete list of the source population is required, and a formal random process is used to select individuals.

Example: To study wait times in a hospital emergency room, you need 1,000 records from 13,000 admissions over the past year. You randomly generate 1,000 numbers between 1 and 13,000 and pull those records.

Advantage: Conceptually simple; all standard statistical analyses apply directly.

Limitation: Requires a complete list of the entire source population.

Systematic Random Sample

In a systematic random sample, a complete list is not required — you only need an estimate of the total population and sequential access to individuals. The sampling interval (j) is computed as the population size divided by the desired sample size.

How it works: Randomly pick a starting point between 1 and j, then select every jth subject after that.

Example: To sample 1,000 from 13,000 emergency patients, the sampling interval is 13. Randomly pick a number between 1 and 13 for your starting patient, then select every 13th patient thereafter.

Caution: Bias may occur if the factor you are studying is related to the sampling interval (e.g., periodic patterns in admissions).

Stratified Random Sample

The population is divided into mutually exclusive strata based on factors likely to affect the outcome. Then, within each stratum, a simple or systematic random sample is chosen. The mathematical foundations of stratification — including the now-standard optimum (Neyman) allocation rule for assigning sample size across strata — were laid out by Neyman (1934) in his landmark Royal Statistical Society paper that effectively founded probability sampling theory.

In proportional stratified sampling, the number sampled from each stratum is proportional to that stratum's share of the total population.

Three key advantages:

  • Ensures all strata are represented in the sample.
  • Can produce more precise overall estimates than a simple random sample because between-strata variation is removed.
  • Allows estimation of stratum-specific outcomes.

Example: If hospital wait times differ between males and females, stratify records by sex and randomly sample within each group.

Cluster Sampling

A cluster is a natural grouping of study subjects with one or more common characteristics (e.g., a household is a cluster of people; a classroom is a cluster of students; a clinic is a cluster of patients).

In cluster sampling, the primary sampling unit (PSU) is the cluster itself, and it is often larger than the unit of concern. Every individual within a selected cluster is included in the sample.

Example: To estimate smoking prevalence among Grade 12 students, randomly select 10 of 47 Grade 12 classes and survey all students in those 10 classes.

Advantage: Easier when getting a list of clusters is simpler than listing all individuals. Often cheaper to visit fewer locations.

Limitation: Individuals within a cluster tend to be more alike, increasing sampling variation for a given sample size compared to SRS.

Important: A sample is only a "cluster sample" if the group is the sampling unit and the individuals within it are the unit of concern. If the group itself is the unit of concern (e.g., "does anyone in the household smoke indoors?"), it is not a cluster sample.

Multistage Sampling

Multistage sampling is similar to cluster sampling, except that after selecting primary sampling units (PSUs), a sample of secondary sampling units (individuals) is drawn within each PSU rather than surveying everyone.

Example: To study smoking among students, first randomly select 10 classes (PSUs), then randomly select 5 students from each class rather than surveying all students in every class. Within-household selection in face-to-face surveys is most often done using a Kish grid — the objective respondent-selection procedure introduced by Kish (1949).

To ensure all individuals have the same probability of being selected, either choose PSUs proportional to their size, or use a constant sampling proportion within each PSU — the latter requires PPS (probability-proportional-to-size) selection at earlier stages.

The number of individuals per cluster (ni) can be optimized by balancing within-cluster and between-cluster variance against the costs of sampling groups versus individuals.

Targeted (Risk-Based) Sampling

Targeted sampling stratifies the source population based on characteristics associated with the probability of disease occurrence, then focuses sampling on strata where disease is most likely to be found.

Individuals are assigned point values based on their probability of having the disease of interest, and sampling proceeds until a predetermined number of points have been sampled. This is an unequal probability sampling strategy — some individuals may even have a zero probability of inclusion.

Advantage: Requires a much smaller sample to detect rare diseases when key risk characteristics can be identified.

Limitation: Key epidemiological parameters (e.g., risk ratios) may not be known for the study population and must be estimated from other evidence.

Comparison of Sampling Methods

MethodRequires Complete List?Key AdvantageKey Limitation
Simple RandomYesSimple; all standard analyses applyNeeds complete population list
SystematicNo (needs estimate)Practical; easy to implementPeriodic bias if factor linked to interval
StratifiedYes (within strata)More precise; ensures representationNeeds to know stratum membership
ClusterList of clusters onlyCheaper; no need to list individualsHigher variance than SRS for same n
MultistageList of PSUs onlyFlexible; cost-effectiveComplex design; needs more subjects
TargetedNo (risk-based)Efficient for rare diseasesNeeds prior knowledge of risk factors

Worked Example: How the CCHS Combines These Methods

The Canadian Community Health Survey illustrates a real multistage probability design in action:

  1. Stratification — the population is first stratified by health region (about 110 health regions across Canada), and a target sample size is allocated to each so that every region produces stable estimates.
  2. Clustering — within each health region, dwellings are sampled from the LFS area frame (groups of dwellings that share a geographic boundary). This is the cluster stage.
  3. Selection within cluster — one person is randomly selected from each chosen household to complete the interview.
  4. Top-up samples — an RDD (random digit dialling) telephone frame fills in coverage for areas where the area frame is sparse.

The result is a probability sample where every Canadian resident has a known, non-zero chance of selection — but where the selection probability differs by region, household size, and frame. That is why CCHS data must be analysed with survey weights and bootstrap replicate weights (covered on the next page).

Reflection

Think of a health research question you are interested in. Which sampling method would be most appropriate, and why? What practical constraints (cost, time, available lists) would influence your choice?

Model answerA defensible answer names the question (e.g., prevalence of food insecurity in BC post-secondary students) and matches design to it. Stratified random sampling is the right default: stratify by institution type (research-intensive vs. teaching-focused vs. community college) and within each stratum draw an SRS proportional to enrolment, oversampling smaller strata to ensure precise estimates. Practical constraints: cost favours administrative-data sampling frames over door-to-door; time favours online survey delivery; available lists (institution registrar files) determine the stratification variables that are feasible. If institutional lists are unavailable, fall back to cluster sampling on courses or classes, accept the design effect, and inflate n accordingly. Document refusal rates and run weighted analysis to address differential non-response.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Probability samples give every element a known, non-zero chance of selection, enabling valid statistical inference.
  • Simple random sampling requires a complete list; systematic sampling needs only sequential access.
  • Stratified sampling improves precision by removing between-strata variation.
  • Cluster and multistage sampling are practical when listing all individuals is impractical, but they require more subjects for the same precision.
  • Targeted sampling is efficient for rare outcomes but requires prior knowledge of risk characteristics.
Knowledge Check — Section 3

1. What defines a probability sample?

A probability sample ensures every member of the population has a known, non-zero probability of being selected through a formal random process.

2. A key advantage of stratified random sampling over simple random sampling is that it:

By dividing the population into homogeneous strata and sampling within each, the between-strata variation is explicitly removed from the overall estimate, potentially improving precision.

3. In cluster sampling, why is sampling variation typically greater than in simple random sampling for the same sample size?

Individuals within a cluster (e.g., students in the same class) are more similar to each other than to individuals in other clusters, which increases sampling variation compared to drawing individuals independently.

4. Targeted (risk-based) sampling is most useful when:

Targeted sampling focuses on high-risk strata to efficiently detect rare diseases, requiring prior knowledge of risk characteristics and a smaller sample size than other methods.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 4

Analysing Survey Data & Sample Size

⏱ Estimated reading time: 15 minutes

Introduction and Overview

Section 3 walked through the major probability designs. Real surveys (the CCHS being the canonical Canadian example) typically combine several of these — stratification at the top level, clustering within strata, multistage selection within clusters. That combination is precisely what makes the analysis non-trivial: a complex sample design demands a complex analysis. The first half of this section covers how to do that analysis correctly. The second half closes the loop by walking through how to determine sample size before the data are collected.

Learning Objectives

  • Explain how stratification, sampling weights, and clustering affect the analysis of survey data.
  • Define the design effect and the finite population correction.
  • Describe the key factors that determine sample size.
  • Apply basic sample-size formulae for estimating proportions and means.

Analysing Complex Survey Data

When data come from a complex sampling design (involving stratification, weighting, or clustering), the analysis must account for these features. Ignoring them can lead to incorrect point estimates and underestimated standard errors.

Accounting for Stratification

If the population was divided into strata before sampling, this must be reflected in the analysis. Stratification provides stratum-specific estimates and can reduce the standard error of the overall estimate if the stratifying variable is related to the outcome.

However, stratification alone does not change the overall point estimate — it primarily affects precision. The total population size in each stratum must be known to compute appropriate sampling weights.

Sampling Weights

Not all individuals in a probability sample necessarily have the same probability of selection. The sampling weight for each individual is the inverse of their overall selection probability — this inverse-probability weighting underlies the Horvitz-Thompson estimator introduced by Horvitz & Thompson (1952), which produces unbiased totals and means from any probability sample with known inclusion probabilities.

The probability of selection depends on multiple stages. For example, in a household survey:

p(selection) = (n/N) × (m/M)

where n = households in sample, N = households in source population, m = individuals selected per household, and M = total people in that household.

The sampling weight = 1/p(selection). This weight reflects how many people in the source population each sampled individual "represents." Incorporating weights may change both the point estimate and the standard error.

Accounting for Clustering

In cluster and multistage sampling, individuals within groups are usually more alike than randomly chosen individuals. This means observations are not independent, and standard errors must be adjusted upward.

The most common approach is to identify the primary sampling unit (PSU) and adjust all standard error calculations for clustering at that level. The technique called variance linearisation is widely used for this purpose and requires a large number of PSUs to be reliable.

The Design Effect (deff)

The design effect (deff) summarizes the overall impact of the sampling plan on precision. It is the ratio of the variance from the complex sampling design to the variance that would have been obtained from a simple random sample of the same size. The concept and the term were coined by Leslie Kish in his classic textbook Survey Sampling (Kish, 1965) and remain the standard summary statistic for complex-design efficiency.

Interpreting the Design Effect

A deff > 1 means the complex design produces less precise (larger variance) estimates than a simple random sample would. For example, in the Brazil diarrhea study, the deff was 4.43, meaning the variance of the incidence estimate was 4.43 times larger than what a simple random sample of the same size would have produced.

Example: Impact of Survey Design on Estimates

Type of AnalysisIncidence EstimateSE
Simple random sample (assumed)0.14620.0061
+ Stratification0.14620.0059
+ Stratification + Weights0.17510.0091
+ Clustering0.14620.0088
All features combined0.17510.0128

Notice how incorporating all features of the sampling plan changes both the point estimate (from 14.62% to 17.51%) and dramatically increases the standard error (from 0.0061 to 0.0128). Ignoring the sampling design would give a misleadingly precise — and potentially incorrect — result.

Canadian Practice: Bootstrap Weights for the CCHS and CHMS

Statistics Canada distributes the CCHS and CHMS with a set of 500 bootstrap replicate weights rather than releasing the underlying cluster identifiers (which would risk re-identification). The rescaling-bootstrap method that produces these weights was developed by Rao & Wu (1988). To get correct standard errors you re-run your analysis 500 times — once with each replicate weight — and combine the results.

Most analysts use survey or srvyr in R, svy commands in Stata, or SAS PROC SURVEY* procedures. If you ignore the bootstrap weights and just analyse the CCHS as if it were a simple random sample, your standard errors will typically be 30–80% too small — and your confidence intervals and p-values become meaningless.

Finite Population Correction (FPC)

When the proportion of the population sampled is relatively large (>10%), precision improves beyond what would be expected from an "infinite" population. The finite population correction adjusts the estimated variance downward:

FPC Formula

FPC = (N − n) / (N − 1)

where N is the population size and n is the sample size. The FPC should not be applied in multistage sampling even if the number of PSUs sampled exceeds 10% of the total PSUs. It is only applicable to descriptive studies using simple or stratified random sampling.

Analysing data correctly is necessary but not sufficient. Just as important is making sure you collected enough data to begin with — an underpowered study cannot be rescued by clever analysis. The remainder of this section covers sample-size calculation.

Sample-Size Determination

Choosing the right sample size involves both statistical and non-statistical considerations. Non-statistical factors include available resources (time, money, personnel) and the nature of the sampling frame. Statistical considerations include:

Precision of the Estimate

The more precise you need your estimate to be, the larger the sample you need. If you want to know diarrhea prevalence within ±5%, you need more subjects than if ±10% is acceptable. Precision is denoted L (the "allowable error" or half the desired confidence interval width).

Expected Variation in the Data

For proportions, variance = p × q (where q = 1 − p). You need a rough estimate of the proportion to calculate the required sample size. For continuous variables like BMI, you need an estimate of the population variance (σ²). One approach: estimate the range that covers 95% of values, divide by 4 to get σ, then square it for σ².

Level of Confidence

The confidence level (typically 95%) determines how sure you want to be that the confidence interval includes the true population value. This is linked to the Z-value: for 95% confidence, Zα = 1.96. Higher confidence requires a larger sample.

Power (for Analytic Studies)

In analytical studies, you also need to specify the desired power (often 80%). Power determines the sample size needed to detect a specific effect size. For 80% power, Zβ = −0.84. Greater power requires a larger sample.

Key Sample-Size Formulae

ObjectiveFormulaVariables
Estimate a proportionn = Zα² × p × q / L²p = expected proportion; L = precision
Estimate a meann = Zα² × σ² / L²σ² = population variance; L = precision
Compare 2 proportionsn = [Zα√(2pq) − Zβ√(p1q1 + p2q2)]² / (p1−p2p = (p1+p2)/2; n = per group
Compare 2 meansn = 2[(Zα−Zβ)² × σ²] / (μ1−μ2σ² = population variance; n = per group
FPC adjustmentn′ = 1 / (1/n + 1/N)n = initial estimate; N = population size
Clustering adjustmentn′ = n × [1 + ρ(m−1)]ρ = intra-class correlation; m = cluster size

Worked Example: Comparing Two Proportions

Suppose you want to determine if rainwater cisterns reduce the monthly risk of diarrhea from 15% to 10%. With 95% confidence and 80% power:

p1 = 0.15, p2 = 0.10, p = 0.125, q = 0.875

Applying the formula yields n = 685 per group, so you would need 1,370 total individuals (685 with cisterns, 685 without).

If the outcome is clustered within households (ρ = 0.45, average household size m = 6), the clustering adjustment increases the requirement to 2,230 per group — more than triple the unadjusted estimate!

R Activity — Sampling designs, survey weights, and sample size

The companion R script r-activities/HSCI_341_Lesson_3_Sampling.R walks through three blocks: (A) drawing simple random, stratified, and cluster samples in base R; (B) computing weighted prevalence with the survey package; and (C) running sample-size calculations with power.prop.test and power.t.test, then adjusting for clustering via a design effect.

# PART A -- three probability sampling designs from a 1,000-row frame
set.seed(341)
N <- 1000
frame <- data.frame(id = 1:N,
                    province  = sample(c("BC", "AB", "ON", "QC"), N, replace = TRUE),
                    household = sample(1:300, N, replace = TRUE))

srs   <- frame[sample(N, 100), ]                        # simple random
strat <- do.call(rbind, by(frame, frame$province,
                          function(d) d[sample(nrow(d), 25), ])) # stratified
sel_hh <- sample(unique(frame$household), 30)
clust  <- frame[frame$household %in% sel_hh, ]               # cluster

c(SRS = nrow(srs), Stratified = nrow(strat), Cluster = nrow(clust))

# PART B -- design-corrected prevalence with the survey package
library(survey)
dat <- data.frame(province = sample(c("BC","AB","ON","QC"), 2000, replace = TRUE),
                  smoker   = rbinom(2000, 1, 0.18),
                  weight   = runif(2000, 800, 2200))
des <- svydesign(ids = ~1, strata = ~province, weights = ~weight, data = dat)

mean(dat$smoker)                                # naive (unweighted)
svymean(~smoker, design = des)                  # design-corrected
confint(svymean(~smoker, design = des))         # 95% CI

# PART C -- sample-size calculations + design-effect adjustment
power.prop.test(p1 = 0.15, p2 = 0.10,
                power = 0.80, sig.level = 0.05)        # two proportions
power.t.test(delta = 5, sd = 14,
             power = 0.80, sig.level = 0.05)           # two means (SBP)

n_srs <- 685; rho <- 0.45; m <- 6
ceiling(n_srs * (1 + rho*(m - 1)))                # cluster-adjusted n

What you should be able to do after this activity: draw each of the three probability samples, fit a survey design and report a weighted prevalence with its CI, and compute a sample size for two proportions, two means, and a cluster design.

R Reflect on what you just ran

Use the questions below to interpret the actual numbers you produced. Look at your console output before answering.

1. The line c(SRS = ..., Stratified = ..., Cluster = ...) printed three sample sizes. What were the three numbers, and which design gave you the most variable sample size on a re-run? Why is the cluster sample size NOT exactly 100 or 200 here?

Model answerThe three numbers are typically: SRS = 100 (exact), Stratified = 100 (exact, by construction of the strata sizes), Cluster = around 96–120 depending on the seed (varies). Cluster has the most variable sample size on re-run because it samples clusters, not individuals — once you pick a cluster, you take everyone in it, so the total n depends on the actual sizes of the sampled clusters. SRS and stratified explicitly draw fixed individual counts, so their n is deterministic.

2. Compare mean(dat$smoker) with svymean(~smoker, design = des). Were they nearly the same, and why does that make sense given that the weights came from runif(800, 2200) with no relationship to province or smoker?

Model answermean(dat$smoker) and svymean(~smoker, design = des) were nearly identical because the weights drawn from runif(800, 2200) are independent of both province and smoker status. Weights only matter when they are correlated with the variable being estimated (or with selection probability); under random weights with no informative structure, the weighted mean equals the unweighted mean in expectation. The simulation's point: weights fix design-induced bias only when there is design-induced bias to fix.

3. power.prop.test(p1 = 0.15, p2 = 0.10, power = 0.80, ...) returned an n per group, and the cluster adjustment (rho = 0.45, m = 6) multiplied 685 by roughly 3.25 to give ~2,227. In one sentence, what does that ratio tell you about the price of cluster sampling vs. SRS?

Model answerThe cluster sample needs 2,227 / 685 ≈ 3.25 times as many participants as an SRS to achieve the same statistical power. That ratio is the design effect = 1 + (m−1)ρ = 1 + 5(0.45) = 3.25, confirming the formula. In plain terms: when responses within clusters are correlated (ρ = 0.45 is large), each additional person in a cluster adds less new information than a fresh independent draw — you pay for the operational convenience of cluster sampling by needing a much larger overall n.
Saved.

Reflection

Why do you think it is important to account for clustering when determining sample size? What would happen to your study conclusions if you ignored the clustering effect?

Model answerIgnoring clustering treats every observation as an independent draw, but cluster-correlated data violate that assumption — nearby students share the same teacher, the same socioeconomic context, the same outbreak exposure. The result is under-stated standard errors: CIs are too narrow, p-values too small, and you reject the null when you should not. Concretely, a study analyzing 1,000 students from 50 classes as if they were 1,000 independent observations would report SE that is too tight by a factor of √DE ≈ 1.8 (for ρ = 0.45, m = 6). Conclusions would be over-confident: false-positive associations declared as real, replication fails, and policy decisions made on a sandcastle. The fix: cluster-robust standard errors, multilevel/mixed models, or generalised estimating equations — chosen by data structure and inferential target.

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Complex survey analyses must account for stratification, sampling weights, and clustering to produce correct estimates and valid standard errors.
  • The design effect (deff) quantifies how much less precise a complex design is relative to a simple random sample.
  • Sample size depends on desired precision, expected variance, confidence level, and (for analytic studies) power.
  • Clustering can dramatically increase the required sample size, especially when the intra-class correlation is high.
  • The finite population correction reduces sample size requirements when sampling a large fraction (>10%) of the population.
Knowledge Check — Section 4

1. What does a design effect (deff) of 4.43 indicate?

The design effect is the ratio of variance from the complex design to variance from a simple random sample. A deff > 1 means less precision (more variance) than SRS.

2. Sampling weights are computed as:

Sampling weights = 1/p(selection). They represent how many individuals in the source population each sampled individual "represents."

3. Which of the following increases the required sample size?

Greater precision means a smaller L (allowable error), which increases the required sample size. Higher confidence levels, greater variance, and clustering also increase sample size requirements.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 5

Lesson Review & Final Assessment

⏱ Estimated time: 15 minutes

Bringing It All Together

This lesson moved sampling from a vague intuition to a structured set of decisions. You worked through the hierarchy of populations — target, source, study — and the way it maps onto internal and external validity. From there you built up the probability machinery (sampling distributions, the central limit theorem, Type I and Type II error, power) that lets a sample stand in for the population it was drawn from.

The second half of the lesson made those ideas operational: when to use simple, systematic, stratified, cluster, or multistage probability sampling; when non-probability designs are defensible; how complex survey data must be weighted and clustered in analysis; and how a sample-size calculation actually gets done. Lesson 4 will turn from who you measure to how — the design of the questionnaires those samples respond to.

Key Takeaways from Lesson 3

  • Sampling is the bridge between a research question and feasible data collection: choose a sample so the inference back to the source and target populations is defensible.
  • The target → source → study hierarchy is what makes internal vs. external validity a precise distinction rather than a slogan.
  • The central limit theorem is what makes inference from a sample to a population work — sampling distributions, standard errors, and confidence intervals all depend on it.
  • Type I (α), Type II (β), and power (1−β) are design parameters you set deliberately, not after-the-fact diagnostics.
  • Probability designs (simple, systematic, stratified, cluster, multistage) trade off precision, cost, and feasibility; complex designs require weighting and a design effect in analysis.
  • Sample-size calculations are explicit assumption documents: precision/effect size, variance, confidence level, and adjustments for clustering, attrition, and finite populations.

Reflection

Imagine you are designing a study to estimate the prevalence of a waterborne disease in a rural region with scattered villages. Describe the sampling strategy you would use, including the type of sampling, how you would define your populations, and what factors would influence your sample-size calculation.

Model answerDesign: multistage cluster sampling — villages are natural clusters because individuals within a village share water source, so within-cluster correlation is high (likely ρ > 0.3). Stage 1: stratify villages by region (river-fed vs. groundwater vs. seasonal stream) and probability-sample villages within each stratum, with probability proportional to size. Stage 2: within each sampled village, draw a random sample of households (or census all of them if villages are small). Define populations: target = all residents of the region; sampled = village residents available during the visit window; analytic = those with completed survey and verified residence > 6 months. Sample-size factors: expected prevalence (assume 0.10 as conservative midpoint), desired CI half-width (± 0.03), design effect from cluster + stratification (likely 2.5–3.5), expected non-response rate (~25% in rural settings), and budget for a buffer (~20%). Implement weights for unequal selection probabilities and apply svyglm with cluster-robust variance for any inferential analysis.

Minimum 20 characters required.

✓ Reflection saved

Final Knowledge Assessment

Complete the following 15-question assessment. A score of 100% is required to complete the lesson. You may retake the assessment as many times as needed.

Final Assessment — 15 Questions

1. In a census, the only source of error is:

Because a census evaluates every individual in the population, there is no sampling error — the only source of error is the measurement process itself.

2. A descriptive study aims to:

Descriptive studies (surveys) aim to characterize population attributes, answering questions like "what proportion of people have X?"

3. The source population is best described as:

The source population is the accessible population from which study subjects are drawn. All units should have a non-zero probability of being included.

4. External validity refers to:

External validity is a subjective assessment of whether results from the source population can be generalized to the broader target population.

5. A Type II (β) error occurs when:

A Type II error is a "false negative" — you accept the null hypothesis and conclude there is no effect when one actually exists.

6. A convenience sample is characterized by:

A convenience sample is chosen because it is easy to obtain — for example, selecting households close to a research centre.

7. In a simple random sample, every subject has:

In a simple random sample, every individual in the source population has an equal probability of being included.

8. The sampling interval in systematic random sampling is calculated as:

The sampling interval (j) = population size / desired sample size. You randomly select a starting point between 1 and j, then sample every j-th subject.

9. A key advantage of stratified random sampling is:

By dividing the population into homogeneous strata and sampling within each, stratified sampling explicitly removes between-strata variation from the overall estimate.

10. In cluster sampling, the primary sampling unit (PSU) is:

In cluster sampling, the PSU is the cluster itself (e.g., household, classroom, clinic), which is typically larger than the unit of concern (the individuals within).

11. Sampling weights reflect:

Sampling weights are the inverse of the selection probability. They represent how many individuals in the source population each sampled person "stands for."

12. The design effect (deff) is the ratio of:

The design effect quantifies how much less precise (or more precise) the complex design is compared to a simple random sample of equal size.

13. When computing sample size for estimating a proportion, which factor does NOT increase the required sample size?

A lower confidence level (e.g., 90% instead of 95%) reduces Zα and therefore requires fewer subjects. All other options increase the required sample size.

14. The clustering adjustment formula n′ = n[1 + ρ(m−1)] shows that the required sample size increases when:

When ρ is high (individuals within clusters are very similar) and m is large (many individuals per cluster), the correction factor [1 + ρ(m−1)] becomes large, substantially increasing the required sample size.

15. Which statement best summarizes the importance of understanding sampling methods?

Sampling methods fundamentally affect the study's validity, the precision of estimates, the cost and feasibility, and the appropriate statistical methods. Matching the method to the research context is essential for sound epidemiologic research.

✦ Complete the final reflection above before submitting