Sampling

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

Distinguish between a census and a sample, and between descriptive and analytic studies
Describe the hierarchy of populations and the concept of a sampling frame
Explain types of error, including Type I and Type II errors, and the concept of statistical power
Compare non-probability sampling methods (judgement, convenience, purposive)
Describe probability sampling methods (simple random, systematic, stratified, cluster, multistage, targeted)
Understand the implications of complex sampling designs on data analysis
Compute required sample sizes for common analytic objectives

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 2

Types of Error & Non-Probability Sampling

⏱ Estimated reading time: 12 minutes

Learning Objectives

Explain the two types of statistical error (Type I and Type II).
Define the null hypothesis, P-values, and statistical power.
Describe three non-probability sampling methods and their limitations.

Types of Error

In any study based on a sample, the variability of the outcome, measurement error, and sample-to-sample variability all affect results. When making inferences based on sample data, they are subject to error. Within hypothesis testing in analytical studies, there are two key types of error:

Table 2.1 — Types of Error

Conclusion of Analysis	Effect Truly Present	Effect Truly Absent
Effect present (reject null)	Correct	Type I (α) error
No effect (accept null)	Type II (β) error	Correct

Type I (α) Error ▼

A Type I error occurs when you conclude that the outcomes in the groups are different (i.e., that an association exists), when in fact they are not. In other words, you falsely reject the null hypothesis. The probability of a Type I error is denoted α.

Statistical tests are aimed at disproving the null hypothesis (that there is no difference between groups). When P ≤ 0.05, we are "reasonably sure" that any detected effect is not due to chance — but there remains a 5% chance of making a Type I error.

Type II (β) Error ▼

A Type II error occurs when you conclude that there is no association between the exposure and outcome, when in fact there is. You fail to reject the null hypothesis when you should have. The probability of a Type II error is denoted β.

Reasons a study might fail to find a real effect include: the exposure truly had no effect, the study design was inappropriate, the sample size was too small (low power), or simply bad luck.

Statistical Power ▼

Power is the probability that you will find a statistically significant difference when a real difference of a defined magnitude exists. Mathematically, power = 1 − β.

For example, if a study has 80% power, it has an 80% chance of detecting a true effect of the specified size. To increase power, you need to increase the sample size. So-called negative findings (failure to find a difference) are less commonly reported in the literature, partly because many studies lack adequate power.

Non-Probability Sampling

Samples drawn without an explicit method for determining each individual's probability of selection are known as non-probability samples. Whenever there is no formal process for random selection, the sample should be considered non-probability. There are three main types:

Click each card to learn more:

Judgement
SampleClick to learn more

Convenience
SampleClick to learn more

Purposive
SampleClick to learn more

Important Limitation

Non-probability samples are generally inappropriate for descriptive studies because you cannot generalize prevalence estimates to the source population without knowing each individual's probability of being included. However, non-probability methods are commonly used in analytical studies where comparing exposure groups is the priority.

Key Takeaways

Type I (α) error means falsely concluding there is an effect; Type II (β) error means missing a real effect.
Power (1 − β) is the probability of detecting a true effect; increasing sample size increases power.
Non-probability samples (judgement, convenience, purposive) lack a formal random selection process and are primarily used in analytic studies.

✦ Pass the knowledge check with 100% to continue

Section 3

Probability Sampling Methods

⏱ Estimated reading time: 15 minutes

Learning Objectives

Define a probability sample and explain why random selection is essential.
Describe simple random, systematic random, stratified random, cluster, multistage, and targeted (risk-based) sampling methods.
Identify the advantages and disadvantages of each method.

What Is a Probability Sample?

A probability sample is one in which every element in the population has a known, non-zero probability of being included. This implies that a formal process of random selection has been applied to the sampling frame. The key advantage is that probability samples allow for valid statistical inferences about the source population.

Random ≠ Haphazard

Random selection uses a formal, reproducible process (e.g., computer-generated random numbers, random number tables) — it is not the same as selecting participants haphazardly or arbitrarily.

Types of Probability Sampling

Simple Random Sample

In a simple random sample, every study subject in the source population has an equal probability of being included. A complete list of the source population is required, and a formal random process is used to select individuals.

Example: To study wait times in a hospital emergency room, you need 1,000 records from 13,000 admissions over the past year. You randomly generate 1,000 numbers between 1 and 13,000 and pull those records.

Advantage: Conceptually simple; all standard statistical analyses apply directly.

Limitation: Requires a complete list of the entire source population.

Systematic Random Sample

In a systematic random sample, a complete list is not required — you only need an estimate of the total population and sequential access to individuals. The sampling interval (j) is computed as the population size divided by the desired sample size.

How it works: Randomly pick a starting point between 1 and j, then select every j^th subject after that.

Example: To sample 1,000 from 13,000 emergency patients, the sampling interval is 13. Randomly pick a number between 1 and 13 for your starting patient, then select every 13th patient thereafter.

Caution: Bias may occur if the factor you are studying is related to the sampling interval (e.g., periodic patterns in admissions).

Stratified Random Sample

The population is divided into mutually exclusive strata based on factors likely to affect the outcome. Then, within each stratum, a simple or systematic random sample is chosen.

In proportional stratified sampling, the number sampled from each stratum is proportional to that stratum's share of the total population.

Three key advantages:

Ensures all strata are represented in the sample.
Can produce more precise overall estimates than a simple random sample because between-strata variation is removed.
Allows estimation of stratum-specific outcomes.

Example: If hospital wait times differ between males and females, stratify records by sex and randomly sample within each group.

Cluster Sampling

A cluster is a natural grouping of study subjects with one or more common characteristics (e.g., a household is a cluster of people; a classroom is a cluster of students; a clinic is a cluster of patients).

In cluster sampling, the primary sampling unit (PSU) is the cluster itself, and it is often larger than the unit of concern. Every individual within a selected cluster is included in the sample.

Example: To estimate smoking prevalence among Grade 12 students, randomly select 10 of 47 Grade 12 classes and survey all students in those 10 classes.

Advantage: Easier when getting a list of clusters is simpler than listing all individuals. Often cheaper to visit fewer locations.

Limitation: Individuals within a cluster tend to be more alike, increasing sampling variation for a given sample size compared to SRS.

Important: A sample is only a "cluster sample" if the group is the sampling unit and the individuals within it are the unit of concern. If the group itself is the unit of concern (e.g., "does anyone in the household smoke indoors?"), it is not a cluster sample.

Multistage Sampling

Multistage sampling is similar to cluster sampling, except that after selecting primary sampling units (PSUs), a sample of secondary sampling units (individuals) is drawn within each PSU rather than surveying everyone.

Example: To study smoking among students, first randomly select 10 classes (PSUs), then randomly select 5 students from each class rather than surveying all students in every class.

To ensure all individuals have the same probability of being selected, either choose PSUs proportional to their size, or use a constant sampling proportion within each PSU.

The number of individuals per cluster (n_i) can be optimized by balancing within-cluster and between-cluster variance against the costs of sampling groups versus individuals.

Targeted (Risk-Based) Sampling

Targeted sampling stratifies the source population based on characteristics associated with the probability of disease occurrence, then focuses sampling on strata where disease is most likely to be found.

Individuals are assigned point values based on their probability of having the disease of interest, and sampling proceeds until a predetermined number of points have been sampled. This is an unequal probability sampling strategy — some individuals may even have a zero probability of inclusion.

Advantage: Requires a much smaller sample to detect rare diseases when key risk characteristics can be identified.

Limitation: Key epidemiological parameters (e.g., risk ratios) may not be known for the study population and must be estimated from other evidence.

Comparison of Sampling Methods

Method	Requires Complete List?	Key Advantage	Key Limitation
Simple Random	Yes	Simple; all standard analyses apply	Needs complete population list
Systematic	No (needs estimate)	Practical; easy to implement	Periodic bias if factor linked to interval
Stratified	Yes (within strata)	More precise; ensures representation	Needs to know stratum membership
Cluster	List of clusters only	Cheaper; no need to list individuals	Higher variance than SRS for same n
Multistage	List of PSUs only	Flexible; cost-effective	Complex design; needs more subjects
Targeted	No (risk-based)	Efficient for rare diseases	Needs prior knowledge of risk factors

Reflection

Think of a health research question you are interested in. Which sampling method would be most appropriate, and why? What practical constraints (cost, time, available lists) would influence your choice?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Probability samples give every element a known, non-zero chance of selection, enabling valid statistical inference.
Simple random sampling requires a complete list; systematic sampling needs only sequential access.
Stratified sampling improves precision by removing between-strata variation.
Cluster and multistage sampling are practical when listing all individuals is impractical, but they require more subjects for the same precision.
Targeted sampling is efficient for rare outcomes but requires prior knowledge of risk characteristics.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 4

Analysing Survey Data & Sample Size

⏱ Estimated reading time: 15 minutes

Learning Objectives

Explain how stratification, sampling weights, and clustering affect the analysis of survey data.
Define the design effect and the finite population correction.
Describe the key factors that determine sample size.
Apply basic sample-size formulae for estimating proportions and means.

Analysing Complex Survey Data

When data come from a complex sampling design (involving stratification, weighting, or clustering), the analysis must account for these features. Ignoring them can lead to incorrect point estimates and underestimated standard errors.

Accounting for Stratification

If the population was divided into strata before sampling, this must be reflected in the analysis. Stratification provides stratum-specific estimates and can reduce the standard error of the overall estimate if the stratifying variable is related to the outcome.

However, stratification alone does not change the overall point estimate — it primarily affects precision. The total population size in each stratum must be known to compute appropriate sampling weights.

Sampling Weights

Not all individuals in a probability sample necessarily have the same probability of selection. The sampling weight for each individual is the inverse of their overall selection probability.

The probability of selection depends on multiple stages. For example, in a household survey:

p(selection) = (n/N) × (m/M)

where n = households in sample, N = households in source population, m = individuals selected per household, and M = total people in that household.

The sampling weight = 1/p(selection). This weight reflects how many people in the source population each sampled individual "represents." Incorporating weights may change both the point estimate and the standard error.

Accounting for Clustering

In cluster and multistage sampling, individuals within groups are usually more alike than randomly chosen individuals. This means observations are not independent, and standard errors must be adjusted upward.

The most common approach is to identify the primary sampling unit (PSU) and adjust all standard error calculations for clustering at that level. The technique called variance linearisation is widely used for this purpose and requires a large number of PSUs to be reliable.

The Design Effect (deff)

The design effect (deff) summarizes the overall impact of the sampling plan on precision. It is the ratio of the variance from the complex sampling design to the variance that would have been obtained from a simple random sample of the same size.

Interpreting the Design Effect

A deff > 1 means the complex design produces less precise (larger variance) estimates than a simple random sample would. For example, in the Brazil diarrhea study, the deff was 4.43, meaning the variance of the incidence estimate was 4.43 times larger than what a simple random sample of the same size would have produced.

Example: Impact of Survey Design on Estimates

Type of Analysis	Incidence Estimate	SE
Simple random sample (assumed)	0.1462	0.0061
+ Stratification	0.1462	0.0059
+ Stratification + Weights	0.1751	0.0091
+ Clustering	0.1462	0.0088
All features combined	0.1751	0.0128

Notice how incorporating all features of the sampling plan changes both the point estimate (from 14.62% to 17.51%) and dramatically increases the standard error (from 0.0061 to 0.0128). Ignoring the sampling design would give a misleadingly precise — and potentially incorrect — result.

Finite Population Correction (FPC)

When the proportion of the population sampled is relatively large (>10%), precision improves beyond what would be expected from an "infinite" population. The finite population correction adjusts the estimated variance downward:

FPC Formula

FPC = (N − n) / (N − 1)

where N is the population size and n is the sample size. The FPC should not be applied in multistage sampling even if the number of PSUs sampled exceeds 10% of the total PSUs. It is only applicable to descriptive studies using simple or stratified random sampling.

Sample-Size Determination

Choosing the right sample size involves both statistical and non-statistical considerations. Non-statistical factors include available resources (time, money, personnel) and the nature of the sampling frame. Statistical considerations include:

Precision of the Estimate ▼

The more precise you need your estimate to be, the larger the sample you need. If you want to know diarrhea prevalence within ±5%, you need more subjects than if ±10% is acceptable. Precision is denoted L (the "allowable error" or half the desired confidence interval width).

Expected Variation in the Data ▼

For proportions, variance = p × q (where q = 1 − p). You need a rough estimate of the proportion to calculate the required sample size. For continuous variables like BMI, you need an estimate of the population variance (σ²). One approach: estimate the range that covers 95% of values, divide by 4 to get σ, then square it for σ².

Level of Confidence ▼

The confidence level (typically 95%) determines how sure you want to be that the confidence interval includes the true population value. This is linked to the Z-value: for 95% confidence, Z_α = 1.96. Higher confidence requires a larger sample.

Power (for Analytic Studies) ▼

In analytical studies, you also need to specify the desired power (often 80%). Power determines the sample size needed to detect a specific effect size. For 80% power, Z_β = −0.84. Greater power requires a larger sample.

Key Sample-Size Formulae

Objective	Formula	Variables
Estimate a proportion	n = Z_α² × p × q / L²	p = expected proportion; L = precision
Estimate a mean	n = Z_α² × σ² / L²	σ² = population variance; L = precision
Compare 2 proportions	n = [Z_α√(2pq) − Z_β√(p₁q₁ + p₂q₂)]² / (p₁−p₂)²	p = (p₁+p₂)/2; n = per group
Compare 2 means	n = 2[(Z_α−Z_β)² × σ²] / (μ₁−μ₂)²	σ² = population variance; n = per group
FPC adjustment	n′ = 1 / (1/n + 1/N)	n = initial estimate; N = population size
Clustering adjustment	n′ = n × [1 + ρ(m−1)]	ρ = intra-class correlation; m = cluster size

Worked Example: Comparing Two Proportions

Suppose you want to determine if rainwater cisterns reduce the monthly risk of diarrhea from 15% to 10%. With 95% confidence and 80% power:

p₁ = 0.15, p₂ = 0.10, p = 0.125, q = 0.875

Applying the formula yields n = 685 per group, so you would need 1,370 total individuals (685 with cisterns, 685 without).

If the outcome is clustered within households (ρ = 0.45, average household size m = 6), the clustering adjustment increases the requirement to 2,230 per group — more than triple the unadjusted estimate!

Reflection

Why do you think it is important to account for clustering when determining sample size? What would happen to your study conclusions if you ignored the clustering effect?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

Complex survey analyses must account for stratification, sampling weights, and clustering to produce correct estimates and valid standard errors.
The design effect (deff) quantifies how much less precise a complex design is relative to a simple random sample.
Sample size depends on desired precision, expected variance, confidence level, and (for analytic studies) power.
Clustering can dramatically increase the required sample size, especially when the intra-class correlation is high.
The finite population correction reduces sample size requirements when sampling a large fraction (>10%) of the population.

✦ Complete the reflection and pass the knowledge check with 100% to continue

HSCI 341 — Lesson 2

Fundamental Epidemiological Concepts and Approaches

Sampling

Learning objectives for this lesson:

Introduction to Sampling

Learning Objectives

Census vs. Sample

Key Distinction

Descriptive vs. Analytic Studies

Descriptive Studies (Surveys)

Analytic Studies

Hierarchy of Populations

Validity: Internal and External

The Sampling Frame

Example: Brazil Diarrhea Study

Key Takeaways

Types of Error & Non-Probability Sampling

Learning Objectives

Types of Error

Table 2.1 — Types of Error

Non-Probability Sampling

Important Limitation

Key Takeaways

Probability Sampling Methods

Learning Objectives

What Is a Probability Sample?

Random ≠ Haphazard

Types of Probability Sampling

Simple Random Sample

Systematic Random Sample

Stratified Random Sample

Cluster Sampling

Multistage Sampling

Targeted (Risk-Based) Sampling

Comparison of Sampling Methods

Reflection

Key Takeaways

Analysing Survey Data & Sample Size

Learning Objectives

Analysing Complex Survey Data

Accounting for Stratification

Sampling Weights

Accounting for Clustering

The Design Effect (deff)

Interpreting the Design Effect

Example: Impact of Survey Design on Estimates

Finite Population Correction (FPC)

FPC Formula

Sample-Size Determination

Key Sample-Size Formulae

Worked Example: Comparing Two Proportions

Reflection

Key Takeaways

Lesson Review & Final Assessment

Lesson Summary

Reflection

Final Knowledge Assessment

Lesson 2 Complete!