HSCI 341 — Lesson 2

Sampling

Fundamental Epidemiological Concepts and Approaches

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Distinguish between a census and a sample, and between descriptive and analytic studies
  • Describe the hierarchy of populations and the concept of a sampling frame
  • Explain types of error, including Type I and Type II errors, and the concept of statistical power
  • Compare non-probability sampling methods (judgement, convenience, purposive)
  • Describe probability sampling methods (simple random, systematic, stratified, cluster, multistage, targeted)
  • Understand the implications of complex sampling designs on data analysis
  • Compute required sample sizes for common analytic objectives

This course was developed by Kiffer G. Card, PhD, as a companion to Dohoo, I. R., Martin, S. W., & Stryhn, H. (2012). Methods in Epidemiologic Research. VER Inc.

Section 1

Introduction to Sampling

⏱ Estimated reading time: 12 minutes

Learning Objectives

  • Distinguish between a census and a sample.
  • Contrast descriptive and analytic studies.
  • Describe the hierarchy of populations (target, source, study sample).
  • Explain the concepts of internal and external validity.
  • Define a sampling frame and explain its importance.

Census vs. Sample

When we conduct research, we need data from either all individuals in a population or a subset of them. The process of obtaining this data is called measurement.

In a census, every individual in the population is evaluated. In a sample, data are collected from only a subset. Sampling is generally more convenient and less costly than conducting a full census. Interestingly, even a census can be viewed as a kind of sample — it captures the population at one point in time, making it a "sample" of the population over time.

Key Distinction

In a census, the only source of error is the measurement itself. With a sample, you contend with both measurement error and sampling error. However, a well-planned sample can provide virtually the same information as a census at a fraction of the cost.

Descriptive vs. Analytic Studies

Samples support two fundamental types of studies:

Descriptive Studies (Surveys)

A descriptive study aims to describe population attributes such as the frequency of disease or the prevalence of an exposure. Surveys answer questions like: "What proportion of people had diarrhea over a 1-month period?" or "What is the average BMI of students in Grade 12?"

The focus is on characterizing the current state of a population rather than establishing cause-and-effect relationships.

Analytic Studies

An analytic study is designed to estimate the magnitude of an association between exposures and outcomes. These studies contrast groups and seek explanations for differences between them.

Examples: "Is water source associated with the incidence of diarrhea?" or "How does time spent playing video games affect the BMI of Grade 12 students?"

Establishing an association is the first step to inferring causation, as discussed in Lesson 1.

Hierarchy of Populations

Understanding the different populations involved in a study is essential for evaluating validity. There are three key populations to consider:

Target Population The broadest group to which you want to generalize results External validity Source Population The population from which study subjects are drawn Sampling frame Study Sample The individuals who actually participate and provide data Internal validity

Figure 2.1 — Hierarchy of populations in epidemiologic research. The target population is the broadest; the source population is the accessible subset; the study sample consists of those who actually participate.

Target Population

The target population is the population to which you want to extrapolate your results. It is often not clearly defined and may vary depending on the perspective of the person interpreting the study. For example, researchers studying rainwater cisterns in Pernambuco State, Brazil might define the target as that state, while someone else may want to generalize the findings to all semi-arid regions of Brazil.

Source Population

The source population is the population from which study subjects are actually drawn. All units in the source population should be "listable" and have a non-zero probability of being included in the study. For example, in a diarrhea study in Brazil, the source population included families from households participating in the One Million Cisterns Project (OMCP).

Study Sample

The study sample (or study group) consists of the individuals who actually end up in the study. It is typically a subset drawn from the source population. Researchers determine the necessary sample size, draw their sample, collect data from eligible subjects, and the final study sample consists of those who agreed to participate and whose data met quality requirements.

Validity: Internal and External

Internal validity refers to whether the study results are valid for members of the source population. It indicates whether the study obtained the "correct" answer for that population. Much of epidemiology is dedicated to methods that ensure internal validity.

External validity involves a subjective assessment of whether results can be generalized to the broader target population. It is generally easier to generalize results from analytic studies (which evaluate associations) than from descriptive studies (which estimate prevalence).

The Sampling Frame

The sampling frame is the list of all sampling units in the source population. Sampling units are the basic elements that will be sampled (e.g., households, individuals). A complete list of all sampling units is required for drawing a simple random sample, though some other methods do not require such a complete listing.

Example: Brazil Diarrhea Study

In a study of water cisterns and diarrhea in Brazil, a suitable sampling frame was the list of all households eligible for the One Million Cisterns Project. Once households were selected, a separate strategy was used for selecting individuals within each household.

Key Takeaways

  • A census measures everyone; a sample measures a subset — both involve measurement, but samples also introduce sampling error.
  • Descriptive studies characterize populations; analytic studies evaluate associations between exposures and outcomes.
  • The three populations (target, source, study sample) form a hierarchy, each linked to different aspects of study validity.
  • The sampling frame is the list of all units from which the sample is drawn.
Knowledge Check — Section 1

1. What is the key difference between a census and a sample?

A census collects data from every individual in the population, while a sample collects data from a subset. Samples introduce sampling error but are more practical and cost-effective.

2. An analytic study differs from a descriptive study in that it:

Analytic studies contrast groups and seek to estimate the strength of associations between exposures and outcomes, while descriptive studies aim to characterize population attributes.

3. Internal validity refers to whether:

Internal validity concerns whether the study produced the "correct" answer for the source population. External validity concerns generalizability to the broader target population.

✦ Pass the knowledge check with 100% to continue

Section 2

Types of Error & Non-Probability Sampling

⏱ Estimated reading time: 12 minutes

Learning Objectives

  • Explain the two types of statistical error (Type I and Type II).
  • Define the null hypothesis, P-values, and statistical power.
  • Describe three non-probability sampling methods and their limitations.

Types of Error

In any study based on a sample, the variability of the outcome, measurement error, and sample-to-sample variability all affect results. When making inferences based on sample data, they are subject to error. Within hypothesis testing in analytical studies, there are two key types of error:

Table 2.1 — Types of Error

Conclusion of AnalysisEffect Truly PresentEffect Truly Absent
Effect present (reject null)CorrectType I (α) error
No effect (accept null)Type II (β) errorCorrect
Type I (α) Error

A Type I error occurs when you conclude that the outcomes in the groups are different (i.e., that an association exists), when in fact they are not. In other words, you falsely reject the null hypothesis. The probability of a Type I error is denoted α.

Statistical tests are aimed at disproving the null hypothesis (that there is no difference between groups). When P ≤ 0.05, we are "reasonably sure" that any detected effect is not due to chance — but there remains a 5% chance of making a Type I error.

Type II (β) Error

A Type II error occurs when you conclude that there is no association between the exposure and outcome, when in fact there is. You fail to reject the null hypothesis when you should have. The probability of a Type II error is denoted β.

Reasons a study might fail to find a real effect include: the exposure truly had no effect, the study design was inappropriate, the sample size was too small (low power), or simply bad luck.

Statistical Power

Power is the probability that you will find a statistically significant difference when a real difference of a defined magnitude exists. Mathematically, power = 1 − β.

For example, if a study has 80% power, it has an 80% chance of detecting a true effect of the specified size. To increase power, you need to increase the sample size. So-called negative findings (failure to find a difference) are less commonly reported in the literature, partly because many studies lack adequate power.

Non-Probability Sampling

Samples drawn without an explicit method for determining each individual's probability of selection are known as non-probability samples. Whenever there is no formal process for random selection, the sample should be considered non-probability. There are three main types:

Click each card to learn more:

Judgement
Sample
Click to learn more
Convenience
Sample
Click to learn more
Purposive
Sample
Click to learn more

Important Limitation

Non-probability samples are generally inappropriate for descriptive studies because you cannot generalize prevalence estimates to the source population without knowing each individual's probability of being included. However, non-probability methods are commonly used in analytical studies where comparing exposure groups is the priority.

Key Takeaways

  • Type I (α) error means falsely concluding there is an effect; Type II (β) error means missing a real effect.
  • Power (1 − β) is the probability of detecting a true effect; increasing sample size increases power.
  • Non-probability samples (judgement, convenience, purposive) lack a formal random selection process and are primarily used in analytic studies.
Knowledge Check — Section 2

1. A Type I (α) error occurs when you:

A Type I error is a "false positive" — you reject the null hypothesis and conclude there is an association when there really is not.

2. Statistical power is defined as:

Power = 1 − β, where β is the Type II error rate. A study with 80% power will detect a true effect 80% of the time.

3. Why are non-probability samples generally inappropriate for descriptive studies?

Without knowing the probability of selection for each individual, you cannot reliably estimate population parameters like prevalence or incidence.

✦ Pass the knowledge check with 100% to continue

Section 3

Probability Sampling Methods

⏱ Estimated reading time: 15 minutes

Learning Objectives

  • Define a probability sample and explain why random selection is essential.
  • Describe simple random, systematic random, stratified random, cluster, multistage, and targeted (risk-based) sampling methods.
  • Identify the advantages and disadvantages of each method.

What Is a Probability Sample?

A probability sample is one in which every element in the population has a known, non-zero probability of being included. This implies that a formal process of random selection has been applied to the sampling frame. The key advantage is that probability samples allow for valid statistical inferences about the source population.

Random ≠ Haphazard

Random selection uses a formal, reproducible process (e.g., computer-generated random numbers, random number tables) — it is not the same as selecting participants haphazardly or arbitrarily.

Types of Probability Sampling

Simple Random Sample

In a simple random sample, every study subject in the source population has an equal probability of being included. A complete list of the source population is required, and a formal random process is used to select individuals.

Example: To study wait times in a hospital emergency room, you need 1,000 records from 13,000 admissions over the past year. You randomly generate 1,000 numbers between 1 and 13,000 and pull those records.

Advantage: Conceptually simple; all standard statistical analyses apply directly.

Limitation: Requires a complete list of the entire source population.

Systematic Random Sample

In a systematic random sample, a complete list is not required — you only need an estimate of the total population and sequential access to individuals. The sampling interval (j) is computed as the population size divided by the desired sample size.

How it works: Randomly pick a starting point between 1 and j, then select every jth subject after that.

Example: To sample 1,000 from 13,000 emergency patients, the sampling interval is 13. Randomly pick a number between 1 and 13 for your starting patient, then select every 13th patient thereafter.

Caution: Bias may occur if the factor you are studying is related to the sampling interval (e.g., periodic patterns in admissions).

Stratified Random Sample

The population is divided into mutually exclusive strata based on factors likely to affect the outcome. Then, within each stratum, a simple or systematic random sample is chosen.

In proportional stratified sampling, the number sampled from each stratum is proportional to that stratum's share of the total population.

Three key advantages:

  • Ensures all strata are represented in the sample.
  • Can produce more precise overall estimates than a simple random sample because between-strata variation is removed.
  • Allows estimation of stratum-specific outcomes.

Example: If hospital wait times differ between males and females, stratify records by sex and randomly sample within each group.

Cluster Sampling

A cluster is a natural grouping of study subjects with one or more common characteristics (e.g., a household is a cluster of people; a classroom is a cluster of students; a clinic is a cluster of patients).

In cluster sampling, the primary sampling unit (PSU) is the cluster itself, and it is often larger than the unit of concern. Every individual within a selected cluster is included in the sample.

Example: To estimate smoking prevalence among Grade 12 students, randomly select 10 of 47 Grade 12 classes and survey all students in those 10 classes.

Advantage: Easier when getting a list of clusters is simpler than listing all individuals. Often cheaper to visit fewer locations.

Limitation: Individuals within a cluster tend to be more alike, increasing sampling variation for a given sample size compared to SRS.

Important: A sample is only a "cluster sample" if the group is the sampling unit and the individuals within it are the unit of concern. If the group itself is the unit of concern (e.g., "does anyone in the household smoke indoors?"), it is not a cluster sample.

Multistage Sampling

Multistage sampling is similar to cluster sampling, except that after selecting primary sampling units (PSUs), a sample of secondary sampling units (individuals) is drawn within each PSU rather than surveying everyone.

Example: To study smoking among students, first randomly select 10 classes (PSUs), then randomly select 5 students from each class rather than surveying all students in every class.

To ensure all individuals have the same probability of being selected, either choose PSUs proportional to their size, or use a constant sampling proportion within each PSU.

The number of individuals per cluster (ni) can be optimized by balancing within-cluster and between-cluster variance against the costs of sampling groups versus individuals.

Targeted (Risk-Based) Sampling

Targeted sampling stratifies the source population based on characteristics associated with the probability of disease occurrence, then focuses sampling on strata where disease is most likely to be found.

Individuals are assigned point values based on their probability of having the disease of interest, and sampling proceeds until a predetermined number of points have been sampled. This is an unequal probability sampling strategy — some individuals may even have a zero probability of inclusion.

Advantage: Requires a much smaller sample to detect rare diseases when key risk characteristics can be identified.

Limitation: Key epidemiological parameters (e.g., risk ratios) may not be known for the study population and must be estimated from other evidence.

Comparison of Sampling Methods

MethodRequires Complete List?Key AdvantageKey Limitation
Simple RandomYesSimple; all standard analyses applyNeeds complete population list
SystematicNo (needs estimate)Practical; easy to implementPeriodic bias if factor linked to interval
StratifiedYes (within strata)More precise; ensures representationNeeds to know stratum membership
ClusterList of clusters onlyCheaper; no need to list individualsHigher variance than SRS for same n
MultistageList of PSUs onlyFlexible; cost-effectiveComplex design; needs more subjects
TargetedNo (risk-based)Efficient for rare diseasesNeeds prior knowledge of risk factors

Reflection

Think of a health research question you are interested in. Which sampling method would be most appropriate, and why? What practical constraints (cost, time, available lists) would influence your choice?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Probability samples give every element a known, non-zero chance of selection, enabling valid statistical inference.
  • Simple random sampling requires a complete list; systematic sampling needs only sequential access.
  • Stratified sampling improves precision by removing between-strata variation.
  • Cluster and multistage sampling are practical when listing all individuals is impractical, but they require more subjects for the same precision.
  • Targeted sampling is efficient for rare outcomes but requires prior knowledge of risk characteristics.
Knowledge Check — Section 3

1. What defines a probability sample?

A probability sample ensures every member of the population has a known, non-zero probability of being selected through a formal random process.

2. A key advantage of stratified random sampling over simple random sampling is that it:

By dividing the population into homogeneous strata and sampling within each, the between-strata variation is explicitly removed from the overall estimate, potentially improving precision.

3. In cluster sampling, why is sampling variation typically greater than in simple random sampling for the same sample size?

Individuals within a cluster (e.g., students in the same class) are more similar to each other than to individuals in other clusters, which increases sampling variation compared to drawing individuals independently.

4. Targeted (risk-based) sampling is most useful when:

Targeted sampling focuses on high-risk strata to efficiently detect rare diseases, requiring prior knowledge of risk characteristics and a smaller sample size than other methods.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 4

Analysing Survey Data & Sample Size

⏱ Estimated reading time: 15 minutes

Learning Objectives

  • Explain how stratification, sampling weights, and clustering affect the analysis of survey data.
  • Define the design effect and the finite population correction.
  • Describe the key factors that determine sample size.
  • Apply basic sample-size formulae for estimating proportions and means.

Analysing Complex Survey Data

When data come from a complex sampling design (involving stratification, weighting, or clustering), the analysis must account for these features. Ignoring them can lead to incorrect point estimates and underestimated standard errors.

Accounting for Stratification

If the population was divided into strata before sampling, this must be reflected in the analysis. Stratification provides stratum-specific estimates and can reduce the standard error of the overall estimate if the stratifying variable is related to the outcome.

However, stratification alone does not change the overall point estimate — it primarily affects precision. The total population size in each stratum must be known to compute appropriate sampling weights.

Sampling Weights

Not all individuals in a probability sample necessarily have the same probability of selection. The sampling weight for each individual is the inverse of their overall selection probability.

The probability of selection depends on multiple stages. For example, in a household survey:

p(selection) = (n/N) × (m/M)

where n = households in sample, N = households in source population, m = individuals selected per household, and M = total people in that household.

The sampling weight = 1/p(selection). This weight reflects how many people in the source population each sampled individual "represents." Incorporating weights may change both the point estimate and the standard error.

Accounting for Clustering

In cluster and multistage sampling, individuals within groups are usually more alike than randomly chosen individuals. This means observations are not independent, and standard errors must be adjusted upward.

The most common approach is to identify the primary sampling unit (PSU) and adjust all standard error calculations for clustering at that level. The technique called variance linearisation is widely used for this purpose and requires a large number of PSUs to be reliable.

The Design Effect (deff)

The design effect (deff) summarizes the overall impact of the sampling plan on precision. It is the ratio of the variance from the complex sampling design to the variance that would have been obtained from a simple random sample of the same size.

Interpreting the Design Effect

A deff > 1 means the complex design produces less precise (larger variance) estimates than a simple random sample would. For example, in the Brazil diarrhea study, the deff was 4.43, meaning the variance of the incidence estimate was 4.43 times larger than what a simple random sample of the same size would have produced.

Example: Impact of Survey Design on Estimates

Type of AnalysisIncidence EstimateSE
Simple random sample (assumed)0.14620.0061
+ Stratification0.14620.0059
+ Stratification + Weights0.17510.0091
+ Clustering0.14620.0088
All features combined0.17510.0128

Notice how incorporating all features of the sampling plan changes both the point estimate (from 14.62% to 17.51%) and dramatically increases the standard error (from 0.0061 to 0.0128). Ignoring the sampling design would give a misleadingly precise — and potentially incorrect — result.

Finite Population Correction (FPC)

When the proportion of the population sampled is relatively large (>10%), precision improves beyond what would be expected from an "infinite" population. The finite population correction adjusts the estimated variance downward:

FPC Formula

FPC = (N − n) / (N − 1)

where N is the population size and n is the sample size. The FPC should not be applied in multistage sampling even if the number of PSUs sampled exceeds 10% of the total PSUs. It is only applicable to descriptive studies using simple or stratified random sampling.

Sample-Size Determination

Choosing the right sample size involves both statistical and non-statistical considerations. Non-statistical factors include available resources (time, money, personnel) and the nature of the sampling frame. Statistical considerations include:

Precision of the Estimate

The more precise you need your estimate to be, the larger the sample you need. If you want to know diarrhea prevalence within ±5%, you need more subjects than if ±10% is acceptable. Precision is denoted L (the "allowable error" or half the desired confidence interval width).

Expected Variation in the Data

For proportions, variance = p × q (where q = 1 − p). You need a rough estimate of the proportion to calculate the required sample size. For continuous variables like BMI, you need an estimate of the population variance (σ²). One approach: estimate the range that covers 95% of values, divide by 4 to get σ, then square it for σ².

Level of Confidence

The confidence level (typically 95%) determines how sure you want to be that the confidence interval includes the true population value. This is linked to the Z-value: for 95% confidence, Zα = 1.96. Higher confidence requires a larger sample.

Power (for Analytic Studies)

In analytical studies, you also need to specify the desired power (often 80%). Power determines the sample size needed to detect a specific effect size. For 80% power, Zβ = −0.84. Greater power requires a larger sample.

Key Sample-Size Formulae

ObjectiveFormulaVariables
Estimate a proportionn = Zα² × p × q / L²p = expected proportion; L = precision
Estimate a meann = Zα² × σ² / L²σ² = population variance; L = precision
Compare 2 proportionsn = [Zα√(2pq) − Zβ√(p1q1 + p2q2)]² / (p1−p2p = (p1+p2)/2; n = per group
Compare 2 meansn = 2[(Zα−Zβ)² × σ²] / (μ1−μ2σ² = population variance; n = per group
FPC adjustmentn′ = 1 / (1/n + 1/N)n = initial estimate; N = population size
Clustering adjustmentn′ = n × [1 + ρ(m−1)]ρ = intra-class correlation; m = cluster size

Worked Example: Comparing Two Proportions

Suppose you want to determine if rainwater cisterns reduce the monthly risk of diarrhea from 15% to 10%. With 95% confidence and 80% power:

p1 = 0.15, p2 = 0.10, p = 0.125, q = 0.875

Applying the formula yields n = 685 per group, so you would need 1,370 total individuals (685 with cisterns, 685 without).

If the outcome is clustered within households (ρ = 0.45, average household size m = 6), the clustering adjustment increases the requirement to 2,230 per group — more than triple the unadjusted estimate!

Reflection

Why do you think it is important to account for clustering when determining sample size? What would happen to your study conclusions if you ignored the clustering effect?

Minimum 20 characters required.

✓ Reflection saved

Key Takeaways

  • Complex survey analyses must account for stratification, sampling weights, and clustering to produce correct estimates and valid standard errors.
  • The design effect (deff) quantifies how much less precise a complex design is relative to a simple random sample.
  • Sample size depends on desired precision, expected variance, confidence level, and (for analytic studies) power.
  • Clustering can dramatically increase the required sample size, especially when the intra-class correlation is high.
  • The finite population correction reduces sample size requirements when sampling a large fraction (>10%) of the population.
Knowledge Check — Section 4

1. What does a design effect (deff) of 4.43 indicate?

The design effect is the ratio of variance from the complex design to variance from a simple random sample. A deff > 1 means less precision (more variance) than SRS.

2. Sampling weights are computed as:

Sampling weights = 1/p(selection). They represent how many individuals in the source population each sampled individual "represents."

3. Which of the following increases the required sample size?

Greater precision means a smaller L (allowable error), which increases the required sample size. Higher confidence levels, greater variance, and clustering also increase sample size requirements.

✦ Complete the reflection and pass the knowledge check with 100% to continue

Section 5

Lesson Review & Final Assessment

⏱ Estimated time: 15 minutes

Lesson Summary

In this lesson, you explored the foundational concepts and methods of sampling in epidemiology. Here is a recap of what you covered:

  1. Census vs. sample: A census evaluates every individual; a sample evaluates a subset. Samples introduce sampling error but are far more practical and cost-effective.
  2. Descriptive vs. analytic studies: Descriptive studies characterize populations; analytic studies estimate associations between exposures and outcomes.
  3. Hierarchy of populations: Target population (broadest), source population (from which subjects are drawn), and study sample (those who actually participate).
  4. Validity: Internal validity concerns the source population; external validity concerns generalizability to the target population.
  5. Types of error: Type I (α) error is a false positive; Type II (β) error is a false negative. Power (1−β) is the ability to detect a real effect.
  6. Non-probability sampling: Judgement, convenience, and purposive samples lack formal random selection and are mainly used in analytic studies.
  7. Probability sampling: Simple random, systematic, stratified, cluster, multistage, and targeted methods each have specific requirements, advantages, and limitations.
  8. Survey data analysis: Stratification, sampling weights, and clustering must be accounted for. The design effect quantifies the impact of complex designs on precision.
  9. Sample-size determination: Depends on precision, variance, confidence, power, and adjustments for clustering, confounding, and finite populations.

Reflection

Imagine you are designing a study to estimate the prevalence of a waterborne disease in a rural region with scattered villages. Describe the sampling strategy you would use, including the type of sampling, how you would define your populations, and what factors would influence your sample-size calculation.

Minimum 20 characters required.

✓ Reflection saved

Final Knowledge Assessment

Complete the following 15-question assessment. A score of 100% is required to complete the lesson. You may retake the assessment as many times as needed.

Final Assessment — 15 Questions

1. In a census, the only source of error is:

Because a census evaluates every individual in the population, there is no sampling error — the only source of error is the measurement process itself.

2. A descriptive study aims to:

Descriptive studies (surveys) aim to characterize population attributes, answering questions like "what proportion of people have X?"

3. The source population is best described as:

The source population is the accessible population from which study subjects are drawn. All units should have a non-zero probability of being included.

4. External validity refers to:

External validity is a subjective assessment of whether results from the source population can be generalized to the broader target population.

5. A Type II (β) error occurs when:

A Type II error is a "false negative" — you accept the null hypothesis and conclude there is no effect when one actually exists.

6. A convenience sample is characterized by:

A convenience sample is chosen because it is easy to obtain — for example, selecting households close to a research centre.

7. In a simple random sample, every subject has:

In a simple random sample, every individual in the source population has an equal probability of being included.

8. The sampling interval in systematic random sampling is calculated as:

The sampling interval (j) = population size / desired sample size. You randomly select a starting point between 1 and j, then sample every j-th subject.

9. A key advantage of stratified random sampling is:

By dividing the population into homogeneous strata and sampling within each, stratified sampling explicitly removes between-strata variation from the overall estimate.

10. In cluster sampling, the primary sampling unit (PSU) is:

In cluster sampling, the PSU is the cluster itself (e.g., household, classroom, clinic), which is typically larger than the unit of concern (the individuals within).

11. Sampling weights reflect:

Sampling weights are the inverse of the selection probability. They represent how many individuals in the source population each sampled person "stands for."

12. The design effect (deff) is the ratio of:

The design effect quantifies how much less precise (or more precise) the complex design is compared to a simple random sample of equal size.

13. When computing sample size for estimating a proportion, which factor does NOT increase the required sample size?

A lower confidence level (e.g., 90% instead of 95%) reduces Zα and therefore requires fewer subjects. All other options increase the required sample size.

14. The clustering adjustment formula n′ = n[1 + ρ(m−1)] shows that the required sample size increases when:

When ρ is high (individuals within clusters are very similar) and m is large (many individuals per cluster), the correction factor [1 + ρ(m−1)] becomes large, substantially increasing the required sample size.

15. Which statement best summarizes the importance of understanding sampling methods?

Sampling methods fundamentally affect the study's validity, the precision of estimates, the cost and feasibility, and the appropriate statistical methods. Matching the method to the research context is essential for sound epidemiologic research.

✦ Complete the final reflection above before submitting