Sampling in Qualitative Research

Qualitative Research Methods & Analysis in Public Health

Learning objectives for this lesson:

Distinguish probability and nonprobability sampling and explain when each is the right job
Define saturation and contrast classical, empirical, and revised contemporary heuristics for qualitative sample size
Identify and contrast the six nonprobability sampling strategies covered in Bernard, Wutich & Ryan, Chapter 3
Distinguish theoretical sampling (Glaserian) from purposive sampling and explain why the difference matters
Recognize when key-informant sampling is the right tool and how it relates to other strategies
Defend a qualitative sample in writing: what your methods section owes the reader
Document the loneliness dataset's sampling logic and its limits on what your capstone can claim
Complete the capstone milestone: a 600-word sampling memo and a one-page sampling matrix

This course was developed by Dr. Kiffer G. Card, Faculty of Health Sciences, Simon Fraser University based onBernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE.

Section 1 of 5

Two Kinds of Samples, and the Question Each Was Built to Answer

⏱ Estimated reading time: 25 minutes

Lesson 3 · HSCI 841

Sampling in Qualitative Research

A second model of sampling, equally rigorous, built on premises your quantitative training never needed.

Probability sampling

Known selection probabilities, statistical inference

Every member of the defined population has a known, non-zero probability of selection. That knowability makes population inference possible.

Sample size is a function of variance, desired precision, and (for hypothesis testing) the effect size to detect at a target power. The logic is monotone: more cases always purchase more precision.

Nonprobability sampling

Selection probabilities unknown; inferential target is the phenomenon

Nonprobability designs include purposive, quota, convenience, snowball, respondent-driven, theoretical, and key-informant sampling.

The selection logic is curatorial, not statistical. You choose cases that illuminate the phenomenon, not cases whose selection probability you can calculate.

Extensity vs. intensity

Two complementary lenses on the same phenomenon

Probability → extensity

What percentage? How does prevalence vary? How has the rate changed? Statistical inference to a defined population.

Nonprobability → intensity

What kinds exist? What are the mechanisms? How do people experience it? Analytic description of the phenomenon.

Bernard, Wutich & Ryan (2017), p. 37–38.

The wrong reading

A defensible nonprobability sample is not a failed probability sample

The apologetic methods section assumes nonprobability sampling is a fallback. Bernard, Wutich & Ryan are clear that this framing is wrong.

The curatorial logic selects for variation in the phenomenon, not for representativeness of a population.Bernard, Wutich & Ryan (2017), Ch. 3

Sample-size logic

Why the question has to be different

Carry forward

What to take into the next section

Two jobs, not two quality levels. Probability and nonprobability sampling answer different questions.
Curatorial logic. Nonprobability samples select for variation in the phenomenon, not population representativeness.
Saturation, not power. The sample-size question for nonprobability work is answered by watching the analysis stop producing new information.

Introduction and Overview

A first-year epidemiology student arriving at this course already has a clean mental model of sampling. There is a population. You define it, you draw a probability sample from it, you measure something, and the sample mean is an unbiased estimator of the population mean, with a confidence interval whose width is determined by sample size and design effect. That model is correct, it is powerful, and across earlier courses you have used it to estimate prevalence, to compare exposed and unexposed groups, and to fit regression models. The model is also, as a description of how qualitative researchers actually sample, almost entirely wrong.

Bernard, Wutich, and Ryan (2017, p. 37) open Chapter 3 by drawing the line clearly: there are two kinds of samples in social-science research, and they were built to do different jobs. Probability samples were built so you can estimate population-level magnitudes with calculable error. Nonprobability samples were built so you can characterize a phenomenon: identify its categories, understand its mechanisms, and describe how people make sense of it. These are not different ways of doing the same job badly or well. They are different jobs.

This section unpacks that distinction. We start with what probability sampling is and what its sample-size logic looks like (briefly, because you already know it). We then introduce nonprobability sampling on its own terms, not as the disappointing-cousin-of-real-sampling that some introductory texts make it out to be. The remaining sections of the lesson work through the operational details: the six nonprobability strategies, the empirics of saturation, theoretical sampling, key informants, and the methods-section discipline you will need to defend your eventual capstone sample to a public-health reader.

Learning Objectives for this section

Articulate the operational difference between probability and nonprobability sampling.
Explain why “a nonprobability sample is a bad probability sample” is the wrong framing.
Recall the sample-size logic of probability sampling in one sentence each: power, design effect, finite-population correction.
Recognize that the sample-size question for nonprobability work is fundamentally different from the one you have been trained on.

1.1 What Probability Sampling Is, in One Page

A probability sample is one in which every member of a defined population has a known, non-zero probability of being selected, and that probability is built into the design (Bernard, Wutich & Ryan, 2017, p. 38). Simple random sampling, stratified sampling, cluster sampling, multi-stage sampling, and probability-proportional-to-size designs are all probability designs. The thing that makes them probability samples is not that they involve a random-number generator; it is that the selection probabilities are knowable. That knowability is what makes statistical inference to the population possible.

In probability sampling, sample size is a function of three quantities: the variance of the thing being measured, the precision you want around the estimate (the half-width of the confidence interval), and, for hypothesis-testing applications, the effect size you want to detect with a given power. Bernard, Wutich, and Ryan do not belabour this because if you have made it to a graduate qualitative methods course you have already done sample-size calculations in an earlier course. You know that a prevalence study aiming for ±3% around a 50% prevalence at 95% confidence needs about 1,067 respondents; that a t-test for a moderate effect needs ~64 per arm at 80% power; that a cluster-randomized trial with a design effect of 1.5 needs roughly that much again. The arithmetic is well-understood and the textbooks for it sit in the room next door to this course.

What matters for our purposes is the shape of the probability-sampling sample-size argument. It is a shape that says: the more cases you collect, the more precisely you can estimate a population-level magnitude. Each additional case purchases a (diminishing) amount of statistical information about the same quantity. The argument is monotone. Twenty is better than ten; two hundred is better than twenty; two thousand is better than two hundred. The only reason to stop is cost.

1.2 What Nonprobability Sampling Is, and What Question It Was Built To Answer

Key insight - The probability sample is the wrong baseline

Qualitative sampling is often defended (or attacked) by comparing it to probability sampling. This is the wrong comparison. Probability samples are designed to support claims about population frequencies; qualitative samples are designed to support claims about meaning, mechanism, or variation. Asking a qualitative sample to deliver population estimates is like asking a microscope to map a continent. Different tools, different jobs.

A nonprobability sample is one in which the selection probabilities are not knowable and inference to a defined population is not the analytic goal. Convenience sampling, purposive sampling, quota sampling, snowball sampling, respondent-driven sampling, theoretical sampling, and key-informant sampling are all nonprobability designs. They differ from each other in important ways, and this lesson is largely about how they differ, but they share that defining feature: you cannot calculate the probability that any given member of a hypothetical population was selected for your study.

The temptation, especially for analysts trained on probability sampling, is to read this as a deficiency. It is not. Nonprobability sampling was developed because some research questions are not population-magnitude questions. The questions qualitative researchers most commonly ask, such as what is loneliness, how does it show up, what configurations of loneliness exist, what are the mechanisms, and what do people do about it, are not answered by knowing what proportion of British Columbians experience loneliness. They are answered by deliberately choosing cases that illuminate the phenomenon. The selection logic is curatorial rather than statistical. You choose cases the way a museum curator chooses pieces, for what each one reveals, rather than pulling crates from the warehouse at random.

Bernard, Wutich, and Ryan put the difference this way (paraphrasing pp. 37–38): probability samples are about extensity (the breadth of a phenomenon in a population); nonprobability samples are about intensity (the depth, detail, and configuration of the phenomenon). You can imagine the same phenomenon, loneliness for instance, as having two complementary aspects that the two sampling logics give you access to. A probability sample of 5,000 Canadians would tell you what percentage report frequent loneliness, how the prevalence varies by age and income, and how the rate has changed over time. A nonprobability sample of 20 deliberately chosen interviews would tell you what the experience of loneliness is, what kinds of loneliness exist, what triggers it, how people interpret it, and what they do about it. Each design answers a question the other cannot.

The two-jobs framing in one table

Feature	Probability sampling	Nonprobability sampling
Goal	Estimate a population magnitude	Characterize a phenomenon
Selection logic	Statistical (probabilities known)	Curatorial (probabilities not knowable)
Sample-size logic	Power / precision / variance	Saturation; informational redundancy
Inferential target	The defined population	The phenomenon, transferably described
Generalization claim	Statistical (CI around a parameter)	Theoretical / analytic (categories, mechanisms)
Typical n	Hundreds to thousands	One to several dozen, occasionally more

1.3 Why “Nonprobability Is a Bad Probability Sample” Is the Wrong Reading

The single most common misreading of qualitative sampling in published health research is the assumption that the researcher tried to draw a probability sample, failed, and ended up with a convenience sample as a fallback. The published methods section often reinforces this misreading by describing the sample in apologetic terms (“owing to resource constraints we recruited a convenience sample of 18 participants”). Bernard, Wutich, and Ryan are emphatic that this framing is wrong and that letting it pass is bad for the field. A defensible nonprobability sample is not a failed probability sample. It is a sample assembled to do a job that probability sampling could not have done.

The loneliness dataset is a useful concrete case. The 20 transcripts vary deliberately across age (18 to 82), gender (women, men, and non-binary participants), life-stage, immigration status, caregiving role, and identity. The sample is not random; the configuration is engineered. The point of including P14 Kenji (60, late-life coming out after a long heterosexual marriage) is not that he is statistically representative of British Columbian men in his sixties. He almost certainly is not. The point is that his configuration, late-life sexual-identity disclosure interacting with the loss of long-standing social ties, is a kind of loneliness the literature under-describes, and his transcript gives you analytic purchase on that configuration. A randomly drawn sample of 20 BC men in their sixties would almost certainly miss him. A probability sample of 5,000 would catch him as 0.06% of a frequency table and would not produce 8,000 words of his account.

This is the curatorial logic. You are not trying to be representative of a population; you are trying to be representative of a phenomenon's variation. Bernard, Wutich, and Ryan call this maximum-variation sampling when the variation is the explicit goal (we will come back to it as a kind of purposive sampling in a later section). What you owe your reader, in such a design, is not a calculation of selection probabilities; it is a defence of the variation you chose to capture and an honest accounting of the variation you did not.

1.4 Why the Sample-Size Question Has To Be Different

The shape of the probability-sampling sample-size argument, namely that more cases purchase more precision around the same quantity, does not apply to nonprobability work. The reason is straightforward. In nonprobability sampling you are not estimating a population-level magnitude. There is no parameter that the additional case is sharpening your estimate of. The relevant question is different: at what point do additional cases stop adding new information about the phenomenon?

This is the question of saturation, and it is the operational concept that has replaced the power calculation in nonprobability work. We give it the full treatment in a later section. For now, the key idea is that nonprobability sample size is governed by informational redundancy, not by precision. When the twenty-first interview produces nothing you had not already heard in the first twenty, with no new themes, no new mechanisms, no new variations on the categories you have developed, you have hit the floor of marginal information return, and the design is telling you that you have enough cases for the analytic claims you can defensibly make.

One thing to set aside before going further

If you were trained to read “n = 20” as an underpowered study, you will need to set that reading aside for the rest of this lesson and the rest of the course. The right question for a qualitative study with 20 transcripts is not “was the power adequate?” The right questions are: what configurations were captured? What variation does the sample cover? What claims can the analyst defensibly make on that basis? Those questions are answered by the methods section, not by an arithmetic. A later section of this lesson tells you what the methods section has to contain.

Reflection

Think of one qualitative study (published or hypothetical) and one quantitative study addressing the same broad topic. Briefly describe what each sampling logic gives the field that the other cannot. Try to avoid the framing of “the qualitative work is a follow-up to” or “a stepping stone toward” the quantitative work; the two should be co-equal in your answer.

Model answerA strong response names a specific topic and then names the distinct epistemic contribution of each design. Example: “On the topic of vaccine hesitancy, the Canadian Community Health Survey can tell us what proportion of adults declined a recommended COVID-19 booster, how the rate varies by income, education, and immigration status, and how the rate has changed across waves, a population-magnitude account that no qualitative study can produce. A qualitative interview study with 25 deliberately varied hesitant adults can tell us what the specific arguments are that hesitant people give themselves and each other, how those arguments differ across kinds of hesitancy (distrust of pharma vs. distrust of government vs. religious objection vs. needle phobia presenting as ideological), and what an intervention would need to engage to be persuasive. Neither contribution is a precursor to or follow-up of the other; they are complementary epistemic outputs. The qualitative work could not have been done with a probability design because the categories of interest are themselves part of what the study is discovering; the quantitative work could not have produced configurations of argument because survey items collapse the very nuance the qualitative study is built to surface.”

Minimum 20 characters required.

✓ Reflection saved

Section 3 of 5

The Six Nonprobability Sampling Strategies

⏱ Estimated reading time: 35 minutes

Section 3 of 5

The Six Nonprobability Sampling Strategies

Each with its own logic, appropriate uses, and failure modes.

Strategy 1

Quota sampling

Logic: pre-specified target cells; recruit until each is filled.

Strength: coverage of known dimensions is guaranteed by construction.

Weakness: within each cell, selection is typically convenience-based.

Use when: you need to ensure the sample spans dimensions you already know matter for your question.

Strategy 2

Purposive / judgment sampling

Logic: deliberate case selection based on theoretical or substantive criteria.

Sub-types (Patton, 2015):

Maximum-variation (span the variation)
Homogeneous (within-group patterns)
Extreme/deviant case (boundary illumination)
Critical case (if here, then anywhere)
Confirming/disconfirming (test an interpretation)

The loneliness dataset

Purposive with quota elements, leaning maximum-variation across age, gender, immigration, caregiving, life-stage, and identity.

Strategy 3

Convenience sampling

Recruit whoever is available. Defensible for piloting and training; rarely defensible for substantive analytic claims unless transparently acknowledged.

Defensible uses

Piloting an interview guide. Testing equipment. Building community rapport before formal recruitment.

Indefensible uses

A final analytic sample whose limitations go unacknowledged. The apologetic methods section in disguise.

Strategy 4

Network sampling: snowball and respondent-driven sampling

Snowball sampling

Participants refer others. Access to hidden/stigmatized populations. Seed-network dependent; not for prevalence estimation. (Goodman, 1961)

Respondent-driven sampling (RDS)

Dual incentives + limited coupons + weighting framework. Approximates population inference for hidden populations under strong network assumptions. (Heckathorn, 1997)

Strategy 5

Theoretical sampling (Glaserian)

Iterative and analysis-driven. Emerging theoretical concepts determine who or what to sample next. Sampling and analysis proceed together.

Theoretical sampling

Emergent. Criteria determined by developing analysis. Cannot be pre-specified. Requires iterative rounds. (Glaser & Strauss, 1967)

Purposive sampling

A-priori. Criteria specified before recruitment. Variation targets set in advance. The loneliness dataset is this, not theoretical.

Strategy 6 · Carry forward

Key-informant sampling

Selection by unusual knowledge or strategic position. Can replace or supplement broader sampling.

Strength: system-level and institutional perspectives no lay participant can provide.

Watch out for: analyst adopts the informant’s framings uncritically (“going native”) and community politics around who gets named a key informant.

Next: what a qualitative methods section must say to defend the sample to a public-health reader.

Introduction and Overview

Bernard, Wutich, and Ryan organize nonprobability sampling into six strategies (Chapter 3, pp. 42–56). The strategies are not mutually exclusive, and most real studies combine two or three, but each has its own logic, its own appropriate uses, and its own characteristic failure modes. This section walks through all six in turn, with the loneliness dataset as a concrete reference and with notes on when each strategy is the right choice.

Learning Objectives for this section

Identify and contrast the six nonprobability sampling strategies in Bernard, Wutich, and Ryan.
Recognize the loneliness dataset's sampling logic as purposive with quota elements.
Distinguish snowball sampling from respondent-driven sampling and explain why RDS's inference machinery matters.
Recognize when each strategy is the right tool for the analytic question.

3.1 Quota Sampling

Quota samplingv

Sets quotas for specific subgroups (e.g., 10 men and 10 women; 5 in each age bracket). Within quotas, recruitment is convenient. Useful when you need to ensure representation of categories that matter for your question. Not equivalent to probability stratified sampling.

Purposive / judgment samplingv

Deliberate selection of participants who can speak to the question at hand. Maximum-variation (deliberately diverse), homogeneous (deliberately similar), critical-case (theoretically pivotal cases), and deviant-case (extreme exemplars) are common purposive strategies. The dominant strategy in qualitative health research.

Convenience samplingv

Recruits whoever is reachable: undergraduates on campus, patients in clinic, friends of the researcher. Cheap, fast, and almost always biased in ways that limit transferability. Sometimes the right choice for a pilot study; rarely the right choice for a final analysis.

Network sampling (snowball, RDS)v

Asks existing participants to refer others. Snowball sampling is straightforward but can be deeply biased. Respondent-driven sampling (RDS), developed by Heckathorn (1997), adds dual-incentive structure and mathematical adjustment to approximate probability sampling in hidden populations such as people who inject drugs, sex workers, and undocumented migrants. Widely used in HIV epidemiology.

Theoretical sampling (Glaserian)v

The defining sampling logic of grounded theory. Sampling is iterative: emerging theoretical concepts dictate who or what to sample next, with the goal of elaborating, contrasting, or testing concepts. Sampling and analysis proceed together. Not pre-specified at the start of the study.

Quota sampling is the deliberate construction of a sample to match pre-specified target cells across one or more dimensions of variation. You decide in advance that you want, say, five women and five men; or three participants in each of four age quartiles; or equal representation of immigrants and Canadian-born. You recruit until each cell is filled, accepting whoever in that cell happens to be available.

Quota sampling is the most common nonprobability design in applied health research, and you will encounter it routinely in market research, polling, and the rapid-turnaround qualitative components of evaluation studies. Its strength is that the sample's coverage of the dimensions of interest is guaranteed by construction. Its weakness is that within each cell, selection is typically convenience-based, meaning whoever showed up first, whoever the recruiter knew, or whoever responded to the flyer, and the within-cell sample is therefore a convenience sample.

The loneliness dataset uses quota elements. The interview guide's recruitment notes (Bernard, Wutich & Ryan, 2017, Ch. 3 logic; see also the dataset's Interview Guide document) specify variation targets across four age quartiles spanning 18–80+, across gender (women, men, gender-diverse), across living arrangement, and across major life-stage transitions (recent immigration, recent loss, recent retirement, recent caregiving role, recent relationship dissolution). The sample of 20 was assembled to hit those targets, though not as pure quota sampling, because the framework was richer than a fixed-cell design (we will discuss that richness just below), but with quota-like discipline about coverage.

3.2 Purposive / Judgment Sampling

Purposive sampling (sometimes called judgment sampling or purposeful sampling) is the deliberate selection of cases on the basis of theoretical or substantive criteria. The analyst chooses cases that are expected to illuminate the phenomenon, on the grounds that those cases are richer, more variable, more strategically located, or more theoretically informative than randomly drawn ones would be. The selection criteria are explicit; the choices are defensible; the logic is curatorial.

Patton (2015; see also Palinkas et al., 2015) catalogued more than a dozen sub-types of purposive sampling, of which the most commonly invoked in health research are:

Maximum-variation sampling. Deliberately select cases that span the variation in the phenomenon. The loneliness dataset is an instance.
Homogeneous sampling. Select cases that share key features so within-group patterns become visible.
Extreme or deviant case sampling. Select unusual or boundary cases on the theory that they make analytically invisible features visible.
Critical case sampling. Select cases on the theory that if a phenomenon shows up here, it will show up anywhere; if it does not show up here, it does not show up.
Typical case sampling. Select cases that exemplify the modal pattern.
Confirming/disconfirming case sampling. Select cases specifically to test or strain an emerging interpretation.

The loneliness dataset is best characterized as purposive with quota elements, leaning toward maximum-variation. The 20 transcripts were not drawn to be statistically representative of British Columbia, and they were not assembled by simply filling demographic cells. They were chosen to capture the variation that the literature suggests matters for loneliness: variation in age (P01 Maya, 22; P11 Helen, 78; P20 Frank, 82), in gender and gender-identity history (P12 Tyler, non-binary; P14 Kenji, late-life coming out), in immigration trajectory (P15 Amira, recent refugee; P18 Chen, decades-long bicultural negotiation), in caregiving role (P05 Linda, daughter-of-aging-parent; P07 Diana, partner-of-someone-with-dementia), in life-stage transition (P03 Sarah, post-romantic-dissolution; P16 Elena, post-job-loss; P19 Rose, late-life widowhood), and in identity configurations the literature under-describes (P14 Kenji's late-life coming out; P12 Tyler's non-binary identity in a small town). The dataset is engineered, not random; that engineering is its strength as a teaching dataset.

3.3 Convenience Sampling

Convenience sampling is the recruitment of whoever is available, accessible, or willing, with no purposive criterion other than availability. Examples include the “intercept” survey at a clinic entrance, the email blast to a departmental listserv, the friend-of-a-friend pilot interview, and the all-too-common “we recruited 18 first-year nursing students from a course taught by the second author.”

Bernard, Wutich, and Ryan (2017, pp. 48–49) are careful, not dismissive, about convenience sampling. There are defensible uses of convenience sampling: piloting an interview guide, testing the recording equipment, training a new interviewer, or building rapport in a community before formal recruitment begins. There are also indefensible uses: a published study that recruited only the readily available and then claims findings that generalize beyond them. The line between the two is what the methods section says. If a convenience sample is reported transparently as such, with limitations honestly acknowledged, it is a defensible piece of empirical work. If it is dressed up as something more representative than it is, it is not.

Most published qualitative health studies are, in fact, convenience samples whether they say so or not. A more honest field would acknowledge this and would more carefully describe what claims a convenience sample does and does not support.

3.4 Network Sampling: Snowball and Respondent-Driven Sampling

Network sampling uses the social networks of initial participants to recruit additional ones. The two main forms are snowball sampling (Goodman, 1961) and respondent-driven sampling (RDS), and the distinction between them is methodologically important.

Snowball sampling. The classical form: initial participants (“seeds”) are asked to refer others, who are asked to refer others, and so on. Snowball sampling is the standard tool for reaching populations that are hidden, stigmatized, hard to identify from sampling frames, or organized around relationships rather than addresses. Examples: people who use drugs, sex workers, undocumented migrants, LGBTQ+ adults in regions where outness is risky, members of religious or political minorities.

The strength of snowball sampling is access. The weakness is that the sample is shaped by the social networks of the initial seeds, and those networks are unlikely to be representative of the target population. A snowball sample of people who use drugs starting from one community-health-centre client will look very different from one starting from a university-affiliated harm-reduction researcher. Bernard, Wutich, and Ryan are clear that snowball samples are inappropriate for prevalence estimation and have to be reported as the network samples they are.

Respondent-driven sampling (RDS). Developed by Douglas Heckathorn (1997; Heckathorn 2002 for the inference machinery), RDS is a sophisticated extension of snowball sampling that adds dual incentives (participants are paid for their own participation and for the participation of recruits they bring in), limited coupons (each participant can recruit only a fixed number of others, typically three), and a tracking-and-weighting framework that allows population-level inference under specific assumptions about network structure.

The reason RDS matters, and the reason it gets discussed in a graduate qualitative methods course even though much RDS work is quantitative, is that it represents the most serious attempt in the methods literature to turn network sampling into something with calculable inferential properties. Under Heckathorn's assumptions (long recruitment chains, accurate self-reported network sizes, random recruitment within social networks), RDS estimates can be weighted to approximate population-level prevalence and association estimates with calculable error bounds. The assumptions are strong and have been criticized in subsequent methodological work (Goel and Salganik, 2010; Gile and Handcock, 2010), but RDS remains the most defensible network-sampling design for many populations of public-health interest.

The loneliness dataset did not use snowball or respondent-driven sampling. The dataset's purposive-with-quota design was assembled directly through recruiters, not through participant-driven referral. This is important to record in your capstone's methods section: snowball and RDS designs come with specific analytic obligations, and a study that did not use them cannot claim the inferential properties they enable, but also has none of the network-dependence biases they introduce.

3.5 Theoretical Sampling (Glaserian)

CAPSTONE Plan it - Your sampling matrix

For your capstone, draft a one-page sampling plan:

Sampling strategy: Which of the five above are you using, and why?
Sampling matrix: What categories matter for your question (age, role, geography, condition)? How many participants per cell?
Recruitment: Through what channels will you reach each cell?
Stopping rule: What is your operationalization of saturation, and what would tell you when you have reached it?

A defensible qualitative sample needs the same explicit planning a survey sample does. The strategy is different; the discipline is the same.

Theoretical sampling is the iterative, emergent sampling strategy developed by Barney Glaser and Anselm Strauss (1967) as part of grounded theory. The procedure: you collect and analyse some data, develop a preliminary theory, identify what additional data would test or extend the theory, sample those additional data deliberately, re-analyse, and continue until theoretical saturation is reached. The sampling and the analysis are not separated: each round of sampling is shaped by what the previous round revealed.

Theoretical sampling differs from purposive sampling in a critical way that is often missed. Purposive sampling is a-priori: you decide before recruitment what kinds of variation you want, and you recruit to those targets. Theoretical sampling is emergent: you cannot say in advance what cases you will want, because the criteria are determined by the developing analysis. The first three participants in a theoretical sampling study might be selected for convenience; the next three might be selected because the analysis of the first three revealed a configuration that needs further exploration; the next three might be selected to test whether a hypothesized boundary condition holds.

The loneliness dataset is not a theoretically sampled dataset. The 20 transcripts were assembled in a single recruitment phase, with the variation targets specified in advance. This matters because some qualitative-methods textbooks use “theoretical sampling” and “purposive sampling” almost interchangeably; Bernard, Wutich, and Ryan are clear that they are different things, and a methods section that calls a purposive sample a theoretically sampled one is making a category mistake.

Glaser vs. Strauss on theoretical sampling

Within grounded theory, the original co-authors split methodologically in the 1980s. Glaser kept theoretical sampling tightly tied to emergence from the data; Strauss (with Corbin) developed a more structured version that allowed more a-priori coding. Charmaz's constructivist grounded theory (which you will meet later in the course) sits closer to the Glaserian original. For the purposes of this course, “theoretical sampling” refers to the Glaserian original: iterative, emergent, analysis-driven.

3.6 Key-Informant Sampling

Key informants are individuals selected because they are unusually knowledgeable, articulate, or strategically located with respect to the phenomenon of interest. In a study of clinic operations, the key informants might be the head nurse, the intake coordinator, and the medical director, people whose roles give them an overview no individual patient could provide. In a study of a religious community, the key informants might be elders, clergy, or longtime members. The selection criterion is access to information, not representativeness.

Bernard (a foundational figure in this concept, drawing on cultural anthropology fieldwork) is careful to distinguish two uses (Bernard, Wutich & Ryan, 2017, pp. 54–56). Key informants can replace broader sampling when the research question is about a system or institution and the key informants are who knows it: a study of how a province's overdose-response protocol gets implemented might rely largely on interviews with the small number of people who actually implement it. Key informants can also supplement broader sampling when the analytic question requires both lay perspectives and expert ones: a study of loneliness might interview 20 lay participants (the loneliness dataset) and supplement with two or three key-informant interviews with clinicians, community-organisation directors, or older-adult-services coordinators who see loneliness across many people.

Key-informant interviews are typically deeper, longer, and more iterative than lay interviews. They are also more vulnerable to the informant's own framings being adopted by the analyst (the “going native” problem familiar from anthropology), and to the political dynamics around who gets named a key informant in a community.

3.7 Summary Table

Strategy	Logic	Best for	Watch out for
Quota	Pre-specified target cells across dimensions of variation	Guaranteed coverage of known dimensions	Within-cell convenience selection
Purposive	Theoretically/substantively driven selection	Maximum-variation, extreme-case, critical-case designs	Criteria must be explicit and defensible
Convenience	Whoever is available	Piloting, training, equipment testing	Indefensible for substantive claims unless transparently acknowledged
Snowball	Participant-driven referral chains	Reaching hidden or stigmatized populations	Seed-network dependence; not for prevalence
Respondent-driven (RDS)	Dual-incentive, coupon-tracked snowball with inference machinery	Population estimates for hidden populations	Assumption-heavy; demanding to execute well
Theoretical	Iterative, emergent, analysis-driven (Glaserian)	Grounded-theory studies; theory development	Distinct from purposive; do not conflate
Key informant	Selection by unusual knowledge or strategic position	System/institution studies; expert supplement	“Going native”; community politics

Reflection

Imagine you are designing a qualitative study of loneliness among recently arrived refugees in British Columbia. Which two or three of the six strategies would you combine, and why? Be specific: what does each strategy contribute that the others cannot?

Model answerA defensible answer combines at least two strategies and names what each contributes. Example: “I would combine purposive (maximum-variation) sampling with key-informant interviews, and probably some judicious use of snowball sampling to reach within-community networks. Purposive sampling lets me deliberately capture variation across language groups, settlement trajectories, family configuration, and time-since-arrival, configurations the literature suggests matter for the refugee loneliness experience. Key-informant interviews with settlement-agency staff, ESL instructors, and cultural-broker community leaders would provide system-level context I could not get from any individual participant. Snowball recruitment within the refugee community itself would help me reach participants who are wary of formal recruitment by university researchers and would be more responsive to invitations from trusted community members. I would not use RDS, since it requires longer recruitment chains than a graduate-scale study can sustain, and I would not use pure convenience sampling because the population is heterogeneous and the configurations I most need are not the easily available ones.”

Minimum 20 characters required.

✓ Reflection saved

Section 4 of 5

Defending a Qualitative Sample, and the Week 3 Capstone Milestone

⏱ Estimated reading time: 30 minutes

Section 4 of 5

Defending a Qualitative Sample

And the capstone milestone.

Element 1 of 5

The sampling strategy, named and justified

Not just “a purposive sample was recruited.” Name the sub-type. Justify it. Name what alternative strategies would have given or cost.

Purposive sampling with quota elements, organized around a maximum-variation logic across age, gender, life-stage, immigration status, caregiving role, and identity.The loneliness dataset, characterized

Elements 2, 3, and 4

Recruitment, variation captured, variation NOT captured

(2) Recruitment procedure

How were participants found? What was the screening process? Who was screened out and why?

(3) Variation captured

A sampling matrix: one row per participant, one column per dimension. Lets the reader see what the study covers.

(4) Variation NOT captured

Explicitly name the configurations outside the sample. The most-skipped element; the most important for constraining claims.

Element 5

Sample-size logic and transferability

Defensible sample-size logic for a nonprobability study draws on:

Saturation evidence: which type, at what interview count.
Empirical anchors: Guest et al. (2006); Hennink & Kaiser (2022).
Information-power framework (Malterud, Siersma & Guassora, 2016).

Transferability (Bernard, Wutich & Ryan): the reader’s ability to judge whether the patterns identified are likely to hold in settings they care about. Your job is to describe the sample transparently enough that the reader can make that judgment.

Hands-on in R

Document the sample for your appendix

Age histogram

Distribution of the 20 participants across the 18–82 range.

Gender × life-stage chart

A two-way view of the sample's coverage.

20-row matrix

One row per participant: age, gender, life stage, immigration, caregiving, identity notes.

Built from a hand-made participant_attributes.csv with the tidyverse. This becomes the sampling figure your capstone cites.

Week 3 capstone milestone · Carry forward

Two deliverables

600-word sampling memo

Strategy named & justified. Recruitment procedure. Variation captured. Variation NOT captured. Sample-size logic citing Guest et al. (2006) and Hennink & Kaiser (2022).

One-page sampling matrix

One row per participant. Columns: age, gender, life stage, immigration status, caregiving role, identity notes. The figure for your capstone appendix.

Introduction and Overview

The first three sections gave you the conceptual machinery: probability vs. nonprobability, saturation, and the six strategies. This section turns operational. We work through what the methods section of a qualitative health paper actually has to say about sampling, what Bernard, Wutich, and Ryan call “defending the sample,” and we use that template to set up the capstone milestone, in which you document the loneliness dataset's sampling logic in your own writing.

Learning Objectives for this section

Identify the five things a defensible qualitative methods section owes a reader about sampling.
Recognize the loneliness dataset as a worked example of each element.
Document the sampling structure of a small dataset in R for an appendix figure.
Produce the capstone deliverable: a 600-word sampling memo and a one-page sampling matrix.

4.1 What the Methods Section Owes the Reader

A qualitative methods section that handles sampling well covers five elements. Each element has a corresponding sub-claim, and a paper that omits any one of them leaves the reader unable to evaluate the design.

(1) The sampling strategy, named and justified. Which of the six (or combination) did you use? On what theoretical or substantive grounds? It is not enough to say “a purposive sample was recruited.” You have to say which kind of purposive sampling (maximum-variation, extreme-case, critical-case), why that kind, and what the alternative strategies would have given or cost you. The loneliness dataset would be described as purposive sampling with quota elements, organized around a maximum-variation logic across age, gender, life-stage, immigration status, caregiving role, and identity.

(2) The recruitment procedure. How did you find people? Where did you advertise? What was the screening process? Who was screened out and why? For the loneliness dataset (synthetic though it is, for instructional purposes), the recruitment would be described as conducted through community-based recruiters working with settlement agencies, retirement communities, post-secondary student-health offices, and 2SLGBTQ+ community organisations, with screening for adult age, current BC residence, English-language interview capacity, and self-identified loneliness in the past 12 months.

(3) The variation captured. What configurations does the sample actually cover? The cleanest way to report this is a sampling matrix, a table that lists each participant and the values of each variation dimension. A reader looking at the matrix can immediately see what variation the study captured. The capstone deliverable asks you to produce exactly such a matrix.

(4) The variation NOT captured. What configurations are explicitly outside the sample? This is the discipline that most published qualitative health papers most reliably skip, and it is the one that most distinguishes a defensible sample from one dressed up to look more comprehensive than it is. For the loneliness dataset, the variation not captured includes (among others): adults under 18, BC residents not interviewable in English, adults living outside BC, currently institutionalised adults (carceral, psychiatric, long-term hospital), unhoused adults, and adults experiencing acute crisis at the time of approach.

(5) The sample-size logic. Why this number of cases? On what evidence is the size defensible? The defensible answers, depending on design, are some combination of: code or meaning saturation reached at case n; cultural-consensus logic (Romney/Weller/Batchelder) for a culturally coherent group; the Guest et al. (2006) and Hennink & Kaiser (2022) empirical literature for an interview study with a reasonably focused question; the information-power framework of Malterud, Siersma, & Guassora (2016); or analytic and resource constraints honestly named.

4.2 What the Methods Section Does NOT Have To Do

A defensible qualitative methods section does not have to defend the sample on the grounds of probability-sample logic. You do not have to apologize for not being a population sample. You do not have to gesture at “future quantitative work with a larger n.” You do not have to invoke the language of generalizability when the inferential target was never the population in the first place.

What you do have to do is name the inferential target you are claiming. Bernard, Wutich, and Ryan call this transferability: the ability of the reader to assess whether the patterns you identified are likely to hold in other settings or populations the reader is interested in. Transferability is not the same as statistical generalizability, and the burden of judging it is partly on the reader. Your job is to give the reader enough information, through the methods section, the sampling matrix, and the variation-captured and variation-not-captured statements, that the reader can make a defensible transferability judgment.

4.3 Documenting a Sample in R

Most qualitative analyses spend more time on text than on R. Sampling is one of the exceptions: documenting the sample's structure is the kind of small-dataset reporting that R does well, and a one-page sampling matrix figure goes into your capstone appendix with very little work. The block below sketches the procedure for the loneliness dataset.

RDocument the loneliness dataset's sample structure

Assuming you have created a small participant_attributes.csv file with one row per participant and columns for age, gender, life_stage, immigration_status, caregiving_role, and identity_notes, this block produces summary bar charts you can drop into your capstone appendix.

library(tidyverse)

# Read the per-participant attribute file you built by hand from the transcripts
attrs <- read_csv("../term projects/HSCI_841/participant_attributes.csv")

glimpse(attrs)
# Should show 20 rows and the columns: pid, name, age, gender, life_stage,
# immigration_status, caregiving_role, identity_notes

# Quick age distribution
ggplot(attrs, aes(x = age)) +
  geom_histogram(binwidth = 10, fill = "#0B7B6B", colour = "white") +
  labs(title = "Age distribution of loneliness sample (n = 20)",
       x = "Age (years)", y = "Count") +
  theme_minimal()

# Variation across gender x life-stage (a 2-way summary of the sample's coverage)
attrs |>
  count(gender, life_stage) |>
  ggplot(aes(x = life_stage, y = n, fill = gender)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(title = "Sample coverage: gender x life-stage",
       x = NULL, y = "Number of participants") +
  theme_minimal()

# Sample matrix table (the kind that goes in your appendix)
attrs |>
  select(pid, age, gender, life_stage, immigration_status,
         caregiving_role, identity_notes) |>
  arrange(age) |>
  print(n = 20)

What success looks like: An age-histogram, a gender-by-life-stage bar chart, and a printed-to-console 20-row sampling matrix. Save the bar chart as a PNG; export the matrix as a table for your appendix. This is the sampling figure your capstone will cite.

4.4 The Capstone Milestone

The capstone milestone integrates everything in this lesson. The deliverables are two: a 600-word sampling memo and a one-page sampling matrix. Together they are the first piece of the eventual methods section of your capstone paper.

Reflection

Of the five elements a defensible qualitative methods section owes the reader on sampling (strategy named & justified; recruitment procedure; variation captured; variation NOT captured; sample-size logic), which is the one you are most worried about getting right for your own capstone, and why? Be specific about which loneliness-dataset feature makes this element hardest.

Model answerThe most common honest answer is variation NOT captured, because most students are trained to defend what they did rather than to name what they did not do. The loneliness dataset's particular challenge is that the variation it captures is rich and visible (age, gender, immigration, caregiving, identity), which makes it tempting to over-claim. The not-captured side, including under-18 youth, non-English speakers, unhoused adults, currently institutionalised adults, and adults in acute crisis at the time of approach, is what the methods section has to articulate in a way that constrains the eventual paper's claims. Sample-size logic is the second-most-common answer because n = 20 feels small relative to the population-level instinct of a quantitative reader; the cure is to cite Guest et al. (2006) and Hennink & Kaiser (2022) and to name the saturation flavour you are claiming. A strong reflection picks one element, names what about the dataset makes it hard, and proposes a concrete practice that will get it right.

Minimum 20 characters required.

✓ Reflection saved

Reference

Glossary: Sampling Concepts, Strategies & Key People

📚 Reference page, available throughout the lesson

This glossary collects the key concepts, sampling strategies, and people introduced in this lesson. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.

Core Concepts

Probability Sample A sample in which every member of a defined population has a known, non-zero probability of being selected. The knowability of selection probabilities is what makes statistical inference to the population possible.

Nonprobability Sample A sample in which selection probabilities are not knowable and inference to a defined population is not the analytic goal. The selection logic is curatorial; the inferential target is the phenomenon's variation, not the population's distribution.

Extensity vs. Intensity Bernard, Wutich, and Ryan's framing of the two-jobs distinction. Probability sampling is about extensity (the breadth of a phenomenon in a population); nonprobability sampling is about intensity (the depth, detail, and configuration of the phenomenon).

Saturation The point at which collecting additional cases stops producing new information relevant to the analytic question. The operational replacement, in nonprobability sampling, for the statistical power calculation used in probability sampling.

Theoretical Saturation Glaser & Strauss's original usage: the point at which additional cases stop producing new categories, properties, or relationships in the developing theory. Tightly tied to theoretical sampling. The most demanding kind of saturation.

Code Saturation The point at which additional cases stop producing new codes in the codebook. Tends to arrive relatively quickly, as the major themes appear in the first several transcripts. What most empirical saturation studies measure.

Meaning Saturation The point at which additional cases stop deepening the analyst's understanding of the codes already identified. Harder and slower than code saturation; requires more interviews (typically 16–24 in Hennink, Kaiser & Marconi 2017).

Transferability The qualitative analogue of statistical generalizability: the reader's ability to assess whether the patterns identified are likely to hold in other settings or populations, given a transparent description of the sample. The analyst's job is to enable a defensible transferability judgment, not to make it for the reader.

Sampling Matrix A table listing each participant and the values of each variation dimension (age, gender, life-stage, etc.). A reader looking at the matrix can immediately see what variation the study captured. The capstone deliverable includes a one-page matrix for the loneliness dataset.

Sampling Strategies

Quota Sampling Deliberate construction of a sample to match pre-specified target cells across dimensions of variation. Strength: guaranteed coverage of the named dimensions. Weakness: within-cell selection is typically convenience-based.

Purposive (Judgment) Sampling Deliberate selection of cases on the basis of theoretical or substantive criteria. Patton catalogued more than a dozen sub-types: maximum-variation, homogeneous, extreme-case, critical-case, typical-case, confirming/disconfirming, and others.

Maximum-Variation Sampling A sub-type of purposive sampling in which cases are deliberately selected to span the variation in the phenomenon. The loneliness dataset's dominant logic.

Convenience Sampling Recruitment of whoever is available, accessible, or willing. Defensible uses: piloting, training, equipment testing, community rapport-building. Indefensible uses: substantive published studies that disguise a convenience sample as something more representative.

Snowball Sampling Network-based recruitment in which initial participants (seeds) refer additional participants, who in turn refer others. The standard tool for reaching hidden, stigmatized, or relationship-organized populations. Not appropriate for prevalence estimation.

Respondent-Driven Sampling (RDS) An extension of snowball sampling (Heckathorn 1997) that adds dual incentives, limited recruitment coupons, and a tracking-and-weighting framework. Under specific network-structure assumptions, RDS estimates can be weighted to approximate population-level inference.

Theoretical Sampling The iterative, emergent sampling strategy of grounded theory (Glaser & Strauss 1967). Each round of sampling is shaped by what the previous round revealed in analysis. Distinct from purposive sampling, which is a-priori.

Key-Informant Sampling Selection of individuals because they are unusually knowledgeable, articulate, or strategically located with respect to the phenomenon. Can replace broader sampling (for system or institution studies) or supplement it (for combining lay and expert perspectives).

Key People

H. Russell Bernard, Amber Wutich, Gery W. Ryan Authors of Analyzing Qualitative Data: Systematic Approaches (2nd ed., 2017). Chapter 3, on sampling, is the structural basis for this lesson.

Barney G. Glaser & Anselm L. Strauss Co-authors of The Discovery of Grounded Theory (1967), the foundational text for grounded theory and the original source of theoretical sampling and theoretical saturation. Glaser and Strauss split methodologically in the 1980s; for this course, “theoretical sampling” refers to the Glaserian original.

A. Kimball Romney, Susan C. Weller & William H. Batchelder Authors of the cultural-consensus theory paper (1986) that established the mathematical 4-to-6 rule: 4–6 knowledgeable informants suffice to recover a shared cultural model for a culturally coherent group. The deepest theoretical anchor for the small-n defensibility of qualitative sampling.

Greg Guest, Arwen Bunce & Laura Johnson Authors of the 2006 Field Methods empirical-saturation study that established the 12-interview heuristic (92% of codes identified after the first 12 interviews in a corpus of 60 women's-health interviews). The most cited empirical anchor for qualitative sample size.

Monique Hennink & Bonnie Kaiser Authors of the 2022 systematic review (Social Science & Medicine) of empirical tests of saturation across 23 studies, finding typical saturation between 9 and 17 interviews depending on study design. With Marconi (2017) they also established the code-saturation vs. meaning-saturation distinction.

Douglas D. Heckathorn Sociologist who developed respondent-driven sampling (RDS) at Cornell in the late 1990s (Heckathorn 1997, 2002). RDS's dual-incentive, coupon-tracked, weighting-based framework is the most serious attempt in the methods literature to turn network sampling into a design with calculable inferential properties.

Michael Quinn Patton Evaluation methodologist whose Qualitative Research & Evaluation Methods (4th ed., 2015) catalogued more than a dozen sub-types of purposive sampling (maximum-variation, homogeneous, extreme-case, critical-case, typical-case, confirming/disconfirming, and others) used in applied health and evaluation research.

No matching entries. Try a different search term.

HSCI 841 · Lesson 3

Qualitative Research Methods & Analysis in Public Health

Sampling in Qualitative Research

Learning objectives for this lesson:

Two Kinds of Samples, and the Question Each Was Built to Answer

Sampling in Qualitative Research

Known selection probabilities, statistical inference

Selection probabilities unknown; inferential target is the phenomenon

Two complementary lenses on the same phenomenon

Probability → extensity

Nonprobability → intensity

A defensible nonprobability sample is not a failed probability sample

Why the question has to be different

What to take into the next section

Introduction and Overview

Learning Objectives for this section

1.1 What Probability Sampling Is, in One Page

1.2 What Nonprobability Sampling Is, and What Question It Was Built To Answer

Key insight - The probability sample is the wrong baseline

The two-jobs framing in one table

1.3 Why “Nonprobability Is a Bad Probability Sample” Is the Wrong Reading

1.4 Why the Sample-Size Question Has To Be Different

One thing to set aside before going further

Reflection

Saturation: the Operational Concept and Its Empirical Backbone

Saturation

Theoretical, code, and meaning saturation

Theoretical saturation

Code saturation

Meaning saturation

The 4-to-6 rule

The 12-interview threshold

Systematic review: saturation is a range, not a number

Code saturation

Meaning saturation

Saturation has serious critics

The critique

The course's stance

What saturation does and does not give you

Introduction and Overview

Learning Objectives for this section

2.1 Saturation, Operationally

Hennink, Kaiser, & Marconi (2017) made this distinction matter

2.2 The Romney/Weller/Batchelder 4-to-6 Rule

2.3 Guest, Bunce, and Johnson (2006): The 12-Interview Threshold

2.4 Hennink and Kaiser (2022): The Systematic Review

The honest caveat about saturation

2.5 What Saturation Gives You the Right To Claim

Reflection

The Six Nonprobability Sampling Strategies

The Six Nonprobability Sampling Strategies

Quota sampling

Purposive / judgment sampling

The loneliness dataset

Convenience sampling

Defensible uses

Indefensible uses

Network sampling: snowball and respondent-driven sampling

Snowball sampling

Respondent-driven sampling (RDS)

Theoretical sampling (Glaserian)

Theoretical sampling

Purposive sampling

Key-informant sampling

Introduction and Overview

Learning Objectives for this section

3.1 Quota Sampling

3.2 Purposive / Judgment Sampling

3.3 Convenience Sampling

3.4 Network Sampling: Snowball and Respondent-Driven Sampling

3.5 Theoretical Sampling (Glaserian)

Glaser vs. Strauss on theoretical sampling

3.6 Key-Informant Sampling

3.7 Summary Table

Reflection

Defending a Qualitative Sample, and the Week 3 Capstone Milestone

Defending a Qualitative Sample

The sampling strategy, named and justified

Recruitment, variation captured, variation NOT captured

(2) Recruitment procedure