HSCI 841 — Lesson 3

Sampling in Qualitative Research

Qualitative Research Methods & Analysis in Public Health

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Distinguish probability and nonprobability sampling and explain when each is the right job
  • Define saturation and contrast classical, empirical, and revised contemporary heuristics for qualitative sample size
  • Identify and contrast the six nonprobability sampling strategies covered in Bernard, Wutich & Ryan, Chapter 3
  • Distinguish theoretical sampling (Glaserian) from purposive sampling and explain why the difference matters
  • Recognize when key-informant sampling is the right tool and how it relates to other strategies
  • Defend a qualitative sample in writing — what your methods section owes the reader
  • Document the HSCI 841 loneliness dataset's sampling logic and its limits on what your capstone can claim
  • Complete the Week 3 capstone milestone: a 600-word sampling memo and a one-page sampling matrix

This course was developed by Kiffer G. Card, PhD, as a companion to Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE.

Section 1 of 5

Two Kinds of Samples — and the Question Each Was Built to Answer

⏱ Estimated reading time: 25 minutes

Introduction and Overview

A first-year epidemiology student arriving at HSCI 841 already has a clean mental model of sampling. There is a population. You define it, you draw a probability sample from it, you measure something, and the sample mean is an unbiased estimator of the population mean — with a confidence interval whose width is determined by sample size and design effect. That model is correct, it is powerful, and across HSCI 230, 341, and 410 you have used it to estimate prevalence, to compare exposed and unexposed groups, and to fit regression models. The model is also, as a description of how qualitative researchers actually sample, almost entirely wrong.

Bernard, Wutich, and Ryan (2017, p. 37) open Chapter 3 by drawing the line clearly: there are two kinds of samples in social-science research, and they were built to do different jobs. Probability samples were built so you can estimate population-level magnitudes with calculable error. Nonprobability samples were built so you can characterize a phenomenon — identify its categories, understand its mechanisms, and describe how people make sense of it. These are not different ways of doing the same job badly or well. They are different jobs.

This section unpacks that distinction. We start with what probability sampling is and what its sample-size logic looks like (briefly, because you already know it). We then introduce nonprobability sampling on its own terms, not as the disappointing-cousin-of-real-sampling that some introductory texts make it out to be. The remaining sections of the lesson work through the operational details: the six nonprobability strategies, the empirics of saturation, theoretical sampling, key informants, and the methods-section discipline you will need to defend your eventual capstone sample to a public-health reader.

Learning Objectives for Section 1

  • Articulate the operational difference between probability and nonprobability sampling.
  • Explain why “a nonprobability sample is a bad probability sample” is the wrong framing.
  • Recall the sample-size logic of probability sampling in one sentence each: power, design effect, finite-population correction.
  • Recognize that the sample-size question for nonprobability work is fundamentally different from the one you have been trained on.

1.1 What Probability Sampling Is, in One Page

A probability sample is one in which every member of a defined population has a known, non-zero probability of being selected, and that probability is built into the design (Bernard, Wutich & Ryan, 2017, p. 38). Simple random sampling, stratified sampling, cluster sampling, multi-stage sampling, and probability-proportional-to-size designs are all probability designs. The thing that makes them probability samples is not that they involve a random-number generator; it is that the selection probabilities are knowable. That knowability is what makes statistical inference to the population possible.

In probability sampling, sample size is a function of three quantities: the variance of the thing being measured, the precision you want around the estimate (the half-width of the confidence interval), and, for hypothesis-testing applications, the effect size you want to detect with a given power. Bernard, Wutich, and Ryan do not belabour this because if you have made it to a graduate qualitative methods course you have already done sample-size calculations in HSCI 341 or 410. You know that a prevalence study aiming for ±3% around a 50% prevalence at 95% confidence needs about 1,067 respondents; that a t-test for a moderate effect needs ~64 per arm at 80% power; that a cluster-randomized trial with a design effect of 1.5 needs roughly that much again. The arithmetic is well-understood and the textbooks for it sit in the room next door to this course.

What matters for our purposes is the shape of the probability-sampling sample-size argument. It is a shape that says: the more cases you collect, the more precisely you can estimate a population-level magnitude. Each additional case purchases a (diminishing) amount of statistical information about the same quantity. The argument is monotone. Twenty is better than ten; two hundred is better than twenty; two thousand is better than two hundred. The only reason to stop is cost.

1.2 What Nonprobability Sampling Is, and What Question It Was Built To Answer

Key insight - The probability sample is the wrong baseline

Qualitative sampling is often defended (or attacked) by comparing it to probability sampling. This is the wrong comparison. Probability samples are designed to support claims about population frequencies; qualitative samples are designed to support claims about meaning, mechanism, or variation. Asking a qualitative sample to deliver population estimates is like asking a microscope to map a continent. Different tools, different jobs.

A nonprobability sample is one in which the selection probabilities are not knowable and inference to a defined population is not the analytic goal. Convenience sampling, purposive sampling, quota sampling, snowball sampling, respondent-driven sampling, theoretical sampling, and key-informant sampling are all nonprobability designs. They differ from each other in important ways — this lesson is largely about how they differ — but they share that defining feature: you cannot calculate the probability that any given member of a hypothetical population was selected for your study.

The temptation, especially for analysts trained on probability sampling, is to read this as a deficiency. It is not. Nonprobability sampling was developed because some research questions are not population-magnitude questions. The questions qualitative researchers most commonly ask — what is loneliness, how does it show up, what configurations of loneliness exist, what are the mechanisms, what do people do about it — are not answered by knowing what proportion of British Columbians experience loneliness. They are answered by deliberately choosing cases that illuminate the phenomenon. The selection logic is curatorial rather than statistical.

Bernard, Wutich, and Ryan put the difference this way (paraphrasing pp. 37–38): probability samples are about extensity (the breadth of a phenomenon in a population); nonprobability samples are about intensity (the depth, detail, and configuration of the phenomenon). You can imagine the same phenomenon — loneliness, say — as having two complementary aspects that the two sampling logics give you access to. A probability sample of 5,000 Canadians would tell you what percentage report frequent loneliness, how the prevalence varies by age and income, and how the rate has changed over time. A nonprobability sample of 20 deliberately chosen interviews would tell you what the experience of loneliness is, what kinds of loneliness exist, what triggers it, how people interpret it, and what they do about it. Each design answers a question the other cannot.

The two-jobs framing in one table

FeatureProbability samplingNonprobability sampling
GoalEstimate a population magnitudeCharacterize a phenomenon
Selection logicStatistical (probabilities known)Curatorial (probabilities not knowable)
Sample-size logicPower / precision / varianceSaturation; informational redundancy
Inferential targetThe defined populationThe phenomenon, transferably described
Generalization claimStatistical (CI around a parameter)Theoretical / analytic (categories, mechanisms)
Typical nHundreds to thousandsOne to several dozen, occasionally more

1.3 Why “Nonprobability Is a Bad Probability Sample” Is the Wrong Reading

The single most common misreading of qualitative sampling in published health research is the assumption that the researcher tried to draw a probability sample, failed, and ended up with a convenience sample as a fallback. The published methods section often reinforces this misreading by describing the sample in apologetic terms (“owing to resource constraints we recruited a convenience sample of 18 participants”). Bernard, Wutich, and Ryan are emphatic that this framing is wrong and that letting it pass is bad for the field. A defensible nonprobability sample is not a failed probability sample. It is a sample assembled to do a job that probability sampling could not have done.

The HSCI 841 loneliness dataset is a useful concrete case. The 20 transcripts vary deliberately across age (18 to 82), gender (women, men, and non-binary participants), life-stage, immigration status, caregiving role, and identity. The sample is not random — the configuration is engineered. The point of including P14 Kenji (60, late-life coming out after a long heterosexual marriage) is not that he is statistically representative of British Columbian men in his sixties. He almost certainly is not. The point is that his configuration — late-life sexual-identity disclosure interacting with the loss of long-standing social ties — is a kind of loneliness the literature under-describes, and his transcript gives you analytic purchase on that configuration. A randomly drawn sample of 20 BC men in their sixties would almost certainly miss him. A probability sample of 5,000 would catch him as 0.06% of a frequency table and would not produce 8,000 words of his account.

This is the curatorial logic. You are not trying to be representative of a population; you are trying to be representative of a phenomenon's variation. Bernard, Wutich, and Ryan call this maximum-variation sampling when the variation is the explicit goal (we will come back to it as a kind of purposive sampling in Section 2). What you owe your reader, in such a design, is not a calculation of selection probabilities; it is a defence of the variation you chose to capture and an honest accounting of the variation you did not.

1.4 Why the Sample-Size Question Has To Be Different

The shape of the probability-sampling sample-size argument — more cases purchase more precision around the same quantity — does not apply to nonprobability work. The reason is straightforward. In nonprobability sampling you are not estimating a population-level magnitude. There is no parameter that the additional case is sharpening your estimate of. The relevant question is different: at what point do additional cases stop adding new information about the phenomenon?

This is the question of saturation, and it is the operational concept that has replaced the power calculation in nonprobability work. We give it the full treatment in Section 2. For now, the key idea is that nonprobability sample size is governed by informational redundancy, not by precision. When the twenty-first interview produces nothing you had not already heard in the first twenty — no new themes, no new mechanisms, no new variations on the categories you have developed — you have hit the floor of marginal information return, and the design is telling you that you have enough cases for the analytic claims you can defensibly make.

One thing to set aside before going further

If you were trained to read “n = 20” as an underpowered study, you will need to set that reading aside for the rest of this lesson and the rest of the course. The right question for a qualitative study with 20 transcripts is not “was the power adequate?” The right questions are: what configurations were captured? What variation does the sample cover? What claims can the analyst defensibly make on that basis? Those questions are answered by the methods section, not by an arithmetic. Section 5 of this lesson tells you what the methods section has to contain.

Reflection

Think of one qualitative study (published or hypothetical) and one quantitative study addressing the same broad topic. Briefly describe what each sampling logic gives the field that the other cannot. Try to avoid the framing of “the qualitative work is a follow-up to” or “a stepping stone toward” the quantitative work — the two should be co-equal in your answer.

Model answerA strong response names a specific topic and then names the distinct epistemic contribution of each design. Example: “On the topic of vaccine hesitancy, the Canadian Community Health Survey can tell us what proportion of adults declined a recommended COVID-19 booster, how the rate varies by income, education, and immigration status, and how the rate has changed across waves — a population-magnitude account that no qualitative study can produce. A qualitative interview study with 25 deliberately varied hesitant adults can tell us what the specific arguments are that hesitant people give themselves and each other, how those arguments differ across kinds of hesitancy (distrust of pharma vs. distrust of government vs. religious objection vs. needle phobia presenting as ideological), and what an intervention would need to engage to be persuasive. Neither contribution is a precursor to or follow-up of the other; they are complementary epistemic outputs. The qualitative work could not have been done with a probability design because the categories of interest are themselves part of what the study is discovering; the quantitative work could not have produced configurations of argument because survey items collapse the very nuance the qualitative study is built to surface.”

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 1

Question 1: What is the defining feature of a probability sample, according to Bernard, Wutich, and Ryan?

A probability sample is defined by knowable, non-zero selection probabilities for every member of the target population. Random-number generation is one way to implement that; the criterion itself is knowability of selection probabilities, which is what enables statistical inference to the population.

Question 2: Which of the following best describes the analytic goal of nonprobability sampling, as Bernard, Wutich, and Ryan frame it?

Nonprobability sampling is not a failed probability sample. It was built to characterize a phenomenon, not to estimate a population magnitude. The selection logic is curatorial, and the inferential target is the phenomenon's variation, not the population's distribution.

Question 3: Why does the sample-size argument from probability sampling (“more cases purchase more precision”) not apply to nonprobability work?

Sample-size logic in nonprobability work is governed by informational redundancy — the point at which additional cases stop revealing new themes, mechanisms, or variations. The probability-sampling argument is not wrong, it is simply solving a different problem (precision around a parameter rather than coverage of a phenomenon).
Section 2 of 5

Saturation — the Operational Concept and Its Empirical Backbone

⏱ Estimated reading time: 25 minutes

Introduction and Overview

Section 1 ended with a claim that needs unpacking: that the sample-size question for nonprobability work is governed by saturation, the point at which additional cases stop adding new information about the phenomenon. Saturation is the most cited and the most contested concept in qualitative sampling. Glaser and Strauss (1967) introduced it as a feature of theoretical sampling in grounded theory; it has since been generalized far beyond that origin to almost every kind of nonprobability design. The concept has empirical legs — there is now a serious literature attempting to measure when saturation actually occurs in practice — and it also has reasonable critics. This section walks through both.

By the end of the section you should be able to define saturation operationally, recite the major empirical findings (Romney/Weller/Batchelder's 4-to-6 rule; Guest, Bunce, & Johnson (2006) finding that saturation arrives around 12 interviews; Hennink & Kaiser’s (2022) systematic review), and explain what saturation does and does not give you the right to claim about your dataset.

Learning Objectives for Section 2

  • Define saturation in operational terms.
  • Distinguish theoretical saturation, code saturation, and meaning saturation.
  • Recall the Romney/Weller/Batchelder 4-to-6 rule and its underlying logic.
  • Cite Guest, Bunce, & Johnson (2006) and Hennink & Kaiser (2022) and explain how their empirical findings constrain general claims about sample size.

2.1 Saturation, Operationally

Saturation is the point at which additional data no longer surface new themes, concepts, or relationships. It is the qualitative analog of stopping rules in sequential testing. The concept is widely used and often poorly operationalized; the next three tabs review what the empirical literature actually shows.

For cultural domains with high consensus, the Romney-Weller-Batchelder cultural consensus model shows that 4-6 well-chosen informants can recover the shared knowledge of a domain with high reliability. Most usefully applied to focused, bounded questions in homogeneous populations.

Guest, Bunce, & Johnson (2006) analyzed 60 interviews and showed that 92% of codes were identified by interview 12, with diminishing returns thereafter. Now the most-cited single empirical study on qualitative sample size. Important caveats: their study used a relatively homogeneous sample on a focused question; heterogeneous populations and broader questions require more.

Hennink & Kaiser’s (2022) systematic review of 23 empirical studies found 9-17 interviews typically reach code saturation for relatively homogeneous samples; 20-40+ often required for code-meaning saturation and heterogeneous samples. The 'magic number' is a range, not a single value, and depends on the sample, question, and analytic depth.

The working definition above: a sample is saturated when collecting additional cases stops producing new information relevant to the analytic question (Bernard, Wutich & Ryan, 2017, pp. 40–42). That is the simple statement. The operationalization — how you actually decide that you have hit saturation — requires more care, because what counts as “new information” depends on what kind of analysis you are doing.

Three flavours of saturation get distinguished in the methodological literature, and you should keep them straight because mixing them up is the most common source of methods-section confusion in published qualitative health papers.

Theoretical saturation. The original Glaser/Strauss usage. A sample is theoretically saturated when additional cases stop producing new categories, properties, or relationships in your developing theory. Theoretical saturation is tightly tied to theoretical sampling (Section 4 of this lesson): you sample iteratively, in response to the emerging analysis, until the theoretical structure stops changing. This is the most demanding kind of saturation and the hardest to demonstrate.

Code saturation. A more recent, more operational concept. A sample is code-saturated when additional cases stop producing new codes in your codebook. Code saturation tends to arrive relatively quickly — the major themes show up in the first several transcripts — and is what most empirical saturation studies actually measure.

Meaning saturation. The point at which additional cases stop deepening your understanding of the codes you already have. This is harder and slower than code saturation. You might have identified a code like “loneliness as spatial” after three interviews, but reach a real understanding of the variations within that code — the specific spatial metaphors people deploy, the sense in which space is doing work, the contrasts with non-spatial framings — only after fifteen or twenty.

Hennink, Kaiser, & Marconi (2017) made this distinction matter

In a widely cited methodological study, Monique Hennink and colleagues showed that in a sample of 25 women's-health interviews, code saturation arrived at 9 interviews, but meaning saturation required 16 to 24. The takeaway is that the rosy headline number you sometimes see (“saturation occurs around 9 interviews”) is doing work for code saturation, not for the deeper meaning saturation a serious analysis usually requires. When you defend your sample, be explicit about which kind of saturation you are claiming and on what evidence.

2.2 The Romney/Weller/Batchelder 4-to-6 Rule

The earliest empirically grounded heuristic for qualitative sample size comes from a body of work in cultural domain analysis (Bernard, Wutich & Ryan, 2017, p. 41). Romney, Weller, & Batchelder (1986) studied cultural consensus — the degree to which members of a group share an underlying cultural model of a domain. Their mathematical model showed that if cultural agreement within a group is high (which it typically is for shared cultural domains within a more-or-less culturally coherent group), as few as four to six knowledgeable respondents are enough to recover the shared cultural model with high confidence.

The 4-to-6 rule comes out of free-list, pile-sort, and similar cultural-consensus tasks where the analytic goal is to recover a structured, shared cognitive model. It does not generalize without qualification to depth interviewing about contested or variable phenomena (loneliness, for example, is not a tightly shared cultural domain in the way that, say, the taxonomy of edible mushrooms is for an experienced forager). But the rule established a principle that the empirical saturation literature has largely confirmed: when the group is culturally homogeneous and the domain is reasonably structured, sample sizes in the single digits are defensible. Sample sizes that look implausibly small to a quantitatively trained reader are not implausibly small to a cultural-consensus theorist who has done the math.

2.3 Guest, Bunce, and Johnson (2006): The 12-Interview Threshold

The most cited empirical study of saturation in interview research is Greg Guest, Arwen Bunce, & Laura Johnson’s (2006) paper “How many interviews are enough? An experiment with data saturation and variability,” published in Field Methods. Guest and colleagues coded a corpus of 60 women's-health interviews from West African field sites and tracked when new codes were appearing. They found that 74 percent of all codes had been identified after the first six interviews, and 92 percent after the first twelve. After twelve interviews, the new-code-per-interview rate dropped to near zero.

The Guest et al. study gave qualitative researchers the most cited heuristic in the field: if your sample is reasonably homogeneous and your analytic question is reasonably focused, saturation will arrive around 12 interviews. The 12-interview threshold has been embraced (and often over-applied) by graduate students writing methods sections, by IRBs trying to evaluate qualitative protocols, and by editors weighing the credibility of qualitative health papers. It has also been refined and qualified by subsequent work, which we turn to next.

2.4 Hennink and Kaiser (2022): The Systematic Review

Monique Hennink & Bonnie Kaiser (2022) published a systematic review in Social Science & Medicine titled “Sample sizes for saturation in qualitative research: A systematic review of empirical tests.” They identified 23 empirical studies that had measured saturation in interview research and synthesized the findings. Their conclusions are the contemporary state of the art.

First, saturation in interview research is typically reached between 9 and 17 interviews across the studies they reviewed. Twelve is a reasonable central estimate but is neither a floor nor a ceiling. Second, saturation point varies systematically with study design. Studies with narrower research questions, more homogeneous samples, and more experienced interviewers saturate faster. Studies with broader questions, heterogeneous samples, and less-experienced interviewers require more interviews. Third, different kinds of saturation arrive on different schedules — code saturation early, meaning saturation later, theoretical saturation latest of all. Fourth and most importantly, saturation should be operationalized and reported, not asserted. A methods section that simply says “saturation was reached” without indicating how it was defined and assessed has not done its work.

The honest caveat about saturation

Saturation has been criticized by serious qualitative methodologists (notably Braun and Clarke, 2021, for reflexive thematic analysis) as a concept that does not fit every qualitative method. The Braun-Clarke argument is that for interpretive work, the “new information” framing presupposes a realist epistemology that not every qualitative tradition shares: meaning is co-constructed, not waiting to be inventoried. Bernard, Wutich, and Ryan (and this course) take a pragmatic stance — saturation is a useful default standard for applied health research, and the empirical literature on it is helpful — but you should know that there are traditions in which the concept is contested. If your capstone uses a reflexive-thematic or constructivist-grounded-theory approach, you can defend a sample on grounds other than saturation.

2.5 What Saturation Gives You the Right To Claim

+
Saturation as alibi
Tap to reveal
+
Population conflation
Tap to reveal
+
Snowball homogeneity
Tap to reveal
+
Convenience masquerading
Tap to reveal

Saturation, when properly assessed and reported, gives you the right to claim that within the variation your sample captured, you have identified the major categories and configurations the phenomenon takes. It does not give you the right to claim that you have found all the categories that exist in the population. It does not give you the right to claim that you have measured the prevalence of any of the categories. And it does not exempt you from describing what variation your sample failed to capture.

For the HSCI 841 loneliness dataset, what saturation can and cannot do is illustrative. With 20 deliberately varied transcripts, you can defensibly claim that you have identified the major kinds of loneliness present in this sample's life-stage and identity range. You have probably saturated on common configurations like existential loneliness in later life (P11 Helen, P13 Margaret, P19 Rose, P20 Frank), situational loneliness following disruption (P03 Sarah following romantic dissolution, P16 Elena following job loss), and the loneliness of cultural-belonging disruption (P15 Amira's wahda, P18 Chen's straddling of Mandarin-speaking and English-speaking community ties). What you have not saturated on, with twenty transcripts, is the rarer configurations — the loneliness of incarcerated parents, the loneliness of unhoused adults, the loneliness of people in long-term institutional psychiatric care. Saturation can be claimed for the territory you mapped; it cannot be claimed for the territory you never went into.

Reflection

You are writing the methods section for your capstone. In one paragraph, draft a defensible saturation claim for the loneliness dataset: which kind of saturation are you claiming (theoretical, code, or meaning), what is the operational evidence, and what configurations are explicitly outside the scope of your saturation claim?

Model answerA defensible draft might read: “Within the 20-transcript dataset, code saturation was assessed by tracking the appearance of new codes across the cumulative coding order and was reached at approximately transcript 14, after which only one new code (‘loneliness as moral failure’) appeared. Meaning saturation, operationalized as substantive deepening of the major code categories, continued through the final transcripts; the analysis is therefore code-saturated but the meaning of certain categories (notably the cultural-belonging disruption category) is likely to extend further than the sample reveals. Configurations explicitly outside the saturation claim include incarcerated populations, unhoused adults, and adults in long-term institutional psychiatric care, none of whom are represented in the sample.” A strong answer specifies the kind of saturation, names the operational test, and is honest about what the sample did not cover.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 2

Question 1: Which of the following best operationalizes saturation in nonprobability qualitative sampling?

Saturation is defined by informational redundancy — additional cases ceasing to produce new information about the phenomenon — not by population representativeness or by statistical power, neither of which is the analytic target in nonprobability work.

Question 2: Guest, Bunce, and Johnson (2006) found that, in a corpus of 60 women's-health interviews, what proportion of codes had been identified after the first 12 interviews?

Guest et al. reported 74% of codes after 6 interviews and 92% after 12. The 12-interview threshold has become the most cited heuristic in the qualitative-methods literature for code saturation in reasonably homogeneous samples with focused research questions.

Question 3: The Romney/Weller/Batchelder 4-to-6 rule is derived from work on:

The 4-to-6 rule comes from cultural consensus theory (Romney, Weller & Batchelder 1986). It applies most cleanly to free-list, pile-sort, and similar cultural-consensus tasks where the analytic goal is to recover a shared cultural model from members of a culturally coherent group.
Section 3 of 5

The Six Nonprobability Sampling Strategies

⏱ Estimated reading time: 35 minutes

Introduction and Overview

Bernard, Wutich, and Ryan organize nonprobability sampling into six strategies (Chapter 3, pp. 42–56). The strategies are not mutually exclusive — most real studies combine two or three — but each has its own logic, its own appropriate uses, and its own characteristic failure modes. This section walks through all six in turn, with the loneliness dataset as a concrete reference and with notes on when each strategy is the right choice.

Learning Objectives for Section 3

  • Identify and contrast the six nonprobability sampling strategies in Bernard, Wutich, and Ryan.
  • Recognize the loneliness dataset's sampling logic as purposive with quota elements.
  • Distinguish snowball sampling from respondent-driven sampling and explain why RDS's inference machinery matters.
  • Recognize when each strategy is the right tool for the analytic question.

3.1 Quota Sampling

Quota samplingv

Sets quotas for specific subgroups (e.g., 10 men and 10 women; 5 in each age bracket). Within quotas, recruitment is convenient. Useful when you need to ensure representation of categories that matter for your question. Not equivalent to probability stratified sampling.

Purposive / judgment samplingv

Deliberate selection of participants who can speak to the question at hand. Maximum-variation (deliberately diverse), homogeneous (deliberately similar), critical-case (theoretically pivotal cases), and deviant-case (extreme exemplars) are common purposive strategies. The dominant strategy in qualitative health research.

Convenience samplingv

Recruits whoever is reachable: undergraduates on campus, patients in clinic, friends of the researcher. Cheap, fast, and almost always biased in ways that limit transferability. Sometimes the right choice for a pilot study; rarely the right choice for a final analysis.

Network sampling (snowball, RDS)v

Asks existing participants to refer others. Snowball sampling is straightforward but can be deeply biased. Respondent-driven sampling (RDS), developed by Heckathorn (1997), adds dual-incentive structure and mathematical adjustment to approximate probability sampling in hidden populations — people who inject drugs, sex workers, undocumented migrants. Widely used in HIV epidemiology.

Theoretical sampling (Glaserian)v

The defining sampling logic of grounded theory. Sampling is iterative: emerging theoretical concepts dictate who or what to sample next, with the goal of elaborating, contrasting, or testing concepts. Sampling and analysis proceed together. Not pre-specified at the start of the study.

Quota sampling is the deliberate construction of a sample to match pre-specified target cells across one or more dimensions of variation. You decide in advance that you want, say, five women and five men; or three participants in each of four age quartiles; or equal representation of immigrants and Canadian-born. You recruit until each cell is filled, accepting whoever in that cell happens to be available.

Quota sampling is the most common nonprobability design in applied health research, and you will encounter it routinely in market research, polling, and the rapid-turnaround qualitative components of evaluation studies. Its strength is that the sample's coverage of the dimensions of interest is guaranteed by construction. Its weakness is that within each cell, selection is typically convenience-based — whoever showed up first, whoever the recruiter knew, whoever responded to the flyer — and the within-cell sample is therefore a convenience sample.

The HSCI 841 loneliness dataset uses quota elements. The interview guide's recruitment notes (Bernard, Wutich & Ryan, 2017, Ch. 3 logic; see also the dataset's Interview Guide document) specify variation targets across four age quartiles spanning 18–80+, across gender (women, men, gender-diverse), across living arrangement, and across major life-stage transitions (recent immigration, recent loss, recent retirement, recent caregiving role, recent relationship dissolution). The sample of 20 was assembled to hit those targets — not as pure quota sampling, because the framework was richer than a fixed-cell design (we will discuss that richness in 3.2 below), but with quota-like discipline about coverage.

3.2 Purposive / Judgment Sampling

Purposive sampling (sometimes called judgment sampling or purposeful sampling) is the deliberate selection of cases on the basis of theoretical or substantive criteria. The analyst chooses cases that are expected to illuminate the phenomenon, on the grounds that those cases are richer, more variable, more strategically located, or more theoretically informative than randomly drawn ones would be. The selection criteria are explicit; the choices are defensible; the logic is curatorial.

Patton (2015; see also Palinkas et al., 2015) catalogued more than a dozen sub-types of purposive sampling, of which the most commonly invoked in health research are:

  • Maximum-variation sampling. Deliberately select cases that span the variation in the phenomenon. The loneliness dataset is an instance.
  • Homogeneous sampling. Select cases that share key features so within-group patterns become visible.
  • Extreme or deviant case sampling. Select unusual or boundary cases on the theory that they make analytically invisible features visible.
  • Critical case sampling. Select cases on the theory that if a phenomenon shows up here, it will show up anywhere; if it does not show up here, it does not show up.
  • Typical case sampling. Select cases that exemplify the modal pattern.
  • Confirming/disconfirming case sampling. Select cases specifically to test or strain an emerging interpretation.

The loneliness dataset is best characterized as purposive with quota elements, leaning toward maximum-variation. The 20 transcripts were not drawn to be statistically representative of British Columbia, and they were not assembled by simply filling demographic cells. They were chosen to capture the variation that the literature suggests matters for loneliness: variation in age (P01 Maya, 22; P11 Helen, 78; P20 Frank, 82), in gender and gender-identity history (P12 Tyler, non-binary; P14 Kenji, late-life coming out), in immigration trajectory (P15 Amira, recent refugee; P18 Chen, decades-long bicultural negotiation), in caregiving role (P05 Linda, daughter-of-aging-parent; P07 Diana, partner-of-someone-with-dementia), in life-stage transition (P03 Sarah, post-romantic-dissolution; P16 Elena, post-job-loss; P19 Rose, late-life widowhood), and in identity configurations the literature under-describes (P14 Kenji's late-life coming out; P12 Tyler's non-binary identity in a small town). The dataset is engineered, not random; that engineering is its strength as a teaching dataset.

3.3 Convenience Sampling

Convenience sampling is the recruitment of whoever is available, accessible, or willing — with no purposive criterion other than availability. Examples include the “intercept” survey at a clinic entrance, the email blast to a departmental listserv, the friend-of-a-friend pilot interview, and the all-too-common “we recruited 18 first-year nursing students from a course taught by the second author.”

Bernard, Wutich, and Ryan (2017, pp. 48–49) are careful, not dismissive, about convenience sampling. There are defensible uses of convenience sampling: piloting an interview guide, testing the recording equipment, training a new interviewer, or building rapport in a community before formal recruitment begins. There are also indefensible uses: a published study that recruited only the readily available and then claims findings that generalize beyond them. The line between the two is what the methods section says. If a convenience sample is reported transparently as such, with limitations honestly acknowledged, it is a defensible piece of empirical work. If it is dressed up as something more representative than it is, it is not.

Most published qualitative health studies are, in fact, convenience samples whether they say so or not. A more honest field would acknowledge this and would more carefully describe what claims a convenience sample does and does not support.

3.4 Network Sampling: Snowball and Respondent-Driven Sampling

Network sampling uses the social networks of initial participants to recruit additional ones. The two main forms are snowball sampling (Goodman, 1961) and respondent-driven sampling (RDS), and the distinction between them is methodologically important.

Snowball sampling. The classical form: initial participants (“seeds”) are asked to refer others, who are asked to refer others, and so on. Snowball sampling is the standard tool for reaching populations that are hidden, stigmatized, hard to identify from sampling frames, or organized around relationships rather than addresses. Examples: people who use drugs, sex workers, undocumented migrants, LGBTQ+ adults in regions where outness is risky, members of religious or political minorities.

The strength of snowball sampling is access. The weakness is that the sample is shaped by the social networks of the initial seeds, and those networks are unlikely to be representative of the target population. A snowball sample of people who use drugs starting from one community-health-centre client will look very different from one starting from a university-affiliated harm-reduction researcher. Bernard, Wutich, and Ryan are clear that snowball samples are inappropriate for prevalence estimation and have to be reported as the network samples they are.

Respondent-driven sampling (RDS). Developed by Douglas Heckathorn (1997; Heckathorn 2002 for the inference machinery), RDS is a sophisticated extension of snowball sampling that adds dual incentives (participants are paid for their own participation and for the participation of recruits they bring in), limited coupons (each participant can recruit only a fixed number of others, typically three), and a tracking-and-weighting framework that allows population-level inference under specific assumptions about network structure.

The reason RDS matters — and the reason it gets discussed in a graduate qualitative methods course even though much RDS work is quantitative — is that it represents the most serious attempt in the methods literature to turn network sampling into something with calculable inferential properties. Under Heckathorn's assumptions (long recruitment chains, accurate self-reported network sizes, random recruitment within social networks), RDS estimates can be weighted to approximate population-level prevalence and association estimates with calculable error bounds. The assumptions are strong and have been criticized in subsequent methodological work (Goel and Salganik, 2010; Gile and Handcock, 2010), but RDS remains the most defensible network-sampling design for many populations of public-health interest.

The HSCI 841 loneliness dataset did not use snowball or respondent-driven sampling. The dataset's purposive-with-quota design was assembled directly through recruiters, not through participant-driven referral. This is important to record in your capstone's methods section: snowball and RDS designs come with specific analytic obligations, and a study that did not use them cannot claim the inferential properties they enable, but also has none of the network-dependence biases they introduce.

3.5 Theoretical Sampling (Glaserian)

CAPSTONE Plan it - Your sampling matrix

For your HSCI 841 capstone, draft a one-page sampling plan:

  1. Sampling strategy: Which of the five above are you using, and why?
  2. Sampling matrix: What categories matter for your question (age, role, geography, condition)? How many participants per cell?
  3. Recruitment: Through what channels will you reach each cell?
  4. Stopping rule: What is your operationalization of saturation, and what would tell you when you have reached it?

A defensible qualitative sample needs the same explicit planning a survey sample does. The strategy is different; the discipline is the same.

Theoretical sampling is the iterative, emergent sampling strategy developed by Barney Glaser and Anselm Strauss (1967) as part of grounded theory. The procedure: you collect and analyse some data, develop a preliminary theory, identify what additional data would test or extend the theory, sample those additional data deliberately, re-analyse, and continue until theoretical saturation is reached. The sampling and the analysis are not separated: each round of sampling is shaped by what the previous round revealed.

Theoretical sampling differs from purposive sampling in a critical way that is often missed. Purposive sampling is a-priori: you decide before recruitment what kinds of variation you want, and you recruit to those targets. Theoretical sampling is emergent: you cannot say in advance what cases you will want, because the criteria are determined by the developing analysis. The first three participants in a theoretical sampling study might be selected for convenience; the next three might be selected because the analysis of the first three revealed a configuration that needs further exploration; the next three might be selected to test whether a hypothesized boundary condition holds.

The HSCI 841 loneliness dataset is not a theoretically sampled dataset. The 20 transcripts were assembled in a single recruitment phase, with the variation targets specified in advance. This matters because some qualitative-methods textbooks use “theoretical sampling” and “purposive sampling” almost interchangeably; Bernard, Wutich, and Ryan are clear that they are different things, and a methods section that calls a purposive sample a theoretically sampled one is making a category mistake.

Glaser vs. Strauss on theoretical sampling

Within grounded theory, the original co-authors split methodologically in the 1980s. Glaser kept theoretical sampling tightly tied to emergence from the data; Strauss (with Corbin) developed a more structured version that allowed more a-priori coding. Charmaz's constructivist grounded theory (which you will meet in Module 7) sits closer to the Glaserian original. For the purposes of this course, “theoretical sampling” refers to the Glaserian original: iterative, emergent, analysis-driven.

3.6 Key-Informant Sampling

Key informants are individuals selected because they are unusually knowledgeable, articulate, or strategically located with respect to the phenomenon of interest. In a study of clinic operations, the key informants might be the head nurse, the intake coordinator, and the medical director — people whose roles give them an overview no individual patient could provide. In a study of a religious community, the key informants might be elders, clergy, or longtime members. The selection criterion is access to information, not representativeness.

Bernard (a foundational figure in this concept, drawing on cultural anthropology fieldwork) is careful to distinguish two uses (Bernard, Wutich & Ryan, 2017, pp. 54–56). Key informants can replace broader sampling when the research question is about a system or institution and the key informants are who knows it: a study of how a province's overdose-response protocol gets implemented might rely largely on interviews with the small number of people who actually implement it. Key informants can also supplement broader sampling when the analytic question requires both lay perspectives and expert ones: a study of loneliness might interview 20 lay participants (the HSCI 841 dataset) and supplement with two or three key-informant interviews with clinicians, community-organisation directors, or older-adult-services coordinators who see loneliness across many people.

Key-informant interviews are typically deeper, longer, and more iterative than lay interviews. They are also more vulnerable to the informant's own framings being adopted by the analyst (the “going native” problem familiar from anthropology), and to the political dynamics around who gets named a key informant in a community.

3.7 Summary Table

StrategyLogicBest forWatch out for
Quota Pre-specified target cells across dimensions of variation Guaranteed coverage of known dimensions Within-cell convenience selection
Purposive Theoretically/substantively driven selection Maximum-variation, extreme-case, critical-case designs Criteria must be explicit and defensible
Convenience Whoever is available Piloting, training, equipment testing Indefensible for substantive claims unless transparently acknowledged
Snowball Participant-driven referral chains Reaching hidden or stigmatized populations Seed-network dependence; not for prevalence
Respondent-driven (RDS) Dual-incentive, coupon-tracked snowball with inference machinery Population estimates for hidden populations Assumption-heavy; demanding to execute well
Theoretical Iterative, emergent, analysis-driven (Glaserian) Grounded-theory studies; theory development Distinct from purposive — do not conflate
Key informant Selection by unusual knowledge or strategic position System/institution studies; expert supplement “Going native”; community politics

Reflection

Imagine you are designing a qualitative study of loneliness among recently arrived refugees in British Columbia. Which two or three of the six strategies would you combine, and why? Be specific: what does each strategy contribute that the others cannot?

Model answerA defensible answer combines at least two strategies and names what each contributes. Example: “I would combine purposive (maximum-variation) sampling with key-informant interviews, and probably some judicious use of snowball sampling to reach within-community networks. Purposive sampling lets me deliberately capture variation across language groups, settlement trajectories, family configuration, and time-since-arrival — configurations the literature suggests matter for the refugee loneliness experience. Key-informant interviews with settlement-agency staff, ESL instructors, and cultural-broker community leaders would provide system-level context I could not get from any individual participant. Snowball recruitment within the refugee community itself would help me reach participants who are wary of formal recruitment by university researchers and would be more responsive to invitations from trusted community members. I would not use RDS — it requires longer recruitment chains than a graduate-scale study can sustain — and I would not use pure convenience sampling because the population is heterogeneous and the configurations I most need are not the easily available ones.”

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 3

Question 1: A study recruits 5 women and 5 men in each of four age quartiles, accepting whoever is available within each cell. The sampling strategy is best characterized as:

Pre-specified target cells across dimensions of variation, filled by whoever is available within each cell, is the standard definition of quota sampling. The within-cell recruitment is convenience-based; the design's strength is the guaranteed coverage of the dimensions.

Question 2: The HSCI 841 loneliness dataset is best characterized as:

The 20 transcripts were assembled in a single phase with deliberate variation across age, gender, life-stage, immigration status, caregiving role, and identity (purposive, leaning maximum-variation), with quota-like coverage targets (quota elements). The sample is not theoretical (which would be iterative and emergent) and is not snowball, RDS, or probability.

Question 3: Which is the key methodological distinction between snowball sampling and respondent-driven sampling (RDS)?

RDS extends snowball sampling with the dual-incentive, limited-coupon, tracking-and-weighting machinery developed by Heckathorn (1997, 2002), which under network-structure assumptions allows population-level inference. Snowball sampling has no comparable inferential apparatus and is reported as the network sample it is.
Section 4 of 5

Defending a Qualitative Sample — and the Week 3 Capstone Milestone

⏱ Estimated reading time: 30 minutes

Introduction and Overview

The first three sections gave you the conceptual machinery: probability vs. nonprobability, saturation, and the six strategies. This section turns operational. We work through what the methods section of a qualitative health paper actually has to say about sampling — what Bernard, Wutich, and Ryan call “defending the sample” — and we use that template to set up the Week 3 capstone milestone, in which you document the loneliness dataset's sampling logic in your own writing.

Learning Objectives for Section 4

  • Identify the five things a defensible qualitative methods section owes a reader about sampling.
  • Recognize the loneliness dataset as a worked example of each element.
  • Document the sampling structure of a small dataset in R for an appendix figure.
  • Produce the Week 3 capstone deliverable: a 600-word sampling memo and a one-page sampling matrix.

4.1 What the Methods Section Owes the Reader

A qualitative methods section that handles sampling well covers five elements. Each element has a corresponding sub-claim, and a paper that omits any one of them leaves the reader unable to evaluate the design.

(1) The sampling strategy, named and justified. Which of the six (or combination) did you use? On what theoretical or substantive grounds? It is not enough to say “a purposive sample was recruited.” You have to say which kind of purposive sampling (maximum-variation, extreme-case, critical-case), why that kind, and what the alternative strategies would have given or cost you. The HSCI 841 loneliness dataset would be described as purposive sampling with quota elements, organized around a maximum-variation logic across age, gender, life-stage, immigration status, caregiving role, and identity.

(2) The recruitment procedure. How did you find people? Where did you advertise? What was the screening process? Who was screened out and why? For the HSCI 841 dataset (synthetic though it is, for instructional purposes), the recruitment would be described as conducted through community-based recruiters working with settlement agencies, retirement communities, post-secondary student-health offices, and 2SLGBTQ+ community organisations, with screening for adult age, current BC residence, English-language interview capacity, and self-identified loneliness in the past 12 months.

(3) The variation captured. What configurations does the sample actually cover? The cleanest way to report this is a sampling matrix — a table that lists each participant and the values of each variation dimension. A reader looking at the matrix can immediately see what variation the study captured. The Week 3 capstone deliverable asks you to produce exactly such a matrix.

(4) The variation NOT captured. What configurations are explicitly outside the sample? This is the discipline that most published qualitative health papers most reliably skip, and it is the one that most distinguishes a defensible sample from one dressed up to look more comprehensive than it is. For the HSCI 841 dataset, the variation not captured includes (among others): adults under 18, BC residents not interviewable in English, adults living outside BC, currently institutionalised adults (carceral, psychiatric, long-term hospital), unhoused adults, and adults experiencing acute crisis at the time of approach.

(5) The sample-size logic. Why this number of cases? On what evidence is the size defensible? The defensible answers, depending on design, are some combination of: code or meaning saturation reached at case n; cultural-consensus logic (Romney/Weller/Batchelder) for a culturally coherent group; the Guest et al. (2006) and Hennink & Kaiser (2022) empirical literature for an interview study with a reasonably focused question; the information-power framework of Malterud, Siersma, & Guassora (2016); or analytic and resource constraints honestly named.

4.2 What the Methods Section Does NOT Have To Do

A defensible qualitative methods section does not have to defend the sample on the grounds of probability-sample logic. You do not have to apologize for not being a population sample. You do not have to gesture at “future quantitative work with a larger n.” You do not have to invoke the language of generalizability when the inferential target was never the population in the first place.

What you do have to do is name the inferential target you are claiming. Bernard, Wutich, and Ryan call this transferability: the ability of the reader to assess whether the patterns you identified are likely to hold in other settings or populations the reader is interested in. Transferability is not the same as statistical generalizability, and the burden of judging it is partly on the reader. Your job is to give the reader enough information — through the methods section, the sampling matrix, the variation-captured and variation-not-captured statements — that the reader can make a defensible transferability judgment.

4.3 Documenting a Sample in R

Most qualitative analyses spend more time on text than on R. Sampling is one of the exceptions: documenting the sample's structure is the kind of small-dataset reporting that R does well, and a one-page sampling matrix figure goes into your capstone appendix with very little work. The block below sketches the procedure for the loneliness dataset.

RDocument the loneliness dataset's sample structure

Assuming you have created a small participant_attributes.csv file with one row per participant and columns for age, gender, life_stage, immigration_status, caregiving_role, and identity_notes, this block produces summary bar charts you can drop into your capstone appendix.

library(tidyverse)

# Read the per-participant attribute file you built by hand from the transcripts
attrs <- read_csv("../term projects/HSCI_841/participant_attributes.csv")

glimpse(attrs)
# Should show 20 rows and the columns: pid, name, age, gender, life_stage,
# immigration_status, caregiving_role, identity_notes

# Quick age distribution
ggplot(attrs, aes(x = age)) +
  geom_histogram(binwidth = 10, fill = "#0B7B6B", colour = "white") +
  labs(title = "Age distribution of loneliness sample (n = 20)",
       x = "Age (years)", y = "Count") +
  theme_minimal()

# Variation across gender x life-stage (a 2-way summary of the sample's coverage)
attrs |>
  count(gender, life_stage) |>
  ggplot(aes(x = life_stage, y = n, fill = gender)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(title = "Sample coverage: gender x life-stage",
       x = NULL, y = "Number of participants") +
  theme_minimal()

# Sample matrix table (the kind that goes in your appendix)
attrs |>
  select(pid, age, gender, life_stage, immigration_status,
         caregiving_role, identity_notes) |>
  arrange(age) |>
  print(n = 20)

What success looks like: An age-histogram, a gender-by-life-stage bar chart, and a printed-to-console 20-row sampling matrix. Save the bar chart as a PNG; export the matrix as a table for your appendix. This is the sampling figure your capstone will cite.

4.4 The Week 3 Capstone Milestone

The Week 3 capstone milestone integrates everything in this lesson. The deliverables are two: a 600-word sampling memo and a one-page sampling matrix. Together they are the first piece of the eventual methods section of your capstone paper.

Reflection

Of the five elements a defensible qualitative methods section owes the reader on sampling (strategy named & justified; recruitment procedure; variation captured; variation NOT captured; sample-size logic), which is the one you are most worried about getting right for your own capstone, and why? Be specific about which loneliness-dataset feature makes this element hardest.

Model answerThe most common honest answer is variation NOT captured, because most students are trained to defend what they did rather than to name what they did not do. The loneliness dataset's particular challenge is that the variation it captures is rich and visible (age, gender, immigration, caregiving, identity), which makes it tempting to over-claim. The not-captured side — under-18 youth, non-English speakers, unhoused adults, currently institutionalised adults, adults in acute crisis at the time of approach — is what the methods section has to articulate in a way that constrains the eventual paper's claims. Sample-size logic is the second-most-common answer because n = 20 feels small relative to the population-level instinct of a quantitative reader; the cure is to cite Guest et al. (2006) and Hennink & Kaiser (2022) and to name the saturation flavour you are claiming. A strong reflection picks one element, names what about the dataset makes it hard, and proposes a concrete practice that will get it right.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 4

Question 1: Which of the following is NOT one of the five elements a defensible qualitative methods section owes the reader on sampling?

A power calculation is not part of a defensible nonprobability methods section — the inferential target is not a population parameter, so there is no power calculation to do. The five elements are: strategy named & justified; recruitment; variation captured; variation NOT captured; and sample-size logic (which cites saturation evidence, not statistical power).

Question 2: What is Bernard, Wutich, and Ryan's concept of transferability in qualitative work?

Transferability is the qualitative analogue of generalizability: the burden is partly on the reader, but the analyst's job is to describe the sample transparently enough (through the methods section, the sampling matrix, and the variation-captured/not-captured statements) that the reader can make a defensible transferability judgment.

Question 3: Which deliverable is required for the Week 3 capstone milestone?

The Week 3 deliverable is a 600-word sampling memo (naming the strategy, defending it, listing variation captured and not captured, and citing the saturation literature) plus a one-page sampling matrix showing each participant's location on the dimensions of variation.
Section 5 of 5

Final Assessment

⏱ Estimated time: 25 minutes

Bringing It All Together

Lesson 3 has given you the sampling vocabulary your capstone methods section will be written in. The probability/nonprobability distinction (Section 1) locates qualitative sampling in its own logic rather than as a deficient cousin of probability work. The saturation framework (Section 2) replaces the power calculation with a defensible operational standard for nonprobability sample size, with empirical anchors from Romney/Weller/Batchelder, Guest et al. (2006), and Hennink & Kaiser (2022). The six-strategy taxonomy (Section 3) gives you the named tools you will use and combine in your own designs, with particular attention to the distinction between purposive and theoretical sampling and to the inferential machinery of RDS. And the methods-section template (Section 4) gives you the five-element discipline a reader of your eventual capstone paper will hold the work to.

What you take away from this lesson directly enables Lessons 4 (Qualitative Data Collection) and 5 (Themes & Codebooks). A defensible sample is the precondition for defensible data collection and defensible analysis; without it, the rest of the methodological discipline of HSCI 841 has nothing to stand on.

Key Takeaways from Lesson 3

  • There are two kinds of samples doing two kinds of jobs. Probability sampling estimates population magnitudes with calculable error. Nonprobability sampling characterizes a phenomenon through deliberate case selection. The two are complementary, not hierarchical.
  • Nonprobability sample size is governed by saturation, not power. The operational question is when additional cases stop producing new information. Code saturation arrives earlier than meaning saturation; theoretical saturation requires iterative sampling.
  • The empirical literature on saturation has converged. Romney/Weller/Batchelder show that 4–6 informants suffice for cultural-consensus tasks. Guest et al. (2006) found ~92% of codes by 12 interviews. Hennink & Kaiser (2022) reviewed 23 studies and report typical saturation between 9 and 17 interviews, varying with study design.
  • Six nonprobability strategies cover the field. Quota, purposive, convenience, snowball, RDS, theoretical, and key-informant. Each has its own logic, appropriate use, and characteristic failure modes.
  • Theoretical sampling is not the same as purposive sampling. Theoretical sampling is iterative and analysis-driven (Glaserian); purposive is a-priori and theory- or substance-driven. Conflating them is a common methods-section category mistake.
  • The HSCI 841 loneliness dataset is purposive with quota elements, leaning maximum-variation. It is not snowball; it is not theoretical sampling. Its strength is variation captured; its limits are the variation not captured.
  • A defensible methods section owes the reader five things on sampling: strategy named and justified; recruitment procedure; variation captured; variation NOT captured; sample-size logic.

Core Concepts Reviewed

Section 1: Probability vs. nonprobability sampling as different jobs (extensity vs. intensity); the curatorial logic of nonprobability designs; why “nonprobability is a bad probability sample” is the wrong reading; why the sample-size question has to be different.

Section 2: Saturation as informational redundancy; three flavours (theoretical, code, meaning); the Romney/Weller/Batchelder 4-to-6 rule and its cultural-consensus origin; Guest, Bunce & Johnson (2006) and the 12-interview threshold; Hennink & Kaiser's 2022 systematic review and the 9-to-17 range; what saturation gives you the right to claim.

Section 3: Quota sampling; purposive/judgment sampling and Patton's sub-types (max-variation, homogeneous, extreme-case, critical-case, typical-case, confirming/disconfirming); convenience sampling (defensible and indefensible uses); snowball sampling vs. respondent-driven sampling and Heckathorn's inference machinery; theoretical sampling (Glaserian) vs. purposive sampling; key-informant sampling (replacing vs. supplementing broader samples).

Section 4: The five elements of a defensible methods section on sampling; what the methods section does NOT have to do; transferability vs. statistical generalizability; documenting a sample in R (sampling matrix, gender-by-life-stage coverage figure); the Week 3 sampling-memo capstone milestone.

The final reflection below asks you to commit to a sampling self-discipline for your capstone. There is no single right answer; the goal is to leave the lesson with an articulated stance on what your sample can and cannot claim.

Final Reflection

In one paragraph, articulate the stance you intend to take on sampling in your capstone paper. Specifically: name the sampling strategy of the HSCI 841 loneliness dataset, identify the kind of saturation you will claim, and commit to one specific kind of variation that you will name as not captured in your methods section.

Model answerA strong stance is concrete and disciplined. Example: “For my capstone I will describe the HSCI 841 loneliness dataset as a purposive sample with quota elements, organized around a maximum-variation logic across age, gender, life-stage, immigration status, caregiving role, and identity — explicitly not theoretical sampling and not snowball. I will claim code saturation on the basis of the cumulative new-code-per-transcript curve in my coding, citing Guest et al. (2006) and Hennink & Kaiser (2022) as my empirical anchors. I will be cautious about claiming meaning saturation, because the configurations my sample captures are heterogeneous and I expect meaning to continue developing past the twentieth transcript. I will name as explicitly NOT captured the loneliness of unhoused adults and adults in long-term institutional psychiatric care, both of which the literature suggests have configurations my dataset does not represent. This stance is what my eventual methods section will rest on, and my findings claims will be constrained to what these design choices defensibly support.” A strong answer names the strategy, names the saturation flavour with evidence, and names what was not captured with specificity.

Minimum 30 characters required.

✓ Reflection saved
Final Assessment — Lesson 3: Sampling in Qualitative Research (15 Questions)

Question 1: What is the defining feature of a probability sample?

A probability sample is defined by knowable, non-zero selection probabilities for every member of the target population. That property — not the randomness of any single step — is what makes statistical inference to the population possible.

Question 2: Bernard, Wutich, and Ryan characterize the goal of nonprobability sampling as:

Nonprobability sampling is built to characterize a phenomenon (intensity) rather than estimate a population magnitude (extensity). The selection logic is curatorial; the inferential target is the phenomenon's variation, not the population's distribution.

Question 3: Which empirical study is most associated with the “12-interview threshold” for saturation in interview research?

Guest, Bunce, and Johnson's 2006 Field Methods paper reported 74% of codes identified after 6 interviews and 92% after 12, establishing the 12-interview heuristic that has become the most cited threshold in the qualitative-methods literature.

Question 4: The Romney/Weller/Batchelder 4-to-6 rule applies most directly to:

The 4-to-6 rule comes from cultural-consensus theory (Romney, Weller & Batchelder 1986) and applies cleanly to cultural-consensus tasks where the group is culturally coherent and the domain is structured. It does not generalize without qualification to depth interviewing about contested or heterogeneous phenomena.

Question 5: Hennink and Kaiser's 2022 systematic review found that saturation in interview research is typically reached:

Hennink & Kaiser (2022) synthesized 23 empirical studies and reported that saturation is typically reached between 9 and 17 interviews, with the exact number depending on the breadth of the research question, the homogeneity of the sample, and the experience of the interviewers.

Question 6: Which of the following is NOT one of the six nonprobability sampling strategies in Bernard, Wutich, and Ryan's Chapter 3?

Stratified random sampling is a probability sampling design, not one of the six nonprobability strategies. The six are: quota, purposive/judgment, convenience, network (snowball / RDS), theoretical, and key-informant.

Question 7: The HSCI 841 loneliness dataset is best characterized as:

The 20 transcripts were assembled in a single phase with deliberate variation across age, gender, life-stage, immigration status, caregiving role, and identity. The sample is purposive with quota elements, leaning maximum-variation. It is not snowball (no participant-driven referral chains), not RDS, and not theoretical sampling (no iterative emergence from analysis).

Question 8: The key methodological feature that distinguishes respondent-driven sampling (RDS) from ordinary snowball sampling is:

RDS extends snowball sampling with the dual-incentive, limited-coupon, tracking-and-weighting machinery developed by Heckathorn (1997, 2002), which under network-structure assumptions allows population-level inference with calculable error bounds. Snowball sampling has no comparable inferential apparatus.

Question 9: The critical methodological difference between theoretical sampling and purposive sampling is:

Theoretical sampling (Glaser & Strauss 1967) is iterative: you collect, analyse, identify what additional data would extend the theory, sample those, re-analyse, and continue. Purposive sampling is a-priori: variation targets and selection criteria are set before recruitment. Conflating the two is a common methods-section category mistake.

Question 10: Hennink, Kaiser, and Marconi (2017) distinguished code saturation from meaning saturation. Which of the following best describes the finding?

Hennink, Kaiser, and Marconi (2017) showed that code saturation (no new codes appearing) tends to arrive relatively early, while meaning saturation (substantive deepening of existing codes) requires more interviews. A methods section claiming saturation should specify which kind.

Question 11: Which of the following is a defensible use of convenience sampling, according to Bernard, Wutich, and Ryan?

Convenience sampling has defensible uses (piloting, training, equipment testing, community rapport-building) when transparently reported and not extended to claims the data cannot support. Disguising a convenience sample as something more representative is what Bernard, Wutich, and Ryan critique.

Question 12: Key-informant interviews can either replace or supplement broader sampling. An example of supplementing would be:

A supplementing use combines lay-participant interviews with key-informant interviews to provide complementary perspectives. The lay interviews give the phenomenon from inside; the key informants give system-level context.

Question 13: Which of the following is NOT one of the five elements a defensible qualitative methods section owes the reader on sampling?

A statistical power calculation is not part of a defensible nonprobability methods section, because the inferential target is not a population-level parameter. The five elements are: strategy named & justified; recruitment procedure; variation captured; variation NOT captured; and sample-size logic (citing saturation evidence).

Question 14: The Week 3 capstone milestone deliverables are:

Week 3 asks for a 600-word sampling memo (naming the strategy, defending it, listing variation captured and not captured, citing the saturation literature) plus a one-page sampling matrix showing each participant's location on the dimensions of variation. The full coded analysis comes later; the literature review was Week 2.

Question 15: Saturation, when properly assessed and reported, gives the analyst the right to claim:

Saturation gives the right to claim within-sample coverage of major categories and configurations — not population-level completeness, not prevalence, and not statistical representativeness. It also does not exempt the analyst from describing what variation the sample did not capture.
✦ Complete the final reflection above before submitting

Congratulations!

You have successfully completed Lesson 3: Sampling in Qualitative Research.

You can now distinguish probability from nonprobability sampling, operationalize saturation with empirical anchors, name and combine the six nonprobability strategies, distinguish theoretical sampling from purposive sampling, and defend a qualitative sample in writing. The Week 3 sampling memo and matrix are your first piece of methods-section work; submit them before the Module 4 lecture.

Next up — Lesson 4: Qualitative Data Collection, which moves from the design vocabulary of sampling into the operational craft of interviewing, focus groups, observation, and field notes.

Continue to Lesson 4 →
Reference

Glossary — Sampling Concepts, Strategies & Key People

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, sampling strategies, and people introduced in Lesson 3. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.

Core Concepts
Probability Sample A sample in which every member of a defined population has a known, non-zero probability of being selected. The knowability of selection probabilities is what makes statistical inference to the population possible.
Nonprobability Sample A sample in which selection probabilities are not knowable and inference to a defined population is not the analytic goal. The selection logic is curatorial; the inferential target is the phenomenon's variation, not the population's distribution.
Extensity vs. Intensity Bernard, Wutich, and Ryan's framing of the two-jobs distinction. Probability sampling is about extensity (the breadth of a phenomenon in a population); nonprobability sampling is about intensity (the depth, detail, and configuration of the phenomenon).
Saturation The point at which collecting additional cases stops producing new information relevant to the analytic question. The operational replacement, in nonprobability sampling, for the statistical power calculation used in probability sampling.
Theoretical Saturation Glaser & Strauss's original usage: the point at which additional cases stop producing new categories, properties, or relationships in the developing theory. Tightly tied to theoretical sampling. The most demanding kind of saturation.
Code Saturation The point at which additional cases stop producing new codes in the codebook. Tends to arrive relatively quickly — the major themes appear in the first several transcripts. What most empirical saturation studies measure.
Meaning Saturation The point at which additional cases stop deepening the analyst's understanding of the codes already identified. Harder and slower than code saturation; requires more interviews (typically 16–24 in Hennink, Kaiser & Marconi 2017).
Transferability The qualitative analogue of statistical generalizability: the reader's ability to assess whether the patterns identified are likely to hold in other settings or populations, given a transparent description of the sample. The analyst's job is to enable a defensible transferability judgment, not to make it for the reader.
Sampling Matrix A table listing each participant and the values of each variation dimension (age, gender, life-stage, etc.). A reader looking at the matrix can immediately see what variation the study captured. The Week 3 capstone deliverable includes a one-page matrix for the loneliness dataset.
Sampling Strategies
Quota Sampling Deliberate construction of a sample to match pre-specified target cells across dimensions of variation. Strength: guaranteed coverage of the named dimensions. Weakness: within-cell selection is typically convenience-based.
Purposive (Judgment) Sampling Deliberate selection of cases on the basis of theoretical or substantive criteria. Patton catalogued more than a dozen sub-types: maximum-variation, homogeneous, extreme-case, critical-case, typical-case, confirming/disconfirming, and others.
Maximum-Variation Sampling A sub-type of purposive sampling in which cases are deliberately selected to span the variation in the phenomenon. The HSCI 841 loneliness dataset's dominant logic.
Convenience Sampling Recruitment of whoever is available, accessible, or willing. Defensible uses: piloting, training, equipment testing, community rapport-building. Indefensible uses: substantive published studies that disguise a convenience sample as something more representative.
Snowball Sampling Network-based recruitment in which initial participants (seeds) refer additional participants, who in turn refer others. The standard tool for reaching hidden, stigmatized, or relationship-organized populations. Not appropriate for prevalence estimation.
Respondent-Driven Sampling (RDS) An extension of snowball sampling (Heckathorn 1997) that adds dual incentives, limited recruitment coupons, and a tracking-and-weighting framework. Under specific network-structure assumptions, RDS estimates can be weighted to approximate population-level inference.
Theoretical Sampling The iterative, emergent sampling strategy of grounded theory (Glaser & Strauss 1967). Each round of sampling is shaped by what the previous round revealed in analysis. Distinct from purposive sampling, which is a-priori.
Key-Informant Sampling Selection of individuals because they are unusually knowledgeable, articulate, or strategically located with respect to the phenomenon. Can replace broader sampling (for system or institution studies) or supplement it (for combining lay and expert perspectives).
Key People
H. Russell Bernard, Amber Wutich, Gery W. Ryan Authors of Analyzing Qualitative Data: Systematic Approaches (2nd ed., 2017). Chapter 3, on sampling, is the structural basis for this lesson.
Barney G. Glaser & Anselm L. Strauss Co-authors of The Discovery of Grounded Theory (1967), the foundational text for grounded theory and the original source of theoretical sampling and theoretical saturation. Glaser and Strauss split methodologically in the 1980s; for this course, “theoretical sampling” refers to the Glaserian original.
A. Kimball Romney, Susan C. Weller & William H. Batchelder Authors of the cultural-consensus theory paper (1986) that established the mathematical 4-to-6 rule: 4–6 knowledgeable informants suffice to recover a shared cultural model for a culturally coherent group. The deepest theoretical anchor for the small-n defensibility of qualitative sampling.
Greg Guest, Arwen Bunce & Laura Johnson Authors of the 2006 Field Methods empirical-saturation study that established the 12-interview heuristic (92% of codes identified after the first 12 interviews in a corpus of 60 women's-health interviews). The most cited empirical anchor for qualitative sample size.
Monique Hennink & Bonnie Kaiser Authors of the 2022 systematic review (Social Science & Medicine) of empirical tests of saturation across 23 studies, finding typical saturation between 9 and 17 interviews depending on study design. With Marconi (2017) they also established the code-saturation vs. meaning-saturation distinction.
Douglas D. Heckathorn Sociologist who developed respondent-driven sampling (RDS) at Cornell in the late 1990s (Heckathorn 1997, 2002). RDS's dual-incentive, coupon-tracked, weighting-based framework is the most serious attempt in the methods literature to turn network sampling into a design with calculable inferential properties.
Michael Quinn Patton Evaluation methodologist whose Qualitative Research & Evaluation Methods (4th ed., 2015) catalogued more than a dozen sub-types of purposive sampling (maximum-variation, homogeneous, extreme-case, critical-case, typical-case, confirming/disconfirming, and others) used in applied health and evaluation research.
No matching entries. Try a different search term.