Qualitative Data Collection
Indirect Observation, Direct Observation, Elicitation, Transcription & Field Notes
Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University
Learning objectives for this lesson:
- Distinguish indirect observation, direct observation, and elicitation as the three families of qualitative data collection.
- Identify the analytic uses (and limits) of behavior traces, archival records, and secondary qualitative datasets.
- Locate yourself on the participant-observation spectrum from complete observer to complete participant, and explain what each position can and cannot see.
- Compare unstructured, semi-structured, and structured interviews on what they elicit, how they are designed, and what they cannot reach.
- Explain when focus groups outperform individual interviews and what moderator skills they demand.
- Preview the cultural-domain elicitation methods (free listing, pile sorts, triads, rankings) reserved for Module 12.
- Treat transcription as an analytic act — recognising verbatim conventions, Jefferson notation (preview), intelligent verbatim, and clean verbatim, and what each choice forecloses.
- Write disciplined field notes using the Emerson, Fretz & Shaw progression: jottings → expanded notes → analytic memos.
- Complete the Week 4 capstone milestone: a dataset familiarisation log covering 6–8 transcripts.
This course was developed by Kiffer G. Card, PhD, as a companion to Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.), Chapter 4 (pp. 63–100). SAGE.
Indirect Observation — Behavior Traces, Archives & Secondary Analysis
Introduction and Overview
Most students arrive at a qualitative methods course assuming that “qualitative data collection” means interviews. It does, eventually, dominate the chapter. But Bernard, Wutich, and Ryan (2017, pp. 63–100) open Chapter 4 with a deliberately broader frame. The data-collection landscape, they argue, is organised around three families: indirect observation (where the researcher never interacts with the people whose behavior is being studied), direct observation (where the researcher is present and looks at what is happening), and elicitation (where the researcher actively asks people to produce data — through interviews, focus groups, free lists, or written responses). The three families are not a hierarchy. They are tools that answer different kinds of questions, that cost different amounts of money and time, and that come with different ethical obligations.
This section concerns the first family, indirect observation, because it is the one most often overlooked and the one with the most counterintuitive payoff. Indirect data are data that already exist in the world — left by past behavior, stored in institutional archives, or generated by other researchers for other purposes — and that the analyst harvests without ever asking a participant a question. Eugene Webb and colleagues coined the term unobtrusive measures for this family in their 1966 classic of the same name, and it remains one of the most fertile underused regions of qualitative methodology in public health.
Learning Objectives for Section 1
- Define indirect observation and distinguish it from direct observation and elicitation.
- Identify the three sub-types: behavior traces, archival records, and secondary qualitative datasets.
- Give a public-health example for each, including at least one that draws on the same loneliness phenomenon your capstone analyses.
- Explain the analytic advantages of indirect data (no reactivity, longitudinal reach, low cost) and the limits (selection biases, missing context, the impossibility of probing).
- Articulate the ethical issues distinctive to secondary qualitative analysis.
1.1 The Logic of Indirect Observation
Indirect observation begins with a simple insight: people leave traces. They wear paths into grass between buildings, they leave fingerprints on museum glass, they fill landfills with packaging, they post on social media, they generate medical records, they file tax returns, they get arrested, they buy products, they vote. Each of those traces is data, and the discipline of harvesting them is, in principle, a qualitative discipline — the analyst is reading the world the way an interviewer reads a transcript. Eugene Webb, Donald Campbell, Richard Schwartz, and Lee Sechrest's Unobtrusive Measures: Nonreactive Research in the Social Sciences (1966; second edition 2000) is the canonical methodological statement. Bernard, Wutich, and Ryan treat the Webb tradition as foundational and devote the opening pages of Chapter 4 to it for two reasons.
First, indirect data are nonreactive. When you interview someone about their loneliness, they know they are being studied; what they say is shaped by the relationship, by the setting, by the desire to appear coherent, by the willingness to disclose. When you analyse a transit log showing how often someone uses public transit alone versus in company, the data have no such reactivity — the person was not performing for you. Reactivity is the qualitative-research equivalent of measurement bias, and indirect data are, in this respect, methodologically cleaner than elicited data on the same phenomenon.
Second, indirect data are typically longitudinal in ways that elicitation cannot match. An interview captures one moment, perhaps two if you do a follow-up. An archival record can stretch back decades. A behavior trace can accumulate over the entire life of the trace-generating activity. If you want to know how loneliness in a community has changed across thirty years, no interview design can give you that — but a content analysis of obituaries, of personal ads, of community-newspaper letters to the editor, of city-council minutes mentioning “isolation” can.
The cost of these advantages is a cost we will return to throughout the section: with indirect data you cannot ask why. You can see the pattern. You cannot ask the person who produced it what it meant to them. Combining indirect observation with elicitation — the two-source design — is often the strongest move.
1.2 Behavior Traces
Physical evidence left by behaviour: worn carpet paths, garbage composition, library-book usage patterns, app analytics, swipe-card logs. Webb et al.’s 1966 Unobtrusive Measures is the founding text. Useful when self-report is unreliable, sensitive, or socially constrained.
Pre-existing records made for purposes other than research: clinical chart notes, government minutes, court transcripts, news media archives, parish registers, organizational policy documents. Almost every public health office sits on a treasure of underused archival data.
Re-analysis of qualitative datasets collected by other researchers. The UK Data Service’s Qualitative Data Archive holds 1,000+ qualitative studies; the Murray Research Archive at Harvard holds many more. Important ethical considerations around consent and confidentiality.
Indirect methods are preferred when: (1) the behaviour is sensitive and self-report is suspect, (2) the population is hard to reach but leaves records, (3) historical reconstruction is needed, or (4) you want to triangulate against direct measures. Often combined with direct methods rather than replacing them.
Behavior traces are physical or digital records left by past action. Webb and colleagues divided them into two sub-types: accretion measures (things that have built up — graffiti, worn paths, garbage, fingerprints) and erosion measures (things that have worn down through use — the worn linoleum in front of a museum exhibit indicating how popular it was, the most-thumbed page of a Bible in a hotel room, the fading paint on a heavily-used playground). The distinction is more conceptual than practical; the analytic logic in both cases is the same.
In public-health research, behavior traces are useful for the questions that respondents cannot or will not answer accurately. Two examples make the point. A study of harm-reduction service utilisation in the Downtown Eastside used the contents of needle-exchange returns — the actual physical objects, not the user-reported counts — to estimate injection-event volumes. The traces were more reliable than self-report. A 2020 study of children's playground use during the early COVID-19 lockdowns used aerial photographs of trampled grass and ball-sport patterns to characterize informal play in periods when interviews and surveys were impossible.
The qualitative move with behavior traces is interpretive: what does this trace mean? The answer requires the same disciplined work as coding a transcript — specify what you are looking for, code consistently across cases, and be transparent about your interpretive leaps. A worn path between two buildings means someone walked there many times. It does not, by itself, tell you who they were, why they preferred that route, or whether they enjoyed it. The trace is the starting point of analysis, not its conclusion.
A loneliness example
If you wanted to study loneliness without interviewing anyone, behavior traces could give you a great deal. Library-borrowing records (which books are checked out by patrons who borrow alone vs. in groups), grocery-store self-checkout queue patterns at 9 p.m. on a Sunday (the time of week Maya, in your dataset, names as the loneliest), the contents of the “food for one” freezer section across socioeconomically different neighbourhoods, the timestamps of who turns on their porch light first at dusk. None of these is conclusive on its own. Combined with the kinds of accounts in your capstone transcripts, they give the qualitative claim a quantitative shadow that makes it more defensible to a public-health policy audience.
1.3 Archival Data
Key insight - The archival selection bias is invisible
Archival data are seductive: they already exist, they are often free, they cover periods you could not reach prospectively. But every archive is a sample of what was recorded by people empowered to record. Clinical charts overrepresent compliant patients seen by attentive clinicians; court records overrepresent successful prosecutions; newspaper archives overrepresent stories that fit editorial framings of the era. The selection bias is not a flaw to control for — it is a structural feature of the data. A defensible archival analysis names the selection explicitly.
Archival data are the institutional records people, organisations, and governments generate as a by-product of operating. The category is large. It includes:
- Institutional records — clinic charts, school-attendance registers, court transcripts, council minutes, child-welfare case files, ambulance dispatch logs, organ-donation registries.
- Historical documents — diaries, letters, newspapers, pamphlets, public-health reports from earlier eras, autopsy reports, missionary records, parish death books.
- Media corpora — television news transcripts, podcast episodes, social-media posts, advertising, government information campaigns.
- Personal documents — in the hands of researchers via consent or via the public domain: published memoirs, blogs, online support-group threads, GoFundMe campaign descriptions.
Bernard, Wutich, and Ryan are explicit that archival work has historically been treated as a sub-discipline of history rather than as qualitative methodology, and that the artificial boundary has cost the social sciences. The methods of qualitative analysis you will learn in Modules 5 through 12 — theme identification, content analysis, narrative analysis, grounded theory — all apply to archival data. A clinic chart can be coded; a newspaper editorial can be theme-analysed; a corpus of council minutes can be content-analysed for keyness over time. The analytic discipline is exactly the same as for interview transcripts; the data just arrived without an interviewer in the picture.
| Archival source | What it can tell you about loneliness | What it cannot tell you |
|---|---|---|
| Coroner reports of solitary deaths | Patterns in who dies alone, in what neighbourhoods, in what seasons, after how long undiscovered | Whether the deceased felt lonely, what their social life looked like before death |
| Newspaper personal ads (1980s–2010s) | Historical changes in how people advertise loneliness and seek connection; what language is and is not sayable | Whether the advertisers found what they were looking for |
| City-council minutes mentioning “social isolation” | When loneliness became politically nameable; what kinds of solutions were proposed and by whom | Whether the proposed solutions worked for the people the policy targeted |
| Online bereavement-support forum threads | How bereaved people describe loneliness to one another, what advice circulates, what stays unsaid | Whether non-posting bereaved people experience loneliness the same way (massive sampling concern) |
| Family-doctor chart notes (with consent) | How clinicians record patient loneliness; what gets coded vs. left in free text; medication histories | The patient's own framing; what they did not tell the doctor |
The selection-bias problem
The big methodological concern with archival data is that they were not produced for your study. They were produced for someone else's purpose, by someone else, with someone else's selection criteria, in someone else's institutional context. A coroner's report describes the deaths the coroner saw, written in the way the coroner was trained to write them. Newspaper personal ads were written by people who could afford to place them and read by people who read that newspaper. Online forum posts were written by people who used that forum, in English, with the literacy to compose a post, and who chose to post rather than to lurk. Each of these selection mechanisms is a population-defining filter, and your analytic claims have to be calibrated to what the data can actually support.
Bernard, Wutich, and Ryan's recommendation is conservative: treat archival data as one source among others, identify what the source can and cannot represent, write the limitations into your methods section explicitly, and resist the temptation to generalise beyond the population whose traces actually entered the archive.
1.4 Secondary Analysis of Existing Qualitative Datasets
Secondary qualitative analysis — using interview or focus-group data collected by other researchers for other studies — is the youngest of the three indirect-observation sub-types and the one growing fastest. The Qualitative Data Repository at Syracuse, the UK Data Service's Qualitative Bank, the SAGE Research Methods Cases collection, and a growing number of journal data-sharing requirements are making qualitative datasets available the way quantitative datasets have been available for decades.
The methodological promise is real. A graduate student who could never afford to do twenty new interviews on, say, vaccine hesitancy in rural Manitoba can re-analyse a deposited 2018 dataset on the same topic, asking new questions of the existing transcripts. A researcher building a meta-ethnography (a synthesis of multiple qualitative studies, the qualitative analogue of a meta-analysis) can use the deposited transcripts rather than only the published papers about them.
The methodological hazards are also real. Bernard, Wutich, and Ryan flag four. First, the original interview guide constrained what got asked; if your research question requires a probe the original interviewer never offered, the data are silent on it. Second, the analyst was not in the room and cannot recover the relational and embodied context the original researcher had. Third, the participants consented to one study; using their words for a different study requires either explicit re-consent (often impossible) or a defensible ethics argument that the secondary use is sufficiently consonant with the original consent. Fourth, the analytic frame you bring may be inappropriate for the kind of relationship the original interviewer built.
Bernard, Wutich, and Ryan's stance is permissive but cautious: secondary qualitative analysis is legitimate and increasingly necessary, but it should be done with explicit acknowledgement of what was lost and what was not collected, and with humility about interpretations that go beyond what the original interviews can support.
A word about your capstone dataset
The HSCI 841 loneliness dataset is itself a kind of pre-collected dataset — the transcripts were generated as a teaching corpus and you are doing what is, in effect, secondary analysis. The Week 4 milestone (at the end of this lesson) asks you to read the transcripts deeply, in part because that deep reading is the qualitative analogue of getting to know a quantitative dataset before you model it. You cannot ask Maya what she meant when she said the SkyTrain at 9 p.m. on Sunday was the loneliest place in her life. The transcript is the data. Knowing the data well, before you start coding it, is the first move of disciplined secondary analysis.
1.5 When to Use Indirect Methods
Indirect observation is the right first choice when one or more of the following is true: (a) the population is hard to reach by elicitation (e.g., the deceased, people in conflict zones, people who would not consent to interview); (b) the time horizon is long (decades or centuries); (c) reactivity is a serious concern (the topic is socially sensitive or self-presentation will distort responses); (d) the budget is small; (e) the research question is about an institutional or population-level pattern rather than about lived experience.
It is the wrong choice when you need to know what something felt like from the inside, when you need to probe a participant's understanding, or when the trace itself is missing the most interesting part of the phenomenon. For most loneliness research, indirect observation is a complement to elicitation, not a replacement. The chair Helen sits in (an accretion measure of where she spends her time) tells you something. What she says about the chair tells you something else. The disciplined qualitative researcher uses both.
Reflection
You are doing a qualitative study of loneliness in long-term-care facilities. Interviews with residents are slow, expensive, and ethically delicate. What indirect data sources might supplement your interview corpus? Name two, and for each, say what it can and cannot tell you.
Minimum 20 characters required.
Question 1: The Webb et al. (1966) term unobtrusive measures refers to:
Question 2: An accretion measure in the Webb tradition is:
Question 3: Which of the following is NOT a methodological hazard of secondary qualitative analysis identified by Bernard, Wutich, and Ryan?
Direct Observation — Participant Observation, Monitoring & Time Allocation
Introduction and Overview
Direct observation is the family of methods in which the researcher is physically (or virtually) present and watches behavior happen. It is the oldest qualitative methodology in cultural anthropology — ethnographers have been doing it for more than a century (Hammersley & Atkinson, 2019) — and it remains a workhorse of contemporary public-health research, especially in implementation science, clinical-encounter research, organisational ethnography, and Indigenous health. Bernard, Wutich, and Ryan dedicate pp. 71–79 of Chapter 4 to it, organising the discipline around three sub-types: participant observation (immersive engagement over time), continuous monitoring (structured watching of specific behavior streams), and spot observation / time-allocation studies (sampled snapshots of who is doing what, when).
The defining intellectual move in all three sub-types is what Spradley (1980) called “the discipline of seeing.” Ordinary perception is selective and habitual; it filters out the familiar and concentrates on what is novel or salient. Disciplined observation reverses the filter. The observer trains themselves to attend to what is so ordinary it has become invisible — the layout of chairs, the rhythm of who speaks and who waits, the timing of who arrives and who leaves, what is on the walls, what people wear, what they eat, when they touch and when they don't. The discipline is not natural. It is trained.
Learning Objectives for Section 2
- Distinguish the three sub-types of direct observation: participant observation, continuous monitoring, and spot observation.
- Locate yourself on Spradley's participant-observation spectrum from complete observer to complete participant, and explain what each position can and cannot see.
- Describe the discipline of “explicit awareness” (Spradley) and the techniques used to build it.
- Explain when continuous monitoring outperforms self-report (and vice versa).
- Set up a basic time-allocation study.
2.1 Participant Observation — The Spradley Spectrum
The researcher observes without participating. Often invisible to participants. Used for natural-behaviour studies in public spaces. Tradeoff: minimal influence on the scene; minimal access to meanings, motivations, or backstage talk.
The researcher participates marginally but is identified as a researcher. Brief field visits, structured site walk-throughs, episodic observation. Tradeoff: easier to enter and leave; less embedded knowledge.
The researcher is a participant in the setting and openly identified as also doing research. The most common form in modern ethnographic and implementation research. Tradeoff: deep access; identity management work; complex consent.
The researcher is fully embedded as a member, with research role hidden. Now ethically constrained (most IRBs will not approve without strong justification). Historical examples: studies of religious cults, criminal organizations, healthcare-system ‘mystery shopper’ designs.
Continuous monitoring: observe one focal person or process across the full activity. Spot observation / time allocation: sample many people at random instants to estimate how time is spent across a population. The choice is between depth-on-one and breadth-across-many.
Participant observation (DeWalt & DeWalt, 2011) is the ethnographic core of qualitative methodology: prolonged engagement in a setting, during which the researcher both watches and (to varying degrees) participates. The classic anthropological model is Bronislaw Malinowski's Trobriand Islands fieldwork — years of residence, daily participation, slow accumulation of insight. The same logic applies to a public-health researcher who spends six months attending the weekly meetings of a peer-support group, eating in the staff lounge of a community clinic, or volunteering at a needle exchange. The method's power comes from time: prolonged engagement reveals patterns invisible to a one-time interview.
James Spradley's Participant Observation (1980) is the canonical methodological treatment. Spradley organised participant-observation along a five-point spectrum (building on Gold, 1958) that names the trade-offs at each position:
| Position | What it looks like | What it sees well | What it cannot see |
|---|---|---|---|
| Complete observer | Researcher is fully external to the setting; people may not even know they are being observed (one-way mirror, public-space ethnography). | Behavior in its “natural” state, unaffected by the researcher. | Meaning. The observer cannot ask why; cannot probe. |
| Observer as participant | Researcher is known to be a researcher; spends time in the setting but does not take on participant roles. The default of most contemporary clinic-based ethnography. | Routine practice; staff-side dynamics; the ordinary working of the setting. | What it feels like to be a patient/client/insider; emic categories may stay opaque. |
| Participant as observer | Researcher takes a participant role (volunteer, intern, trainee) and observes from inside it. Known to be a researcher. | Insider perspective on the role taken; embodied knowledge of what the role requires. | Other roles in the setting; risk of “going native” and losing the analyst's distance. |
| Complete participant | Researcher's identity as researcher is hidden; they are seen as a member of the setting. Now ethically restricted in most public-health research. | The fullest insider view; what members say to each other when they don't think they are being studied. | Cannot probe analytically; serious ethical problems with informed consent. |
| Nonparticipation | Researcher observes only documentary or archival traces of the setting — the limit case bordering on indirect observation. | Patterns at scale; longitudinal change. | Anything that requires being there. |
The position you choose for a given study is a design decision with consequences. Bernard, Wutich, and Ryan recommend, for most contemporary public-health work, the middle two positions — observer as participant or participant as observer — because they balance access against ethical clarity (Madden, 2017). The complete-participant position has a long history in mid-twentieth-century sociology (Festinger's When Prophecy Fails, Humphreys' Tearoom Trade) but is now restricted by institutional research-ethics boards in most jurisdictions.
2.2 Building Explicit Awareness
Spradley argued that the central skill of participant observation is what he called explicit awareness: the trained capacity to notice things that ordinary perception filters out. The skill is the qualitative equivalent of learning to read an ECG — the lines on the paper are the same, but the trained eye sees what the untrained eye misses. Spradley described several specific techniques.
The everyday-object exercise. Spend ten minutes describing in writing the contents of a clinic waiting room you have been in dozens of times. Most students struggle for the first three minutes, then find a rhythm. The exercise reveals how little of the familiar setting was ever consciously registered. The training value is that subsequent visits to the setting will register more.
The naive-observer exercise. Pretend you have just arrived from a culture that has never seen a Canadian hospital before. Describe what is happening as if you do not know what a stethoscope is, what a hospital gown is for, why people are wearing masks. The discipline is to make the familiar strange. This is the canonical anthropological move and it is the engine of ethnographic insight.
The widening-attention exercise. Pick a single feature of a setting — say, who speaks first at the start of each meeting — and track it across multiple sessions. Then add a second feature: who interrupts whom. Then a third: where people position themselves around the table. Layered attention builds capacity that single-feature attention cannot.
The body-knowledge exercise. Notice the bodily sensations the setting produces in you. The smell. The temperature. The acoustics. The pace. Your own discomfort or ease. Ethnographic insight is partly embodied; what your body notices is data, even if it never makes it into the final report verbatim.
Why this matters for your capstone
You will not do participant observation for the HSCI 841 capstone — the dataset is pre-collected interview transcripts. But the discipline of explicit awareness applies to reading transcripts as well. The next time you read P01 (Maya), notice what she says about the SkyTrain at 9 p.m. on Sunday as an ethnographer would notice an everyday detail: what is the SkyTrain at that hour, what does it sound like, who is on it, what does the silence among passengers feel like? The trained ethnographic eye is the same eye you should be reading transcripts with. Without it, all you can see is the words; with it, you can see the world the words point to.
2.3 Continuous Monitoring
Continuous monitoring is the structured cousin of participant observation. The observer watches a specific behavior stream — not the whole setting — and records it systematically. Examples in public health include hand-hygiene observation in hospitals (the World Health Organization's “5 Moments” observation protocol), clinical-encounter observation (how much of a 15-minute primary-care visit is spent on what), classroom observation of school-meal consumption, and harm-reduction street-outreach observation of who uses what services in what order.
Continuous monitoring trades the breadth of participant observation for the precision of structured measurement. The observer typically uses a coding sheet, recording each event of interest with a timestamp. The data are dense and tabular. They can be analysed both qualitatively (what is the texture of the behavior? what coding categories needed to be added in the field?) and quantitatively (how often does each behavior occur per unit time? are there temporal patterns?).
Bernard, Wutich, and Ryan are explicit that continuous monitoring outperforms self-report whenever the behavior is one people would mis-remember, under-report, or socially-desirably-report. Hand hygiene is the canonical example: when clinicians are asked how often they wash their hands between patients, they say roughly 90 percent of the time. When observed, the rate is closer to 40 percent. The discrepancy is not because clinicians lie; it is because behavior at that frequency is invisible to self-report.
2.4 Spot Observation and Time-Allocation Studies
Spot observation is the sampling-based cousin of continuous monitoring. Instead of watching a stream of behavior continuously, the observer makes brief observations at randomly sampled times, recording who is doing what in that instant. Aggregated across many such instants, the data yield a time-allocation profile: what proportion of a population's time, on average, is spent on each activity.
The classical use is in cultural-anthropology economic research — how do subsistence-farming households allocate labour between fields, livestock, household, and leisure? — but the method translates directly to public-health questions. Time-allocation studies have been used to characterise informal caregiving in aging populations, sedentary behavior in office workers, and child supervision in low-income households. The method's strength is that it does not depend on participants' ability to recall how they spent their time (which is notoriously inaccurate). Its weakness is that brief snapshots cannot capture the meaning of activities and may miss rare but important events.
The reactivity problem
Direct observation has its own version of the measurement-bias problem: the Hawthorne effect. When people know they are being watched, they often change their behavior — sometimes briefly (the first few minutes of an observation), sometimes for the duration. Bernard, Wutich, and Ryan’s recommendation is to acknowledge this explicitly, to extend observation periods so the effect attenuates, and to triangulate with indirect data (the harm-reduction example in Section 1 is a good case — behavior traces revealed what observation would have distorted).
2.5 When to Use Direct Methods
Direct observation is the right first choice when you need to see behavior in context, when you suspect that what people do differs from what they say they do, when the setting itself is part of the object of study, when the behavior is too frequent or too embodied to be accurately recalled, or when you need to characterise an ordinary practice that participants take so for granted they cannot describe it. It is the wrong choice when the population is dispersed, when the behavior is private, when the budget for fieldwork is small, or when the question is fundamentally about meaning rather than action.
For loneliness research, direct observation would be a powerful complement to interviews if the question is, for example, “What does loneliness look like in residential settings?” You could spend weeks in a long-term-care facility watching the morning routine, mealtime seating patterns, who walks with whom on the corridor, when residents return to their rooms. None of this would replace interviews like Helen's, but it would let you see the loneliness Helen could only describe to you in fragments.
Reflection
If you were designing a participant-observation study of loneliness in a Vancouver senior-housing complex, which of Spradley's five positions would you take, and why? What would that position let you see, and what would you give up?
Minimum 20 characters required.
Question 1: Spradley's participant-observation spectrum runs from:
Question 2: Continuous monitoring of hand hygiene typically finds rates much lower than self-report. This is best characterized as:
Question 3: Time-allocation studies use which sampling strategy?
Elicitation — Interviews, Focus Groups & Cultural-Domain Methods
Introduction and Overview
The third and largest family in Chapter 4 is elicitation: methods in which the researcher actively asks participants to produce data. Bernard, Wutich, and Ryan dedicate most of pages 79–97 to this family because it dominates contemporary qualitative health research and because the design decisions are non-obvious. The family contains several quite different methods — unstructured interviews, semi-structured interviews, structured interviews, focus groups, free lists, pile sorts, triads, ranking tasks, and open-ended survey items. Each has a different relationship between researcher control and participant freedom; each is good for different kinds of questions.
Your capstone dataset was elicited using semi-structured interviewing, so this section dwells on that method longest. But you need to know the others — in part because you will encounter them in the literature you cite, in part because some of them (free listing in particular) will reappear in Module 12 as the basis of cultural-domain analysis.
Learning Objectives for Section 3
- Distinguish unstructured, semi-structured, and structured interviews on the dimensions of researcher control, participant freedom, and analytic affordance.
- Identify the design features of a good semi-structured interview guide, using the HSCI 841 loneliness guide as a worked example.
- Describe the role of probes — what they are, when to use them, and what makes a probe work or fail.
- Explain when focus groups outperform individual interviews and what moderator skills they demand.
- Preview the cultural-domain elicitation methods that Module 12 will return to.
- Recognise the place of open-ended survey items in the elicitation landscape.
3.1 Unstructured Interviews
The unstructured interview — sometimes called the informal, ethnographic, or conversational interview — is the elicitation form closest to an ordinary conversation. There is no fixed script. The interviewer has a topic in mind and follows the participant's lead, asking probes responsively as the conversation unfolds. The form is foundational in cultural anthropology (Spradley's The Ethnographic Interview, 1979, is the classic methodological treatment) and is used heavily in long-term fieldwork where the same researcher interviews the same participants many times over months or years.
The strength of unstructured interviewing is that the participant sets the agenda. They name the topics they think matter; they choose the words; they impose their categories before the researcher imposes theirs. This is invaluable when you are working in an unfamiliar cultural domain or with a population whose ways of talking about the phenomenon are themselves the object of study. A researcher trying to understand how Vietnamese-Canadian elders in East Vancouver talk about buon ba (the rough translation might be “loneliness”) will learn more from unstructured conversation than from a guide constructed in English-language categories.
The weakness of unstructured interviewing is that it sacrifices comparability across cases. If you and I both interview ten people unstructured-ly, we will have asked ten different sets of questions; whatever we say about “the pattern across cases” rests on the fragile assumption that the cases are about the same thing. For most public-health qualitative work — where the researcher wants to compare across participants on identifiable dimensions — unstructured interviewing is the wrong tool, or is used only in the early scoping phase.
3.2 Semi-Structured Interviews — The HSCI 841 Format
Sketch a 30-minute semi-structured guide for your capstone topic. Include:
- An opening (warm-up, builds rapport, easy to answer) — 3-5 min
- 2-4 core topic areas, each with a primary question plus 2-3 probe prompts ('Can you tell me more about that?', 'What was that like for you?') — 20 min total
- A closing ('Is there anything important I haven't asked you?') — 5 min
- Demographic items at the end (not the beginning — you want them invested before you ask)
A good interview guide is short. Topics, not scripts. The questions you don’t ask are as important as the ones you do.
Semi-structured interviewing is the workhorse of contemporary qualitative health research (Brinkmann, 2007), and it is the method used to produce the 20 transcripts of your capstone dataset. The method sits between unstructured and structured: it uses a written guide that covers the topics the researcher wants to address, but treats the guide as a checklist rather than a script. The interviewer follows the participant's lead within each topic, asks probes responsively, skips items the participant has already covered, and reorders items as needed.
The defining feature is the guide-but-not-script approach (Kvale & Brinkmann, 2018). The interviewer should be able to conduct the interview without looking at the page; the guide is internalised, not read. The reason is that reading questions aloud creates a school-test register that suppresses the discursive material the method is meant to elicit.
The HSCI 841 loneliness guide as a worked example
Open the interview guide in another tab and read along with this discussion. The guide is organised around six conceptual domains (defining and recognising loneliness; triggers and patterns; meaning and identity; responses and coping; social world; systems and policy). Each domain has a small number of main questions, each with two or three italicised probes. Notice three design features:
- Open-ended main questions. “When I say the word loneliness, what comes to mind for you?” (Q2) invites the participant to bring their own framing first.
- Targeted probes underneath. “Is loneliness the word you would use yourself, or would you use something different?” is the probe that produced Amira's response “In Arabic the word is wahda.” The probe opened a door the main question alone would have left closed.
- Domain order. The guide moves from definition to triggers to meaning to coping to social world to policy. The order is deliberate: definition first (before researcher framings contaminate it), policy last (when the participant is warmed up enough to step back from their own experience).
The art of the probe
Bernard, Wutich, and Ryan dedicate substantial attention to probes — the secondary questions the interviewer uses to invite expansion, clarification, or specification. A probe is not a follow-up question in the sense of a planned item; it is a responsive intervention. Bernard, Wutich, and Ryan (drawing on Bernard's earlier work) identifies several probe types worth naming.
| Probe type | Function | Example from the loneliness corpus |
|---|---|---|
| Silent probe | The interviewer says nothing after the participant's response, inviting them to continue. | When Maya pauses after “the SkyTrain at 9 p.m. on a Sunday,” an experienced interviewer waits. Often the most analytically valuable material comes in the second beat. |
| Echo probe | The interviewer repeats a fragment of what the participant just said with a slight rising intonation, inviting expansion. | P: “It feels like fading.” / I: “Fading...?” — Helen's elaboration about “fading at the edges” comes from such a probe. |
| Uh-huh probe | Minimal verbal acknowledgement that does not introduce new content but signals attention. | Used throughout the corpus; not always visible in transcript but evident from the long uninterrupted participant turns. |
| Tell-me-more probe | An explicit invitation to expand on a topic the participant has just raised. | I: “Tell me more about wahda.” Amira's discussion of memories without a place to go follows. |
| Long-question probe | The interviewer asks a longer, more elaborated question that gives the participant more to attach to. | Useful with reticent or laconic participants. |
| Leading probe (cautionary) | A probe that suggests the answer the interviewer expects. To be avoided. Bernard, Wutich, and Ryan are explicit that leading probes contaminate the data. | NOT: “So you'd say loneliness is mostly an emotional thing?” (after the participant has not said that). The leading frame closes off everything except agreement or disagreement with the frame. |
Probes that worked, probes that didn't — lessons from the corpus
Read alongside the transcripts, you can see the probes working. With Maya, the silent probe after “the SkyTrain at 9 p.m. on a Sunday” produced the elaboration “everyone's just on their phones, and nobody acknowledges that anybody else exists.” With Helen, the echo probe “Fading...?” produced the elaboration about three days without speaking and the voice sounding strange. With Amira, the uh-huh probe held the space open during the difficult disclosure about the anniversary of when her home was destroyed.
You can also see, in places, probes that did not produce expansion. Amira's response to a probe about Canadian friends — “To explain everything is to live through it again” — signals that the probe touched material the participant did not want to elaborate on. The interviewer correctly did not push. The discipline of qualitative interviewing is partly the discipline of not probing — recognising when to step back, when silence is what the participant needs, and when the probe would extract material the participant would later regret giving.
Notice in the excerpt that the bracketed [pause] is a transcription convention — an artefact of the analytic choice made during transcription, not of the spoken utterance itself. We return to that in Section 4. Notice also that the participant's response is full of filled pauses (“like,” “um”) and false starts (“nobody — nobody”). The fact that these survived into the transcript means the transcription used an intelligent verbatim convention — preserving most of what was said with some light cleaning. The fact that they did not survive into a polished journal-article quote would mean the analyst then cleaned them for publication. Both are legitimate choices at different stages.
3.3 Structured Interviews
Structured interviewing is what you get when the guide becomes the script. Every participant is asked the same questions in the same order, with no responsive probing. The form blurs into survey research at the structured end — in some ways it is survey research, conducted in conversation rather than on a form — and it is most useful when you need to compare across cases on standardised items.
Bernard, Wutich, and Ryan treat structured interviewing as legitimate and underused in qualitative methods. A grounded-theory study that has matured to the point of needing to test specific propositions can benefit from a structured-interview phase, even if the early phases used semi-structured or unstructured methods. The constant-comparative method is well-served by structured comparison at a late stage.
3.4 Focus Groups
Focus groups (or group interviews) are a qualitatively distinct elicitation method, not just “an interview with more people.” The defining feature is that the data are produced by the group interaction, not just by individuals serially (Kitzinger, 1994). Participants respond to each other; they agree, disagree, build on, qualify, and contest one another's framings. The data are dialogical. Richard Krueger's Focus Groups: A Practical Guide for Applied Research (5th edition, 2015) is the canonical methodological treatment; Kitzinger (1995) is the standard short-form introduction in the medical literature.
When focus groups outperform individual interviews
Focus groups (Krueger & Casey, 2015) are the right choice in five situations identified across Bernard, Wutich, and Ryan (2017) and Krueger:
- When you want to surface the range of views in a community — the group format brings out positions individuals might not articulate alone.
- When the topic is best discussed collectively — community priorities, policy preferences, evaluation of a programme, perceptions of a public-health campaign.
- When you want to observe how categories are negotiated in real time — what becomes consensus, what gets contested, what is unspeakable.
- When you have limited time and the question is breadth more than depth — six focus groups of eight people each produce a dataset of 48 voices in roughly the same time as eight individual interviews.
- When the population is reluctant to speak alone — some communities prefer collective speech; some topics are easier to discuss with peers present than with a researcher alone.
When focus groups are the wrong choice
Focus groups are the wrong choice when the topic is too sensitive for collective disclosure (sexual behavior, intimate partner violence, suicidality, stigmatised illness), when the participants are in unequal power relationships with each other (boss and subordinate, parent and adolescent), when the population is too small to recruit groups, or when the research question requires the kind of sustained individual narrative that group dynamics interrupt.
For loneliness research specifically, the choice is non-obvious. Individual interviews allow the participant to disclose the intimate and stigmatising dimensions of loneliness (Maya's comment “loneliness feels like admitting I'm failing at being 22” would be hard to say in a group). Focus groups allow you to see how loneliness is talked about collectively — what the public vocabulary is, what is performable and what is not, what stays unsayable. A study using both methods would learn things neither alone could reveal.
Moderator skills
The moderator's job is harder than the individual interviewer's. The moderator must (a) keep the discussion on topic without scripting it, (b) ensure the quiet participants speak and the loud ones do not dominate, (c) attend to non-verbal dynamics (who agrees by nodding, who disengages, who is silenced by another participant's comment), and (d) manage the ethics of the group, including the impossibility of guaranteeing confidentiality from other participants. Krueger emphasises that focus-group moderating is a trained skill, not a transferable interviewing skill — many experienced one-on-one interviewers struggle the first time they moderate a group.
3.5 Cultural-Domain Elicitation Methods (Preview)
Bernard, Wutich, and Ryan (pp. 92–96) introduce a family of elicitation methods specifically designed to surface the structure of a cultural domain — how members of a community categorise, order, and relate concepts within a topic area. These methods will be the focus of Module 12, where you will analyse them with networking and clustering techniques. For now, learn the names and what they do.
- Free listing. The simplest cultural-domain method: ask a sample of people to list, say, “all the kinds of loneliness you can think of.” The frequency with which each item appears, and the average position in the list, together index its cultural salience.
- Pile sorts. Hand participants a set of cards (each with a concept or item) and ask them to sort the cards into piles of things that go together. Aggregated across participants, the sorts yield a similarity matrix that can be analysed with multidimensional scaling or hierarchical clustering.
- Triad tests. Show participants three items at a time and ask which two are most similar (or which one is most different). Aggregated across triads, the data yield a similarity matrix similar to pile-sort output.
- Ranking tasks. Ask participants to order a set of items along a specified dimension (e.g., from least to most lonely, from most to least helpful).
These methods are powerful for studying the structure of a cultural domain — what its elements are and how they relate — in a way that open-ended interviewing cannot match. They are also fast: a free list can be elicited in two minutes per participant, allowing samples of several hundred people in a single study. The cost is depth: cultural-domain methods are structurally rich but phenomenologically shallow. You learn how loneliness is categorised, not what it feels like.
3.6 Open-Ended Survey Items
The final elicitation form is the humble open-ended survey item: the “please describe in your own words” box at the back of a questionnaire. Bernard, Wutich, and Ryan treat these as the most underused qualitative-data source in public health, partly because the items are often included for completeness rather than analysed, partly because the data are typically thin (a sentence or two per response) and easy to dismiss.
The corrective is to design open-ended items deliberately, position them where respondents have time to answer (early in the survey, not buried at the end), provide enough space to invite expansion, and budget the analytic time the responses deserve. A 2,000-respondent survey with even a 40 percent response rate to one open-ended item gives you 800 qualitative responses — a respectable corpus that can be theme-analysed, content-analysed, or topic-modelled (Module 12).
Reflection
You are designing the next phase of a loneliness study, building on the existing 20 semi-structured interviews. Pick ONE elicitation method you have NOT yet used (focus groups, free listing, pile sorts, structured interviews, or an open-ended survey item) and explain what specific question it would let you answer that the existing interview corpus cannot.
Minimum 20 characters required.
Question 1: The HSCI 841 loneliness interview guide is best characterised as:
Question 2: An echo probe is:
Question 3: Focus groups are typically the WRONG method choice when:
Transcription, Field Notes & the HSCI 841 Dataset
Introduction and Overview
You have now seen the three families of qualitative data collection — indirect, direct, elicitation. The last section of this lesson turns to what happens after the data are collected: the work of transforming spoken speech into transcripts and field experience into field notes. Bernard, Wutich, and Ryan treat both as analytic acts in their own right, not as neutral preparation. The decisions you make at this stage shape what later analysis can see — and, in the case of your capstone, the decisions someone else made before you got the dataset have already shaped what is available to you.
Learning Objectives for Section 4
- Recognise transcription as an analytic act, not a neutral technical step.
- Distinguish three levels of transcription — Jefferson notation (preview for Module 10), intelligent verbatim, and clean verbatim — and identify which level your capstone dataset uses.
- Write disciplined field notes using the Emerson, Fretz & Shaw progression: jottings → expanded notes → analytic memos.
- Read transcripts into R as a corpus and produce a basic words-per-transcript summary.
- Complete the Week 4 capstone milestone: a dataset familiarisation log covering 6–8 transcripts.
4.1 Transcription as Analytic Act
Transcription is the conversion of recorded speech into written text. The conversion seems, on first encounter, mechanical: type what you hear. In practice, transcription requires hundreds of small interpretive decisions per transcript. Should “um” be transcribed? What about “you know” and “like” when they are filler? Should false starts be preserved or smoothed? Should the pause between words be timed? Should laughter be marked, and if so, how? Should overlapping speech be shown with brackets? Should non-verbal sounds (a sigh, a cough, a chair scraping) be annotated?
Each of these decisions shapes what later analysis can see. A transcript that omits filled pauses and false starts cannot support conversation analysis, because the analytic categories conversation analysts work with (Module 10) are built on exactly those features. A transcript that omits laughter cannot support narrative-affective analysis, because the laugh-marker is often the most analytically interesting moment in a turn. A transcript that strips out vocalised pauses turns the speech of an articulate participant and the speech of a halting one into the same kind of text, masking a phenomenon that may matter for what you are studying.
Bernard, Wutich, and Ryan frame this as a methodological commitment: the level of transcription should be matched to the analytic intent. If you plan content analysis or thematic analysis, light transcription is sufficient. If you plan discourse or conversation analysis, you need much heavier transcription. If you do not yet know which method you will use, err on the side of more detail — you can simplify a detailed transcript later, but you cannot recover what was never written down.
4.2 Three Levels of Transcription
| Level | What is preserved | What is removed/cleaned | Used for |
|---|---|---|---|
| Jefferson notation (preview — Module 10) |
Every audible feature: pause durations to the tenth of a second, overlaps marked with brackets, in-breath/out-breath marked, intonation contours, stress, latching, audible laughter inside words. | Nothing — the convention's whole point is to preserve everything. | Conversation analysis; discourse analysis; the close study of interactional sequencing. |
| Intelligent verbatim | The participant's words, including most filled pauses (“um,” “like”), false starts, repetitions, and laughter (marked as [laughs] or [laughter]). Long pauses noted ([pause]). | Some throat-clearing, off-topic interruptions, background noise; cleaned of typos and grammar errors that are clearly transcription artefacts. | Most contemporary qualitative health research, including thematic analysis, content analysis, and grounded theory. |
| Clean verbatim | The substantive content of what was said, in grammatical sentences. | Filled pauses, false starts, repetitions, most non-fluent features. Speech is “cleaned up” into readable prose. | Journalistic interviews; oral history; some applied research where readability is the priority. Generally inappropriate for academic qualitative analysis because too much is lost. |
What level is the HSCI 841 dataset?
Your capstone dataset uses intelligent verbatim, leaning toward the more detailed end. Read any transcript and you can see the conventions: filled pauses (“um,” “like”) are preserved, false starts and self-corrections are visible (Maya's “hollow isn't the right word, it's like an ache”), long pauses are marked [pause] or [long pause], laughter is annotated [laughs], the dashes mark mid-utterance interruptions of self.
What the dataset does not have is Jefferson-level detail: pause durations are not numerically timed, overlapping speech is not bracketed in CA notation, intonation contours and stress are not marked, in-breaths are not transcribed. Module 10, which works on discourse and conversation analysis, will return to this. For the Week 4 milestone and most subsequent analytic work in this course, the intelligent-verbatim level is what you have to work with.
What you can and cannot do with the existing transcripts
The intelligent-verbatim level supports thematic analysis (Module 5), grounded theory (Module 7), content analysis (Module 8), narrative analysis (Module 9), analytic induction (Module 11), and computational text analysis (Module 12) without modification. It does not support fine-grained conversation analysis. If you wanted to do a CA-style examination of, say, how Helen's interview turns are constructed sequentially with the interviewer's, you would need to re-listen to the original recording (which the dataset does not provide) and re-transcribe at Jefferson level. We will work around this in Module 10 by transcribing a single short excerpt by hand.
4.3 Field Notes — Emerson, Fretz & Shaw
Robert Emerson, Rachel Fretz, and Linda Shaw's Writing Ethnographic Fieldnotes (Emerson, Fretz, & Shaw, 2011) is the standard methodological treatment of field-note writing in contemporary qualitative methodology. The Emerson/Fretz/Shaw progression organises field-note writing into three stages, each with a distinct purpose and discipline.
Stage 1: Jottings. Brief, in-the-moment notes scribbled in a small notebook (or, increasingly, in a phone notes app) during or immediately after observation. Jottings are not for the reader; they are for the writer's later self. They typically include a few key words per moment — enough to trigger memory when the writer sits down to expand them. The discipline is to capture the specific concrete particular: who said what, what they were doing, what time it was, what was on the wall. The temptation is to write analytic interpretations on the spot; the discipline is to resist this and write only what could be photographed if a camera had been present.
Stage 2: Expanded notes. Written as soon as possible after the observation (ideally the same day), expanded notes turn the jottings into full descriptive prose. The discipline is to write at length and in the present tense, recreating the scene so a reader could enter it. Expanded notes are typically several pages per hour of observation. They include direct quotations where possible, descriptive detail of setting, and observations of body language, sequence, and timing. They do not yet include analytic interpretation — the discipline is to keep description and analysis separated, because analysis dressed as observation gets remembered as observation and cannot be challenged later.
Stage 3: Analytic memos. Written separately from the expanded notes (often in a different document or notebook section), analytic memos record the researcher's evolving interpretation: what the field experience seems to mean, what patterns are emerging, what theoretical connections are forming, what hypotheses are taking shape. Memos are dated, indexed, and treated as part of the data corpus — you cite your own memos in the eventual methods section because the audit trail of analytic development is part of what transparency owes the reader.
Field notes for your capstone
Your capstone work does not involve fieldwork in the traditional sense. But the discipline of separating description from analysis applies to the work you will do reading transcripts. Maintain an analytic-memo file (a single .md or .docx document, dated entries, never deleted) throughout the term. Every analytic insight you have about the data — every “what if” thought, every emerging connection, every “Maya and Helen both...” observation — goes into the memo file. Module 5 will return to this; at that point you will have several weeks of memos accumulated, and they will be the raw material for codebook development.
4.4 R + Taguette — Reading Transcripts as a Corpus
This module is light on R. The serious computational work begins in Module 5 (when you start coding) and Module 12 (when computational text analysis takes over). But you can already do some useful things with the dataset in R, starting with reading the 20 transcripts in as a corpus and producing a basic summary.
This block reads all 20 transcripts into a corpus, counts tokens per document with quanteda::ntoken(), and produces a simple summary that lets you see which interviews are longest and shortest. Run it after you have completed the toolchain install from Module 1.
# Read all 20 loneliness transcripts as a corpus
library(readtext)
library(quanteda)
library(dplyr)
transcript_dir <- "../term projects/HSCI_841/transcripts"
# readtext::readtext() reads every .txt file in a folder into a tibble
# with one row per file, a doc_id column, and a text column
loneliness_texts <- readtext(file.path(transcript_dir, "*.txt"),
docvarsfrom = "filenames",
dvsep = "_",
docvarnames = c("pid", "pseudonym"))
# Turn it into a quanteda corpus object
loneliness_corpus <- corpus(loneliness_texts)
# Words per transcript
word_counts <- ntoken(loneliness_corpus, remove_punct = TRUE)
# Build a tidy summary
summary_df <- tibble(
pid = docvars(loneliness_corpus, "pid"),
pseudonym = docvars(loneliness_corpus, "pseudonym"),
words = as.integer(word_counts)
) %>% arrange(desc(words))
print(summary_df, n = 20)
summary(summary_df$words)
# Median, IQR, min, max give a sense of the spread in transcript length
What success looks like: A tibble with 20 rows showing each transcript's PID, pseudonym, and word count, sorted longest first. The summary() call gives you median word count and range. Expect roughly 1,500–4,500 words per transcript, with the longer ones from more articulate participants and the shorter ones — like Amira's, which was 39 minutes with interpreter pauses — reflecting different speech rhythms, not less interesting material.
4.5 The Week 4 Capstone Milestone — Dataset Familiarisation Log
Week 4 is the last week before coding begins in Modules 5 and 6. Your job this week is to get to know the dataset deeply. Bernard, Wutich, and Ryan are explicit (and most experienced qualitative researchers will tell you) that familiarity with the data is the precondition of analysis. You cannot code transcripts well that you have only skimmed. You cannot identify themes you would recognise in a transcript you read for the first time when you opened the coding software.
The Week 4 deliverable is a dataset familiarisation log: a structured table covering 6 to 8 transcripts (your choice, but with deliberate variation in age, gender, and life-stage), with a one-page reflection on what reading the transcripts has taught you about the data.
Reflection
Pick one transcript you have read so far and identify ONE feature of its transcription that shapes what later analysis can see. For example: a marked [laughs] annotation that opens a particular analytic possibility, a [pause] that did the work of unspoken disclosure, a false start preserved that revealed something the participant talked away from. What would have been lost if a clean-verbatim convention had been used?
Minimum 20 characters required.
Question 1: The HSCI 841 capstone dataset uses which level of transcription?
Question 2: The Emerson/Fretz/Shaw field-note progression is:
Question 3: Which R function would you use to count tokens (words) in each document of a quanteda corpus?
ntoken() returns the number of tokens (with punctuation handling controllable via the remove_punct argument) per document in a quanteda corpus. readtext() reads files in; corpus() creates the corpus object; summarise() aggregates a data frame.Final Assessment
Bringing It All Together
Lesson 4 has mapped the three families of qualitative data collection (indirect, direct, elicitation), located the HSCI 841 capstone dataset within the semi-structured-interview tradition, treated transcription and field-note writing as analytic acts in their own right, and set up the Week 4 dataset familiarisation log. With this lesson complete, you have the full design vocabulary the rest of the term will draw on: in Module 5 you start coding, in Modules 6 and 7 you start building conceptual models, in Modules 8 through 12 you move into specialised analytic techniques. The data are now familiar enough to start working on, not just reading.
Key Takeaways from Lesson 4
- Three families of data collection: indirect observation (Webb's unobtrusive measures — traces, archives, secondary datasets), direct observation (participant observation, monitoring, time allocation), and elicitation (interviews, focus groups, cultural-domain methods).
- Indirect observation is non-reactive and longitudinal but cannot ask why. Behavior traces, archival records, and secondary qualitative analysis are powerful complements to elicitation.
- Direct observation requires the discipline of explicit awareness (Spradley). The five-position spectrum from complete observer to complete participant names what each position can and cannot see.
- Semi-structured interviewing is the workhorse of public-health qualitative research and the format used to produce your capstone dataset. The guide-but-not-script approach, the responsive use of probes, and the discipline of letting the participant set the agenda are the central skills.
- Focus groups produce qualitatively different data from individual interviews — dialogical rather than narrative — and demand trained moderator skills. They are wrong for sensitive topics and right for community priorities.
- Cultural-domain elicitation methods (free listing, pile sorts, triads, ranking tasks) surface the categorical structure of a topic area at scale. Preview for Module 12.
- Transcription is an analytic act. Jefferson notation, intelligent verbatim, and clean verbatim each foreclose different later analyses. Your dataset uses intelligent verbatim.
- Field notes follow the Emerson/Fretz/Shaw progression: jottings → expanded notes → analytic memos — with description and interpretation rigorously separated.
- The Week 4 milestone is the dataset familiarisation log: a structured table covering 6–8 transcripts plus a 1-page reflection on what reading them has taught you.
Core Concepts Reviewed
Section 1: The three families of data collection; Webb's unobtrusive measures; accretion vs. erosion traces; archival data and the selection-bias problem; secondary qualitative analysis and its four hazards (constrained guide, lost context, consent, frame mismatch).
Section 2: Spradley's five-position participant-observation spectrum; explicit awareness as a trained skill; continuous monitoring vs. time-allocation studies; the Hawthorne / reactivity problem in direct observation.
Section 3: Unstructured, semi-structured, and structured interviewing; the HSCI 841 loneliness guide as a worked example of semi-structured design; six probe types (silent, echo, uh-huh, tell-me-more, long-question, leading); when focus groups outperform individual interviews; the cultural-domain elicitation family (free listing, pile sorts, triads, ranking); open-ended survey items.
Section 4: Transcription as analytic act; the three levels — Jefferson notation, intelligent verbatim, clean verbatim; the level used in your capstone dataset; the Emerson/Fretz/Shaw field-note progression; readtext::readtext() and quanteda::ntoken() for corpus loading and word-count summaries; the Week 4 dataset familiarisation log.
The final reflection asks you to integrate what you have learned about data collection with what you know about your own capstone dataset, before you move into coding next week.
Final Reflection
Your capstone dataset arrived in a particular form — 20 intelligent-verbatim semi-structured interview transcripts plus a guide — and that form has shaped what your capstone can and cannot do. In one paragraph, name TWO research questions about loneliness that this dataset can defensibly answer, and ONE that it cannot. For the one it cannot, briefly describe the data-collection design that would be needed instead.
Minimum 30 characters required.
Question 1: The three families of qualitative data collection identified in Chapter 4 are:
Question 2: Webb et al.'s 1966 term unobtrusive measures refers to:
Question 3: An erosion measure in the Webb tradition is:
Question 4: Spradley's five-position participant-observation spectrum runs from:
Question 5: Continuous monitoring of hand hygiene typically finds rates far below self-report. Bernard, Wutich, and Ryan read this as:
Question 6: The HSCI 841 capstone dataset was elicited with:
Question 7: A probe that repeats a short fragment of what the participant just said, with slight rising intonation, to invite expansion is called:
Question 8: Focus groups are typically the WRONG method choice for:
Question 9: Free listing, pile sorts, triad tests, and ranking tasks are examples of:
Question 10: Transcription is best characterised as:
Question 11: The level of transcription used in your capstone dataset is best described as:
Question 12: The Emerson, Fretz & Shaw progression for field-note writing is:
Question 13: Which R function reads all .txt files in a directory into a tibble for analysis?
readtext::readtext() reads all matching files from a folder into a tibble with one row per file, with options to parse metadata out of filenames. corpus() turns the tibble into a quanteda corpus; ntoken() counts tokens; summarise() aggregates.Question 14: The Week 4 capstone milestone is:
Question 15: A defensible research question for the existing 20-transcript loneliness dataset is:
Glossary — Key Terms, People & Methodological Stances
📚 Reference page — available throughout the lesson
This glossary collects the key concepts, people, and methodological stances introduced in Lesson 4. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.