HSCI 841 — Lesson 1

Foundations of Qualitative Data Analysis

Qualitative Research Methods & Analysis in Public Health

Kiffer G. Card, PhD, Faculty of Health Sciences, Simon Fraser University

Learning objectives for this lesson:

  • Define qualitative data analysis (QDA) operationally and locate it in the public-health evidence landscape
  • Explain why the boundary between qualitative and quantitative work is more porous than introductory texts suggest
  • Identify the four research goals (exploration, description, comparison, model-testing) and which is dominant in qualitative work
  • Recognize the five kinds of qualitative data (objects, still images, sounds, video, texts) and why texts dominate health research
  • Articulate the three methodological commitments — systematic, transparent, replicable — in operational terms
  • Set up the R + Taguette toolchain and orient to the HSCI 841 loneliness capstone dataset
  • Complete the Week 1 capstone milestone: read three transcripts and write a 500-word positionality memo

This course was developed by Kiffer G. Card, PhD, as a companion to Bernard, H. R., Wutich, A., & Ryan, G. W. (2017). Analyzing Qualitative Data: Systematic Approaches (2nd ed.). SAGE.

Section 1 of 5

What Qualitative Data Analysis Is — and Why It Belongs in an Epidemiology Series

⏱ Estimated reading time: 25 minutes

Introduction and Overview

Imagine sitting at a kitchen table in Burnaby with an 82-year-old man in long-term care who has just told you that his loneliness is “the empty space all those people used to fill.” A line like that is data. It has not been counted, scaled, or coded yet, but it is empirical, it is patterned, and it carries information about what loneliness is for the person living it. The problem of qualitative data analysis — the problem this course is about — is how to move from a stack of statements like that one to defensible knowledge claims a public-health audience will believe.

For three terms you have learned to do something different. In HSCI 230 you learned to count cases. In HSCI 341 you learned to detect outbreaks and screen populations. In HSCI 410 you learned to model exposures and outcomes with regression. All three courses worked on data that arrived in your hands as numbers, or that could be coerced into numbers without much loss. HSCI 841 sits beside those courses, not above or below them. It is the analytical companion that handles the data your three previous courses politely refused to deal with: interview transcripts, field notes, archival documents, focus-group recordings, free-list responses, and the open-text comments at the back of every survey.

Learning Objectives for Section 1

  • Define qualitative data analysis (QDA) operationally, not by what it is not.
  • Explain why the boundary between qualitative and quantitative work is more porous than introductory texts suggest.
  • Articulate three claims Bernard, Wutich, and Ryan make about disciplined QDA: it is systematic, transparent, and replicable.
  • Locate qualitative work in the broader landscape of public-health evidence and explain why a methods-trained epidemiologist needs both.

1.1 An Operational Definition

Key insight - An operational definition of qualitative analysis

Qualitative data analysis is the systematic process of organizing, coding, comparing, and interpreting non-numeric data to produce defensible claims about meaning, process, or context. The four verbs — organize, code, compare, interpret — are the steps; the three nouns — meaning, process, context — are what qualitative work uniquely can deliver. Hold this definition in mind for the rest of the course; we will return to each verb in detail.

Bernard, Wutich, & Ryan (2017, p. 1) offer a working definition: qualitative data analysis is the search for patterns in non-numeric data and an explanation of why those patterns are there. Read that sentence carefully. There are three moving parts, and each one is doing work.

The search for patterns means the analyst is doing more than retelling what was said. Patterns are regularities — co-occurrences, sequences, contrasts, gradients, absences. A theme that shows up in four out of twenty transcripts is a pattern. A code that always appears immediately after another code is a pattern. A topic that no one mentions, even though you asked about it, is also a pattern (a particularly informative one).

Non-numeric data is the easy part. The hard part is what counts as “non-numeric” in practice. A transcript is non-numeric. A photograph is non-numeric. A 30-second clip of a focus group laughing in unison is non-numeric. So is the layout of a clinic waiting room. We will be precise about the five kinds of qualitative data in Section 3.

Explanation — the third part — is the move that separates analysis from description, what Geertz (1973) called the difference between thin and thick description. A study that shows a pattern but does not explain it is, in Bernard, Wutich, and Ryan's terms, descriptive but not analytic. A loneliness study that finds the word “chair” mentioned in 11 of 20 transcripts has identified a pattern. It becomes analysis when the researcher proposes why — in this case, perhaps, because chairs are the most stable physical traces of absent people in domestic space.

Why Bernard, Wutich, and Ryan start with this definition

Many qualitative methods textbooks begin with the philosophy of qualitative work — ontology, epistemology, the interpretive turn. Bernard, Wutich, and Ryan deliberately do not. They start with a working definition that emphasizes doing over being, because their stance is that qualitative analysis is best learned the way quantitative analysis is learned: by performing it on data, transparently, and being prepared to defend the moves you made.

1.2 Numbers and Words Are Not Different Substances

The misleading common framingv

Introductory textbooks often present qualitative and quantitative as opposites: subjective vs objective, soft vs hard, narrative vs numeric, exploratory vs confirmatory (a framing canonized in handbooks such as Denzin & Lincoln, 2017). This framing produces bad work in both traditions. A bad quantitative paper is one whose categories are obviously wrong; a bad qualitative paper is one whose patterns cannot be transmitted to readers without losing them.

What unifies themv

Both traditions are in the business of turning observations into claims with warrant. Both require sampling decisions, measurement decisions, analytic procedures, and standards of rigour. The methodological choices differ; the epistemological obligations do not.

The Bernard-Wutich-Ryan positionv

This course takes the explicit position that numbers and words are not different substances. A frequency count derived from coded transcripts is information of the same kind as a frequency count from a survey. The choice is about what question you are answering, not about which side of a methodological war you are on.

Why this matters for public healthv

Public health research routinely combines methods (mixed-methods designs are now the norm in implementation research). Researchers who can move between numeric and narrative evidence without an identity crisis are the ones who do the most useful work.

Introductory methods textbooks like to draw a sharp line between “quantitative” and “qualitative” research. In practice, the line is more porous than the textbook chapters that surround it would suggest. A few illustrations make the point.

The annual Canadian Community Health Survey contains thousands of pre-coded numeric items and a smaller set of open-text fields. Researchers routinely quantify the qualitative: they read the open-text comments, develop a coding scheme, count the resulting categories, and analyze the counts with chi-squared tests. They have just done qualitative analysis — they just did not stop there. Conversely, an interview-based grounded-theory study may begin with a frequency table of how many transcripts mention each emerging code. It has used a quantitative move (counting) inside a qualitative project.

The defensible distinction is not between numbers and words but between the type of question a study is answering and the kind of access the data give the analyst to the phenomenon. Quantitative methods are typically the right choice when you want to estimate a population-level magnitude, test a pre-specified hypothesis, or measure an effect. Qualitative methods are typically the right choice when you want to discover what something is, how people make sense of it, or how a process unfolds. Both can use counting. Both can use words. The deeper choice is about whether you are measuring a known phenomenon or characterizing an under-described one.

Question typeQuantitative is usually a good fitQualitative is usually a good fit
How common? Yes — prevalence, incidence, rates Limited — can suggest commonality but not measure it
How strong is the association? Yes — regression, odds ratios, effect sizes No — not the right tool
What is it? Limited — depends on a pre-existing definition Yes — especially for new or contested phenomena
How do people make sense of it? Limited — structured surveys constrain answers Yes — this is the natural home of QDA
How does the process unfold over time? Yes for outcomes, limited for mechanism Yes for mechanism, limited for outcomes

1.3 The Textbook's Methodological Commitment

Bernard, Wutich, and Ryan are explicit about a stance that shapes the entire course: qualitative data analysis can and should be systematic, transparent, and replicable. The three words do specific work.

Systematic means that the analytic procedure is specifiable in advance (or at least in retrospect) and applied consistently across the dataset. If you decide to code every mention of the word “chair” in a transcript, you code every mention of the word “chair” in every transcript — you do not code some and skip others based on whether the mention is interesting. The systematicity is what makes the resulting pattern claim a real finding rather than an anecdote.

Transparent means that a reader of the eventual report can see what you did. This is the function of methods sections, audit trails, and codebooks (Lincoln & Guba, 1985). When a published qualitative paper says “themes emerged from the data,” Bernard, Wutich, and Ryan would consider that a methodological failure: themes do not emerge, analysts develop them through specifiable steps, and those steps should be in the paper (Braun & Clarke, 2006).

Replicable is the most contested of the three words and the most often misunderstood. Replicability in qualitative work does not mean that two analysts working on the same dataset would produce identical interpretations — Bernard, Wutich, and Ryan are clear that interpretation is partly perspectival. It means that two analysts following the same procedure would produce defensible interpretations and would identify similar patterns. The standard is not identity. The standard is “another competent researcher would arrive somewhere coherent with mine.”

Why this matters for your epidemiology training

Public-health audiences — the people who read your eventual reports — have been trained to ask methodological questions about quantitative work: How was the sample drawn? What was the case definition? What is the confidence interval? Most of them have not been trained to ask the same kinds of questions about qualitative work. The Bernard, Wutich, and Ryan stance is that you should welcome those questions and have answers for them. The point of being systematic, transparent, and replicable is not philosophical purity. It is so that the qualitative work you publish is taken seriously by the public-health audiences who decide policy.

1.4 Where Qualitative Work Sits in the Public-Health Evidence Landscape

One question that comes up early in every qualitative methods course is some version of: “Why would a public-health researcher do this kind of work?” The honest answer is that for many of the most important public-health questions, qualitative work is the only way in. Consider three examples.

Why do people refuse vaccines? You can count refusals with surveys (HSCI 341 territory) and you can identify the predictors of refusal in a logistic regression (HSCI 410 territory). But if you want to understand the specific arguments people give themselves and each other for refusing — the narratives, the framings, the felt experience of distrust — you need interviews. The qualitative literature on vaccine hesitancy is what produced the interventions that the quantitative literature later tested (on qualitative inquiry more broadly, see Wikipedia, 2025).

What is it like to live with a chronic illness? The phenomenology of, say, type 2 diabetes or long COVID is not legible in administrative data. Patient-reported outcome measures (HSCI 410 territory) give you scores; qualitative work gives you what the scores are measures of. A scale that says someone has a quality-of-life score of 0.62 tells you a number. The interviews behind such scales are what give the number its content.

Why did the program fail? Implementation science — the study of why evidence-based programs work in trials but stall in real-world rollout — relies heavily on qualitative methods. The numerical fact that a program failed is the starting point. The reasons are uncovered through interviews with implementers, observation of practice, document analysis of organizational policy, and the kinds of analytic moves you will learn in this course.

Reflection

Think of a public-health question from your previous coursework or your current work where the quantitative answer feels incomplete — where the numbers are there but the meaning is missing. What kind of qualitative work would fill the gap, and what kind of data would you want to collect to do it?

Model answerA defensible response names a specific question, identifies what the quantitative answer is doing and what it is missing, and proposes a concrete qualitative complement. Example: “The provincial overdose mortality rate tells us how many people died, but not why those who survived a near-fatal overdose did or did not subsequently engage with services. I would conduct semi-structured interviews with 20–30 survivors of a recent near-fatal overdose, recruited through a needle-exchange site and an emergency department, to characterize the decision logic around service engagement, the role of stigma in the post-overdose window, and the structural barriers (waitlists, ID requirements, working hours) survivors identify. The qualitative work would directly inform what to measure in a future quantitative engagement-predictor study.” The point is to name the specific phenomenological or mechanistic gap, not to vaguely say “more research is needed.”

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 1

Question 1: According to Bernard, Wutich, and Ryan, the operational definition of qualitative data analysis includes three parts: the search for patterns in non-numeric data and...

The working definition above has three parts: (1) the search for patterns, (2) in non-numeric data, (3) and an explanation of why those patterns are there. Without the third part, you have description, not analysis.

Question 2: Which of the following is NOT one of Bernard, Wutich, and Ryan's three commitments for disciplined qualitative analysis?

The three commitments are systematic, transparent, and replicable. Bernard, Wutich, and Ryan acknowledge that interpretation has a perspectival element but rejects the framing of qualitative work as inherently or proudly subjective.

Question 3: A researcher counts how many transcripts in their corpus mention a specific theme and analyzes the counts by participant gender. Which statement best describes this move?

Quantifying themes from qualitative data is a routine and defensible analytic move — content analysis (Module 8 of this course) is built on it. The qualitative/quantitative boundary is porous in practice.
Section 2 of 5

The Four Research Goals: Where Qualitative Work Lives

⏱ Estimated reading time: 20 minutes

Introduction and Overview

Bernard, Wutich, and Ryan organize the entire enterprise of empirical research around four goals: exploration, description, comparison, and the testing of models. Every research study you have ever read or conducted can be located in one of these four (or, more commonly, in two or three of them at once). The goals are not a hierarchy. They are not phases of a single study. They are different jobs that empirical research can do, and each has its preferred methods. This section walks through the four goals, locates qualitative work within them, and uses the loneliness dataset as a running example.

Learning Objectives for Section 2

  • Distinguish exploration, description, comparison, and testing as different research goals.
  • Identify which goals qualitative methods are best suited to and why.
  • Recognize that a single study often pursues more than one goal.
  • Locate the HSCI 841 capstone in this landscape.

2.1 Exploration

Exploratory qualitative work asks: what is going on here? It surfaces categories, patterns, and concerns that did not pre-exist in the researcher’s head. The output is a vocabulary the field did not have before.

Example: Interviews with newly diagnosed long-COVID patients in 2020-21 surfaced symptom clusters that questionnaire research could only operationalize later.

Descriptive qualitative work asks: what does this phenomenon look like in detail, in its own terms? The output is a rich, context-bound account that gives readers a feel for the world being described.

Example: A field study of a smoking-cessation clinic describes what staff actually do all day — not what the protocol says they do.

Comparative qualitative work asks: how do these cases differ, and what does the difference teach us? Compares people, settings, time periods, or institutional arrangements to identify what varies and what holds constant.

Example: Comparing how Indigenous and non-Indigenous focus groups talk about ‘wellness’ reveals which words mean similar things and which do not.

Model-testing qualitative work asks: does this theoretical claim hold up when we look at lived experience? Brings an existing model to data and asks whether the data confirms, qualifies, or undermines it.

Example: Bringing the Health Belief Model to interviews with vaccine-hesitant parents to see which of its constructs (susceptibility, severity, benefits, barriers) are actually invoked.

Exploration is what you do when you do not yet know enough about a phenomenon to make a hypothesis about it. The goal is to map the territory: to find out what is there, what the relevant categories are, what people consider important, and what the underlying mechanisms might be. Bernard, Wutich, and Ryan are explicit that qualitative work dominates exploration, because quantitative methods generally require that you have already decided what the variables of interest are, and exploration is the work of deciding that.

Most public-health questions begin in an exploratory phase, even if they later move into hypothesis-testing. When a new pathogen emerges, when a previously invisible population's experience comes onto the policy agenda, when a digital harm (like cyberbullying or AI-mediated relationships) appears that no existing survey can ask about, the first scholars to study it are doing exploration. Their job is to give the rest of the field something to measure.

The loneliness dataset that anchors this course is, in part, exploratory. The transcripts contain accounts of loneliness from 20 people who differ in age, gender, life-stage, immigration status, caregiving role, and many other dimensions. A defensible analysis of these transcripts will help develop or refine the categories that future quantitative surveys of loneliness might use.

2.2 Description

Description is the careful characterization of a phenomenon: what is it, what does it look like, what are its dimensions, who experiences it, in what settings, with what consequences? Description is sometimes treated as the consolation prize of empirical research — the work you do when you cannot do anything “real.” Bernard, Wutich, and Ryan reject this framing emphatically. Many of the most influential studies in public health are descriptive: the Framingham Heart Study began as description, the BC Centre for Disease Control overdose mortality reports are description, the entire field of demography is description.

Both qualitative and quantitative methods can do description, and they describe different aspects of the same phenomenon. A quantitative survey can describe what percentage of adults in BC report being lonely in the past year (this is description by counting). A qualitative study can describe what loneliness feels like from the inside, what triggers it, what people do about it, and how they make sense of it (this is description by characterization). Both are valid. Often, both are necessary.

Your capstone work in HSCI 841 is heavily descriptive. You will be asked to characterize loneliness as experienced by the 20 participants whose transcripts you analyze: its dimensions, its triggers, its embodied features, the meanings participants assign to it. That descriptive characterization is the bulk of what a qualitative health study contributes.

2.3 Comparison

Comparison is what you do when you have characterized a phenomenon and now want to know how it varies across groups, settings, or conditions. The classical home of comparison in epidemiology is the case-control study (does the exposure differ between cases and non-cases?) or the cohort design (does the outcome differ between exposed and unexposed?). Comparison is more often associated with quantitative work because of the statistical machinery available for it, but qualitative comparison is a real and rigorous activity.

Bernard, Wutich, and Ryan dedicate substantial parts of this course to comparison-based qualitative methods. Grounded theory's constant-comparative method (Glaser & Strauss, 1967), qualitative comparative analysis (QCA), and matrix analysis are all systematic ways to compare across cases — with words instead of variables — and to draw defensible inferences about why groups differ.

In your capstone, you will compare across the 20 loneliness transcripts. You might compare how older participants describe loneliness with how younger participants describe it. You might compare immigrant participants' accounts with native-born participants' accounts. The comparison is qualitative when the units being compared are texts or interpretive cases rather than rows in a spreadsheet, and when the conclusions are about patterns of meaning rather than effect sizes.

2.4 Testing Models

The fourth goal — the testing of theoretical models — is what most introductory methods textbooks treat as the pinnacle of empirical research. You have a theory; you derive predictions from it; you collect data and see whether the predictions hold. This is the standard logic of confirmatory quantitative work.

Qualitative work can do this too, though it is rarer and more contested. Analytic induction (Module 11) is a qualitative model-testing approach: you specify a hypothesis, examine your cases, find a case that does not fit, and revise the hypothesis until it fits all cases. Qualitative comparative analysis (also Module 11) tests Boolean propositions about combinations of conditions sufficient for an outcome. These are real model-testing exercises, just conducted with qualitative cases.

Your capstone may include a small model-testing element if you wish. For example, you might hypothesize that participants who describe loneliness in existential terms (a feature of one's life-stage) cope through reframing, while participants who describe it in situational terms (a feature of current circumstances) cope through behavioral change. You can then check the hypothesis against the 20 transcripts. That is qualitative model-testing.

One study, multiple goals

Almost no real study fits cleanly into one of the four research goals. The original Cacioppo and Patrick work on loneliness (2008) explored what loneliness is, described its physiological correlates, compared lonely and non-lonely adults, and tested specific neurobiological models. Most of your capstone work will be predominantly exploratory and descriptive, but you should not avoid comparison or model-testing if your data support them.

Reflection

Of the four research goals (exploration, description, comparison, testing models), which two would you say your capstone is most oriented toward? Why? There is no wrong answer — the question is whether you can defend your choice with reference to what the loneliness dataset can and cannot support.

Model answerFor most students the dominant goals will be description and comparison. The dataset (20 transcripts with deliberate variation in age, gender, life-stage, and immigration status) is well-suited to characterizing what loneliness looks like across kinds of people (description) and to comparing the felt experience across subgroups (comparison). Exploration is plausibly your second goal if your specific research question is under-described in the literature (e.g., the loneliness of late-life coming out, or refugee wahda). Model-testing is the least natural fit for a 20-transcript dataset because the analytic power for testing Boolean propositions is limited — though analytic induction on a focused sub-question is defensible. A strong answer names the two goals AND names the feature of the dataset that justifies the choice.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 2

Question 1: Which of the four research goals does qualitative work most strongly dominate?

Exploration is where qualitative work dominates because quantitative methods generally require pre-specified variables, and exploration is the work of figuring out what those variables should be. Description is also strongly qualitative, but exploration is where Bernard, Wutich, and Ryan are most emphatic.

Question 2: A study compares how older and younger participants describe loneliness using grounded-theory constant comparison across 20 interview transcripts. Which of the four research goals is most central?

Comparing groups (here, older vs. younger participants) is comparison. The use of grounded theory's constant-comparative method is a qualitative comparison engine; comparison need not be quantitative.

Question 3: Why is description rejected by Bernard, Wutich, and Ryan as a “consolation prize” characterization of research?

Bernard, Wutich, and Ryan point out that many landmark public-health contributions are descriptive. Description is a primary scientific achievement, not a fallback, and qualitative description gives a phenomenon its content in a way that counting alone cannot.
Section 3 of 5

The Five Kinds of Qualitative Data — and Your Capstone Dataset

⏱ Estimated reading time: 25 minutes

Introduction and Overview

Bernard, Wutich, and Ryan organize qualitative data into five kinds: physical objects, still images, sounds, moving images, and texts. The categorization may feel pedantic until you realize that which kind of data you have shapes what analytic moves are available to you. A photograph and a transcript both look like “qualitative data” but they are coded differently, sampled differently, and reported differently. This section walks through the five kinds, with public-health examples for each, and then introduces the HSCI 841 capstone dataset.

Learning Objectives for Section 3

  • List the five kinds of qualitative data.
  • Give a public-health example for each kind.
  • Explain why texts dominate this course (and most of contemporary qualitative health research).
  • Recognize that “texts” is a broader category than it appears.
  • Locate and read at least three transcripts from the loneliness capstone dataset.

3.1 Physical Objects

2007).')">
+
Physical objects
Tap to reveal
+
Still images
Tap to reveal
+
Sounds
Tap to reveal
+
Moving images
Tap to reveal
+
Texts
Tap to reveal

Anthropologists call this material culture: the artefacts people make, use, exchange, and discard. In public health, the relevant physical objects include medication packaging, syringe-exchange kit contents, vaccine cards, ad hoc harm-reduction supplies, mobility aids, the layout of a clinic waiting room, the contents of someone's medicine cabinet, and the materials available (or unavailable) in a school health office. Object-based analysis is not common in mainstream public-health research but it is increasingly important in implementation science, environmental health, and Indigenous health research where physical context carries meaning that words do not.

3.2 Still Images

Photographs, hand-drawn maps, satellite imagery, screenshots of social-media posts, anatomy diagrams in patient education materials, public-health campaign posters. Still-image analysis is a recognized sub-specialty (visual sociology, visual anthropology) with its own techniques: content analysis (Module 8) is regularly applied to images, and the photovoice method — participants taking their own photographs and discussing them — is a staple of community-based participatory research.

3.3 Sounds

Recorded speech is the most common form, but other audio data is also analytically tractable: the sound of a clinic at peak hours, the music played in a hospice, the auditory environment of a school cafeteria. The vast majority of qualitative analysis in health research, however, begins with sound (an interview recording) and is converted to text (a transcript) before analysis. This conversion step — transcription — is itself an analytic act, and you will spend serious time on transcription conventions in Module 4 and Module 10.

3.4 Moving Images: Video

Video is sound plus image plus time. Clinical encounter recordings, simulated training scenarios, ethnographic field recordings, TikTok health-influencer content, telehealth call recordings — all are qualitative data. Video analysis is more time-consuming than audio or text but allows attention to non-verbal communication, embodied action, and spatial arrangement. Conversation analysis (Module 10) has historically privileged video for exactly this reason.

3.5 Texts

Texts are, by far, the most common qualitative data in contemporary health research. They include:

  • Interview transcripts — the most familiar form, and the form your capstone dataset takes.
  • Focus-group transcripts — like interviews, but multi-party and with conversational dynamics.
  • Field notes — the researcher's own written record of observation.
  • Documents — policy papers, clinical guidelines, organizational reports, news articles, archived correspondence.
  • Open-text survey responses — the qualitative tail of an otherwise quantitative instrument.
  • Social-media corpora — posts, comments, threaded discussions, hashtag streams.
  • Patient-generated text — diaries, symptom journals, illness blogs.
  • Free-list responses — short open-ended elicitation data used in cultural domain analysis (Module 12).

The reason texts dominate this course is partly practical (they are cheap to store, easy to share, computationally tractable) and partly principled: text is the form most amenable to the systematic, transparent, replicable analytic procedures Bernard, Wutich, and Ryan advocate. The methods you learn in this course will be applicable to images, video, and audio with adjustments, but the default unit of analysis is text.

3.6 The HSCI 841 Capstone Dataset

ACTIVITY Try it - Inventory your capstone dataset

Take your HSCI 841 capstone dataset (or, if you don’t have one yet, pick a small body of qualitative material you have access to). Make a quick inventory:

  1. How many cases are there? (People, sites, documents.)
  2. What is the unit of data? (Transcript, field note page, image, audio file.)
  3. What is the volume? (Approximate word count, number of pages, duration of audio.)
  4. What is missing? (Whose voices are not in the corpus? What contexts are not represented?)

This four-line inventory is the qualitative equivalent of nrow(df), ncol(df), summary(df). Do it before you start coding anything.

Your capstone dataset consists of 20 transcripts of semi-structured interviews about experiences of loneliness, conducted with adults in British Columbia. The dataset is paired with the interview guide used to elicit it. Both are stored in the course materials. The 20 participants vary deliberately across age (18 to 82), gender (women, men, non-binary), life-stage (students, parents, retirees, widows), and circumstance (immigration, caregiving, romantic dissolution, late-life coming out, occupational burnout, refugee resettlement, neurodivergent identity, long-term care).

🔎 Hands-on: First look at the capstone dataset

Before continuing, open the interview guide and read at least three transcripts:

Read the guide first; then read the three transcripts side by side. Notice what is shared and what differs across the three accounts. This is the kind of side-by-side reading that anchors every analytic technique you will learn in the next eleven lessons.

Important note on the dataset

The 20 transcripts are fully synthetic composites developed for instructional use. They draw on themes documented in the published loneliness literature but no individual transcript represents a real person. You may treat them as you would any qualitative dataset for the purpose of learning the methods. Your capstone paper should state in its methods section that the data are an instructional dataset.

Reflection

After reading three transcripts of your choice, what is one shared theme you noticed across all three? What is one thing that appeared in only one transcript and surprised you? Don't overthink this — first impressions are analytically useful.

Model answerThere is no single correct answer; what matters is that you have specific, named observations and that you can locate them in particular transcripts. A strong response names a shared theme (e.g., “all three participants described loneliness using a spatial metaphor — a chair, a building, an apartment”) AND a distinguishing observation (e.g., “Amira used a non-English word, wahda, to name something the English category could not hold — that surprised me because I had assumed the participants would have a shared vocabulary”). Specific observations like these are what theme-identification (Module 5) systematizes. Vague impressions like “they all seemed sad” are not the analytic raw material the rest of the course will work with.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 3

Question 1: Bernard, Wutich, and Ryan identify five kinds of qualitative data. Which of the following is NOT one of them?

The five kinds are physical objects, still images, sounds, moving images (video), and texts. Numeric biomarkers are quantitative data, not qualitative data.

Question 2: Why do texts dominate contemporary qualitative health research, according to this lesson?

Texts dominate for practical reasons (storage, sharing, computation) and for principled ones (the systematic-transparent-replicable analytic procedures Bernard, Wutich, and Ryan advocate are most developed for text data).

Question 3: Transcription — converting recorded interview sound into a written transcript — is best characterized as:

Transcription is an interpretive act. The conventions you adopt (verbatim with fillers and pauses, vs. clean intelligent verbatim) determine what later analysis can reveal — conversation analysis in particular requires very detailed transcription that ordinary content analysis does not.
Section 4 of 5

The Three Commitments, the Toolchain, and the Week 1 Capstone Milestone

⏱ Estimated reading time: 30 minutes

Introduction and Overview

The first three sections gave you definitions, the four research goals, and the five kinds of qualitative data. This section turns operational. It unpacks the systematic-transparent-replicable manifesto in working terms, installs the R and Taguette toolchain you will use for the rest of the term, and ends with the Week 1 capstone milestone.

Learning Objectives for Section 4

  • Operationalize each of the three methodological commitments (systematic, transparent, replicable) for your own work.
  • Install the HSCI 841 R toolchain (tidyverse, quanteda, tidytext, igraph, irr) and verify it runs.
  • Set up Taguette for hand-coding.
  • Understand the Week 1 capstone deliverable and produce the positionality memo.

4.1 Systematic Analysis — in Operational Terms

A systematic procedure has three features. First, it is specifiable: you can write it down and another researcher can follow it. Second, it is consistent: the same rule is applied to every case. Third, it is iterative when needed but documented when revised: if you change your codebook halfway through coding, you note when and why, and you re-code the earlier transcripts under the new scheme.

Unsystematic analysis — the kind Bernard, Wutich, and Ryan critique — reads as “we read the transcripts and noted what stood out.” That is not analysis; that is intuition. The systematic version of the same activity reads as “two analysts independently coded each transcript using a codebook developed from the first five transcripts; disagreements were resolved through discussion and the codebook was revised twice during the analysis (revisions documented in the audit trail).”

4.2 Transparency — What You Owe Your Reader

Transparency is what you owe your reader. Specifically, you owe them four things:

  1. An explicit account of how you got the data — sampling logic, recruitment, interview procedure, the instrument you used.
  2. An explicit account of how you analyzed the data — coding procedure, the codebook (often in an appendix), how disagreements were handled, what software you used.
  3. An explicit account of your positionality — who you are, what you brought to the interpretation, what you might have missed.
  4. An explicit account of the limitations — what your design and dataset cannot tell you, even if you wish they could.

The convention in contemporary public-health qualitative research is that all four are in the methods section or an appendix, not in the body of the discussion as an afterthought.

4.3 Replicability — Coherent, Not Identical

Replicability in qualitative work is best operationalized (see Tracy, 2010) as the question: if a competent researcher with no prior knowledge of your study had access to your dataset, your codebook, and your methods, could they reach interpretations that are coherent with yours? The answer should be yes, even if not identical. Where the answer is no, the cause is usually one of: undocumented analytic decisions, idiosyncratic coding, or interpretations that go far beyond what the data support.

One way to test replicability is the intercoder reliability check (Morse, Barrett, Mayan, Olson, & Spiers, 2002): two analysts independently apply the codebook to the same subset of transcripts and the agreement is calculated. You will do this in Module 5. Intercoder reliability is not the only test of replicability — for some interpretive methods it is the wrong test — but it is the most common operational standard in applied health research.

A word on the “objectivity” debate

Some traditions in qualitative research are skeptical of the language of systematicity, transparency, and replicability. They argue that knowledge is co-constructed and that the analyst is a constitutive part of what gets seen (see also reflexivity in qualitative research; Wikipedia, 2025), and that pretending otherwise is methodologically dishonest. Bernard, Wutich, and Ryan agree with the underlying point about co-construction but reject the implication that systematic methods are therefore inappropriate. Their position — and the position of this course — is that transparency about subjectivity is the modern operational solution: you say what you brought, you make your moves visible, and you let the reader judge.

4.4 The HSCI 841 R Toolchain

You have used R throughout the epidemiology sequence. In HSCI 841 you will use the same R environment but with different packages. The text-analysis ecosystem in R is mature and well-documented. The packages you install below in your Week 1 work block are the ones you will rely on for the rest of the course.

RInstall the HSCI 841 toolchain

Open RStudio. Run the following installation block. Comments after the # explain each package's role in the course.

# Core qualitative-text-analysis stack for HSCI 841
install.packages(c(
  "tidyverse",          # general data wrangling and plotting
  "tidytext",           # text-as-data verbs in the tidyverse idiom
  "quanteda",           # industrial-strength text analysis (Modules 8, 12)
  "quanteda.textstats", # keyness, readability, lexical diversity
  "quanteda.textplots", # keyness plots, word clouds, network plots
  "stringr",            # text manipulation
  "readtext",           # reading text corpora into R
  "igraph",             # network analysis (Module 12)
  "topicmodels",        # LDA topic modelling (Module 12)
  "irr"                 # intercoder reliability stats (Module 5)
))

# Verify the install by loading the core stack
library(tidyverse)
library(quanteda)
library(tidytext)

# Smoke test: read one transcript into R
loneliness_dir <- "../term projects/HSCI_841/transcripts"
p01 <- readLines(file.path(loneliness_dir, "P01_Maya.txt"))
length(p01)  # number of lines
head(p01, 12)  # first 12 lines: metadata header

What success looks like: Each package shows as “successfully installed” in the console. The smoke test reads the file and prints its first 12 lines (the metadata header).

4.5 Taguette for Hand-Coding

Taguette is a free, open-source qualitative coding application. It does what NVivo and ATLAS.ti do (highlight passages, attach codes, build a codebook, export coded extracts) without the licence fee. It runs in your browser. There is nothing to install if you use the hosted version; you can also install it locally.

🔎 Hands-on: Get set up with Taguette
  1. Go to taguette.org.
  2. Create a free account, or download the desktop version if you prefer not to use the hosted instance.
  3. Create a new project called “HSCI 841 Loneliness Capstone”.
  4. Upload one of the transcripts as a test (you can delete and re-upload later).
  5. Familiarize yourself with the interface: how to highlight a passage, how to create a code, how to view the codebook.

You will spend serious time in Taguette in Modules 5, 7, 9, and 10. Setting it up now means you are not configuring software the week you also have content to learn.

4.6 The Week 1 Capstone Milestone

The capstone is a journal-article-format paper, due Week 12, reporting your qualitative analysis of (a subset of) the loneliness dataset. The paper will have the standard structure: introduction, methods, findings, discussion, references. The methods section will be the heaviest in the paper, because it is the section where Bernard, Wutich, and Ryan's three commitments will be most visible.

Across the term, each module advances the capstone by one concrete milestone. The Week 1 milestone is below.

Reflection

Of the three commitments — systematic, transparent, replicable — which one feels least intuitive to you right now, given how you have been trained in quantitative methods? What might you have to unlearn or relearn to meet it in your capstone?

Model answerMost students arriving from a quantitative epidemiology background find replicability the most uncomfortable of the three, because the standard they were trained on — bit-for-bit reproducibility of a regression output — is not available for qualitative interpretation. The work of unlearning is to accept that “coherent with mine” is a defensible standard, and that documentation of analytic choices (not identity of conclusions) is the operational goal. Systematic is usually the most familiar, because epidemiology students are already trained to specify procedures in advance. Transparent is sometimes harder than it looks because writing a positionality statement is unfamiliar — you have not been trained to declare yourself in your methods section. The best answer will name the specific commitment, name what you would have to change in your habits, and name the practice you will adopt to bridge the gap.

Minimum 20 characters required.

✓ Reflection saved
Knowledge Check — Section 4

Question 1: Which of the following best operationalizes “transparency” in qualitative analysis?

Transparency means the reader can see how you got the data, how you analyzed it, who you are (positionality), and what your design cannot tell you (limitations). Raw audio is rarely shareable for confidentiality reasons; emotional reactions are not the same as a positionality statement; and member-checking is a separate validation technique, not what transparency means.

Question 2: Which is Bernard, Wutich, and Ryan's view of replicability in qualitative work?

Bernard, Wutich, and Ryan's standard is “coherent with mine” — defensible interpretations from a competent second analyst, not identical conclusions. Intercoder reliability is one operational test, but not the only one.

Question 3: Which package is the “industrial-strength text analysis” engine that the HSCI 841 toolchain will lean on in Modules 8 and 12?

quanteda is the workhorse text-analysis package: document-feature matrices, keyness, lexical diversity, plotting, network co-occurrence. tidyverse is general-purpose; irr is for intercoder reliability; topicmodels is for LDA specifically.
Section 5 of 5

Final Assessment

⏱ Estimated time: 25 minutes

Bringing It All Together

Lesson 1 has set up the conceptual and operational vocabulary that the rest of HSCI 841 depends on. The four-research-goal frame (Section 2), the five-kinds-of-data frame (Section 3), and the systematic-transparent-replicable manifesto (Section 4) are the scaffolding the next eleven lessons will hang their specific techniques on. The Week 1 capstone milestone — reading three transcripts and writing a positionality memo — is your first piece of real qualitative work, and it intentionally mirrors the work you will do at much larger scale across the term.

What you take away from this lesson sets up Lesson 2 (Research Questions, Theory, and the Literature), which asks you to formulate the specific research question your capstone will answer. Lessons 3 and 4 build the design side — sampling and data collection — even though for this course your data are already collected; the design vocabulary is what your methods section will be written in. Lessons 5 onward move into analysis proper.

Key Takeaways from Lesson 1

  • QDA is defined operationally: the search for patterns in non-numeric data and an explanation of why those patterns are there. Description without explanation is not analysis.
  • The qualitative/quantitative boundary is porous: the deeper distinction is the type of question being answered (magnitude vs. characterization) and what kind of access the data give to the phenomenon.
  • Four research goals organize empirical work: exploration, description, comparison, and testing models. Qualitative work dominates the first two and contributes seriously to the second two.
  • Five kinds of qualitative data: physical objects, still images, sounds, moving images, and texts. Texts dominate health research for practical and principled reasons.
  • Three methodological commitments anchor the course: systematic (specifiable, consistent, documented), transparent (data, procedure, positionality, limitations), and replicable (coherent with, not identical to, a second analyst's interpretation).
  • The HSCI 841 toolchain is R (tidyverse, quanteda, tidytext, igraph, irr) plus Taguette — both free, both open-source, both transferable beyond the course.
  • The Week 1 capstone milestone is the positionality memo: a 500-word document that names what you bring to the dataset before you start coding it, and the toolchain set-up that lets you start coding.

Core Concepts Reviewed

Section 1: The operational definition of QDA (patterns + non-numeric data + explanation); the porousness of the qualitative/quantitative boundary; the three methodological commitments (systematic, transparent, replicable); the case for qualitative work in public-health evidence (vaccine refusal, chronic illness phenomenology, implementation failure).

Section 2: The four research goals — exploration, description, comparison, model-testing; the dominance of qualitative work in exploration and description; the legitimacy of qualitative comparison and qualitative model-testing; the multi-goal character of real studies.

Section 3: The five kinds of qualitative data (physical objects, still images, sounds, moving images, texts); why texts dominate; the analytic status of transcription; the structure and provenance of the HSCI 841 capstone dataset (20 synthetic loneliness transcripts).

Section 4: Systematic, transparent, and replicable in operational terms; the four things owed to a reader for transparency; the “coherent, not identical” standard for replicability; the R toolchain (tidyverse, quanteda, tidytext, igraph, irr); Taguette for hand-coding; the Week 1 positionality memo.

The final reflection below asks you to step out of method-mode and name what you carry forward from Lesson 1 into the rest of the course. There is no single right answer; the goal is to leave the lesson with an articulated stance.

Final Reflection

You are about to begin a course that asks you to take qualitative work as seriously as you take quantitative work. In one paragraph, name one thing you are bringing to this from the prior epidemiology courses that will help, and one thing you may need to set aside in order to do this work well.

Model answerA strong answer is specific and self-aware. What helps: the discipline of pre-specification (writing your analytic procedure before you run it) translates almost directly from quantitative epidemiology to qualitative analysis, as does the habit of declaring limitations explicitly. The methodological seriousness HSCI 230/341/410 trained — treating your method as something you must defend — is exactly what Bernard, Wutich, and Ryan are asking for. What to set aside: the instinct to reach immediately for population-level claims. Qualitative work on 20 transcripts cannot tell you what percentage of British Columbians experience loneliness; it can tell you what kinds of loneliness exist, what they look like from the inside, and what mechanisms might be at work. Letting go of the prevalence question while doing this work is the hardest unlearning. The other thing many quantitative-trained students need to set aside is the wish for an unambiguous “result”: qualitative findings come with interpretive ranges and competing readings that should be reported, not collapsed.

Minimum 30 characters required.

✓ Reflection saved
Final Assessment — Lesson 1: Foundations of Qualitative Data Analysis (15 Questions)

Question 1: Bernard, Wutich, and Ryan define qualitative data analysis as the search for patterns in non-numeric data and...

The three-part definition is patterns + non-numeric data + explanation. The explanation step is what separates analysis from description.

Question 2: Which of the following best characterizes the relationship between qualitative and quantitative methods, as framed by this lesson?

The lesson's stance is that the qualitative/quantitative boundary is porous in practice: counting is used inside qualitative projects, and characterization is used inside quantitative ones. The defensible distinction is between magnitude questions and characterization questions.

Question 3: The three methodological commitments Bernard, Wutich, and Ryan ask of disciplined qualitative analysis are:

The three-word manifesto is systematic, transparent, replicable — the working standards every assignment in this course will be judged against.

Question 4: Which of the four research goals does qualitative work most strongly dominate?

Exploration is where qualitative work dominates because quantitative methods generally require pre-specified variables, and exploration is the work of figuring out what those variables should be.

Question 5: A new pathogen emerges and there are no validated survey instruments to measure how people experience infection. The most appropriate first methodological move is:

Exploration is the appropriate first move when a phenomenon is under-described. Qualitative interviews give later quantitative instruments something to measure.

Question 6: The five kinds of qualitative data identified in this lesson are:

Bernard, Wutich, and Ryan organize qualitative data by its material form: physical objects, still images, sounds, moving images (video), and texts. The other options are categories of methods or stances, not data forms.

Question 7: Why do texts dominate contemporary qualitative health research?

Texts dominate for practical and principled reasons: they are cheap, shareable, computationally tractable, and most compatible with the systematic-transparent-replicable analytic procedures Bernard, Wutich, and Ryan advocate.

Question 8: Transcription — converting interview audio into a written transcript — is best characterized as:

Transcription is interpretive: the conventions adopted determine what analyses become possible. Conversation analysis requires far more detailed transcription than content analysis does.

Question 9: The HSCI 841 capstone dataset consists of:

The dataset is 20 synthetic interview transcripts plus the semi-structured guide. The synthetic nature is disclosed in the dataset note and must be acknowledged in your eventual capstone paper's methods section.

Question 10: Which statement best operationalizes “systematic” analysis in qualitative work?

Systematic means the procedure is specifiable, consistent, and iteration-friendly with full documentation. Idiosyncratic coding of “what stands out” is what Bernard, Wutich, and Ryan critique as unsystematic.

Question 11: Transparency in qualitative analysis means a reader of your paper can see explicit accounts of:

The four-part transparency obligation is: data collection, analytic procedure, positionality, and limitations. Bernard, Wutich, and Ryan expect all four in the methods section or an appendix.

Question 12: Bernard, Wutich, and Ryan's standard for replicability in qualitative work is:

“Coherent, not identical” is Bernard, Wutich, and Ryan's operational standard. It acknowledges the perspectival element of interpretation without abandoning the demand for procedural discipline.

Question 13: In the HSCI 841 toolchain, which package is the primary engine for industrial-strength text analysis used in Modules 8 and 12?

quanteda is the workhorse text-analysis package (document-feature matrices, keyness, lexical diversity, plotting). The others have narrower roles: tidyverse for general data wrangling, irr for intercoder reliability, topicmodels specifically for LDA.

Question 14: The Week 1 capstone deliverable is:

Week 1 asks for a positionality memo (your reading position before you code) and confirmation that the toolchain is installed and working. The literature review comes in Week 2; coding begins in Week 5.

Question 15: What is the best characterization of the relationship between HSCI 841 and the prior courses (HSCI 230, 341, 410)?

HSCI 841 is the qualitative companion to the quantitative methods sequence. The goal is methodological omnivory — an epidemiologist who can do credible qualitative work and a qualitative researcher who can read quantitative work, both in the same person.
✦ Complete the final reflection above before submitting

Congratulations!

You have successfully completed Lesson 1: Foundations of Qualitative Data Analysis.

You can now define QDA operationally, locate it among the four research goals, recognize the five kinds of qualitative data, articulate the systematic-transparent-replicable manifesto, and operate the R + Taguette toolchain. The Week 1 positionality memo is your first piece of real qualitative work; submit it before the Module 2 lecture.

Next up — Lesson 2: Research Questions, Theory & the Literature, which gives you the upstream design vocabulary your eventual capstone methods section will be written in.

Continue to Lesson 2 →
Reference

Glossary — Key Terms, People & Methodological Stances

📚 Reference page — available throughout the lesson

This glossary collects the key concepts, people, and methodological stances introduced in Lesson 1. Use it as a reference while you work through the material, or as a review before the final assessment. Type in the search box to filter entries.

Core Concepts
Qualitative Data Analysis (QDA) The systematic search for patterns in non-numeric data and the explanation of why those patterns are there (Bernard, Wutich & Ryan, 2017). Distinguished from casual interpretation by its commitment to transparency, replicability, and audit trails.
Qualitative Data Empirical material that has not yet been reduced to numbers: physical objects, still images, sounds (including spoken words), video, and texts. Anything that can be turned into numbers later but is not numeric to begin with.
Four Research Goals The Bernard/Wutich/Ryan framing of what empirical research does: exploration (what is going on?), description (what does it look like?), comparison (how do groups differ?), and testing models (does this theory hold?). Qualitative work dominates the first two and contributes seriously to the second two.
Exploration Research aimed at mapping unknown territory — identifying what exists, what the categories are, and what mechanisms might be at work. Qualitative methods dominate exploration because quantitative methods need pre-specified variables.
Description The careful characterization of a phenomenon: what it is, what it looks like, who experiences it. Bernard, Wutich, and Ryan reject the framing of description as a consolation prize: many landmark public-health contributions (Framingham, BC overdose mortality, demography) are descriptive.
Comparison Research aimed at how a phenomenon varies across groups, settings, or conditions. Qualitative comparison is real and rigorous; grounded theory, QCA, and matrix analysis are systematic qualitative-comparison engines.
Testing Models Research aimed at evaluating a theoretical proposition against data. Qualitative model-testing exists — analytic induction and qualitative comparative analysis (QCA) are the main examples — but is less common than qualitative exploration or description.
The Three Commitments
Systematic An analytic procedure is systematic when it is specifiable, consistent across cases, and iterative-with-documentation. Unsystematic analysis is what Bernard, Wutich, and Ryan critique as “noting what stood out.”
Transparent An analysis is transparent when the reader can see four things: how you got the data, how you analyzed it, your positionality, and your limitations. The four belong in the methods section or an appendix, not in the discussion.
Replicable An analysis is replicable when a competent second analyst, using your dataset, codebook, and procedure, would arrive at an interpretation coherent with yours — not identical to it. The standard is “coherent, not identical.”
Methodological Stances
Positivism / Postpositivism The view that social phenomena can be studied with the same methods as natural phenomena and that there is a knowable external reality. Contemporary qualitative work usually adopts postpositivism: an objective reality exists but our access to it is fallible.
Interpretivism / Constructivism The view that social reality is co-constructed through meaning-making by participants and researchers. Common in narrative analysis, constructivist grounded theory (Charmaz, 2014), and most ethnography (Hammersley & Atkinson, 2007).
Critical / Critical Realist Frameworks (feminist, anti-racist, decolonial, Marxist, critical-realist) that explicitly attend to power and structural inequality in both research question and analysis. Increasingly the default in public-health qualitative work.
Pragmatism A meta-stance that picks methods to fit the question rather than committing in advance to one paradigm. Common in mixed-methods health research.
Emic vs. Etic From anthropology. Emic = the insider's account, in the participant's own categories. Etic = the outside analyst's framework, in theory-driven categories. Most QDA moves between the two; making the move explicit is part of being transparent.
Reflexivity The practice of examining how your own social position, assumptions, and analytic choices shape the interpretations you produce. Qualitative researchers are expected to be reflexive in writing, not merely in private.
Positionality The specific social location (gender, race, class, profession, age, language) from which a researcher conducts and interprets research. A positionality statement is the standard expectation in contemporary qualitative health research.
Audit Trail A documented record of analytic decisions — codebook revisions, memos, coder disagreements, sampling adjustments — that another reader could follow to reconstruct how you got from raw data to conclusions.
Key People
H. Russell Bernard, Amber Wutich, Gery W. Ryan Authors of Analyzing Qualitative Data: Systematic Approaches (2nd ed., 2017). Bernard is a foundational figure in cultural anthropology and research methods; Wutich and Ryan work in applied anthropology and health research at Arizona State University and the RAND Corporation respectively. Their stance — that qualitative analysis can and should be as disciplined as quantitative analysis — defines the course.
John T. Cacioppo (1951–2018) Social neuroscientist who, with Stephanie Cacioppo and Louise Hawkley, established loneliness as a serious public-health and biomedical concern. His Loneliness: Human Nature and the Need for Social Connection (with William Patrick, 2008) is the foundational reference for the empirical study of loneliness.
Kathy Charmaz (1939–2020) Medical sociologist who developed constructivist grounded theory (Charmaz, 2014), the most widely used qualitative-methodology variant in contemporary health research. You will meet her again in Module 7.
No matching entries. Try a different search term.